arxiv_cs_ai 2026年2月10日

冷暖房システムの最適HVAC用冷却インフラの共同設計と運用を強化学学習に基づいて行う

Reinforcement Learning-Based Co-Design and Operation of Chiller and Thermal Energy Storage for Cost-Optimal HVAC Systems

Open original article

Translated: 2026/2/14 8:13:48

Japanese Translation

私たちが商用HVACシステムに対する冷却インフラの組み合わせ的な操作とサイズ設定について研究しています。これは、30年の horizon 情報を考慮したコスト最適化の目的です。冷却システムは固定容量の電気冷凍機と熱能蓄留スクラバ (TES) 単位で構成されています。単位時間が経過するための時間変動価格の下で、時間を変化させる電力供給を制御します。ライフサイクルコストは既存資本投資費用と現金流産費が考慮されます。そして主要な課題は、キャパシティコストの強力な不対称性です：1単位増加冷凍機容量よりもTECS 单位増加がずっと高価です。結果として、冷房負荷なし最適運転を確保しながら最も適切な冷凍機 TECSS カパシティの組み合わせを見つけ出すための共同設計は一部であり、これは困難な問題です。これらの問題に対して,我々は固定インフラ構成に基づく冷凍機の操作を有限の時間範囲 Markov 賦定プロセス (MDP)として立てるとし、制御アウェーションは冷凍分割比 PLR です。これは DQN (深層 Q 置換ネットワーク)を使用して最適解を求めます。そして学到んだ DQN RL 政策は過去の情報を元に電力コストを最小限にします。それぞれ候補の冷凍機と TECSS カパシティ構成での優れた政策が評価され、その後各可能な構成では冷房負荷完全満足を確認しながら、この可視域内でライフサイクルコストが最小化された設計を見つけ出します。この方法によって、最適な冷凍機と熱能蓄留能力はそれぞれ700と1500です。

Original Content

arXiv:2601.22880v2 Announce Type: replace-cross Abstract: We study the joint operation and sizing of cooling infrastructure for commercial HVAC systems using reinforcement learning, with the objective of minimizing life-cycle cost over a 30-year horizon. The cooling system consists of a fixed-capacity electric chiller and a thermal energy storage (TES) unit, jointly operated to meet stochastic hourly cooling demands under time-varying electricity prices. The life-cycle cost accounts for both capital expenditure and discounted operating cost, including electricity consumption and maintenance. A key challenge arises from the strong asymmetry in capital costs: increasing chiller capacity by one unit is far more expensive than an equivalent increase in TES capacity. As a result, identifying the right combination of chiller and TES sizes, while ensuring zero loss-of-cooling-load under optimal operation, is a non-trivial co-design problem. To address this, we formulate the chiller operation problem for a fixed infrastructure configuration as a finite-horizon Markov Decision Process (MDP), in which the control action is the chiller part-load ratio (PLR). The MDP is solved using a Deep Q Network (DQN) with a constrained action space. The learned DQN RL policy minimizes electricity cost over historical traces of cooling demand and electricity prices. For each candidate chiller-TES sizing configuration, the trained policy is evaluated. We then restrict attention to configurations that fully satisfy the cooling demand and perform a life-cycle cost minimization over this feasible set to identify the cost-optimal infrastructure design. Using this approach, we determine the optimal chiller and thermal energy storage capacities to be 700 and 1500, respectively.