arxiv_cs_ai 2026年2月10日

充電性劣化に-awareの頻度制御：リフレインド学習に基づく多種混合バッテリー部隊

Degradation-Aware Frequency Regulation of a Heterogeneous Battery Fleet via Reinforcement Learning

Translated: 2026/2/14 8:13:14

Japanese Translation

蓄電システムは、重視した再生可能エネルギー生成の統合に有効な対策となることが広く行われています。これら蓄電システムは通常高応答能力を持ち、供給業界の制衡サービスや絶縁制御を行うために使用されます。<br>しかし、繰り返し充電と一回転で劣化を引き起こされると、バッテリーの持続時間や寿命が短くなります。この論文は、複数のバッテリー部隊全体的な状態保持信号を追跡するという課題について研究を行いますし、複数のバッテリー各々に設定された電圧制限内で、長期的に欠損率を引き下げながら最適なタイムセッティングをおこなうことで最適化します。<br>充電性劣化は、通常の状態保持の形で形成される負荷パタンと相互作用します。また非マーチンジェニックの形状を特徴としていますから、統計的信号の制御問題において、従来のダイナミック最適化方法は適用が困難と言えます。これが我々にマーチンジェニック決定過程（MDP）にコンテキスト制約された行動空間を利用せしめることを強調し、長期的な損失の減少に対して情報を提供できる、密集した代理目的報酬を設計する挑戦を与えます。<br>バッテリー状態の細かい格子化やバッテリー間で異なる制限に対応することが求められます。そのためには，非線形のランダムな特徴関数となるラミング極限学習（ELM）を用いた関数近似と時間差補正は不可欠だと私は考えます。それに加えて、実世界から得られた制御信号トレースデータを事前に行い、そのモデル訓練を行った上で，提案する方法の効果を評価します。実際のトウキョンシナノモデルと標的データベースを使用して、最適なマーチンジェニック決定過程に合わせてプロセスが進行します。<br>結論からいえば、我々の提案されたアプローチは一般的な最短化パタンを引き込む基準を持つ単一のバッテリー部隊に対する基盤ソースで優れた効果を示しました。その結果，バッテリーコストデgradationを削減する一方で、その統計的信号やランダムなトレースモデルに基づいた定期的なパタンを引き込むことを防ぐことが可能です。

Original Content

arXiv:2601.22865v2 Announce Type: replace-cross Abstract: Battery energy storage systems are increasingly deployed as fast-responding resources for grid balancing services such as frequency regulation and for mitigating renewable generation uncertainty. However, repeated charging and discharging induces cycling degradation and reduces battery lifetime. This paper studies the real-time scheduling of a heterogeneous battery fleet that collectively tracks a stochastic balancing signal subject to per-battery ramp-rate and capacity constraints, while minimizing long-term cycling degradation. Cycling degradation is fundamentally path-dependent: it is determined by charge-discharge cycles formed by the state-of-charge (SoC) trajectory and is commonly quantified via rainflow cycle counting. This non-Markovian structure makes it difficult to express degradation as an additive per-time-step cost, complicating classical dynamic programming approaches. We address this challenge by formulating the fleet scheduling problem as a Markov decision process (MDP) with constrained action space and designing a dense proxy reward that provides informative feedback at each time step while remaining aligned with long-term cycle-depth reduction. To scale learning to large state-action spaces induced by fine-grained SoC discretization and asymmetric per-battery constraints, we develop a function-approximation reinforcement learning method using an Extreme Learning Machine (ELM) as a random nonlinear feature map combined with linear temporal-difference learning. We evaluate the proposed approach on a toy Markovian signal model and on a Markovian model trained from real-world regulation signal traces obtained from the University of Delaware, and demonstrate consistent reductions in cycle-depth occurrence and degradation metrics compared to baseline scheduling policies.