arxiv_cs_lg 2026年2月10日

延髄に着想を得た余剰制御による故障回復：推論時の適応から構造の統合へ

Cerebellar-Inspired Residual Control for Fault Recovery: From Inference-Time Adaptation to Structural Consolidation

Translated: 2026/3/15 13:05:22

cerebellar-inspiredfault-recoveryreinforcement-learninginference-time-adaptationrobotics

Japanese Translation

arXiv:2602.07227v1 Announce Type: new 要約: 実世界の環境で展開されるロボット政策は、トレーニング後の不具合に遭遇することがあります。この場合、再トレーニングや探索、システム特定は現実的ではありません。我々は、固定された強化学習政策にオンラインの修正動作を付加し、基本政策パラメータを変更せずに故障回復を可能にする、延髄に着想を得た推論時余剰制御フレームワークを導入しました。このフレームワークは、高次元パターン分離への固定特徴量拡張、並列ミクロゾーン型余剰経路、そして異なったタイムスケールで動作する興奮性と抑制性の適格トレース付きのローカル誤り駆動可塑性を含む、延髄の核原理を実装します。これらのメカニズムは、トレーニング後の擾乱に対して迅速で局所的な補正を可能にしつつ、安定化させない全体的な政策更新を避けます。残余権限と可塑性を調節し、正常な動作を維持し、不必要な介入を抑制する、性能を重視した保存的なメタ適応が導入されました。MuJoCo ベンチマークにおけるアクチュエータ、動的、環境擾乱に対する実験では、中等度の故障条件下で exttt{HalfCheetah-v5} で最大 +66%、 exttt{Humanoid-v5} で +53% の改善が見られましたが、深刻なシフトでは優雅な劣化を示し、一貫した余剰補正を政策パラメータに統合することで相補的な頑健性を付与しました。

Original Content

arXiv:2602.07227v1 Announce Type: new Abstract: Robotic policies deployed in real-world environments often encounter post-training faults, where retraining, exploration, or system identification are impractical. We introduce an inference-time, cerebellar-inspired residual control framework that augments a frozen reinforcement learning policy with online corrective actions, enabling fault recovery without modifying base policy parameters. The framework instantiates core cerebellar principles, including high-dimensional pattern separation via fixed feature expansion, parallel microzone-style residual pathways, and local error-driven plasticity with excitatory and inhibitory eligibility traces operating at distinct time scales. These mechanisms enable fast, localized correction under post-training disturbances while avoiding destabilizing global policy updates. A conservative, performance-driven meta-adaptation regulates residual authority and plasticity, preserving nominal behavior and suppressing unnecessary intervention. Experiments on MuJoCo benchmarks under actuator, dynamic, and environmental perturbations show improvements of up to $+66\%$ on \texttt{HalfCheetah-v5} and $+53\%$ on \texttt{Humanoid-v5} under moderate faults, with graceful degradation under severe shifts and complementary robustness from consolidating persistent residual corrections into policy parameters.