arxiv_cs_lg 2026年2月10日

Laplacian Keyboard：線形空間を超えて

The Laplacian Keyboard: Beyond the Linear Span

Translated: 2026/3/15 14:47:37

reinforcement-learninglaplacian-eigenvectorsmeta-policyzero-shot-learningsample-efficiency

Japanese Translation

arXiv:2602.07730v1 Announce Type: new 要約：科学的分野にまたがり、シグナル処理から量子力学に至るまで、複雑なシステムの単純化のための基本的な基としてラプラシアン固有ベクトルが機能しています。強化学習（RL）において、これらの固有ベクトルは報酬関数の近似のための自然な基を提供しますが、その利用は通常、線形範囲にのみ制限されており、これは複雑な環境における表現能力を制限します。我々は、線形範囲を超えた階層構造化の「Laplacian Keyboard（LK）」を導入します。LK は、これらの固有ベクトルからタスクに依存しないオプションライブラリを構築し、線形範囲内の任意の報酬にとって最適なポリシーを含むことが保証される行動基を形成します。メタポリシーがこれらのオプションを動的に繋ぐことを学び、元の本質的な制約外でのポリシーの効率的な学習を可能にします。我々はゼロショット近似誤差についての理論的枠組みを確立し、経験的に、LK がゼロショット解決策を超え、標準的な RL メソッドと比較してサンプル効率を向上させたことを示しました。

Original Content

arXiv:2602.07730v1 Announce Type: new Abstract: Across scientific disciplines, Laplacian eigenvectors serve as a fundamental basis for simplifying complex systems, from signal processing to quantum mechanics. In reinforcement learning (RL), these eigenvectors provide a natural basis for approximating reward functions; however, their use is typically limited to their linear span, which restricts expressivity in complex environments. We introduce the Laplacian Keyboard (LK), a hierarchical framework that goes beyond the linear span. LK constructs a task-agnostic library of options from these eigenvectors, forming a behavior basis guaranteed to contain the optimal policy for any reward within the linear span. A meta-policy learns to stitch these options dynamically, enabling efficient learning of policies outside the original linear constraints. We establish theoretical bounds on zero-shot approximation error and demonstrate empirically that LK surpasses zero-shot solutions while achieving improved sample efficiency compared to standard RL methods.