arxiv_cs_ai 2026年4月24日

ノイズへの頑健性の高い量子回路最適化のためのリプレイバッファエンジニアリング

Replay-buffer engineering for noise-robust quantum circuit optimization

Translated: 2026/4/24 20:29:07

quantum-circuit-optimizationdeep-reinforcement-learningreplay-buffertemporal-differencequantum-computation

Japanese Translation

arXiv:2604.21863v1 発表タイプ：横断要旨: 量子回路最適化のためのディープ強化学習（RL）は、時間差（TD）目標の信頼性を無視するリプレイバッファ、環境ステップ毎に全量子・古典的評価をトリガーするカリキュラムに基づくアーキテクチャ検索、そしてハードウェアノイズ下での再学習の際に無音の軌跡を定型通り捨てるという、3 つの根本的なボトルネックに直面しています。我々は、これら 3 つすべてを解決するために、リプレイバッファを量子最適化の主要なアルゴリズム的レバレッジとして扱うことで進めます。我々は、初期の訓練段階では TD 誤差駆動の優先付けから、評価推定値が成熟するにつれて信頼性に感度のあるサンプリングへ移行する、アンニアリングしたリプレイ規則を持つ ReaPER$+$ を導入しました。これは、固定された PER、ReaPER、および均等リプレイと比較してサンプル効率を $4-32$ 倍向上させ、同時期に QAS ベンチマークおよび量子コンパイレーションでよりコンパクトな回路をより一貫して発見するという成果をもたらしました。LunarLander-v3 上の検証は、この原理はドメインに依存しないことを確認しました。さらに、我々はカリキュラム RL における量子・古典的評価のボトルネックを、複数のアーキテクチャ編集に対して高価な評価を安定的化する OptCRLQAS を導入することにより除去しました。これにより、12-qubit の最適化問題において、1 episode 当たりのウォールクロック時間は最大で $67.5\%$ 削減されましたが、解の品質を低下させることはなく、最終的に化学精度に達するまでのステップ数は最大で $85-90\%$ 減少し、最終的なエネルギー誤差は最大で $90\%$ 減少しました。これは、ネットワーク重みの移植や $\epsilon$-greedy 事前学習を行わず、無音の軌跡を再利用することで、ノイズのある設定の学習を温み始めます。これらの結果は、体験の保存、サンプリング、移植が、拡張性及びノイズへの頑健性を備えた量子回路最適化において決定的なレバレッジであることを確立しました。

Original Content

arXiv:2604.21863v1 Announce Type: cross Abstract: Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step, and the routine discard of noiseless trajectories when retraining under hardware noise. We address all three by treating the replay buffer as a primary algorithmic lever for quantum optimization. We introduce ReaPER$+$, an annealed replay rule that transitions from TD error-driven prioritization early in training to reliability-aware sampling as value estimates mature, achieving $4-32\times$ gains in sample efficiency over fixed PER, ReaPER, and uniform replay while consistently discovering more compact circuits across quantum compilation and QAS benchmarks; validation on LunarLander-v3 confirms the principle is domain-agnostic. Furthermore we eliminate the quantum-classical evaluation bottleneck in curriculum RL by introducing OptCRLQAS which amortizes expensive evaluations over multiple architectural edits, cutting wall-clock time per episode by up to $67.5\%$ on a 12-qubit optimization problem without degrading solution quality. Finally we introduce a lightweight replay-buffer transfer scheme that warm-starts noisy-setting learning by reusing noiseless trajectories, without network-weight transfer or $\epsilon$-greedy pretraining. This reduces steps to chemical accuracy by up to $85-90\%$ and final energy error by up to $90\%$ over from-scratch baselines on 6-, 8-, and 12-qubit molecular tasks. Together, these results establish that experience storage, sampling, and transfer are decisive levers for scalable, noise-robust quantum circuit optimization.