arxiv_cs_ai 2026年2月10日

行動から行動へのフロー適合

Action-to-Action Flow Matching

Translated: 2026/3/7 12:30:38

action-to-actionflow-matchingdiffusion-based-policy

Japanese Translation

>差分政策は、機動ロボットを推進する手法として、近日間の業績が驚異的であり、アクション予測を可逆的ノイズ処理と見なし、行動予定を取り繕う様が成功しました。しかし、標準的な作法において、確率ベクタガウス型ノイズからサンプリングすることは何度も反復のステップを必要とするもので、その結果発生する推進アクションには高インフェーザー・レーションが存在し、実時間制御へ向かう大きなブロッキング要素となっております。本論では、この無知した型のノイズサンプリングの必要性を挑戦し、アクションから行動へのフロー適合（A2A）と呼ばれる新しい行動政策様式が提言されます。既存の手法とは異なり、進化上の神経質なフィードバック動作を取り入れているためです。これまでの方法論では、そのストーリーを静的な状態として扱っていますが、A2Aは歴史的慣性動作を活用し、特定の位置からの高次元のロケイション空間と融為するために、それらは行動生成のために始まります。この構型で costly な反復処理脱却を行う一方、そのロボットの実存動態が取り上げられつつも、タイムコントリウム性を効果的に捉えています。本論ではアカバネ試験により動作の高生産性と早期推定速度の改善結果が示されています。さらに、A2Aは見事な品質の高い推進生成ができること、およびその推進の視覚的な変化への健全な抵抗と異常な範疇の行動へからの一般化を高める効果が認識されました。最後に、この論文ではA2Aもビデオリズムから生産されたことにより、その時間モデルの広い適応性が示されています。サイト：https://lorenzo-0-0.github.io/A2A_Flow_Matching。

Original Content

arXiv:2602.07322v1 Announce Type: cross Abstract: Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process. However, the standard practice of sampling from random Gaussian noise often requires multiple iterative steps to produce clean actions, leading to high inference latency that incurs a major bottleneck for real-time control. In this paper, we challenge the necessity of uninformed noise sampling and propose Action-to-Action flow matching (A2A), a novel policy paradigm that shifts from random sampling to initialization informed by the previous action. Unlike existing methods that treat proprioceptive action feedback as static conditions, A2A leverages historical proprioceptive sequences, embedding them into a high-dimensional latent space as the starting point for action generation. This design bypasses costly iterative denoising while effectively capturing the robot's physical dynamics and temporal continuity. Extensive experiments demonstrate that A2A exhibits high training efficiency, fast inference speed, and improved generalization. Notably, A2A enables high-quality action generation in as few as a single inference step (0.56 ms latency), and exhibits superior robustness to visual perturbations and enhanced generalization to unseen configurations. Lastly, we also extend A2A to video generation, demonstrating its broader versatility in temporal modeling. Project site: https://lorenzo-0-0.github.io/A2A_Flow_Matching.