arxiv_cs_ai 2026年4月24日

修正された Schrödinger ブリッジによる少ステップ可視化ナビゲーションの適合

Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation

Translated: 2026/4/24 20:35:07

schrodinger-bridgevisual-navigationembodied-aioptimal-transportdiffusion-models

Japanese Translation

arXiv:2604.05673v2 Announce Type: replace-cross 要旨：体積化 AI（Embodied AI）における視覚ナビゲーションは、自律エージェントが多次元感覚観測を連続的な長期の行動軌道に変換する際の中心的課題です。拡散モデルおよび Schrödinger ブリッジ（SB）に基づく生成性ポリシーは多模式的な行動分布を効果的に捉えることができますが、高い分散を持つ確率的輸送のために数十回の積分ステップを必要とし、リアルタイムロボット制御に対する決定的な障壁となっています。私たちは、標準的な Schrödinger ブリッジ（$\varepsilon=1$、最大エントロピー輸送）と確定的な最適輸送（$\varepsilon\to 0$、例：条件付きフロー適合）の間にある単一のエンタルピ規制パラメータ $\varepsilon$ で制御される共有速度場の構造を活用する「修正された Schrödinger ブリッジ適合（RSBM）」というフレームワークを提案します。私たちは 2 つの主要な結果を証明します：(1) 条件付き速度場の関数形式は $\varepsilon$ スペクトルの全域において不変であるため（速度構造不変性）、1 つのネットワークがすべての規制強度に使用できること；および (2) $\varepsilon$ を減らすと条件付き速度の分散が線形的に減少し、より安定した粗いステップ ODE 積分を可能にすること。 RSBM は、移動距離を短縮する学習された条件付き先述に固定されており、多模式のカバー率と経路の直直さをバランスさせる中間的な $\varepsilon$ で動作します。実験的には、標準的なブリッジでは 10 以上のステップを必要とすることで収束しますが、RSBM は蒸馏または多段階トレーニングなしで、わずか 3 回の積分ステップで 94% 以上の余弦類似性と 92% の成功率を達成し、高忠実度の生成性ポリシーと体積化 AI の低遅延要件の間に存在するギャップを大幅に縮めました。

Original Content

arXiv:2604.05673v2 Announce Type: replace-cross Abstract: Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into continuous, long-horizon action trajectories. While generative policies based on diffusion models and Schr\"odinger Bridges (SB) effectively capture multimodal action distributions, they require dozens of integration steps due to high-variance stochastic transport, posing a critical barrier for real-time robotic control. We propose Rectified Schr\"odinger Bridge Matching (RSBM), a framework that exploits a shared velocity-field structure between standard Schr\"odinger Bridges ($\varepsilon=1$, maximum-entropy transport) and deterministic Optimal Transport ($\varepsilon\to 0$, as in Conditional Flow Matching), controlled by a single entropic regularization parameter $\varepsilon$. We prove two key results: (1) the conditional velocity field's functional form is invariant across the entire $\varepsilon$-spectrum (Velocity Structure Invariance), enabling a single network to serve all regularization strengths; and (2) reducing $\varepsilon$ linearly decreases the conditional velocity variance, enabling more stable coarse-step ODE integration. Anchored to a learned conditional prior that shortens transport distance, RSBM operates at an intermediate $\varepsilon$ that balances multimodal coverage and path straightness. Empirically, while standard bridges require $\geq 10$ steps to converge, RSBM achieves over 94% cosine similarity and 92% success rate in merely 3 integration steps -- without distillation or multi-stage training -- substantially narrowing the gap between high-fidelity generative policies and the low-latency demands of Embodied AI.