arxiv_cs_ai 2026年2月10日

TextOp: リアルタイムのインタラクティブなテキスト駆動式 humanoidロボット動作生成と制御

TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control

Translated: 2026/3/7 12:35:16

text-ophumanoid-roboticsmotion-generationreal-time-control

Japanese Translation

最近の先進的な humanoid の全体の動作跟踪は、実際のハードウェアで多様で高度に統合された動態を実行できています。しかし、既存のコントローラーは、ユーザーの意図が変わるときに制限がある定型的な動きパススケジュールと連続的で他人のテレオペレートによるコンピューティングを通常にドライブされています。これらは、ユーザーアンティンエンティが変わるときに柔軟性がないため、この研究はどのようにUniversal humanoid コントローラーに対してリアルタイムでインタラクティブなマネージメントを持ち込む問題について解決します。我々はテキストオプスを提案し、それはツールストリームの言語コマンドを Streaming して再指令中に実行するためのテキスト駆動式 humanoid モーション生成と制御フレームワークで支援されています。テキストオプスは、高レベルの自動回帰モーメーション浮遊モデルは常に現在のテキスト入力が条件にされた短周期の運動ピクチャを生成し、その次の低レベルのモティブトラッキングポリシーがそれを物理 humanoid ロボット上実行するために利用されると考えられ、動的オーダーアンラインパラメータ。インターベニュエーション動作生産と筋肉全体のコントロールの間に結び付けることはテキストオプスに自由な意図表現から解放し、複数の課題であるダンスやジャンプなどの challenging パフォーマンスの seamless トランジションを可能にしています。大量のリアルタイムロボット実験とオフライン評価は即座の応答性、光滑な全長動態、そして正確なコントロールを示しています。

Original Content

arXiv:2602.07439v1 Announce Type: cross Abstract: Recent advances in humanoid whole-body motion tracking have enabled the execution of diverse and highly coordinated motions on real hardware. However, existing controllers are commonly driven either by predefined motion trajectories, which offer limited flexibility when user intent changes, or by continuous human teleoperation, which requires constant human involvement and limits autonomy. This work addresses the problem of how to drive a universal humanoid controller in a real-time and interactive manner. We present TextOp, a real-time text-driven humanoid motion generation and control framework that supports streaming language commands and on-the-fly instruction modification during execution. TextOp adopts a two-level architecture in which a high-level autoregressive motion diffusion model continuously generates short-horizon kinematic trajectories conditioned on the current text input, while a low-level motion tracking policy executes these trajectories on a physical humanoid robot. By bridging interactive motion generation with robust whole-body control, TextOp unlocks free-form intent expression and enables smooth transitions across multiple challenging behaviors such as dancing and jumping, within a single continuous motion execution. Extensive real-robot experiments and offline evaluations demonstrate instant responsiveness, smooth whole-body motion, and precise control. The project page and the open-source code are available at https://text-op.github.io/