arxiv_cs_cv 2026年2月10日

SpecPrune-VLA: アクション感知自己推測剪定によるビジョン言語アクションモデルの加速

SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning

Translated: 2026/3/15 13:02:21

specprune-vlavision-language-actionmodel-pruninginference-accelerationtransformer-efficiency

Japanese Translation

arXiv:2509.05614v2 発表タイプ：置き換え要約: 剪定は、計算負荷の高いモデルの加速に不可欠な技法であり、無視される重要な値の計算を除去することで機能します。最近、ビジョン言語アクション（VLA）モデルの推論加速にも応用されています。しかし、既存の加速手法は、現在のアクションステップからの局所情報に焦点を当てることに留まり、全球的コンテキストを無視しており、いくつかのシナリオにおいて成功確率が 20% 以上低下し、加速効果が限定的な問題が発生しています。この論文では、VLA 任務における時系列空間一貫性の存在に言及し、トークン選択は局所情報とモデルの全球的コンテキストを組み合わせるべきであるという主要な洞察を提案します。その基礎に基づき、我々はヒューリスティック制御を持つ、トレーニング不要な二段階剪定手法である SpecPrune-VLA を提案します。(1) アクションレベルの静的剪定: 私達はグローバル履歴と局所注意を用いて、アクションごとの視覚トークン数を静的に削減します。(2) レイヤールレベルの動的剪定: レイヤールごとの重要度に基づき、トークンを適応的に剪定します。(3) 軽量アクション感知コントローラー: エンドアクテレーターの速度に応じてアクションを粗粒度または微粒度として分類し、それに応じた剪定の攻撃度を調整します。大規模な実験結果は、SpecPrune-VLA が LIBERO シミュレーションにおいて最大 1.57 倍の加速、現実世界のタスクにおいて 1.70 倍の加速を実現し、成功確率の低下は極めて軽微であることを示しました。

Original Content

arXiv:2509.05614v2 Announce Type: replace Abstract: Pruning is a typical acceleration technique for compute-bound models by removing computation on unimportant values. Recently, it has been applied to accelerate Vision-Language-Action (VLA) model inference. However, existing acceleration methods focus on local information from the current action step and ignore the global context, leading to >20% success rate drop and limited speedup in some scenarios. In this paper, we point out spatial-temporal consistency in VLA tasks: input images in consecutive steps exhibit high similarity, and propose the key insight that token selection should combine local information with global context of the model. Based on this, we propose SpecPrune-VLA, a training-free, two-level pruning method with heuristic control. (1) Action-level static pruning. We leverage global history and local attention to statically reduce visual tokens per action. (2) Layer-level dynamic pruning. We prune tokens adaptively per layer based on layer-wise importance. (3) Lightweight action-aware controller: We classify actions as coarse- or fine-grained by the speed of the end effector and adjust pruning aggressiveness accordingly. Extensive experiments show that SpecPrune-VLA achieves up to 1.57$\times$ speedup in LIBERO simulation and 1.70$\times$ on real-world tasks, with negligible success rate degradation.