arxiv_cs_cv 2026年2月10日

ReaMOT：論理的推論に基づく多オブジェクト追跡のためのベンチマークとフレームワーク

ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking

Translated: 2026/3/15 5:02:27

reasoningmulti-object-trackingbenchmarkvision-language-modelsarxiv

Japanese Translation

arXiv:2505.20381v2 Announce Type: replace 要約：参照多オブジェクト追跡（RMOT）は、言語指令によって指定された目標を追跡することを目的としています。しかし、既存の RMOT パラダイムは明示的な指令に大きく設計されており、論理的推論を必要とする複雑な指令への一般化に失敗しています。これを克服するために、モデルが論理的推論を通じて明示的な制約を満たさない目標を特定・追跡することを必要とする、新しい課題である Reasoning-based Multi-Object Tracking（ReaMOT）を提案します。この分野を前進させるために、ReaMOT チャレンジと呼ばれる包括的なベンチマークを構築しました。これは以下の要素から構成されています：(1) 869 種類の多様なシーンにおける 423,359 組の画像 - 言語ペアを含む、高レベル推論と低レベル知覚に分類された 1,156 件の指令からなる大規模データセット；および (2) 推論精度と追跡の頑健性を同時に評価するための量身定制されたメトリックセット。さらに、Thinking バリアントの大型ビジョン - 言語モデル（LVLM）の推論能力と SAM2 の正確な時間モデル化をシナジー化し、トレーニングフリーのフレームワーク ReaTrack を提案しました。ReaMOT チャレンジベンチマークにおける広範な実験は、我々の ReaTrack フレームワークの有効性を示しています。

Original Content

arXiv:2505.20381v2 Announce Type: replace Abstract: Referring Multi-Object Tracking (RMOT) aims to track targets specified by language instructions. However, existing RMOT paradigms are largely designed for explicit instructions and consequently fail to generalize to complex instructions that require logical reasoning. To overcome this, we propose Reasoning-based Multi-Object Tracking (ReaMOT), a novel task that requires models to identify and track targets that satisfy implicit constraints via logical reasoning. To advance this field, we construct the ReaMOT Challenge, a comprehensive benchmark comprising: (1) a large-scale dataset with 1,156 instructions categorized into High-Level Reasoning and Low-Level Perception, covering 423,359 image-language pairs across 869 diverse scenes; and (2) a tailored metric suite designed to jointly evaluate reasoning accuracy and tracking robustness. Furthermore, we propose ReaTrack, a training-free framework that synergizes the reasoning capabilities of Thinking-variant Large Vision-Language Model (LVLM) with the precise temporal modeling of SAM2. Extensive experiments on the ReaMOT Challenge benchmark demonstrates the effectiveness of our ReaTrack framework.