arxiv_cs_lg 2026年2月10日

反復型ニューラルネットワークにおける動的な計算割り当てを理解する

Understanding Dynamic Compute Allocation in Recurrent Transformers

Translated: 2026/3/15 8:09:42

recurrent-transformersdynamic-compute-allocationtoken-level-adaptationneural-complexitymachine-learning

Japanese Translation

arXiv:2602.08864v1 Announce Type: cross 要旨: トークンレベルの適応的な計算は、困難なトークンにはより多くの計算資源を、容易なトークンにはより少ない計算資源を割当てることで推論コストを削減しようとするアプローチである。しかし、これまでの研究は、トークンの難易度が観察できず、アーキテクチャ要因と混同されているため、タスクレベルの指標を用いた自然言語ベンチマークにのみ評価されている。この場合、計算割り当てが本質的な複雑性と本当に整合しているかどうかは明白ではない。我々は、アルゴリズム的および合成的な言語タスクを用いて、パラメータ化された難易度で評価されるパラダイムを導入し、トークンレベルの計算割り当てを直接検証することによってこのギャップを是正する。第一に、我々は複雑性情報制御された評価パラダイムを導入した。第二に、我々は、トークンごとの変深度計算をサポートし、計算割り当ての決定を他のモデル要因から分離する統一的な反復型トランスフォーマー枠組み ANIRA を提案した。第三に、この枠組みを用いて、複雑性情報との整合性、一般化性能、そして意思決定時刻においてトークンレベルの適応的計算について体系的に分析を行った。我々の結果は、明確な難易度監督なしにタスクの複雑性と整合した計算割り当てが出現できるが、このような整合性はアルゴリズム的推論的通用性を意味しないことを示している。すなわち、追加の計算資源を割り当てても、我々は未見入力サイズへの外挿を妨げる。また、我々は、初期の計算決定は静的な構造情報を、一方、オンラインでの停止はアルゴリズム的执行状態に近いことを発見した。

Original Content

arXiv:2602.08864v1 Announce Type: cross Abstract: Token-level adaptive computation seeks to reduce inference cost by allocating more computation to harder tokens and less to easier ones. However, prior work is primarily evaluated on natural-language benchmarks using task-level metrics, where token-level difficulty is unobservable and confounded with architectural factors, making it unclear whether compute allocation truly aligns with underlying complexity. We address this gap through three contributions. First, we introduce a complexity-controlled evaluation paradigm using algorithmic and synthetic language tasks with parameterized difficulty, enabling direct testing of token-level compute allocation. Second, we propose ANIRA, a unified recurrent Transformer framework that supports per-token variable-depth computation while isolating compute allocation decisions from other model factors. Third, we use this framework to conduct a systematic analysis of token-level adaptive computation across alignment with complexity, generalization, and decision timing. Our results show that compute allocation aligned with task complexity can emerge without explicit difficulty supervision, but such alignment does not imply algorithmic generalization: models fail to extrapolate to unseen input sizes despite allocating additional computation. We further find that early compute decisions rely on static structural cues, whereas online halting more closely tracks algorithmic execution state.