arxiv_cs_lg 2026年2月10日

有理型伝送子 (Rational Transductors)

Rational Transductors

Translated: 2026/3/15 14:09:21

rational-transductorstransformerscomplexity-theoryweighted-finite-automatamachine-learning

Japanese Translation

arXiv:2602.07599v1 発表タイプ：new 要約：標準的な Transformer は語義モデリングでは優れていますが、硬直した順次論理や状態追跡には苦戦します。理論的な研究によると、自己注意力は（厳密な注意力の下では）$\\\\\\\\\\\\AC^0$クラス、（軟らかい注意力の下では）$\\\\\\\\\\\\TC^0$クラスに制限されており、中間の chain-of-thought なしでは順次問題に対する堅牢な長さ一般化を支持することが難しいことが多いです。この研究では、我々は重み付き有限自動機（WFA）から導出された行列値の再帰を用いた二流アーキテクチャである「有理型伝送子（Rational Transductors）」を導入します。$\\\\\\\\\\\\emph{Deep Rational Injection}$ スキームを通じて注意力機構に有理型状態情報を注入することにより、我々のフレームワークは Transformer の表現力を厳密に一般化し、すべての正規語法を捕らえ、$\\\\\\\\\\\\NC^1$ 問題（例：論理式的評価）や、奇偶性やモジュラーカウントのような基本的な区別を解決します。また、$O(L + \\log T)$ の並列時間複雑性を維持します。私たちはこのアーキテクチャを厳密な学習理論に基づき支持しています：$\\\\\\\\\\\\emph{Random Rational Features}$ が順次依存に対する普遍的基底として機能し、我々の初期化戦略を正当化すると同時に、$\\\\\\\\\\\\emph{Differentiable Rational Feature}$ レジムは表現のコンパクトさのギャップを埋めるために必要であることは証明されています。理論的分析と実証的な結果は、Rational Transductors が「正規ギャップ」を解決し、標準的な Transformer が失敗するアルゴリズムタスクにおいて堅牢な長さ一般化を実現する一方で、従来の RNN の順次計算上のボトルネックを持たないことを示しています。

Original Content

arXiv:2602.07599v1 Announce Type: new Abstract: Standard Transformers excel at semantic modeling but struggle with rigid sequential logic and state tracking. Theoretical work establishes that self-attention is limited to $\AC^0$ (under hard attention) or $\TC^0$ (under soft attention), complexity classes that often fail to support robust length generalization on sequential problems without intermediate chain-of-thought. In this work, we introduce \emph{Rational Transductors}, a dual-stream architecture that augments the Transformer with a matrix-valued recurrence derived from Weighted Finite Automata (WFA). By injecting rational state information into the attention mechanism via a \emph{Deep Rational Injection} scheme, our framework strictly generalizes the expressive power of Transformers to capture all Regular Languages, $\NC^1$-complete problems (such as Boolean Formula Evaluation), and fundamental separations like Parity and Modular Counting, while preserving $O(L + \log T)$ parallel time complexity. We ground the architecture in a rigorous learning theory: we prove that \emph{Random Rational Features} act as a universal basis for sequential dependencies, justifying our initialization strategy, while establishing that the \emph{Differentiable Rational Feature} regime is necessary to close the representational compactness gap. Theoretical analysis and empirical results demonstrate that Rational Transductors solve the "Regular Gap," enabling robust length generalization on algorithmic tasks where standard Transformers fail, without the sequential computational bottlenecks of traditional RNNs.