arxiv_cs_lg 2026年4月24日

普遍的アテンションシミュレーターの存在に関する研究

On the Existence of Universal Simulators of Attention

Translated: 2026/4/24 20:07:49

transformersartificial-intelligencetheoretical-computer-sciencemachine-learningturing-completeness

Japanese Translation

arXiv:2506.18739v2 発表タイプ：置換要旨：以前のトランジューマーの学習可能に関する研究は、主に訓練を通じて特定のアルゴリズム的パターンを近似する能力を検討することに焦点を当てており、主にデータ駆動型的であり、確率的保証を提供するに留まりました。一方、表現力は、理論的に此类構造によって計算可能な問題を対処するために考案されました。これらの結果は、トランジューマーのチューリング完全性を証明し、回路の複雑性或論理に関する制限を検討しました。学習性と表現性の交差点に立つ中、残される問題は、 extbf{計算モデルとしてのトランジューマーが、任意のメカニズム、あるいは特に、その裏の操作をシミュレートできるか？} です。本研究では、トランジューマーエンコーダーが単純なアテンションメカニズムをシミュレートする能力を調査します。トランジューマーエンコーダーからなる普遍シミュレーター $\\ ext{U}$ を構成し、アテンションの出力と、トランジューマー計算のための形式枠組みである RASP を用いて、下位レベルの要素行列と活性化操作を複製するアルゴリズム的解法を示しました。以前は学習によってのみ近似される被认为のみが、アルゴリズム的に達成可能でデータ非依存な解の存在を示しました。

Original Content

arXiv:2506.18739v2 Announce Type: replace Abstract: Previous work on the learnability of transformers \textemdash\ focused primarily on examining their ability to approximate specific algorithmic patterns through training \textemdash\ has largely been data-driven, offering only probabilistic guarantees rather than deterministic solutions. Expressivity, on the contrary, has been devised to address the problems \emph{computable} by such architecture theoretically. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: \emph{can a transformer, as a computational model, simulate an arbitrary attention mechanism, or in particular, the underlying operations?} In this study, we investigate the transformer encoder's ability to simulate a vanilla attention mechanism. By constructing a universal simulator $\mathcal{U}$ composed of transformer encoders, we present algorithmic solutions to replicate attention outputs and the underlying elementary matrix and activation operations via RASP, a formal framework for transformer computation. We show the existence of an algorithmically achievable, data-agnostic solution, previously known to be approximated only by learning.