arxiv_cs_lg 2026年4月20日

転換器におけるイン・コンテキスト分類の層別ダイナミクス

Layerwise Dynamics for In-Context Classification in Transformers

Translated: 2026/4/20 11:05:43

transformersin-context-learningmachine-learningneural-networksclassification

Japanese Translation

arXiv:2604.11613v2 Announce Type: replace 要旨：転換器は数つのラベル付き例からイン・コンテキスト分類を実行できるものの、推論時のアルゴリズムは不透明である。本稿では、ハード・ノ・マージン体制における多クラス線形分類を研究し、各層における特徴量およびラベル置換の同質性（equivariance）を強いることで計算を特定可能（identifiable）にしました。これにより、解釈性が維持されつつ機能的等価性が保たれ、非常に構造化された重みが得られます。これらのモデルから、転換器のソフトマックス関数内部にあるエンドツーエンド特定された、そして出現した更新規則として、明示的な深度索引付き再帰性が抽出できました。これは、著者の知る同様のもの初のものです。混合された特徴量・ラベルグ람構造から形成された注意行列が、トレーニングポイント、ラベル、およびテストプローブの結合更新を誘発します。その結果得られるダイナミクスは、幾何学分基のアルゴリズムモチーフを実行しており、クラス間分離を増幅できると証明可能であり、堅牢な期待クラスアライメントを产生します。

Original Content

arXiv:2604.11613v2 Announce Type: replace Abstract: Transformers can perform in-context classification from a few labeled examples, yet the inference-time algorithm remains opaque. We study multi-class linear classification in the hard no-margin regime and make the computation identifiable by enforcing feature- and label-permutation equivariance at every layer. This enables interpretability while maintaining functional equivalence and yields highly structured weights. From these models we extract an explicit depth-indexed recursion: an end-to-end identified, emergent update rule inside a softmax transformer, to our knowledge the first of its kind. Attention matrices formed from mixed feature-label Gram structure drive coupled updates of training points, labels, and the test probe. The resulting dynamics implement a geometry-driven algorithmic motif, which can provably amplify class separation and yields robust expected class alignment.