arxiv_cs_lg 2026年4月24日

Transformers は一部のグラフでは接続性を学習できるが、他のグラフでは学習できない

Transformers Can Learn Connectivity in Some Graphs but Not Others

Translated: 2026/4/24 20:12:51

transformerslarge-language-modelstransitivitydirected-graphscausal-inference

Japanese Translation

arXiv:2509.22343v2 Announce Type: replace-cross 要約：推論能力は、Transformer ベースの大規模言語モデル（LLMs）の応答的事実的正確性を保証するために不可欠であり、反復的関係に関する堅牢な推論は、因果推論など多くの設定において重要である。したがって、Transformer が反復的関係（例：A が B に起因し、B が C に起因する場合、A が C に起因すること）を推論するタスクにおける能力を調べることは不可欠である。反復的関係を推論するタスクは、有向グラフにおける接続性のタスクに等しい（例：A から B への経路が存在し、B から C への経路が存在する場合、A から C への経路が存在すること）。過去の研究は、Transformer が入力プロンプトに提供された文脈例から反復性を推論できるかどうかを焦点にしていた。しかし、Transformer がトレーニング例から反復性を推論する能力や、スケーリングがその能力に与える影響は未探索である。本研究では、この問いに応えるために Transformer モデルのサイズを異なさせる有向グラフを生成し、それらをトレーニングして、さまざまなグラフサイズにおける反復関係の推論能力を評価する。当社の見解は、Transformer が低次元部分空間に埋め込むことが可能な「格子状」の有向グラフにおいて接続性を学習できることを示唆しており、接続性はノードの埋め込みから容易に推測できる。我々は、基本的な格子グラフの次元性が Transformer の接続性タスクを学習する能力の強力な予測因子であると見出しており、高次元の格子グラフの方が低次元の格子グラフよりも大きなchallenge を提起すると観察した。さらに、モデル規模を増やすと格子グラフ上で接続性を推論するための一般化能力がより良く向上することが観察された。しかし、グラフが格子グラフでなく多くの切断成分を含んでいる場合、特に成分数が大きいとき、Transformer は接続性タスクを学習するのが困難になる。

Original Content

arXiv:2509.22343v2 Announce Type: replace-cross Abstract: Reasoning capability is essential to ensure the factual correctness of the responses of transformer-based Large Language Models (LLMs), and robust reasoning about transitive relations is instrumental in many settings, such as causal inference. Hence, it is essential to investigate the capability of transformers in the task of inferring transitive relations (e.g., knowing A causes B and B causes C, then A causes C). The task of inferring transitive relations is equivalent to the task of connectivity in directed graphs (e.g., knowing there is a path from A to B, and there is a path from B to C, then there is a path from A to C). Past research focused on whether transformers can learn to infer transitivity from in-context examples provided in the input prompt. However, transformers' capability to infer transitive relations from training examples and how scaling affects the ability is unexplored. In this study, we seek to answer this question by generating directed graphs to train transformer models of varying sizes and evaluate their ability to infer transitive relations for various graph sizes. Our findings suggest that transformers are capable of learning connectivity on "grid-like'' directed graphs where each node can be embedded in a low-dimensional subspace, and connectivity is easily inferable from the embeddings of the nodes. We find that the dimensionality of the underlying grid graph is a strong predictor of transformers' ability to learn the connectivity task, where higher-dimensional grid graphs pose a greater challenge than low-dimensional grid graphs. In addition, we observe that increasing the model scale leads to increasingly better generalization to infer connectivity over grid graphs. However, if the graph is not a grid graph and contains many disconnected components, transformers struggle to learn the connectivity task, especially when the number of components is large.