arxiv_cs_lg 2026年2月10日

Transformer を無教師学習アルゴリズムとして：ガウス混合モデルに関する研究

Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures

Translated: 2026/3/15 9:05:09

transformersunsupervised-learninggaussian-mixturesdeep-learningmachine-learning

Japanese Translation

arXiv:2505.11918v2 Announce Type: replace 【要約】 Transformer アーキテクチャは、現代の人工知能において驚くべき能力を示しており、そのうち推論時に内部モデルを明示的に学習する能力が、事前トレーニングされた大規模言語モデルの理解において極めて重要な役割を果たしていることは広く信じられています。しかし、最近の多くの研究は、コンテキスト内学習などの教師あり学習の課題に焦点を当てており、無教師学習の分野はほとんど未開拓のままであります。本論文では、統計推測の観点から、ガウス混合モデル (GMM) という基本的な無教師学習の課題を解決する Transformer の能力を検証しました。我々は、共有する Transformer バックボーンを使って複数の GMM タスクを同時に解決するための Transformer ベースの学習フレームワークである TGMM を提案しました。学習されたモデルは、実験的に古典的な手法である Expectation-Maximization (EM) やスペクトルアルゴリズムの限界を効果的に緩和できると同時に、分布シフトに対する合理性の有る頑健性を示しています。理論的には、Transformer が EM アルゴリズムとスペクトル法の主要な構成要素である 3 乗テンソル幂イテレーションを両方とも近似できることを証明しました。これらの結果は、実用的な成功と理論的な理解の間のギャップを埋め、Transformer を無教師学習のための汎用的なツールとして位置づけました。

Original Content

arXiv:2505.11918v2 Announce Type: replace Abstract: The transformer architecture has demonstrated remarkable capabilities in modern artificial intelligence, among which the capability of implicitly learning an internal model during inference time is widely believed to play a key role in the under standing of pre-trained large language models. However, most recent works have been focusing on studying supervised learning topics such as in-context learning, leaving the field of unsupervised learning largely unexplored. This paper investigates the capabilities of transformers in solving Gaussian Mixture Models (GMMs), a fundamental unsupervised learning problem through the lens of statistical estimation. We propose a transformer-based learning framework called TGMM that simultaneously learns to solve multiple GMM tasks using a shared transformer backbone. The learned models are empirically demonstrated to effectively mitigate the limitations of classical methods such as Expectation-Maximization (EM) or spectral algorithms, at the same time exhibit reasonable robustness to distribution shifts. Theoretically, we prove that transformers can approximate both the EM algorithm and a core component of spectral methods (cubic tensor power iterations). These results bridge the gap between practical success and theoretical understanding, positioning transformers as versatile tools for unsupervised learning.