arxiv_cs_lg 2026年4月20日

汎用的機械学習原子間ポテンシャルの潜在特異性の比較

Comparing the latent features of universal machine-learning interatomic potentials

Translated: 2026/4/20 11:07:30

machine-learning-interatomic-potentialsuniversal-mliplatent-featureschemical-spacestructure-prediction

Japanese Translation

arXiv:2512.05717v3 Announce Type: replace-cross 摘要：近年、広い化学構造と組成の範囲で地上状態のポテンシャルエネルギー表面を十分に精度よく近似できる「汎用」な機械学習原子間ポテンシャル（uMLIP）が開発されました。これらのモデルはアーキテクチャや使用datasetが異なりますが、膨大な化学情報を記述可能な潜在特異性に圧縮する能力を共有しています。ここでは、潜在特異性の相対的信息量を定量評価し、およびトレーニングセットとトレーニングプロトコルによる傾向の影響を観察することで、異なる uMLIP が何を学習したかを体系的に分析します。我々は、uMLIP が化学空間を著しく異なる方法で符号化しており、大規模なクロスマodelの潜在特異性再構成誤差を持つことに気づきました。同一のモデルアーキテクチャの変形を考慮した際には、傾向はdataset、ターゲット、および選択されたトレーニングプロトコルに依存します。また、uMLIP の潜特異性に強い事前学習バイアスが保持されていることを確認しました。最後に、MLIP が直接出力する原子レベルの特徴が、与えられた系内の原子環境の变化に関する新たな情報を加える逐次 cumulant の連結を通じて、グローバルな構造レベルの特徴へ圧縮可能であるという点について議論します。

Original Content

arXiv:2512.05717v3 Announce Type: replace-cross Abstract: The past few years have seen the development of ``universal'' machine-learning interatomic potentials (uMLIPs) capable of approximating the ground-state potential energy surface across a wide range of chemical structures and compositions with reasonable accuracy. While these models differ in the architecture and the dataset used, they share the ability to compress a staggering amount of chemical information into descriptive latent features. Herein, we systematically analyze what the different uMLIPs have learned by quantitatively assessing the relative information content of their latent features with feature reconstruction errors, and observing how the trends are affected by the choice of training set and training protocol. We find that uMLIPs encode the chemical space in significantly distinct ways, with substantial cross-model feature reconstruction errors. When variants of the same model architecture are considered, trends become dependent on the dataset, target, and training protocol of choice. We also observe that fine-tuning of a uMLIP retains a strong pre-training bias in the latent features. Finally, we discuss how atom-level features, which are directly output by MLIPs, can be compressed into global structure-level features via concatenation of progressive cumulants, each adding significantly new information about the variability across the atomic environments within a given system.