arxiv_cs_cv 2026年4月20日

Social-JEPA：顕著な幾何学的同相現象の発見

Social-JEPA: Emergent Geometric Isomorphism

Translated: 2026/4/20 10:51:22

social-jegeometric-isomorphismworld-modelsmulti-agent-learninglatent-space-alignment

Japanese Translation

arXiv:2603.02263v2 発表型：置換要旨: 世界モデルは、豊かにした感覚流を圧縮し、コンパクトな潜在コードに変換して未来の観測を予見します。同一の環境からの異なる視点において、パラメータの共有や調整なく、各別々のエージェントがこのようなモデルを取得させました。トレーニングの終了後、それらの内部表現は、互いの潜在空間を近似して線形等距離変換に関連付けるという、驚くべき顕著な性質を示しました。この幾何学的な合意は、大きな視点の変化や生画像のわずかな重複が存在する状況下でも存続します。学習されたアライメントを活用し、一方のエージェントで訓練された分類器は、追加の勾配ステップなしに他方へ移植でき、蒸留のような移行は後の学習を加速し、総計算量を著しく削減します。これらの知見は、予測学習目標が表現幾何学に対して強い規則性を課すことを示しており、分散された視覚システム間の互換性への軽量な道筋を提案しています。コードは https://anonymous.4open.science/r/Social-JEPA-5C57 に利用可能です。

Original Content

arXiv:2603.02263v2 Announce Type: replace Abstract: World models compress rich sensory streams into compact latent codes that anticipate future observations. We let separate agents acquire such models from distinct viewpoints of the same environment without any parameter sharing or coordination. After training, their internal representations exhibit a striking emergent property: the two latent spaces are related by an approximate linear isometry, enabling transparent translation between them. This geometric consensus survives large viewpoint shifts and scant overlap in raw pixels. Leveraging the learned alignment, a classifier trained on one agent can be ported to the other with no additional gradient steps, while distillation-like migration accelerates later learning and markedly reduces total compute. The findings reveal that predictive learning objectives impose strong regularities on representation geometry, suggesting a lightweight path to interoperability among decentralized vision systems. The code is available at https://anonymous.4open.science/r/Social-JEPA-5C57.