arxiv_cs_ai 2026年2月10日

弱音だけで学習：弱い代理によって強い代理人がより強化される

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Translated: 2026/3/7 9:55:40

post-trainingweak-agentsstrong-agentsentropy-dynamicsmachine-learning

Japanese Translation

オーオXIV:2602.08222v1の発表: 新タイプ:新しい情報共有: 大規模な言語モデルの性能向上に訓練を続けている中で、モデルが非常に自信を持つと、新たな訓練はますますリターンの削減に見舞われる不自由感のブロックバスターとして、我々はしばしば現役方法が目標の予測を強化しながら、モデルの自己歴史中の弱い状態の通知情報信号は模倣されているという観察と見なされるため、存在する。WMSS（弱い代理人によって強い代理人がより強化される）という訓練後のパラダイムがある。これは、有用な学習ギャップを識別し補充学習を通じてそれを強化することで、彼らは通常の訓練後のブロックを超えて強い代理人を改善します。数学的判断とコード生成などのデータセットについて実験には有効的なパフォーマンス改善が期待されますが、これらに対する追加の推論コストはありません。

Original Content

arXiv:2602.08222v1 Announce Type: new Abstract: As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.