arxiv_cs_lg 2026年4月20日

CLewR：機械翻訳の優先度学習のための再試行付きカリキュラム学習

CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

Translated: 2026/4/20 11:07:43

curriculum-learningmachine-translationlarge-language-modelsreinforcement-learningpreference-optimization

Japanese Translation

arXiv:2601.05858v2 Announce Type: replace-cross 要約：大規模言語モデル（LLM）は、ゼロショット複数言語機械翻訳（MT）において競争力のある性能を示しています。いくつかの従後研究は、優先度最適化を通じて MT の性能をさらに向上させていますが、それらは訓練中にデータサンプルの提示順序という重要な側面がまだ十分に調査されていないという課題を残しています。私たちは、この問題を解くために、カリキュラム学習をさまざまな最先端の優先度最適化アルゴリズムに統合し、MT の性能を向上させました。私たちは、訓練中に簡単な例から難しい例へのカリキュラムを複数回繰り返すことで、簡単な例の過去学习を効果的に軽減する、再試行付きカリキュラム学習（CLewR）という新しい戦略を提案しました。私たちは、複数のモデルファミリー（Gemma2、Qwen2.5、Llama3.1）および優先度最適化手法において一貫したパフォーマンス向上を証明しました。私たちのコードを https://github.com/alexandra-dragomir/CLewR で公開しました。

Original Content

arXiv:2601.05858v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference optimization, but they leave a key aspect largely underexplored: the order in which data samples are given during training. We address this topic by integrating curriculum learning into various state-of-the-art preference optimization algorithms to boost MT performance. We introduce a novel curriculum learning strategy with restarts (CLewR), which reiterates easy-to-hard curriculum multiple times during training to effectively mitigate the catastrophic forgetting of easy examples. We demonstrate consistent gains across several model families (Gemma2, Qwen2.5, Llama3.1) and preference optimization techniques. We publicly release our code at https://github.com/alexandra-dragomir/CLewR.