arxiv_cs_ai 2026年2月10日

Reasoning Lingua Franca: Multilinguismeの両面剣としてのAIの問題

The Reasoning Lingua Franca: A Double-Edged Sword for Multilingual AI

Translated: 2026/2/14 8:08:40

Japanese Translation

Large Reasoning Models (LRMs)が数学、科学など、他にも質問応答タスクで強力なパフォーマンスを達成しています。しかし、多言語論理能力についてはまだ十分に探索されていません。非英語の文面に出くわしたとき、LRMsは英語で論理を進めがちであり、解釈性や、文化と言語のニュアンスへの対応力を懸念する問題点があります。我々はLRMの英語での論理性に対して、文中の言語に基づいた論理性との比較を行いました。この評価には2つのタスクが含まれました: MGSMとGPQA Diamondです。評価の範囲では、答えの精度を測るだけでなく、論理的な特質についての分析も行われました。その結果として表明することは、英語での論理性はこれらの認識行動がより頻繁に発生し、論理性を行うことで最も困難なタスクについて英語で進行することで見えてきた高い最終答えの精度を獲得することでした。しかし、これが非常に単純な問題解法であり、『翻訳が途中まで』と呼ばれる機能的な欠陥として影響します:翻訳ステップがエラーを招き、本質的なものを避けるべきだった英語での論理性です。

Original Content

arXiv:2510.20647v3 Announce Type: replace-cross Abstract: Large Reasoning Models (LRMs) achieve strong performance on mathematical, scientific, and other question-answering tasks, but their multilingual reasoning abilities remain underexplored. When presented with non-English questions, LRMs often default to reasoning in English, raising concerns about interpretability and the handling of linguistic and cultural nuances. We systematically compare an LRM's reasoning in English versus the language of the question. Our evaluation spans two tasks: MGSM and GPQA Diamond. Beyond measuring answer accuracy, we also analyze cognitive attributes in the reasoning traces. We find that English reasoning traces exhibit a substantially higher presence of these cognitive behaviors, and that reasoning in English generally yields higher final-answer accuracy, with the performance gap increasing as tasks become more complex. However, this English-centric strategy is susceptible to a key failure mode - getting "Lost in Translation," where translation steps lead to errors that would have been avoided by reasoning in the language of the question.