arxiv_cs_lg 2026年4月24日

Agnostic Language Identification and Generation

Translated: 2026/4/24 20:09:41

agentic-language-modelsstatistical-learning-theorylanguage-identificationlanguage-generationstatistical-rates

Japanese Translation

arXiv:2601.23258v2 Announce Type: replace 要約：近年の言語識別および生成に関する研究は、これらのタスクが達成できる密接な統計的レート（学習効率の上限と下界）を確立しました。これらの研究は、一般的に強い実現可能仮定（realizability assumption）の下で動作しており、入力データが与えられた語彙集合の一部の言語でサポートされる未知の分布から生成されると仮定します。本研究では、この実現可能仮定を完全に緩和し、入力データの分布に関する制限を課しません。私たちは、より一般的で「アグリオンシック（agnostic）」な設定における言語識別と生成の両方を研究するための目的を提案しました。両方の問題に対して、我々は新しい興味深い特徴付けと、ほぼ厳密な学習レートを導き出しました。

Original Content

arXiv:2601.23258v2 Announce Type: replace Abstract: Recent works on language identification and generation have established tight statistical rates at which these tasks can be achieved. These works typically operate under a strong realizability assumption: that the input data is drawn from an unknown distribution necessarily supported on some language in a given collection. In this work, we relax this assumption of realizability entirely, and impose no restrictions on the distribution of the input data. We propose objectives to study both language identification and generation in this more general "agnostic" setup. Across both problems, we obtain novel interesting characterizations and nearly tight rates.