arxiv_cs_ai 2026年4月24日

文脈こそが必要：現実世界の LLM の限界における最大効果的文脈ウィンドウ

Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs

Translated: 2026/4/24 20:32:29

machine-learninglarge-language-modelscontext-windowartificial-intelligenceevaluation-methods

Japanese Translation

arXiv:2509.21361v2 Announce Type: replace-cross 要約：大規模言語モデル（LLM）プロバイダーは、最大文脈ウィンドウサイズに対して大きな数値を公表しています。文脈ウィンドウの現実的な使用を調べるために、私達は 1) 最大効果的文脈ウィンドウという概念を定義し、2) 様々なサイズおよび問題タイプにおける文脈ウィンドウの効果を評価する試験方法を形式化し、3) 文脈ウィンドウサイズが増加するにつれてモデルの有効性を比較し、破綻点を特定するための標準化された方法を作成しました。数多くのモデルで収集した数十万点以上のデータに基づき、公表された最大文脈ウィンドウ（MCW）サイズと最大効果的文脈ウィンドウ（MECW）サイズ間に著しい差が認められました。我々の知見は、MECW が単に MCW と劇的に異なるだけでなく、問題タイプに基づいて変化するものであることを示しています。我々の試験グループのいくつかのトップモデルは、文脈内 100 トークンのみで失敗しました；大部分のモデルは、文脈内 1000 トークンによって精度に重度の低下を示しました。全てのモデルは、その最大文脈ウィンドウの限界を遥かに超えており、その差は最大で 99 パーセントに達しました。我々のデータは、MECW が提供された問題の種類に基づいて変化することを示しており、モデルの精度向上とモデルのハルシネーション率低下に clearer な且つ実行可能な洞察を提供しています。

Original Content

arXiv:2509.21361v2 Announce Type: replace-cross Abstract: Large language model (LLM) providers boast big numbers for maximum context window sizes. To test the real world use of context windows, we 1) define a concept of maximum effective context window, 2) formulate a testing method of a context window's effectiveness over various sizes and problem types, and 3) create a standardized way to compare model efficacy for increasingly larger context window sizes to find the point of failure. We collected hundreds of thousands of data points across several models and found significant differences between reported Maximum Context Window (MCW) size and Maximum Effective Context Window (MECW) size. Our findings show that the MECW is, not only, drastically different from the MCW but also shifts based on the problem type. A few top of the line models in our test group failed with as little as 100 tokens in context; most had severe degradation in accuracy by 1000 tokens in context. All models fell far short of their Maximum Context Window by as much as 99 percent. Our data reveals the Maximum Effective Context Window shifts based on the type of problem provided, offering clear and actionable insights into how to improve model accuracy and decrease model hallucination rates.