arxiv_cs_lg 2026年4月24日

統計学ではなくスケール: モジュール化された医療対話とベイズ信念エンジン

Statistics, Not Scale: Modular Medical Dialogue with Bayesian Belief Engine

Translated: 2026/4/24 19:55:31

large-language-modelsbayesian-inferencemedical-diagnosticsai-architectureprivacy-preserving-computation

Japanese Translation

arXiv:2604.20022v1 発表タイプ：new 要約：大規模言語モデル（LLM）は、自律的な診断エージェントとしてますます展開されていますが、それらは天然言語の通信と確率的推論という、根本的に異なる二つの能力を混同しています。我々は、この混同はエンジニアリング上の欠如ではなく、アーキテクチャ上の欠陥であるという主張を行います。我々は、言語と推論の厳格な分離を強制するモジュール化された診断対話フレームワーク「BMBE（ベイズ医療信念エンジン）」を導入しました。LLM は単に患者の発言を構造化された証拠にパースし、質問を語義化するためのセンサーとして機能し、すべての診断推論は、確定的かつ検証可能なベイズエンジンにおいて行われます。患者データが LLM に入らずに済むため、このアーキテクチャは設計によりプライバシーを確保します。統計解析バックエンドはスタンドアローンモジュールであるため、ターゲット集団に応じて再学習なしで置き換え可能です。この分離は、従来の自律型 LLM が提供できない三つの特性を生み出します：連続的に調整可能な精度と範囲のトレードオフを持つ校正された選択診断、すなわち安価なセンサーとベイズエンジンの組み合わせが、同じ家族から開発された最先端スタンドアローンモデルよりも、数分の一の費用で優越する統計的分離ギャップ、そしてスタンドアローン医師が崩壊させるような、敵対的な患者コミュニケーションスタイルに対する堅牢性。我々は最先端 LLM に対して、実証データと LLM 生成の知識ベースを用いた検証を行い、この優位性は情報面的ではなくアーキテクチャ的なものであることを確認しました。

Original Content

arXiv:2604.20022v1 Announce Type: new Abstract: Large language models are increasingly deployed as autonomous diagnostic agents, yet they conflate two fundamentally different capabilities: natural-language communication and probabilistic reasoning. We argue that this conflation is an architectural flaw, not an engineering shortcoming. We introduce BMBE (Bayesian Medical Belief Engine), a modular diagnostic dialogue framework that enforces a strict separation between language and reasoning: an LLM serves only as a sensor, parsing patient utterances into structured evidence and verbalising questions, while all diagnostic inference resides in a deterministic, auditable Bayesian engine. Because patient data never enters the LLM, the architecture is private by construction; because the statistical backend is a standalone module, it can be replaced per target population without retraining. This separation yields three properties no autonomous LLM can offer: calibrated selective diagnosis with a continuously adjustable accuracy-coverage tradeoff, a statistical separation gap where even a cheap sensor paired with the engine outperforms a frontier standalone model from the same family at a fraction of the cost, and robustness to adversarial patient communication styles that cause standalone doctors to collapse. We validate across empirical and LLM-generated knowledge bases against frontier LLMs, confirming the advantage is architectural, not informational.