arxiv_cs_lg 2026年2月10日

RiskAgent: 検証された臨床意思決定ツールを統合し、証拠に基づくリスク予測を実現する

RiskAgent: Synergizing Language Models with Validated Tools for Evidence-Based Risk Prediction

Translated: 2026/3/15 9:04:05

language-modelsclinical-decision-makingevidence-based-medicinerisk-predictionhallucination

Japanese Translation

arXiv:2503.03802v2 Announce Type: replace 摘要：大規模言語モデル（LLM）は、医療検査において専門家の結果に競合する成果を示しています。しかし、現在のアプローチで使用されている標準化された、試験形式のシナリオとは異なる、深い医学知識の理解が必要な複雑な臨床意思決定への LLM 応用は依然として課題となっています。一般的なアプローチは LLM を目的タスクに合わせて微調整することですが、これは膨大なデータと計算リソースを要するだけでなく、生成する「幻覚」に脆弱でもあります。本稿では、証拠に基づく医学によってサポートされる数百件の検証された臨床意思決定ツールと Language Models をシナジーさせる RiskAgent を提示します。RiskAgent は、汎用性と忠実な推奨事項を提供します。当社の実験では、RiskAgent が多様なシナリオと疾患にわたる広範な臨床リスク予測において優越したパフォーマンスを示すだけでなく、外部 MedCalc-Bench データセットにおけるツール学習、および MedQA、MedMCQA、MMLU という 3 つの代表的なベンチマークにおける医学的理由推論と質問応答における堅牢な一般化性を示したことが分かりました。

Original Content

arXiv:2503.03802v2 Announce Type: replace Abstract: Large Language Models (LLMs) achieve competitive results compared to human experts in medical examinations. However, it remains a challenge to apply LLMs to complex clinical decision-making, which requires a deep understanding of medical knowledge and differs from the standardized, exam-style scenarios commonly used in current efforts. A common approach is to fine-tune LLMs for target tasks, which, however, not only requires substantial data and computational resources but also remains prone to generating `hallucinations'. In this work, we present RiskAgent, which synergizes language models with hundreds of validated clinical decision tools supported by evidence-based medicine, to provide generalizable and faithful recommendations. Our experiments show that RiskAgent not only achieves superior performance on a broad range of clinical risk predictions across diverse scenarios and diseases, but also demonstrates robust generalization in tool learning on the external MedCalc-Bench dataset, as well as in medical reasoning and question answering on three representative benchmarks, MedQA, MedMCQA, and MMLU.