arxiv_cs_cv 2026年4月20日

EchoVLM: 汎用超音波知能のための動的混合専門家（Mixture-of-Experts）ベースのビジョン・言語モデル

EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence

Translated: 2026/4/20 10:50:09

echovlmmixture-of-expertsultrasound-imagingvision-language-modelmedical-ai

Japanese Translation

超音波画像診断学会誌 arXiv:2509.14977v2 Announce Type: replace 要約超音波画像検査は、非イオン化放射線、低コスト、リアルタイム画像化の特性を活かして、早期がんスクリーニングの好む画像診断法へと発展しました。しかし、従来の超音波診断は医師の専門知識に依存しており、高い主観性や低い診断効率という課題を抱えています。ビジョン・言語モデル（VLM）はこの課題に対する有望な解決策を提示していますが、既存の汎用モデルは超音波医療タスクにおける知識が限られており、多臓器病変認識における一般化能力が低く、マルチタスク診断における効率も低下しています。これらの制約に対処するため、私々は超音波医療画像に特化したビジョン・言語モデル、EchoVLMを提案しました。このモデルは、7 つの解剖学的領域にわたるデータを用いて訓練された混合専門家（Mixture of Experts, MoE）アーキテクチャを採用しています。この設計により、模型は超音波報告生成、診断、および視覚的質問・回答（VQA）を含む複数のタスクを実行することが可能となりました。実験結果は、Qwen2-VLと比較して超音波報告生成タスクにおいて、BLEU-1 スコアが 10.15 ポイント、ROUGE-1 スコアが 4.77 ポイント大幅に向上したことを示しています。これらの発見は、EchoVLM が超音波画像診断の診断精度を大幅に向上させる可能性があり、将来的な臨床応用に viable な技術解決策を提供することを示唆しています。ソースコードおよびモデルウェイトは、https://github.com/Asunatan/EchoVLM におよびます。

Original Content

arXiv:2509.14977v2 Announce Type: replace Abstract: Ultrasound imaging has become the preferred imaging modality for early cancer screening due to its advantages of non-ionizing radiation, low cost, and real-time imaging capabilities. However, conventional ultrasound diagnosis heavily relies on physician expertise, presenting challenges of high subjectivity and low diagnostic efficiency. Vision-language models (VLMs) offer promising solutions for this issue, but existing general-purpose models demonstrate limited knowledge in ultrasound medical tasks, with poor generalization in multi-organ lesion recognition and low efficiency across multi-task diagnostics. To address these limitations, we propose EchoVLM, a vision-language model specifically designed for ultrasound medical imaging. The model employs a Mixture of Experts (MoE) architecture trained on data spanning seven anatomical regions. This design enables the model to perform multiple tasks, including ultrasound report generation, diagnosis and visual question-answering (VQA). The experimental results demonstrated that EchoVLM achieved significant improvements of 10.15 and 4.77 points in BLEU-1 scores and ROUGE-1 scores respectively compared to Qwen2-VL on the ultrasound report generation task. These findings suggest that EchoVLM has substantial potential to enhance diagnostic accuracy in ultrasound imaging, thereby providing a viable technical solution for future clinical applications. Source code and model weights are available at https://github.com/Asunatan/EchoVLM.