arxiv_cs_cv 2026年4月20日

知能型医療画像プラットフォーム：画像解析と臨床報告生成のために VLM ベースの自動化枠組み

Intelligent Healthcare Imaging Platform: A VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation

Open original article

Translated: 2026/4/20 10:52:08

healthcare-imagingvision-language-modelsgoogle-geminimedical-diagnosisclinical-automation

Japanese Translation

arXiv:2509.13590v3 発表タイプ：replace-cross 摘要：人工知能（AI）が医療画像領域で急速に進化する中、診断医学および臨床意思決定プロセスが変容しています。本稿は、ヘルスケア診断において Vision-Language Models（VLMs）を活用した知能型マルチモーダル枠組みを提示します。この枠組みは、Google Gemini 2.5 Flash を統合し、CT、MRI、X 線、超音波を含む複数の画像モーダリティで自動化した腫瘍検出および臨床報告の生成を実現しています。システムは画像特徴抽出と自然言語処理を組み合わせ、文脈に応じた画像解釈を可能にし、座標検証メカニズムと異常分布のための確率的ガウスモデルを取り入れています。マルチレイヤー可視化技術により、詳細な医療イラストレーション、重ね合わせ比較、統計的表現が生成され、臨床の信心を高め、位置測定は 80 ピクセルの平均偏差を達成しました。結果処理では、正確なプロンプトエンジニアリングとテキスト分析を適用し、構造化された臨床情報を抽出しながら解釈性の維持を行っています。実験評価は、複数のモーダリティにおいて高い異常検出性能を示しました。システムは臨床ワークフロー統合のためのユーザーフレンドリーな Gradio インターフェースを特徴とし、大規模データの依存を減らすゼロショット学習能力も示しています。この枠組みは、自動化された診断サポートおよび放射線学ワークフローの効率化において著しい進歩を示していますが、広範な採用前に臨床検証および多センター評価が必要です。

Original Content

arXiv:2509.13590v3 Announce Type: replace-cross Abstract: The rapid advancement of artificial intelligence (AI) in healthcare imaging has revolutionized diagnostic medicine and clinical decision-making processes. This work presents an intelligent multimodal framework for medical image analysis that leverages Vision-Language Models (VLMs) in healthcare diagnostics. The framework integrates Google Gemini 2.5 Flash for automated tumor detection and clinical report generation across multiple imaging modalities including CT, MRI, X-ray, and Ultrasound. The system combines visual feature extraction with natural language processing to enable contextual image interpretation, incorporating coordinate verification mechanisms and probabilistic Gaussian modeling for anomaly distribution. Multi-layered visualization techniques generate detailed medical illustrations, overlay comparisons, and statistical representations to enhance clinical confidence, with location measurement achieving 80 pixels average deviation. Result processing utilizes precise prompt engineering and textual analysis to extract structured clinical information while maintaining interpretability. Experimental evaluations demonstrated high performance in anomaly detection across multiple modalities. The system features a user-friendly Gradio interface for clinical workflow integration and demonstrates zero-shot learning capabilities to reduce dependence on large datasets. This framework represents a significant advancement in automated diagnostic support and radiological workflow efficiency, though clinical validation and multi-center evaluation are necessary prior to widespread adoption.