arxiv_cs_cv 2026年4月24日

腹膜炎の診断における federated learning を活用した手術視覚: FedSurg EndoVis 2024 チャレンジの結果

Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge

Translated: 2026/4/24 19:49:58

federated-learningsurgical-aiappendicitiscomputer-visionprivacy-preserving

Japanese Translation

arXiv:2510.04772v2 Announce Type: replace Abstract: 汎用性のある手術 AI を開発するには多施設データが必要ですが、患者プライバシーの制約により直接データ共有が不可能であるため、Federated Learning (FL) は自然な候補ソリューションとなります。FL を複雑な時間空間手術動画データに適用することはまだ十分にベンチマーク化されていません。本研究では、多施設腹腔鏡付帯腸切除（Appendix300 の仮想的なサブセット）データセットを用いたポータブル・アプロウ・オン概念のフェーズとして、最初の国際的な FL 手術視覚ベンチマーク調査「FedSurg チャレンジ」を提示しました。 unseen センターへの一般化とセンター固有の適応の 3 つの提案が評価されました。中央集権的および Swarm Learning ベースラインは、観測されたパフォーマンスにおけるタスクの難易度と分散化の寄与を区別しました。すべてのデータを中央に集約した場合でも、 unseen センターでは F1 スコア 26.31% のみであり、分散トレーニングが追加的で分離可能なパフォーマンスペナルティをもたらしました。時間的モデリングが支配的なアーキテクチャ因子となります：ビデオレベルの時間空間モデルは、集約戦略にかかわらず常にフレームレベルのアプローチを凌駕しました。単純なローカル微調整は、非平衡ローカルデータにおいて分類器の崩壊を引き起こします。パラメータ効率的な微調整を伴う構造化されたパーソナライズド・FL は、センター固有の適応へのより原則に基づいたアプローチを表しています。厳密な統計分析を通じて現在の FL の制限を特徴付け、本論文は手術動画分析における堅牢でプライバシー保持型 AI システムのための方法論的基準点を提供しました。

Original Content

arXiv:2510.04772v2 Announce Type: replace Abstract: Developing generalizable surgical AI requires multi-institutional data, yet patient privacy constraints preclude direct data sharing, making Federated Learning (FL) a natural candidate solution. The application of FL to complex, spatiotemporal surgical video data remains largely unbenchmarked. We present the FedSurg Challenge, the first international benchmarking initiative dedicated to FL in surgical vision, evaluated as a proof-of-concept on a multi-center laparoscopic appendectomy dataset (preliminary subset of Appendix300). Three submissions were evaluated on generalization to an unseen center and center-specific adaptation. Centralized and Swarm Learning baselines isolate the contributions of task difficulty and decentralization to observed performance. Even with all data pooled centrally, the task achieved only 26.31\% F1-score on the unseen center, while decentralized training introduced an additional, separable performance penalty. Temporal modeling emerges as the dominant architectural factor: video-level spatiotemporal models consistently outperformed frame-level approaches regardless of aggregation strategy. Naive local fine-tuning leads to classifier collapse on imbalanced local data; structured personalized FL with parameter-efficient fine-tuning represents a more principled path toward center-specific adaptation. By characterizing current FL limitations through rigorous statistical analysis, this work establishes a methodological reference point for robust, privacy-preserving AI systems in surgical video analysis.