arxiv_cs_cv 2026年2月10日

手術室における自己教師ありアンカリバラウンドマルチビュー動画匿名化

Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room

Translated: 2026/3/15 16:08:08

self-supervisedmulti-viewvideo-anonymizationoperating-roomperson-detection

Japanese Translation

arXiv:2602.02850v2 発表タイプ：置換要約：動画データを手術室（OR）研究に利用するためのプライバシー保持は前提条件です。効果的な匿名化には個々の人物の完全な検出が不可欠であり、単一の漏れ検出でも大規模な手動修正を必要とします。しかし、既存のアプローチは以下の 2 つの重要なスケーラビリティのボトルネックに直面しています：(1) 高い精度を達成するために、新しい臨床現場ごとに手動アノテーションが必要となる；(2) シングルビューの曖昧さを解決するためにマルチカメラ設置が広く採用されているにもかかわらず、カメラの再配置時には通常カレンブレーションが必要となります。これらの問題を解決するため、アノテーションやカレンブレーションなしに全身人物検出と全身姿勢推定を備えた自己教師ありマルチビュー動画匿名化フレームワークを提案します。当社の核となる戦略は、時系列およびマルチビュー文脈を使用して偽の否定結果を検出し、自己教師ありドメイン適応を行う単一ビュー検出器を強化することです。まず、各ビューでオフザシェルフの全身人物検出器を実行し、低スコアの閾値を使用して候補検出を収集します。次に、トラッキングおよび自己教師ありアンカリバラウンドマルチビュー関連付けを通じて、高スコア検出との一貫性を示す低スコアの偽の否定結果を検出します。これらの回復した検出物は、全身検出器を反復的に微調するための擬似ラベルとして機能します。最後に、各検出された人物に対して全身姿勢推定を適用し、自身の高スコア予測を使用して姿勢モデルを微調します。シミュレートされた手術の 4D-OR データセットおよび我々の実手術データセットにおける実験は、当社のアプローチが 97% を超る呼び出し率を達成したことを示しています。さらに、当社の擬似ラベルを使用してリアルタイムの全身検出器を訓練し、比較可能な性能を実現し、我々の方法の実用的な適用性を強調しました。コードは https://github.com/CAMMA-public/OR_anonymization に利用可能です。

Original Content

arXiv:2602.02850v2 Announce Type: replace Abstract: Privacy preservation is a prerequisite for using video data in Operating Room (OR) research. Effective anonymization relies on the exhaustive localization of every individual; even a single missed detection necessitates extensive manual correction. However, existing approaches face two critical scalability bottlenecks: (1) they usually require manual annotations of each new clinical site for high accuracy; (2) while multi-camera setups have been widely adopted to address single-view ambiguity, camera calibration is typically required whenever cameras are repositioned. To address these problems, we propose a novel self-supervised multi-view video anonymization framework consisting of whole-body person detection and whole-body pose estimation, without annotation or camera calibration. Our core strategy is to enhance the single-view detector by "retrieving" false negatives using temporal and multi-view context, and conducting self-supervised domain adaptation. We first run an off-the-shelf whole-body person detector in each view with a low-score threshold to gather candidate detections. Then, we retrieve the low-score false negatives that exhibit consistency with the high-score detections via tracking and self-supervised uncalibrated multi-view association. These recovered detections serve as pseudo labels to iteratively fine-tune the whole-body detector. Finally, we apply whole-body pose estimation on each detected person, and fine-tune the pose model using its own high-score predictions. Experiments on the 4D-OR dataset of simulated surgeries and our dataset of real surgeries show the effectiveness of our approach achieving over 97% recall. Moreover, we train a real-time whole-body detector using our pseudo labels, achieving comparable performance and highlighting our method's practical applicability. Code will be available at https://github.com/CAMMA-public/OR_anonymization.