arxiv_cs_cv 2026年4月24日

擬付けによるデータ拡張を活用した、水下画像データにおける密集型物体検出の向上のための確率的フレームワーク

A Probabilistic Framework for Improving Dense Object Detection in Underwater Image Data via Annealing-Based Data Augmentation

Open original article

Translated: 2026/4/24 19:41:48

data-augmentationobject-detectionunderwater-visionsimulated-annealingimage-processing

Japanese Translation

arXiv:2604.21198v1 Announce Type: new 摘要：物体検出モデルは、制御された環境で撮影された安定した照明、水の透明度、および視点を持つ画像において通常良好な性能を発揮しますが、変動が高く、不連続が頻繁である現実的な水下環境ではその性能が大幅に低下します。本稿では、これらの課題を解決するために、密集型かつ制約のない水下シーンにおける頑健性を向上させるための新しいデータ拡張フレームワークを導入しました。DeepFish データセット（自然環境中の魚の画像を含む）を使用して、提供された分割マスクから境界ボックス注釈を生成して、カスタム検出データセットを構築しました。次に、Deng 等人 [1] のコピーペースト戦略に触発された擬似シミュレーション退火に基づく拡張アルゴリズムを提案し、現実的な混雑した魚のシナリオを合成しました。我々の手法は、学習中の空間的な多様性と物体密度を向上させ、複雑なシーンへのより良い一般化を可能にしました。実験的結果は、我々の手法が YOLOv10 のベースラインモデルを著しく凌駕することを示しており、特にフロリダのキーに撮影されたライブストリーム映像から収集された手注釈画像を含む難しいテストセットにおいては顕著でした。これらの結果は、我々の拡張戦略が密集型、現実的な水下環境における検出性能の向上に有効であることを示しています。

Original Content

arXiv:2604.21198v1 Announce Type: new Abstract: Object detection models typically perform well on images captured in controlled environments with stable lighting, water clarity, and viewpoint, but their performance degrades substantially in real-world underwater settings characterized by high variability and frequent occlusions. In this work, we address these challenges by introducing a novel data augmentation framework designed to improve robustness in dense and unconstrained underwater scenes. Using the DeepFish dataset, which contains images of fish in natural environments, we first generate bounding box annotations from provided segmentation masks to construct a custom detection dataset. We then propose a pseudo-simulated annealing-based augmentation algorithm, inspired by the copy-paste strategy of Deng et al. [1], to synthesize realistic crowded fish scenarios. Our approach improves spatial diversity and object density during training, enabling better generalization to complex scenes. Experimental results show that our method significantly outperforms a baseline YOLOv10 model, particularly on a challenging test set of manually annotated images collected from live-stream footage in the Florida Keys. These results demonstrate the effectiveness of our augmentation strategy for improving detection performance in dense, real-world underwater environments.