arxiv_cs_ai 2026年4月24日

マルチモーダルアクティブ学習へ：限られたペア付きデータでの効率的な学習

Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data

Translated: 2026/4/24 20:32:43

active-learningmultimodal-learningdeep-learningannotation-costdata-science

Japanese Translation

arXiv:2510.03247v2 Announce Type: replace-cross 要旨：アクティブ学習（AL）は、データhungryな深層学習の注釈コストを削減するための原理に基づく戦略ですが、既存の AL アルゴリズムはほぼ一元的なデータに焦点を当て、マルチモーダル学習における巨大な注釈負担を見逃しています。我々は、事前にアラインされていないデータを持つマルチモーダルアクティブ学習初のフレームワークを提案します。ここで学習者は、ペアされたデータにラベルを取得するのではなく、アクティブにクロスモーダルアラインメントを取得する必要があります。この設定は、現代のマルチモーダルパイプラインにおける実用的なボトルネックを捉えており、ここで一元的な特徴量は容易に入手できるものの、高品質なアラインメントは高コストです。我々は、不確実性と多様性の原則を模態対応の設計で組み合わせた新しいアルゴリズムを開発し、直線時間での収集を実現するとともに、プールの設定とストリーミングの設定の両方に順調に適用できます。ベンチマークデータセットに対する大規模な実験は、我々のアプローチがマルチモーダル注釈コストを一貫して削減し、性能を維持することを示しています。例えば、ColorSwap データセットでは精度の低下なしに注釈要件を最大 40% 削減しました。

Original Content

arXiv:2510.03247v2 Announce Type: replace-cross Abstract: Active learning (AL) is a principled strategy to reduce annotation cost in data-hungry deep learning. However, existing AL algorithms focus almost exclusively on unimodal data, overlooking the substantial annotation burden in multimodal learning. We introduce the first framework for multimodal active learning with unaligned data, where the learner must actively acquire cross-modal alignments rather than labels on pre-aligned pairs. This setting captures the practical bottleneck in modern multimodal pipelines, where unimodal features are easy to obtain but high-quality alignment is costly. We develop a new algorithm that combines uncertainty and diversity principles in a modality-aware design, achieves linear-time acquisition, and applies seamlessly to both pool-based and streaming-based settings. Extensive experiments on benchmark datasets demonstrate that our approach consistently reduces multimodal annotation cost while preserving performance; for instance, on the ColorSwap dataset it cuts annotation requirements by up to 40% without loss in accuracy.