arxiv_cs_lg 2026年2月10日

マルチモーダルデータ収集による偶然不確性と认识的不確性の削減

Reducing Aleatoric and Epistemic Uncertainty through Multi-modal Data Acquisition

Translated: 2026/3/15 9:03:33

multi-modal-datauncertainty-quantificationactive-learningartificial-intelligencedata-acquisition

Japanese Translation

arXiv:2501.18268v2 Announce Type: replace Abstract: 現代の AI システムは、テキスト、画像、音声、スプレッドシート、時間 series データなど複数のモーダルからのデータを組み合わせて、正確で信頼性の高い予測を生成する必要があります。マルチモーダルデータは、不確実性の分解に対して新たな機会と課題をもたらします。機械学習のコミュニティでは、认识的不確性はより多くのデータ収集によって削減でき、偶然的不確性は削減不能であると考えられていましたが、現代の AI システムでは異なるモーダルから情報を得ることでこの仮説が問われてきました。本論文では、サンプリングの 2 つの方向であるサンプルサイズとデータモーダルを対象とする、不確実性の分解が実行可能な決定に導く革新的なデータ収集フレームワークを提案します。主な仮説は、モーダルの数が增加するにつれて偶然的不確性は減少し、認識的不確性はより多くの観測を収集するにつれて減少することです。我々は、能動学習、能動特徴量取得、不確実性量化のアイデアを組み合わせたデータ収集フレームワークを示すために、2 つのマルチモーダルデータセットにおける概念実証を実施しました。

Original Content

arXiv:2501.18268v2 Announce Type: replace Abstract: To generate accurate and reliable predictions, modern AI systems need to combine data from multiple modalities, such as text, images, audio, spreadsheets, and time series. Multi-modal data introduces new opportunities and challenges for disentangling uncertainty: it is commonly assumed in the machine learning community that epistemic uncertainty can be reduced by collecting more data, while aleatoric uncertainty is irreducible. However, this assumption is challenged in modern AI systems when information is obtained from different modalities. This paper introduces an innovative data acquisition framework where uncertainty disentanglement leads to actionable decisions, allowing sampling in two directions: sample size and data modality. The main hypothesis is that aleatoric uncertainty decreases as the number of modalities increases, while epistemic uncertainty decreases by collecting more observations. We provide proof-of-concept implementations on two multi-modal datasets to showcase our data acquisition framework, which combines ideas from active learning, active feature acquisition and uncertainty quantification.