arxiv_cs_cv 2026年2月10日

CAF-Mamba: 多式分解圧症検出のための Mamba ベースクロスモーダル適応的注意融合

CAF-Mamba: Mamba-Based Cross-Modal Adaptive Attention Fusion for Multimodal Depression Detection

Translated: 2026/3/15 16:07:35

mambamultimodal-learningdepression-detectioncross-modal-attentionarxiv

Japanese Translation

論文: arXiv:2601.21648v2 発表タイプ: 置換要約: 多式分解抑郁症是一种普遍的心理健康疾病，严重损害日常功能和生活质量。虽然近期的抑郁症检测深度学习方案显示出了希望，但大多数方案依赖于有限类型的功能，忽略了明确的跨模态交互，并且对于融合采用了简单的拼接或静态权重。为了解决这些局限性，我们提出了 CAF-Mamba，这是一种新颖的基于 Mamba 的跨模态自适应注意力融合框架。CAF-Mamba 不仅可以明确地而且隐式地捕捉跨模态交互，而且还通过一种模态级别的注意力机制动态调整模态贡献，从而实现了更有效式的多式分解融合。在两个真实世界基准数据集 LMVD 和 D-Vlog 上的实验表明，CAF-Mamba 始终优于现有方法，并达到了最先进的性能。我们的代码可访问于 https://github.com/zbw-zhou/CAF-Mamba

Original Content

arXiv:2601.21648v2 Announce Type: replace Abstract: Depression is a prevalent mental health disorder that severely impairs daily functioning and quality of life. While recent deep learning approaches for depression detection have shown promise, most rely on limited feature types, overlook explicit cross-modal interactions, and employ simple concatenation or static weighting for fusion. To overcome these limitations, we propose CAF-Mamba, a novel Mamba-based cross-modal adaptive attention fusion framework. CAF-Mamba not only captures cross-modal interactions explicitly and implicitly, but also dynamically adjusts modality contributions through a modality-wise attention mechanism, enabling more effective multimodal fusion. Experiments on two in-the-wild benchmark datasets, LMVD and D-Vlog, demonstrate that CAF-Mamba consistently outperforms existing methods and achieves state-of-the-art performance. Our code is available at https://github.com/zbw-zhou/CAF-Mamba.