arxiv_cs_cv 2026年4月24日

UAU-Net: 不確実性感の表現学習と証拠に基づく分類による面部動作単位検出

UAU-Net: Uncertainty-aware Representation Learning and Evidential Classification for Facial Action Unit Detection

Translated: 2026/4/24 19:41:59

uau-netfacial-action-unituncertainty-quantificationevidential-classificationdeep-learning

Japanese Translation

arXiv:2604.21227v1 Announce Type: new 摘要：面部動作単位（AU）検出は、表現段階と判定段階の両方において、不均質で AU 固有の不確実性が生じるため依然として難しい課題です。最近の手法は識別的特性学習を向上させましたが、多くの手法は AU 表現を決定論的とみなし、視覚ノイズ、主観依存の外見変化、および間欠的な AU 関係に伴う不確実性を考慮せず、これにより頑健性が大きく低下する可能性があります。同時に、従来の点推定分類器は不整合された信頼度を示し、特に AU データセットに典型的である深刻なラベル不整合が存在する下では過信した予測を生み出します。我々は、表現と判定の両段階で明示的不確実性モデルを備えた UAU-Net という AU 検出フレームワークを提案します。表現段階では、CV-AFE という条件付き VAE（CVAE）ベースの AU 特徴抽出モジュールを導入し、複数の空間時間スケールにおける特徴平均と方差的を同時推定することで確率的 AU 表現を学習させます；さらに AU ラベルによる条件付けにより、CV-AFE は間欠的な AU 依存関係に伴う不確実性を捉えることが可能になります。判定段階では、マルチラベル AU 検出用の非対称ベータ証拠神経ネットワークである AB-ENN を設計し、予測不確実性をベータ分布でパラメータ化し、高度に不整合な二重ラベルに適した不対称損失関数を用いて過信を緩和します。BP4D および DISFA における大規模実験において、UAU-Net が優れた AU 検出性能を示したことが明らかになり、さらに、表現学習と証拠推測の両方における不確実性のモデル化が頑健性と信頼性を向上させることを示唆しています。

Original Content

arXiv:2604.21227v1 Announce Type: new Abstract: Facial action unit (AU) detection remains challenging because it involves heterogeneous, AU-specific uncertainties arising at both the representation and decision stages. Recent methods have improved discriminative feature learning, but they often treat the AU representations as deterministic, overlooking uncertainty caused by visual noise, subject-dependent appearance variations, and ambiguous inter-AU relationships, all of which can substantially degrade robustness. Meanwhile, conventional point-estimation classifiers often provide poorly calibrated confidence, producing overconfident predictions, especially under the severe label imbalance typical of AU datasets. We propose UAU-Net, an Uncertainty-aware AU detection framework that explicitly models uncertainty at both stages. At the representation stage, we introduce CV-AFE, a conditional VAE (CVAE)-based AU feature extraction module that learns probabilistic AU representations by jointly estimating feature means and variances across multiple spatio-temporal scales; conditioning on AU labels further enables CV-AFE to capture uncertainty associated with inter-AU dependencies. At the decision stage, we design AB-ENN, an Asymmetric Beta Evidential Neural Network for multi-label AU detection, which parameterizes predictive uncertainty with Beta distributions and mitigates overconfidence via an asymmetric loss tailored to highly imbalanced binary labels. Extensive experiments on BP4D and DISFA show that UAU-Net achieves strong AU detection performance, and further analyses indicate that modeling uncertainty in both representation learning and evidential prediction improves robustness and reliability.