arxiv_cs_lg 2026年4月24日

Combo-Gait: Multi モールおよびマルチタスクな gait 認識と属性分析のための統合型 Transformer フレームワーク

Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis

Translated: 2026/4/24 20:13:00

transformergait-recognitionmulti-modalbiometricshuman-attributes

Japanese Translation

arXiv:2510.10417v2 Announce Type: replace-cross 摘要：gait 認識は、低解像度または制約のない環境下における、遠距離での人間識別のための重要な生体識別子です。現在の研究は、主に 2D 表現（例：シルエットと骨格）または 3D 表現（例：メッシュと SMPL）に焦点を当てていますが、単一のモーダルに依存することは、人間の歩行パターンにおける幾何学的および動的な複雑性の全体を捉えることに失敗します。本研究では、2D 時間系列シルエットを 3D SMPL 特徴と組み合わせて頑健な gait 分析を行う、マルチモーダルおよびマルチタスクのフレームワークを提案します。識別だけでなく、gait 認識と年齢、体重指数（BMI）、性別などの人間の属性推定を同時に行うマルチタスク学習戦略を導入します。統合型トランスフォーマーを用いることで、マルチモーダル gait 特徴を効果的に融合し、属性に関連する表現をよりよく学習させつつ、識別性の強い身份情報を保ちます。長距離（最大 1 km）や極端なピッチ角（最大 50 度）などの困難な条件下で収集された大規模な BRIAR データセットにおける広範な実験により、我々のアプローチは gait 認識において最先進の手法を超え、正確な人間の属性推定を提供していることを示しました。これらの結果は、マルチモーダルおよびマルチタスク学習が、現実世界のシナリオにおける gait ベースの人間の理解を推進する可能性を持っていることを示しています。

Original Content

arXiv:2510.10417v2 Announce Type: replace-cross Abstract: Gait recognition is an important biometric for human identification at a distance, particularly under low-resolution or unconstrained environments. Current works typically focus on either 2D representations (e.g., silhouettes and skeletons) or 3D representations (e.g., meshes and SMPLs), but relying on a single modality often fails to capture the full geometric and dynamic complexity of human walking patterns. In this paper, we propose a multi-modal and multi-task framework that combines 2D temporal silhouettes with 3D SMPL features for robust gait analysis. Beyond identification, we introduce a multitask learning strategy that jointly performs gait recognition and human attribute estimation, including age, body mass index (BMI), and gender. A unified transformer is employed to effectively fuse multi-modal gait features and better learn attribute-related representations, while preserving discriminative identity cues. Extensive experiments on the large-scale BRIAR datasets, collected under challenging conditions such as long-range distances (up to 1 km) and extreme pitch angles (up to 50{\deg}), demonstrate that our approach outperforms state-of-the-art methods in gait recognition and provides accurate human attribute estimation. These results highlight the promise of multi-modal and multitask learning for advancing gait-based human understanding in real-world scenarios.