arxiv_cs_ai 2026年4月24日

FairQE: 性別バイアスの軽減のための多エージェントフレームワーク翻訳品質評価において

FairQE: Multi-Agent Framework for Mitigating Gender Bias in Translation Quality Estimation

Translated: 2026/4/24 20:16:51

fairqequality-estimationgender-biasmulti-agentmachine-translation

Japanese Translation

arXiv:2604.21420v1 Announce Type: new Abstract：品質評価（QE）は、参照翻訳なしで機械翻訳の品質を評価することを目的としますが、最近の研究では既存の QE モデルがシステム的な性別バイアスを示していることが明らかになりました。特に、それらは性別が曖昧な文脈において男性的な実現形式を好む傾向があり、性別が明示的に指定されていても性別と整合していない翻訳に高いスコアを付与する可能性があります。これらの問題を解決するため、われわれは性別バイアスを性別が曖昧な文脈も明示的な文脈も緩和する多エージェントベースの公平性意識 QE フレームワーク FairQE を提案します。FairQE は性別のクイズを検出し、性別の逆転翻訳バリエーションを生成し、動的なバイアス意識集約メカニズムを通じて従来の QE スコアと LLM ベースのバイアス軽減推論を組み合わせます。この設計は既存の QE モデルの利点を保ちながら、プラグン＆プレイの方式でその性別に関連するバイアスを校正します。複数の性別バイアス評価設定における広範な実験は、FairQE が強力な QE ベースラインよりも一貫して性別の公平性を向上させることを示しています。さらに、WMT 2023 Metrics Shared Task に従う MQM ベースのメタ評価下、FairQE は競争力のある、または向上した一般 QE パフォーマンスを達成しました。これらの結果は、QE における性別バイアスを経験精度を犠牲することなく効果的に軽減でき、公平で信頼性の高い翻訳評価を可能にすることを示しています。

Original Content

arXiv:2604.21420v1 Announce Type: new Abstract: Quality Estimation (QE) aims to assess machine translation quality without reference translations, but recent studies have shown that existing QE models exhibit systematic gender bias. In particular, they tend to favor masculine realizations in gender-ambiguous contexts and may assign higher scores to gender-misaligned translations even when gender is explicitly specified. To address these issues, we propose FairQE, a multi-agent-based, fairness-aware QE framework that mitigates gender bias in both gender-ambiguous and gender-explicit scenarios. FairQE detects gender cues, generates gender-flipped translation variants, and combines conventional QE scores with LLM-based bias-mitigating reasoning through a dynamic bias-aware aggregation mechanism. This design preserves the strengths of existing QE models while calibrating their gender-related biases in a plug-and-play manner. Extensive experiments across multiple gender bias evaluation settings demonstrate that FairQE consistently improves gender fairness over strong QE baselines. Moreover, under MQM-based meta-evaluation following the WMT 2023 Metrics Shared Task, FairQE achieves competitive or improved general QE performance. These results show that gender bias in QE can be effectively mitigated without sacrificing evaluation accuracy, enabling fairer and more reliable translation evaluation.