arxiv_cs_cv 2026年2月10日

Q-Hawkeye: 画像品質評価のための信頼性のある視覚政策最適化

Q-Hawkeye: Reliable Visual Policy Optimization for Image Quality Assessment

Translated: 2026/3/15 16:07:40

image-quality-assessmentreinforcement-learningmultimodal-learningvisual-policy-optimizationmachine-learning

Japanese Translation

arXiv:2601.22920v2 Announce Type: replace 摘要：画像品質評価 (IQA) は、人間の判断と一貫する知覚品質スコアを予測します。最近の MLLM（多モーダル大規模言語モデル）に基づいた RL ベースの IQA 手法は、視覚品質の説明とスコアの生成に焦点を当てていますが、以下の 2 つの主要な信頼性限界を無視しています：(i) モデルの予測安定性はトレーニングサンプル間で大きく変動しますが、既存の GRPO ベースの手法は一律のアバントッジ重み付けを実行し、不安定なサンプルから来るノイズな信号を勾配更新において増幅しています；(ii) 多くの研究はテキストに基づく論理推理に重点を置いている一方、モデルの画像内容に対する視覚感知能力を看過しています。本論文では、統一された不確実性感知ダイナミック最適化と感知感知最適化を通じて学習シグナルを再設計する RL ベースの信頼性のある視覚政策最適化フレームワーク「Q-Hawkeye」を提案します。Q-Hawkeye は、複数のロールアウトにわたる予測スコアの変動を用いて予測不確実性を推定し、この不確実性を活用して各サンプルの更新強度を再重み付けすることで、政策最適化を安定化させます。認識的な信頼性を強化するために、劣化画像と元画像のペアを入力とし、モデルの品質判断を真の視覚証拠に基づき制限する「暗黙的な感知損失」を導入しました。広範な実験により、Q-Hawkeye は最先进の手法を上回る性能を示し、複数のデータセットにおいてより良い汎化性能を発揮したことが示されています。コードとモデルは公開されます。

Original Content

arXiv:2601.22920v2 Announce Type: replace Abstract: Image Quality Assessment (IQA) predicts perceptual quality scores consistent with human judgments. Recent RL-based IQA methods built on MLLMs focus on generating visual quality descriptions and scores, ignoring two key reliability limitations: (i) although the model's prediction stability varies significantly across training samples, existing GRPO-based methods apply uniform advantage weighting, thereby amplifying noisy signals from unstable samples in gradient updates; (ii) most works emphasize text-grounded reasoning over images while overlooking the model's visual perception ability of image content. In this paper, we propose Q-Hawkeye, an RL-based reliable visual policy optimization framework that redesigns the learning signal through unified Uncertainty-Aware Dynamic Optimization and Perception-Aware Optimization. Q-Hawkeye estimates predictive uncertainty using the variance of predicted scores across multiple rollouts and leverages this uncertainty to reweight each sample's update strength, stabilizing policy optimization. To strengthen perceptual reliability, we construct paired inputs of degraded images and their original images and introduce an Implicit Perception Loss that constrains the model to ground its quality judgments in genuine visual evidence. Extensive experiments demonstrate that Q-Hawkeye outperforms state-of-the-art methods and generalizes better across multiple datasets. The code and models will be made available.