arxiv_cs_cv 2026年2月10日

RealSR-R1: 画像修復における現実世界向け画像超解像度に対するリニアクトルーニングとビジョン言語思考の連鎖

RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought

Translated: 2026/3/15 7:01:46

realesr-r1image-super-resolutionreinforcement-learningchain-of-thoughtvision-language-models

Japanese Translation

arXiv:2506.16796v3 Announce Type: replace 摘要：現実世界画像超解像度は画像修復において最も困難な課題の一つです。しかし、既存の方法は変形された画像の内容を正確に理解できず、再生成された結果は低画質で不自然になりがちです。本研究では、現実世界画像超解像度における理解および推論能力を強化するための RealSR-R1 を提唱します。大規模言語モデル（LLM）における思考の連鎖（Chain of Thought, CoT）の成功に着想を得て、私たちは変形された画像を処理する人間のプロセスを模倣し、視覚と言語の推論を統合する VLCoT フレームワークを提案します。VLCoT は、より包括的なテキストとより高画質の画像を順次生成することで、画像の詳細を正確に修復することを目指しています。従来の监督学習 CoT が現実世界のシナリオに汎用できないという課題に対処するため、Real-World Image Super-Resolution タスクにおいて初めてグループ相対政策最適化（GRPO）を導入しました。我々は VLCoT-GRPO を提案し、4 つの報酬関数を設計しました：(1) 形式報酬、CoT プロセスの標準化に使用；(2) 変形報酬、正確な変形推測を促進するために；(3) 理解報酬、生成されたコンテンツの精度を保証するために；(4) 生成報酬、ここで我々は視覚専門モデルを使用して生成された画像の品質を評価し、より現実的な画像の生成を促すために使用しました。大規模な実験により、我々が提案した RealSR-R1 が現実的な詳細を生成し、画像内容を正確に理解できることを示唆しました。特に、文脈が豊かであるシナリオや変形が激しい画像において、その能力が確認できました。

Original Content

arXiv:2506.16796v3 Announce Type: replace Abstract: Real-World Image Super-Resolution is one of the most challenging task in image restoration. However, existing methods struggle with an accurate understanding of degraded image content, leading to reconstructed results that are both low-fidelity and unnatural. We present RealSR-R1 in this work, which empowers the RealSR models with understanding and reasoning capabilities. Inspired by the success of Chain of Thought (CoT) in large language models (LLMs), we simulate the human process of handling degraded images and propose the VLCoT framework, which integrates vision and language reasoning. The framework aims to precisely restore image details by progressively generating more comprehensive text and higher-resolution images. To overcome the challenge of traditional supervised learning CoT failing to generalize to real-world scenarios, we introduce, for the first time, Group Relative Policy Optimization (GRPO) into the Real-World Image Super-Resolution task. We propose VLCoT-GRPO as a solution, which designs four reward functions: (1) Format reward, used to standardize the CoT process; (2) Degradation reward, to incentivize accurate degradation estimation; (3) Understanding reward, to ensure the accuracy of the generated content; and (4) Generation reward, where we propose using a visual expert model to evaluate the quality of generated images, encouraging the model to generate more realistic images. Extensive experiments demonstrate that our proposed RealSR-R1 can generate realistic details and accurately understand image content, particularly in semantically rich scenes or images with severe degradation.