arxiv_cs_cv 2026年2月10日

ImageRAG: 参照指示された画像生成のための動的な画像検索

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Translated: 2026/3/15 4:02:35

image-generationdiffusion-modelsretrieval-augmented-generationimage-ragvisual-synthesis

Japanese Translation

arXiv:2502.09411v2 Announce Type: replace 要約：拡散モデルは、高品質で多様な視覚コンテンツの合成を可能にしています。ただし、稀なまたは未見の概念の生成においては限界があります。この課題に対処するために、我々は画像生成モデルとの検索拡張生成 (RAG) の利用を検討しました。我々は、与えられたテキストプロンプトに基づいて関連する画像を動的に検索し、それらをコンテキストとして使用して生成プロセスを誘導する手法である ImageRAG を提案します。以前、検索された画像を使用して生成を改善するアプローチでは、検索に基づく生成のためにモデルを特化して訓練する必要がありました。一方、ImageRAG は既存の画像条件付けモデルの機能を活用し、RAG 特化の訓練は必要としません。我々のアプローチは非常に適応力があり、異なるモデルタイプに適用でき、ベースモデルを異ととしてして稀で微細な概念の生成において有意な改善をもたらします。我々のプロジェクトページは次の場所から利用可能です：https://rotem-shalev.github.io/ImageRAG

Original Content

arXiv:2502.09411v2 Announce Type: replace Abstract: Diffusion models enable high-quality and diverse visual content synthesis. However, they struggle to generate rare or unseen concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with image generation models. We propose ImageRAG, a method that dynamically retrieves relevant images based on a given text prompt, and uses them as context to guide the generation process. Prior approaches that used retrieved images to improve generation, trained models specifically for retrieval-based generation. In contrast, ImageRAG leverages the capabilities of existing image conditioning models, and does not require RAG-specific training. Our approach is highly adaptable and can be applied across different model types, showing significant improvement in generating rare and fine-grained concepts using different base models. Our project page is available at: https://rotem-shalev.github.io/ImageRAG