arxiv_cs_cv 2026年4月24日

カメラが生成 AI を使用する場合の画像真実性の問題への対応

Addressing Image Authenticity When Cameras Use Generative AI

Translated: 2026/4/24 19:47:05

generative-aiimage-authenticitycomputer-visiondeep-learningimage-processing

Japanese Translation

arXiv:2604.21879v1 Announce Type: new 本稿の要約: 生成 AI (GenAI) メソッドによる写真写実的なカメラ画像の改変能力は、オンラインで共有される画像の真実性に関する意識を高めています。興味深いかつ、当社のカメラで直接撮影された画像は、真実性と忠実さを持つものと見なされます。しかし、カメラの撮影時ハードウェア（具体的には画像信号処理装置 (ISP)）へのディープラーニングモジュールの統合が拡大している現在、当社のカメラが直接出力する画像の中に、幻覚的なコンテンツが存在する可能性があります。撮影時の幻覚的な画像コンテンツは通常、ベネートなものである（例：エッジやテクスチャの増幅）、一方で、AI ベースのデジタルズームや低光量画像増幅などの特定の操作においては、幻覚が発生する可能性により画像コンテンツの文脈や解釈が変えられることがあります。その結果、ユーザーは、自身のカメラ画像内のコンテンツが真実でないことに気づくことができない場合があります。本稿は、画像コンテンツの誤った解釈を防ぐために、ユーザーに「幻覚除去」されたカメラ画像を回復させることを可能とするこの課題に取り組みます。当社のアプローチは、画像固有の多層パーセプトロン (MLP) デコーダーとモーダル固有エンコーダーを最適化するものであり、カメラ画像を与えれば、幻覚的なコンテンツが加わった前の画像を回復できるようにします。エンコーダーと MLP は完全に独立しており、カメラの ISP へのアクセスを必要とせず、撮影後に画像に適用することができます。さらに、エンコーダーと MLP デコーダーは 180 KB のストレージのみを必要とし、JPEG や HEIC といった標準的な画像フォーマット内のメタデータとして容易に保存可能です。

Original Content

arXiv:2604.21879v1 Announce Type: new Abstract: The ability of generative AI (GenAI) methods to photorealistically alter camera images has raised awareness about the authenticity of images shared online. Interestingly, images captured directly by our cameras are considered authentic and faithful. However, with the increasing integration of deep-learning modules into cameras' capture-time hardware -- namely, the image signal processor (ISP) -- there is now a potential for hallucinated content in images directly output by our cameras. Hallucinated capture-time image content is typically benign, such as enhanced edges or texture, but in certain operations, such as AI-based digital zoom or low-light image enhancement, hallucinations can potentially alter the semantics and interpretation of the image content. As a result, users may not realize that the content in their camera images is not authentic. This paper addresses this issue by enabling users to recover the 'unhallucinated' version of the camera image to avoid misinterpretation of the image content. Our approach works by optimizing an image-specific multi-layer perceptron (MLP) decoder together with a modality-specific encoder so that, given the camera image, we can recover the image before hallucinated content was added. The encoder and MLP are self-contained and can be applied post-capture to the image without requiring access to the camera ISP. Moreover, the encoder and MLP decoder require only 180 KB of storage and can be readily saved as metadata within standard image formats such as JPEG and HEIC.