arxiv_cs_lg 2026年4月20日

効率的な多モーダル寓意識別のための概念漂移導向な LayerNorm 調整

Concept Drift Guided LayerNorm Tuning for Efficient Multimodal Metaphor Identification

Translated: 2026/4/20 11:06:35

concept-driftlayernormmultimodalmetaphor-identificationclip

Japanese Translation

arXiv:2505.11237v4 Announce Type: replace-cross 要約：寓意想像力、関連性のない概念を結びつける能力は、人間の認知とコミュニケーションの基盤であり、極めて重要です。言語的な寓意の理解は大幅に進歩しましたが、インターネットのミームに見られるような多モーダル寓意を理解することは、その非定型な表現と潜在的な意味に起因して独自の課題を伴います。既存の多モーダル寓意識別手法は、文字通りの解釈と比喩的解釈の間に存在する乖離を架けbridge（橋渡す）することに struggles（困難を）あります。さらに、大型言語モデルや画像生成モデルを利用する生成アプローチは有望ですが、高い計算コストを伴います。本稿は、多モーダル寓意識別のための新規かつトレーニング効率の良いフレームワークである**概念漂移導向な LayerNorm 調整 (CDGLT)** を提案します。CDGLT は 2 つの主要な革新を取り入れています：(1) 概念漂移、これは CLIP エンコーダのクロスモーダル埋め込みの球線形補間 (SLERP) を活用して、新しい、相違する概念埋め込みを生成するメカニズムです。この漂移した概念は、文字通りの特徴と比喩的タスクの間の乖離を緩和するのに役立ちます。(2) プロンプト構築戦略、これは事前学習済み言語モデルを用いた特徴抽出と融合の方法を多モーダル寓意識別タスクに適応させるものです。CDGLT は MET-Meme ベンチマークで最先級の性能を実現し、既存の生成的手法と比較してトレーニングコストを大幅に削減しました。アブレーション研究は、概念漂移と我々の適応した LN 調整アプローチの両方の有効性を示しました。我々の方法は、効率的かつ正確な多モーダル寓意理解への大きな一歩を踏み出しました。コードは以下の通り利用可能です：\href{https://github.com/Qianvenh/CDGLT}{https://github.com/Qianvenh/CDGLT}。

Original Content

arXiv:2505.11237v4 Announce Type: replace-cross Abstract: Metaphorical imagination, the ability to connect seemingly unrelated concepts, is fundamental to human cognition and communication. While understanding linguistic metaphors has advanced significantly, grasping multimodal metaphors, such as those found in internet memes, presents unique challenges due to their unconventional expressions and implied meanings. Existing methods for multimodal metaphor identification often struggle to bridge the gap between literal and figurative interpretations. Additionally, generative approaches that utilize large language models or text-to-image models, while promising, suffer from high computational costs. This paper introduces \textbf{C}oncept \textbf{D}rift \textbf{G}uided \textbf{L}ayerNorm \textbf{T}uning (\textbf{CDGLT}), a novel and training-efficient framework for multimodal metaphor identification. CDGLT incorporates two key innovations: (1) Concept Drift, a mechanism that leverages Spherical Linear Interpolation (SLERP) of cross-modal embeddings from a CLIP encoder to generate a new, divergent concept embedding. This drifted concept helps to alleviate the gap between literal features and the figurative task. (2) A prompt construction strategy, that adapts the method of feature extraction and fusion using pre-trained language models for the multimodal metaphor identification task. CDGLT achieves state-of-the-art performance on the MET-Meme benchmark while significantly reducing training costs compared to existing generative methods. Ablation studies demonstrate the effectiveness of both Concept Drift and our adapted LN Tuning approach. Our method represents a significant step towards efficient and accurate multimodal metaphor understanding. The code is available: \href{https://github.com/Qianvenh/CDGLT}{https://github.com/Qianvenh/CDGLT}.