arxiv_cs_lg 2026年4月24日

raw フーチャリングから有効なエンベディングへ：マルチモーダルレシピ推薦のための 3 ステージアプローチ

From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation

Translated: 2026/4/24 20:09:03

recipe-recommendationmultimodal-fusionembeddingscontrastive-learninguser-interaction

Japanese Translation

arXiv:2511.19176v3 Announce Type: replace 要約: レシピ推薦は、Web ベースの食品プラットフォームにおいて不可欠なタスクとなっています。中心的な課題は、ユーザーとレシピの相互作用を超えた豊富なマルチモーダルフーチャリングを効果的に活用することです。わたしたちの解析结果显示、単純なマルチモーダル信号の利用でも競合的な性能が得られるため、これらの信号の系統的な強化は非常に有望であるとの結論に至っています。TESMR と呼ぶレシピ推薦のための 3 ステージフレームワークを提案し、以下を通じて原マルチモーダルフーチャリングを有効なエンベディングへと段階的に洗練させます：(1) マルチモーダル理解能力を持つファウンデーションモデルを用いたコンテンツベースの強化、(2) ユーザーとレシピの相互作用上のメッセージ伝播による関係ベースの強化、(3) 学習可能エンベディングを用いた対比学習による学習ベースの強化。実際のデータセット 2 つにおける実験では、TESMR が既存の手法を上回り、Recall@10 を 7-15% 向上させる成果を達成しました。

Original Content

arXiv:2511.19176v3 Announce Type: replace Abstract: Recipe recommendation has become an essential task in web-based food platforms. A central challenge is effectively leveraging rich multimodal features beyond user-recipe interactions. Our analysis shows that even simple uses of multimodal signals yield competitive performance, suggesting that systematic enhancement of these signals is highly promising. We propose TESMR, a 3-stage framework for recipe recommendation that progressively refines raw multimodal features into effective embeddings through: (1) content-based enhancement using foundation models with multimodal comprehension, (2) relation-based enhancement via message propagation over user-recipe interactions, and (3) learning-based enhancement through contrastive learning with learnable embeddings. Experiments on two real-world datasets show that TESMR outperforms existing methods, achieving 7-15% higher Recall@10.