arxiv_cs_ai 2026年4月24日

類似ではありません相関: 多段検索のための Corpus 特異的相関学習

Association Is Not Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval

Translated: 2026/4/24 20:19:43

dense-retrievalmulti-hop-retrievalcontrastive-learninghotpotqamachine-learning

Japanese Translation

arXiv:2604.20850v1 Announce Type: cross 要約：稠密検索システムは、クエリとの埋め込み類似度によってパスajes をランク付けしていますが、多段質問には、共有推論チェーンを通じて相関的に関連するパスajes が求められます。私たちは、埋め込み空間におけるパスajes 間の相関関係を学習するために、対照的学習による共起注釈を使用して小型 MLP（420 万パラメータ）を訓練する、軽量な遷産的再ランク付け手法「Association-Augmented Retrieval (AAR)」を導入しました。推論時、AAR は 2 方向の相関スコアリングを使用して、初期の稠密検索候補セットを再ランク付けします。HotpotQA では、評価セットのチューニングなしで Passage Recall@5 を 0.831 から 0.916 に改善（+8.6 ポイント）し、稠密ベースラインが失敗する困難な質問において効果は集中しました（+28.5 ポイント）。MuSiQue では、遷産的設定で +10.1 ポイントの改善を達成しました。訓練分割の相関で訓練された誘導的モデルは、未観測検証分割で評価しても有意な改善を示さなかったため、この手法は Corpus 特異的共起を捉え、転送可能なパターンを捉えていることを示唆しています。アブレーション研究はこの解釈をサポートしており、意味的に類似但し非相関のパスajes ペアで訓練するとベースラインより下回ってしまう retrieval が低下し、相関ペアのシャッフルは深刻な劣化をもたらしました。下流の QA 評価では、検索の改善が 6.4 点の正確マッチング向上に転化しました。この手法はクエリごとに 3.7ms を追加し、単一 GPU で 2 分未満で訓練可能であり、LLM ベースのインデックスは不要です。

Original Content

arXiv:2604.20850v1 Announce Type: cross Abstract: Dense retrieval systems rank passages by embedding similarity to a query, but multi-hop questions require passages that are associatively related through shared reasoning chains. We introduce Association-Augmented Retrieval (AAR), a lightweight transductive reranking method that trains a small MLP (4.2M parameters) to learn associative relationships between passages in embedding space using contrastive learning on co-occurrence annotations. At inference time, AAR reranks an initial dense retrieval candidate set using bi-directional association scoring. On HotpotQA, AAR improves passage Recall@5 from 0.831 to 0.916 (+8.6 points) without evaluation-set tuning, with gains concentrated on hard questions where the dense baseline fails (+28.5 points). On MuSiQue, AAR achieves +10.1 points in the transductive setting. An inductive model trained on training-split associations and evaluated on unseen validation associations shows no significant improvement, suggesting that the method captures corpus-specific co-occurrences rather than transferable patterns. Ablation studies support this interpretation: training on semantically similar but non-associated passage pairs degrades retrieval below the baseline, while shuffling association pairs causes severe degradation. A downstream QA evaluation shows retrieval gains translate to +6.4 exact match improvement. The method adds 3.7ms per query, trains in under two minutes on a single GPU, and requires no LLM-based indexing.