arxiv_cs_lg 2026年4月24日

ReasonRank: 強力な推理能力によるパスレートの昇進

ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

Translated: 2026/4/24 20:12:18

reasoninglarge-language-modelsrankingreinforcement-learningsft

Japanese Translation

arXiv:2508.07050v3 Announce Type: replace-cross Abstract: Large Language Model (LLM) におけるリストライズなランク付けは、多くのパスレートのタスクで優れてるパフォーマンスを示した。Large Reasoning Models (LRM) の開発により、テスト時にステップバイステップの推理がリストライズなランク付けのパフォーマンスを改善するとの多くの研究がある。しかし、推理密集型のトレーニングデータの不足により、既存のリランク器は複雑なランク付けシナリオで Poorly パフォーマンスを示し、推理密集型のリランク器のランク付け能力はほとんど未開発である。この論文では、まず多様なドメインからトレーニングクエリとパスを取得し、DeepSeek-R1 を適用して高品質なトレーニングラベルを生成する自動化された推理密集型トレーニングデータ合成フレームワークを提案する。リストライズなリランク器を強力な推理能力に赋能するため、さらに冷スタート監督的微調整 (SFT) ステージと強化学習 (RL) ステージを含む 2段階的なトレーニングアプローチを提案する。RL ステージでは、リストライズなランク付けの多転の性質に適した新たなマルチビューランク付け報酬を設計する。広範な実験は、訓練された推理密集型のリランク器 ReasonRank が既存のベースラインを大幅に上回り、点wise リランク器よりもはるかに低い遅延を達成することを示した。当社のコードは https://github.com/8421BCD/ReasonRank で利用可能である。

Original Content

arXiv:2508.07050v3 Announce Type: replace-cross Abstract: Large Language Model (LLM) based listwise ranking has shown superior performance in many passage ranking tasks. With the development of Large Reasoning Models (LRMs), many studies have demonstrated that step-by-step reasoning during test-time helps improve listwise ranking performance. However, due to the scarcity of reasoning-intensive training data, existing rerankers perform poorly in many complex ranking scenarios, and the ranking ability of reasoning-intensive rerankers remains largely underdeveloped. In this paper, we first propose an automated reasoning-intensive training data synthesis framework, which sources training queries and passages from diverse domains and applies DeepSeek-R1 to generate high-quality training labels. To empower the listwise reranker with strong reasoning ability, we further propose a two-stage training approach, which includes a cold-start supervised fine-tuning (SFT) stage and a reinforcement learning (RL) stage. During the RL stage, we design a novel multi-view ranking reward tailored to the multi-turn nature of listwise ranking. Extensive experiments demonstrate that our trained reasoning-intensive reranker \textbf{ReasonRank} outperforms existing baselines significantly and also achieves much lower latency than the pointwise reranker. Our codes are available at https://github.com/8421BCD/ReasonRank.