arxiv_cs_ai 2026年4月20日

STRIDE-ED: エモパシー型会話システムのための戦略に根差した段階的推論フレームワーク

STRIDE-ED: A Strategy-Grounded Stepwise Reasoning Framework for Empathetic Dialogue Systems

Translated: 2026/4/20 11:19:05

stride-edempathetic-dialoguellmreinforcement-learningdecision-making

Japanese Translation

arXiv:2604.07100v2 Announce Type: replace-cross 要約: エモパシー型対話は、ユーザーの感情状態を認識するだけでなく、応答生成を通じて戦略意識があり、文脈に敏感な意思決定を行うことが不可欠です。しかし、包括的なエモパシー戦略フレームワーク、タスクに一致する明示的な多段階推論、そして高品質な戦略意識データの欠如は、既存のアプローチを根本的に制限しており、複雑な多段階認知および意思決定プロセスとしてのエモパシー型対話を効果的にモデル化するのを妨げています。これらの課題に対処するため、我々は構造化された戦略条件付推論を通じてエモパシー型対話をモデル化する、STRATEGY-grounded、Interpretable、および DEEP reason（STRIDE-ED）フレームワークを提案します。効果的な学習を支援するために、LLM アノテーション、マルチモデル一致重み付け評価、および動的サンプリングを組み合わせた戦略意識データリファインメントパイプラインを開発し、エモパシー戦略に整合した高品質な訓練データを構築しました。さらに、モデルの振る舞いを標的した感情、エモパシー戦略、および応答フォーマットにより良く合わせるために、上流下流微調整と多目的強化学習を結合した 2 段階の訓練パラダイムを採用しました。広範な実験は、STRIDE-ED が多様なオープンソース LLM にわたって汎化し、自動指標と人間評価の両方で既存の手法を一貫して凌駕することを示しています。

Original Content

arXiv:2604.07100v2 Announce Type: replace-cross Abstract: Empathetic dialogue requires not only recognizing a user's emotional state but also making strategy-aware, context-sensitive decisions throughout response generation. However, the lack of a comprehensive empathy strategy framework, explicit task-aligned multi-stage reasoning, and high-quality strategy-aware data fundamentally limits existing approaches, preventing them from effectively modeling empathetic dialogue as a complex, multi-stage cognitive and decision-making process. To address these challenges, we propose STRIDE-ED, a STRategy-grounded, Interpretable, and DEep reasoning framework that models Empathetic Dialogue through structured, strategy-conditioned reasoning. To support effective learning, we develop a strategy-aware data refinement pipeline integrating LLM-based annotation, multi-model consistency-weighted evaluation, and dynamic sampling to construct high-quality training data aligned with empathetic strategies. Furthermore, we adopt a two-stage training paradigm that combines supervised fine-tuning with multi-objective reinforcement learning to better align model behaviors with target emotions, empathetic strategies, and response formats. Extensive experiments demonstrate that STRIDE-ED generalizes across diverse open-source LLMs and consistently outperforms existing methods on both automatic metrics and human evaluations.