arxiv_cs_ai 2026年4月24日

生成人工知能と人間の好意度を一致させるためのオンラインレビュー管理用の新しい大規模言語モデルファインチューニング手法

Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

Open original article

Translated: 2026/4/24 20:15:52

generative-ailarge-language-modelfine-tuningonline-reviewspreference-learning

Japanese Translation

arXiv:2604.21209v1 Announcement Type: new 要約：オンラインレビューは消費者の意思決定プロセスにおいて決定的役割を果たしています。既存の研究は、管理者によるレビューへの対応が顧客関係管理と企業の業績に著しい影響を及ぼすことを示しています。しかし、オンラインレビューの急速な拡大に伴う対応に要する多大な人間の手作業のために、オンラインレビューの大部分が無視されています。生成 AI は多くのタスクにおいて顕著な成果を収めていますが、汎用モデルであるため、ドメイン固有の人間の好意度と十分に一致しない場合があります。これらの汎用生成 AI モデルをドメイン固有のアプリケーションに合わせて調整するため、ファインチューニングが一般的に使用されています。ただし、ドメイン固有データでのファインチューニングには、幻覚、ドメイン固有の人間の好意度を表現する難しさ、オフラインポリシー最適化における過度な保守性といった課題が残っています。これらの課題に対処するために、オンラインレビューの回答生成にドメイン固有の人間の好意度と一致させるための新しいファインチューニング手法を提案します。具体的には、まず幻覚の源を特定し、LLM の幻覚を軽減する効果的なコンテキスト拡張アプローチを提案します。人間の好意度を表現するため、オンラインレビューのドメインにおいて自動的に人間の好意度ペアを構築する、理論駆動型の新しいファインチューニングアプローチを提案します。さらに、ファインチューニングを強化するために、曲線学習アプローチも提案します。既存のオフラインファインチューニング手法における過度な保守性の課題に対処するため、密度推定に基づく制約条件手法を提案し、保守性を緩和すると同時に、その優れた理論的保証を数学的に証明します。広範な評価により、我々の提案したファインチューニング手法の優位性が確認されました。

Original Content

arXiv:2604.21209v1 Announce Type: new Abstract: Online reviews have played a pivotal role in consumers' decision-making processes. Existing research has highlighted the significant impact of managerial review responses on customer relationship management and firm performance. However, a large portion of online reviews remains unaddressed due to the considerable human labor required to respond to the rapid growth of online reviews. While generative AI has achieved remarkable success in a range of tasks, they are general-purpose models and may not align well with domain-specific human preferences. To tailor these general generative AI models to domain-specific applications, finetuning is commonly employed. Nevertheless, several challenges persist in finetuning with domain-specific data, including hallucinations, difficulty in representing domain-specific human preferences, and over conservatism in offline policy optimization. To address these challenges, we propose a novel preference finetuning method to align an LLM with domain-specific human preferences for generating online review responses. Specifically, we first identify the source of hallucination and propose an effective context augmentation approach to mitigate the LLM hallucination. To represent human preferences, we propose a novel theory-driven preference finetuning approach that automatically constructs human preference pairs in the online review domain. Additionally, we propose a curriculum learning approach to further enhance preference finetuning. To overcome the challenge of over conservatism in existing offline preference finetuning method, we propose a novel density estimation-based support constraint method to relax the conservatism, and we mathematically prove its superior theoretical guarantees. Extensive evaluations substantiate the superiority of our proposed preference finetuning method.