arxiv_cs_ai 2026年4月20日

RoleConflictBench：LLM の文脈的感受性を評価するための役の衝突シナリオベンチマーク

RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity

Translated: 2026/4/20 11:17:47

role-conflictllmcontextual-sensitivitybenchmarkingsocial-dilemmas

Japanese Translation

arXiv:2509.25897v2 Announce Type: replace-cross Abstract: 人間はしばしば、複数のロールの期待が衝突して同時に満たせない社会的ジレンマである「役の衝突」に直面します。大規模言語モデル（LLM）がこれらの社会的ダイナミクスにますます対応しているため、重要な研究課題が浮き彫りになりました。そのようなジレンマに直面したとき、LLM は動的な文脈の手がかりを優先し、それとも学習された好みに基づきますか？これを解決するために、私たちは、LLM の役の衝突シナリオにおける文脈的感受性を測定する新たなベンチマークである RoleConflictBench を導入しました。この主観的な分野における客観的な評価を可能にするため、私たちは意思決定のための制約として状況の緊急性を採用しました。競争する状況の緊急性を系統的に変えることで、65 のロールから 5 つの社会的ドメインにおいて 13,000 以上の現実的なシナリオを生成する 3 段階のパイプラインを通じて、データセットを構築しました。この制御された設定により、文脈的感受性を定量的に測定することができ、モデルの決定が状況的文脈と一致するか、それとも学習されたロールの好みに覆されるかを決定できます。我々が 10 つの LLM の分析は、モデルはこの客観的な基準から著しく外れていることを示しました。動的な文脈の手がかりに反応するのではなく、それらの決定は特定の社会的ロールへの偏向によって主に支配されています。

Original Content

arXiv:2509.25897v2 Announce Type: replace-cross Abstract: People often encounter role conflicts -- social dilemmas where the expectations of multiple roles clash and cannot be simultaneously fulfilled. As large language models (LLMs) increasingly navigate these social dynamics, a critical research question emerges. When faced with such dilemmas, do LLMs prioritize dynamic contextual cues or the learned preferences? To address this, we introduce RoleConflictBench, a novel benchmark designed to measure the contextual sensitivity of LLMs in role conflict scenarios. To enable objective evaluation within this subjective domain, we employ situational urgency as a constraint for decision-making. We construct the dataset through a three-stage pipeline that generates over 13,000 realistic scenarios across 65 roles in five social domains by systematically varying the urgency of competing situations. This controlled setup enables us to quantitatively measure contextual sensitivity, determining whether model decisions align with the situational contexts or are overridden by the learned role preferences. Our analysis of 10 LLMs reveals that models substantially deviate from this objective baseline. Instead of responding to dynamic contextual cues, their decisions are predominantly governed by the preferences toward specific social roles.