arxiv_cs_ai 2026年2月10日

LLMのロボットがホ humain的な性格を本当に再現できるのか？ -対話的紛争解決におけるAIと人間の行動配和分析-

Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution

Translated: 2026/3/7 8:50:24

Japanese Translation

> 大言語モデル (LLM) は、例えば法律調整、交渉、および紛爭解決といった社会的な設定において人間のような行動を模倣する機会が増えています。しかしながら、これらのモデリングが観察された人間の性格・行動パターンを再現しているのか、その謎はまだ解けません。例えば人の性格は、個人ごとに状況をどのように乗り越えるかを形作る一方で、また感情的交渉における戦略的な選択や行動にも影響を与えています。このため問題は以下の通りです。性格を提示した先の LLM が人間による紛争解決の異なるパターンを再現することは可能だろうか？精度に優れ、対話的状況で戦略的な選択とその結果への評価に基づく指標を持った評価フレームワークについて説明し、これにより人間同士における LLM による紛争解決の行動と可視化することができるという新たなデータセット作成法を提示します。最後に私たち自身が当社の評価フレームワークを使用して人気開発された三つの先進 LLM を用意し、この評価により LLMS に対する性格による性格の違いに関して、LLM 間ではなく人間データより異なる傾向が出ていると示します。これによって AI のシミュレーションが実世界で使用される前に心理的な基盤を含めることや認証することの必要性に気づかされると警告します

Original Content

arXiv:2602.07414v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to simulate human behavior in social settings such as legal mediation, negotiation, and dispute resolution. However, it remains unclear whether these simulations reproduce the personality-behavior patterns observed in humans. Human personality, for instance, shapes how individuals navigate social interactions, including strategic choices and behaviors in emotionally charged interactions. This raises the question: Can LLMs, when prompted with personality traits, reproduce personality-driven differences in human conflict behavior? To explore this, we introduce an evaluation framework that enables direct comparison of human-human and LLM-LLM behaviors in dispute resolution dialogues with respect to Big Five Inventory (BFI) personality traits. This framework provides a set of interpretable metrics related to strategic behavior and conflict outcomes. We additionally contribute a novel dataset creation methodology for LLM dispute resolution dialogues with matched scenarios and personality traits with respect to human conversations. Finally, we demonstrate the use of our evaluation framework with three contemporary closed-source LLMs and show significant divergences in how personality manifests in conflict across different LLMs compared to human data, challenging the assumption that personality-prompted agents can serve as reliable behavioral proxies in socially impactful applications. Our work highlights the need for psychological grounding and validation in AI simulations before real-world use.