arxiv_cs_lg 2026年4月24日

自己報告に基づく LLM エージェントによる一般目的の個人シミュレーション

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Translated: 2026/4/24 20:11:14

llmgenerative-agentsmachine-learningbehavioral-modelingself-report-data

Japanese Translation

arXiv:2411.10109v2 Announce Type: replace-cross 摘要：機械学習は、大量の構造化データと明確に定義された結果が入手可能な場合、人間行動を十分に予測できますが、これらのモデルは特定の結果に限定されており、新しいドメインへの適用は容易ではありません。本研究では、自己報告データに基づき個々人固有のシミュレーション（すなわち、「生成エージェント」）を構築することで、大型言語モデル（LLM）がより汎用的なアプローチをサポートできるか検証します。多様な全国規模のサンプリング（1,052 名のアメリカ人）を使用したデータを用い、以下いずれかまたは組み合わせからエージェントを構築しました：（i）2 時間の半構造化インタビュー（American Voices Project インタビュースケジュールを使用して抽出）、（ii）構造化アンケート（一般社会調査およびビッグファイブ人格属性調査）、（iii）上記の両方のデータソース。一般社会調査の留保項目において、エージェントの精度は、参加者の 2 週間テスト・リテスト一貫性と比較して、インタビューのみを使用した 83%（アンケートのみを使用した 82%、組み合わせを使用した場合 86%）に達し、個人のデモグラフィック情報のみでプロンプトされたエージェント（74%）より高いものであり、同様の実験においても人格的特徴や行動を予測する精度で同等の結果を示しました。また、デモグラフィックのみをベースにした基準線と比較して、人種やイデオロギーのグループ間における精度の格差を減少させました。これらの結果は、LLM エージェントは、汎用的な個人シミュレーションを様々な結果に対してサポートすることを示しており、タスク固有の訓練データを必要としないことを明確にしています。

Original Content

arXiv:2411.10109v2 Announce Type: replace-cross Abstract: Machine learning can predict human behavior well when substantial structured data and well-defined outcomes are available, but these models are typically limited to specific outcomes and cannot readily be applied to new domains. We test whether large language models (LLMs) can support a more general-purpose approach by building person-specific simulations (i.e., "generative agents") grounded in self-report data. Using data from a diverse national sample of 1,052 Americans, we build agents from (i) two-hour, semi-structured interviews (elicited using the American Voices Project interview schedule), (ii) structured surveys (the General Social Survey and Big Five personality inventory), or (iii) both sources combined. On held-out General Social Survey items, agent accuracy reached 83% (interview only), 82% (surveys only), and 86% (combined) of participants' two-week test-retest consistency, compared with agents prompted only with individuals' demographics (74%). Agents predicted personality traits and behaviors in experiments with similar accuracy, and reduced disparities in accuracy across racial and ideological groups relative to demographics-only baselines. Together, these results show that LLMs agents grounded in rich qualitative or quantitative self-report data can support general-purpose simulation of individuals across outcomes, without requiring task-specific training data.