arxiv_cs_ai 2026年2月10日

社会科学的研究を加速するためのアゲンテヒューレシズと実験

Accelerating Social Science Research via Agentic Hypothesization and Experimentation

Translated: 2026/3/7 9:44:47

experigenbayesianstatisticalexpertsbaker-test

Japanese Translation

統計的なデータに基づく社会科学研究は一貫して、観察→仮説生成→試験名 validation の反復循環に依存し、このプロセス全体を促進するデータ駆動型の手法が存在しても、それにより科学的発見全体を支援することはできません。そのため、本研究ではEXPERIGENと呼ばれるアゲンテファクエムを導入して、統計的な発見を操作化しました。EXPERIGENは、2次の検索を持つバイオインフォメーション Inspired バイオインフォーマルダティスアシストゥードオプチミゼーションフィニッシュで、ジェネレータが候補の仮説を提案し、エクスペリエンターが彼を経験的に評価します。複数の分野では、EXPERIGEN は統計的に有意な仮説に2-4倍もより正確なモデルを持つことが明らかとなり、さらに複雑なデータ体制であるマルチモードとリレーショナルデータセットを自然と拡張しました。統計的な性能を超えた場合、仮説は新しい、実証に基づき、そして行動的でなければならないための最終の主導力が必要です。これらの品質を評価するには、機械が生成された仮説に対する専門家のレビューを行い、博士レベルの研究と同等の質感を持つ多くの専門家を含むフィードバックを集めました。最上位の実戦的な確認要求により、アグレメントテストの最初の LLM 構築した仮説が観察されました、統計的に有意な結果は p よりも1. 0 倍大きい効果を示しました。

Original Content

arXiv:2602.07983v1 Announce Type: new Abstract: Data-driven social science research is inherently slow, relying on iterative cycles of observation, hypothesis generation, and experimental validation. While recent data-driven methods promise to accelerate parts of this process, they largely fail to support end-to-end scientific discovery. To address this gap, we introduce EXPERIGEN, an agentic framework that operationalizes end-to-end discovery through a Bayesian optimization inspired two-phase search, in which a Generator proposes candidate hypotheses and an Experimenter evaluates them empirically. Across multiple domains, EXPERIGEN consistently discovers 2-4x more statistically significant hypotheses that are 7-17 percent more predictive than prior approaches, and naturally extends to complex data regimes including multimodal and relational datasets. Beyond statistical performance, hypotheses must be novel, empirically grounded, and actionable to drive real scientific progress. To evaluate these qualities, we conduct an expert review of machine-generated hypotheses, collecting feedback from senior faculty. Among 25 reviewed hypotheses, 88 percent were rated moderately or strongly novel, 70 percent were deemed impactful and worth pursuing, and most demonstrated rigor comparable to senior graduate-level research. Finally, recognizing that ultimate validation requires real-world evidence, we conduct the first A/B test of LLM-generated hypotheses, observing statistically significant results with p less than 1e-6 and a large effect size of 344 percent.