arxiv_cs_ai 2026年2月10日

スケーラブル・デルピ: スタックしたリスク評価用の大規模言語モデル

Scalable Delphi: Large Language Models for Structured Risk Estimation

Translated: 2026/3/7 11:18:14

delphirisk-assessmentlarge-language-modelsstructured-risk-estimation

Japanese Translation

高額域での定量的なリスク評価は、不明瞭な特性を推定するために構造化された専門家の申出を利用します。最も真実の金の標準 - デルピ手法は、カスタムで検証可能、監視できる判断を行いつつ、数か月の協調と専門時間が必要なことを意味します。我々は広大な言語モデル (LLM) が構造化された専門家申出を利用して-scalableで代用可能なのかを調べています。複多の専門人物について古典的なプロトコルを、ラリー・シェアリングを活用し、再検討とその過程に対応するためのシスケーラバブル・デルピを開発しました。目標の量は通常としては不可観測であるため、評価の文法は必要条件に基づいています：確認可能の代理がカスタムされ、証拠への敏感さ、人間専門家の判断と一致する。我々はAI補強されたセキュリティリスクについて、3つの能力ベンチマークと独立した人が行う人間の評価研究を使用して評価しました。LLM パネルは、統計的データの間において堅固な回帰関係を表しており（ペアソン・r=0.87-0.95）、確証に基づいた情報が追加されるに従って増し、他の人専門評価パネルから近似します（一方の人と比較）。これはLLM 辞書の誘引使用は構造化した専門家決定の適用を許可し、通常的かつ一般的な方法によって不可能だった設定に適用することを示しています。一時的な評価（数か月から数分）を減らしました。

Original Content

arXiv:2602.08889v1 Announce Type: new Abstract: Quantitative risk assessment in high-stakes domains relies on structured expert elicitation to estimate unobservable properties. The gold standard - the Delphi method - produces calibrated, auditable judgments but requires months of coordination and specialist time, placing rigorous risk assessment out of reach for most applications. We investigate whether Large Language Models (LLMs) can serve as scalable proxies for structured expert elicitation. We propose Scalable Delphi, adapting the classical protocol for LLMs with diverse expert personas, iterative refinement, and rationale sharing. Because target quantities are typically unobservable, we develop an evaluation framework based on necessary conditions: calibration against verifiable proxies, sensitivity to evidence, and alignment with human expert judgment. We evaluate in the domain of AI-augmented cybersecurity risk, using three capability benchmarks and independent human elicitation studies. LLM panels achieve strong correlations with benchmark ground truth (Pearson r=0.87-0.95), improve systematically as evidence is added, and align with human expert panels - in one comparison, closer to a human panel than the two human panels are to each other. This demonstrates that LLM-based elicitation can extend structured expert judgment to settings where traditional methods are infeasible, reducing elicitation time from months to minutes.