arxiv_cs_ai 2026年2月10日

SupChain-Bench：供 Chain ワークフロー管理用の大規模言語モデルのベンチマーケィング

SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

Translated: 2026/3/7 8:22:52

Japanese Translation

大規模な言語モデル（LLMs）は、複雑な考慮とツールベースの意思決定での有望性を示しました。これに伴い、生産供給チェーンワークフローでは再現性のある長期間、多段階のオーセストレーションが具体的な業界用手続きに基づいて対応することが難しくなることがわかりました。そのため、現在のモデルにこのような要求に対処するためにはより専門的な手法が必要です。私たちの目的は、この状況についての統一的で現実的なベンチマーケィングを定義することでしたので、我々のSupChain-Benchという名前の新たなフレームワークを作り出しました。我々の実験では、モデル間での実行再現性に対するギャップが明らかになりました。さらに、SupChain-ReActというSOP（標準的なオペレーションプロセス）を経由せずにツール利用に向けた実行可能な手順を自己生成し、これらの最強の性能と最も一致の高いツール呼び出しパフォーマンスを達成するための新しいフレームワークを作りだしました。我々の仕事は、リアルな運用環境での信頼性のある長期間オーセストレーションの原則的なベンチマークを確立し、LLMベースの生産供給チェーン代理人に大きな改善の範囲がまだあることを示します。

Original Content

arXiv:2602.07342v1 Announce Type: new Abstract: Large language models (LLMs) have shown promise in complex reasoning and tool-based decision making, motivating their application to real-world supply chain management. However, supply chain workflows require reliable long-horizon, multi-step orchestration grounded in domain-specific procedures, which remains challenging for current models. To systematically evaluate LLM performance in this setting, we introduce SupChain-Bench, a unified real-world benchmark that assesses both supply chain domain knowledge and long-horizon tool-based orchestration grounded in standard operating procedures (SOPs). Our experiments reveal substantial gaps in execution reliability across models. We further propose SupChain-ReAct, an SOP-free framework that autonomously synthesizes executable procedures for tool use, achieving the strongest and most consistent tool-calling performance. Our work establishes a principled benchmark for studying reliable long-horizon orchestration in real-world operational settings and highlights significant room for improvement in LLM-based supply chain agents.