arxiv_cs_ai 2026年2月10日

リトリーバージューアググレート生成変種について評価：自然言語ベースのSQLクエリとAPI呼び出しからの生成

Evaluating Retrieval-Augmented Generation Variants for Natural Language-Based SQL and API Call Generation

Translated: 2026/3/7 11:36:43

sqlnlpapigeneration

Japanese Translation

企業システムは、ユーザーの要求を構造化された操作に翻訳する（例：SQLクエリやREST API呼び出し）ように進化しており、そのために自然言語インターフェースが必要となっています。非常に大きい言語モデル(LLM)は、コード生成に対して期待を示しており(chen et al., 2021; huynh and lin, 2026）、Domain専用の企業的な状況で、同時に両方の追跡と修正タスクが共同で処理されるための効果はまだ十分に探索されていません。この論文では三つのリトリーバージューアググレート生成(RAG)変種[lewis等., 2021] - デフォルトのRAG、自前のRAG【Asai等., 2023】、そしてコーピオラジック[RAG【wang et al., 2025】--の評価をSQLクエリ生成（REST API呼び出し）、そしてAPIやデータベースで両方とも動くと必要な組み込みタスクの合併をクロスします。リアルタイムシステム銀行[transactional banking]という現実的な企業的なユースケースを使用して、我々は新しいテストデータセットを造って、そのモダリティをカバーし、24種類のパライテーション条件で18つの試験事案を評価し、API・のみ、またはデータベースやどちらも動かないのは、そして両者両方とも働かせる、ハイブリッドドキュメンテーションのコンテキストでの実行精度が上がることを示しています（最大で79.30パーセント）」とRAG。一方、組込タスクはその上にアップする（最大で78. 86））。とりわけ、コーピオラジック[RAG【wang等., 2025】のシナリオでは、両者は統計的に大幅な改善を達成（exactmatch, ロボットはRAGの10. 29%で7.45%に対し）はSQL生成におけるパフォーマンス（最大15. 32%から11. 56%）が卓越していることにより、これは主にリクエスト分解よりもtop-kの追跡や二重有用性フィルター劣るからです。私たちの結果によると、ドキュメンテーションの多様性のもとでの収集政策設計はプロductionsグレードの自然言語インターフェースの重要な軸です。ドキュメンテーションの不均衡がリクエスト分解による優れた影響を与えています、特にその他の関連的なリクエスト分岐よりtop-kの追跡及び二重有用性フィルター劣る。

Original Content

arXiv:2602.07086v1 Announce Type: cross Abstract: Enterprise systems increasingly require natural language interfaces that can translate user requests into structured operations such as SQL queries and REST API calls. While large language models (LLMs) show promise for code generation [Chen et al., 2021; Huynh and Lin, 2025], their effectiveness in domain-specific enterprise contexts remains underexplored, particularly when both retrieval and modification tasks must be handled jointly. This paper presents a comprehensive evaluation of three retrieval-augmented generation (RAG) variants [Lewis et al., 2021] -- standard RAG, Self-RAG [Asai et al., 2024], and CoRAG [Wang et al., 2025] -- across SQL query generation, REST API call generation, and a combined task requiring dynamic task classification. Using SAP Transactional Banking as a realistic enterprise use case, we construct a novel test dataset covering both modalities and evaluate 18 experimental configurations under database-only, API-only, and hybrid documentation contexts. Results demonstrate that RAG is essential: Without retrieval, exact match accuracy is 0% across all tasks, whereas retrieval yields substantial gains in execution accuracy (up to 79.30%) and component match accuracy (up to 78.86%). Critically, CoRAG proves most robust in hybrid documentation settings, achieving statistically significant improvements in the combined task (10.29% exact match vs. 7.45% for standard RAG), driven primarily by superior SQL generation performance (15.32% vs. 11.56%). Our findings establish retrieval-policy design as a key determinant of production-grade natural language interfaces, showing that iterative query decomposition outperforms both top-k retrieval and binary relevance filtering under documentation heterogeneity.