arxiv_cs_ai 2026年2月10日

6G-Bench: AIベースのAI-Native 6Gネットワークにおける意味的通信とネットワークレベルの推論に対する開発検証の標準 benchmarks

6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks

Open original article

Translated: 2026/3/7 14:14:29

artificial-intelligencenetworkingmachine-learningai-native-technologyexpert-systems

Japanese Translation

この文書では、6G-Benchについて紹介します。これは、オブジェクトが30の意思決定タスク（T1〜T30）を抽出した分類されたためのAIベースの6Gネットワークにおける意味的通信とネットワークレベルの推論に対する公開検証です。 6G-Benchは、これらすべてのタスクに基づく標準化活動（3GPP、IETF、ETSI、ITU-T、O-RAN協会）を経由し、5つの統一戦略カテゴリーに整理します。113,475のシナリオから、2次元問題質問を生成し、最大でマルチステップと不確実性、最悪状況での悔いがない最優解のために最小限の最適化が求められます。自動的なフィルタリングと専門家の人間認証により、3,722件の問題は高度な評価セットとして残されますが、全シナリオの公開プールを支援する訓練と細かい調整に提供されます。 6G-Benchを使用して我々は22のファウンダーシンモデルを検討し、これらのモデルが1. Denseと混合型専用アーキテクチャ、短バッフルと長バッフル設計（最大で1Mトークン）、開示重量和秘密システムに影響します。各モデルの決定論的単一ショット正確性は、0.22〜0.82範囲で変動しており、意味的推論能力が大きなばらつきがあります。統一した脆弱性分析により、最も有意義なタスク上の意図的または政策的推論の精度は0.87〜0.89と高まりますが、対照に選択した強度の高い最適化タスクに対するパス@5が0.20〜0.91です。オープン・サイエンスと再現性を支援するため、6G-BenchデータセットはGitHubのhttps://github.com/maferrag/6G-Benchにオープンしてあります。

Original Content

arXiv:2602.08675v1 Announce Type: cross Abstract: This paper introduces 6G-Bench, an open benchmark for evaluating semantic communication and network-level reasoning in AI-native 6G networks. 6G-Bench defines a taxonomy of 30 decision-making tasks (T1--T30) extracted from ongoing 6G and AI-agent standardization activities in 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, and organizes them into five standardization-aligned capability categories. Starting from 113,475 scenarios, we generate a balanced pool of 10,000 very-hard multiple-choice questions using task-conditioned prompts that enforce multi-step quantitative reasoning under uncertainty and worst-case regret minimization over multi-turn horizons. After automated filtering and expert human validation, 3,722 questions are retained as a high-confidence evaluation set, while the full pool is released to support training and fine-tuning of 6G-specialized models. Using 6G-Bench, we evaluate 22 foundation models spanning dense and mixture-of-experts architectures, short- and long-context designs (up to 1M tokens), and both open-weight and proprietary systems. Across models, deterministic single-shot accuracy (pass@1) spans a wide range from 0.22 to 0.82, highlighting substantial variation in semantic reasoning capability. Leading models achieve intent and policy reasoning accuracy in the range 0.87--0.89, while selective robustness analysis on reasoning-intensive tasks shows pass@5 values ranging from 0.20 to 0.91. To support open science and reproducibility, we release the 6G-Bench dataset on GitHub: https://github.com/maferrag/6G-Bench