arxiv_cs_ai 2026年2月10日

マルチ代理人を搭載したAI：モバイルエッジネットワークでの多モーダル大モデルの高速・偏義性対応推論

Multi-Agentic AI for Fairness-Aware and Accelerated Multi-modal Large Model Inference in Real-world Mobile Edge Networks

Open original article

Translated: 2026/3/7 12:25:29

machine-learningnatural-language-processinglarge-modelssemantic-understandinginfrastructure-management

Japanese Translation

ジェネレーティング AI (GenAI) は、自然言語処理やコンテンツクリエーション分野でアプリケーションを transformative に変化させました。しかし中央的なインフェクションは、高遅延、制限のされたカスタマイズとプライバシーに関する懸念によって支えられています。モバイルエッジネットワークでの大規模モデル (LM) の展開で有望を見込む解決策が登場しました。しかしながら多種のモーダルな複数のインスタンスがそれぞれ異なるリソース要件と推理速度を必要し、(prompt/outputモーダル）によって指示されていない複雑さも発生します。これに対応して我々はMulti-Agent AI のフレームワークを開発しました。これはモバイルエッジネットワークでの偏義性対応、高速な多モーバルLの推論に特化しています。我々が提案したソリューションには長期プランニングアジェンダ、(prompt スケジューリング) アジェンダと複数のノードでのLモデルの展開アジェンダがありこれらはファキーズ基礎言語モデルによって動きます。これらのアジェンダは実行中テレメトリーと歴史的経験に基づいて自然言語を理解しながら指示転送とLMの部署を協力して最適化します。私たちは新たなテストバッドを開発しましたこれはネットワークの監視、コンテナ化したLモデルの展開、インターナスリースキーム管理とアンテノード間で通信支援することができました。その効果は、推論ラウンド平均遅延時間を 80% 减少させると共に、他のボスライン（正規化されたJiain指標）の公正性を0.90から向上させることが実証されました。またそれ自身に適応し細調定義せずとも、私たちのソリューションは GenAI のエッジ環境で効力のために一般化されるものでした。

Original Content

arXiv:2602.07215v1 Announce Type: cross Abstract: Generative AI (GenAI) has transformed applications in natural language processing and content creation, yet centralized inference remains hindered by high latency, limited customizability, and privacy concerns. Deploying large models (LMs) in mobile edge networks emerges as a promising solution. However, it also poses new challenges, including heterogeneous multi-modal LMs with diverse resource demands and inference speeds, varied prompt/output modalities that complicate orchestration, and resource-limited infrastructure ill-suited for concurrent LM execution. In response, we propose a Multi-Agentic AI framework for latency- and fairness-aware multi-modal LM inference in mobile edge networks. Our solution includes a long-term planning agent, a short-term prompt scheduling agent, and multiple on-node LM deployment agents, all powered by foundation language models. These agents cooperatively optimize prompt routing and LM deployment through natural language reasoning over runtime telemetry and historical experience. To evaluate its performance, we further develop a city-wide testbed that supports network monitoring, containerized LM deployment, intra-server resource management, and inter-server communications. Experiments demonstrate that our solution reduces average latency by over 80% and improves fairness (Normalized Jain index) to 0.90 compared to other baselines. Moreover, our solution adapts quickly without fine-tuning, offering a generalizable solution for optimizing GenAI services in edge environments.