arxiv_cs_cv 2026年4月24日

単一チャートを超えて：マルチチャートにおける質問応答のベンチマーク

Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts

Translated: 2026/4/24 19:48:20

multimodalquestion-answeringmultichartllmbenchmarking

Japanese Translation

arXiv:2604.21344v1 Announce Type: cross 要約：チャートは複雑な情報を提示するために広く利用されています。実世界の上で意味のある洞察を導くためには、複数の関連するチャートを同時に解釈する必要があります。マルチチャート画像の理解に関する研究は十分に探求されていません。これにより、マルチチャート画像上の質問応答に专门为設計された中規模データセット PolyChartQA を導入します。PolyChartQA は、学術誌に収められたコンピュータサイエンスの研究成果から収集された 534 枚のマルチチャート画像（合計 2,297 枚の子チャート）と、2,694 組の質問応答対から構成されます。PolyChartQA で、9 つの最先端の多式言語モデル (MLM) の質問タイプ、難易度、質問の元、およびマルチチャートの主要構造的特徴にわたる性能を評価しました。われらの結果は、人間作成の質問における LLM ベースの精度 (L-Accuracy) が MLM 生成の質問と比較して 27.4% 低下し、提案したプロンプト手法により 5.39% の L-精度向上であることを示しています。

Original Content

arXiv:2604.21344v1 Announce Type: cross Abstract: Charts are widely used to present complex information. Deriving meaningful insights in real-world contexts often requires interpreting multiple related charts together. Research on understanding multi-chart images has not been extensively explored. We introduce PolyChartQA, a mid-scale dataset specifically designed for question answering over multi-chart images. PolyChartQA comprises 534 multi-chart images (with a total of 2,297 sub-charts) sourced from peer-reviewed computer science research publications and 2,694 QA pairs. We evaluate the performance of nine state-of-the-art Multimodal Language Models (MLMs) on PolyChartQA across question type, difficulty, question source, and key structural characteristics of multi-charts. Our results show a 27.4% LLM-based accuracy (L-Accuracy) drop on human-authored questions compared to MLM-generated questions, and a 5.39% L-accuracy gain with our proposed prompting method.