arxiv_cs_cv 2026年2月10日

UNIKIE-BENCH: Visual Documents における Key Information Extraction のための Large Multimodal Models ベンチマーク

UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

Translated: 2026/2/11 13:42:29

Japanese Translation

arXiv:2602.07038v1 発表種別: new 概要: 実世界の文書からの Key Information Extraction (KIE) は、レイアウト構造、視覚品質、およびタスク固有の情報要件の大きなばらつきにより依然として困難です。最近の Large Multimodal Models (LMMs) は、文書画像からエンドツーエンドで直接 KIE を行う有望な可能性を示しています。本研究では、現実的かつ多様な応用シナリオにわたって包括的かつ体系的な評価を可能にするため、LMMs の KIE 能力を厳密に評価するための統一ベンチマーク UNIKIE-BENCH を導入します。UNIKIE-BENCH は二つの補完的なトラックで構成されます：実運用上のニーズを反映したシナリオ事前定義の schemas を持つ constrained-category KIE トラック、および文書に明示的に存在するあらゆる key information を抽出する open-category KIE トラックです。15 の最先端 LMMs に対する実験により、多様な schema 定義、long-tail key fields、および複雑なレイアウト下での大幅な性能低下が明らかになり、さらに文書タイプやシナリオごとに顕著な性能差が存在することが示されました。これらの知見は、LMM ベースの KIE における grounding accuracy と layout-aware reasoning に関する持続的な課題を浮き彫りにします。全てのコードとデータセットは https://github.com/NEUIR/UNIKIE-BENCH で公開されています。

Original Content

arXiv:2602.07038v1 Announce Type: new Abstract: Key Information Extraction (KIE) from real-world documents remains challenging due to substantial variations in layout structures, visual quality, and task-specific information requirements. Recent Large Multimodal Models (LMMs) have shown promising potential for performing end-to-end KIE directly from document images. To enable a comprehensive and systematic evaluation across realistic and diverse application scenarios, we introduce UNIKIE-BENCH, a unified benchmark designed to rigorously evaluate the KIE capabilities of LMMs. UNIKIE-BENCH consists of two complementary tracks: a constrained-category KIE track with scenario-predefined schemas that reflect practical application needs, and an open-category KIE track that extracts any key information that is explicitly present in the document. Experiments on 15 state-of-the-art LMMs reveal substantial performance degradation under diverse schema definitions, long-tail key fields, and complex layouts, along with pronounced performance disparities across different document types and scenarios. These findings underscore persistent challenges in grounding accuracy and layout-aware reasoning for LMM-based KIE. All codes and datasets are available at https://github.com/NEUIR/UNIKIE-BENCH.