arxiv_cs_ai 2026年4月24日

Operation Sketches と自己学習によるテーブルデータにおける数値推論の一般化

Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

Translated: 2026/4/24 20:27:03

numerical-reasoningself-supervised-learningtable-qacontinual-pretrainingfinqa

Japanese Translation

arXiv:2604.21495v1 Announce Type: cross 要約：専門家ドメインのテーブル上で数値推論を行うモデルは、ドメイン内での高い精度を示すが、ドメインシフトに対する強さは限られている。特定のデータセットで監督学習微調整（SFT）を受けたモデルは、構造的な推論よりもヘッダーと操作に関する短絡的なテクニックに依存する傾向がある。本研究では、継続的な事前学習フレームワークである TaNOS を提案する。TaNOS は以下の 3 つの構成要素を含む：(i) 語彙記憶を削減するためのヘッダーの匿名化、(ii) ミニマimal な構造的指示を提供する operation sketches、(iii) 与えられたテーブルからプログラム第一の方式で正解を保証するプログラム - 質問ペアを構築する自己学習した事前学習。ドメイン語義と数値操作構造を分離させることで、TaNOS は数値推論の汎用性を改善した。8B 規模のインストラクション調化モデルに適用したところ、トレーニングデータ 10% のみで FinQA において実行精度 80.13% を達成し、フルデータセットでトレーニングした SFT ベースライン（73.97%）および GPT-5、Gemini-2.5-Pro などの所有モデルを上回った。さらに、ドメインシフト実験において、標準的な SFT が 10pp 以上のギャップを示すのに対し、TaNOS はほとんど無視できるクロスドメインギャップ（<2pp）を示した。これらの結果は、操作スケッチによる構造的ガイダンス、ヘッダー非特異的表現、そして正解を保証する自己学習が、多様な専門家ドメインのテーブルにおける数値推論の強靭性を向上させる可能性があることを示唆している。

Original Content

arXiv:2604.21495v1 Announce Type: cross Abstract: Numerical reasoning over expert-domain tables often exhibits high in-domain accuracy but limited robustness to domain shift. Models trained with supervised fine-tuning (SFT) on specific datasets tend to rely on header-operation shortcuts rather than structural reasoning. We introduce TaNOS, a continual pre-training framework comprising three components: (i) header anonymization to reduce lexical memorization, (ii) operation sketches that provide minimal structural cues, and (iii) self-supervised pretraining that constructs correctness-guaranteed program-question pairs from given tables in a program-first manner. By decoupling domain semantics and numerical operation structure, TaNOS improves the transferability of numerical reasoning. Applied to an 8B instruction-tuned model, TaNOS achieves 80.13% execution accuracy on FinQA with only 10% train data, outperforming SFT baseline (73.97%) with full train data and proprietary models such as GPT-5, Gemini-2.5-Pro. Furthermore, in the domain-shift experiments, TaNOS displays nearly-negligible cross-domain gap (<2pp) when standard SFT shows over 10pp gap. These results suggest that structural guidance with operation sketches, header-agnostic representations, and correctness-guaranteed self-supervision can improve the robustness of numerical reasoning across diverse expert-domain tables.