arxiv_cs_lg 2026年2月10日

TACIT: 直観的 Thought の変換感応型捉える

TACIT: Transformation-Aware Capturing of Implicit Thought

Translated: 2026/3/15 13:03:29

tactdiffusion-modelvisual-reasoningtransformerflow-matching

Japanese Translation

arXiv:2602.07061v1 発表タイプ：new 要旨：私たちは、解釈可能な視覚推論のための拡散型トランスフォーマーである TACIT（Transformation-Aware Capturing of Implicit Thought）を提案します。言語ベースの推論システムとは異なり、TACIT は rectified flow を使用してピクセル空間全体で動作し、推論プロセスの各ステップにおける直接的な可視化を可能にします。我々は迷路解決のタスクにおいてこのアプローチを評価しました。モデルは、未解決の迷路の画像を解決策に変換するように学習しました。100 万ペアの合成迷路データにおける主要な結果は以下の通りです： - 100 エポックのトレーニング損失に対する 192 倍の削減 - 真値との L2 距離における 22.7 倍の改善 - 通常の拡散モデル（100〜1000 ステップ）に比べて 10 回の Euler ステップのみが必要数値分析は、驚異的な相転移現象を明らかにしています：解決策は変換の 68%（ゼロの再現率）の間は不可視にあり、それから t=0.70 においてたった 2% のプロセスで突然現れます。最も顕著なことは、空間のすべての領域で同時に出現する 100% のサンプルが、順序的な経路構築を排除し、全体論的（holistic）な推論証拠を提供するということです。この「eureka モメント」のパターン（長期的な酝酿後の突然の結晶化）は、人間の認知における洞察現象と平行しています。ノイズフリーフローMatching を備えたピクセル空間設計は、言語の直下および直前に動作する直観的推論戦略の発達を理解するための基盤を提供します。

Original Content

arXiv:2602.07061v1 Announce Type: new Abstract: We present TACIT (Transformation-Aware Capturing of Implicit Thought), a diffusion-based transformer for interpretable visual reasoning. Unlike language-based reasoning systems, TACIT operates entirely in pixel space using rectified flow, enabling direct visualization of the reasoning process at each inference step. We demonstrate the approach on maze-solving, where the model learns to transform images of unsolved mazes into solutions. Key results on 1 million synthetic maze pairs include: - 192x reduction in training loss over 100 epochs - 22.7x improvement in L2 distance to ground truth - Only 10 Euler steps required (vs. 100-1000 for typical diffusion models) Quantitative analysis reveals a striking phase transition phenomenon: the solution remains invisible for 68% of the transformation (zero recall), then emerges abruptly at t=0.70 within just 2% of the process. Most remarkably, 100% of samples exhibit simultaneous emergence across all spatial regions, ruling out sequential path construction and providing evidence for holistic rather than algorithmic reasoning. This "eureka moment" pattern -- long incubation followed by sudden crystallization -- parallels insight phenomena in human cognition. The pixel-space design with noise-free flow matching provides a foundation for understanding how neural networks develop implicit reasoning strategies that operate below and before language.