arxiv_cs_ai 2026年4月24日

CoFEE: LLM ベースの特徴発見のための推論制御

CoFEE: Reasoning Control for LLM-Based Feature Discovery

Translated: 2026/4/24 20:18:01

large-language-modelsfeature-discoveryreasoning-controlmachine-learninginductive-bias

Japanese Translation

arXiv:2604.21584v1 Announce Type: new Abstract: 複雑な非構造化データからの特徴発見は根本的に推論の問題であり、目標結果の予測因子となる抽象化を特定し、情報洩漏、代理指標、および結果後の信号を回避する必要がある。より高性能な大規模言語モデル（LLM）の登場に伴い、我々の手法はこの課題に対処するための構造化された方法を提供する。LLM は大量の情報を処理できるため、このタスクに適しており、しかし制約のない特徴生成は弱い特徴を生じさせる可能性をもたらす。本研究では、LLM における推論制御を、特徴発見の改善を促すための認知行動を誘発することで研究する。我々は、特徴発見の間に LLM がどのように推論するかにおいて認知行動を強制する推論制御フレームワークである CoFEE（Cognitive Feature Engineering Engine）を導入する。機械学習の観点からは、これらの認知行動はモデルが生成する候補特徴の空間上に構造化された誘導バイアスとして機能する。これらの行動は ML モデルの成果で成功裏に活用されており、包括的推論の連鎖、サブゴールの分解、観測可能性と情報洩漏基準に対する検証、および拒否された推論経路の明示的回帰を含んでいる。制御された比較において、我々は認知行動を強制することで、無制約の標準的な LLM プロンプトよりも高い実証的な予測性を示す特徴を得たことを示している。CoFEE の平均成功スコアは標準的なアプローチより 15.2% 高く、29% 少ない特徴を生成し、コストを 53.3% 削減している。保有された特徴評価を用いて、我々は認知的に誘導された特徴が、特徴発見に使われたデータを超えて一般化するかどうかを評価した。我々の結果は、評価された設定において、推論制御は LLM ベースの特徴発見の品質と効率の改善と関連していることを示している。

Original Content

arXiv:2604.21584v1 Announce Type: new Abstract: Feature discovery from complex unstructured data is fundamentally a reasoning problem: it requires identifying abstractions that are predictive of a target outcome while avoiding leakage, proxies, and post-outcome signals. With the introduction of ever-improving Large Language Models (LLMs), our method provides a structured method for addressing this challenge. LLMs are well suited for this task by being able to process large amounts of information, but unconstrained feature generation can lead to weak features. In this work, we study reasoning control in LLMs by inducing cognitive behaviors for improving feature discovery. We introduce CoFEE (Cognitive Feature Engineering Engine), a reasoning control framework that enforces cognitive behaviors in how the LLM reasons during feature discovery. From a machine learning perspective, these cognitive behaviors act as structured inductive biases over the space of candidate features generated by the model. These behaviors have been exploited with success in ML models, and include backward chaining from outcomes, subgoal decomposition, verification against observability and leakage criteria, and explicit backtracking of rejected reasoning paths. In a controlled comparison, we show that enforcing cognitive behaviors yields features with higher empirical predictability than those under unconstrained vanilla LLM prompts. CoFEE achieves an average Success Rate Score that is 15.2% higher than the vanilla approach, while generating 29% fewer features and reducing costs by 53.3%. Using held-out feature evaluation, we assess whether cognitively induced features generalize beyond the data used for discovery. Our results indicate that, in our evaluated setting, reasoning control is associated with improvements in quality and efficiency of LLM-based feature discovery.