arxiv_cs_lg 2026年2月10日

推論時への学習可能なチernoフ基準 (Learnable Chernoff Baselines)

Learnable Chernoff Baselines for Inference-Time Alignment

Translated: 2026/3/15 14:47:51

learnable-chernoff-baselinesinference-time-alignmentkl-regularizationrejection-samplingdiffusion-models

Japanese Translation

arXiv:2602.07738v1 発表タイプ: 新しい論文要旨：我々は生成モデルにおける推論時報酬指向の対称化 (alignment) を研究する。既存の方法は、アーキテクチャに特化した適応や計算コストの高い推論手順に依存していることが多い。我々は、KL 正規化された報酬指向から生じる指数関数傾けされた核から効率的かつ近似したサンプリングを行うための手法として、学習可能なチernoフ基準 (LCBs) を導入する。事前訓練されたモデルへのブラックボックスサンプリングアクセスのみを利用することで、LCBs は適応的に選択された許容確率を用いた拒絶サンプリングの一種を実装し、推論計算のスケーリングに対して微細な制御を可能にする。我々は理想の対称化モデルへの総変化距離保証を確立し、連続および離散的な拡散の設定において、LCB サンプルングは理想の拒絶サンプルングと非常に近似的でありながら、事前訓練されたモデルへのクエリ数を劇的に削減する実証を行っている。

Original Content

arXiv:2602.07738v1 Announce Type: new Abstract: We study inference-time reward-guided alignment for generative models. Existing methods often rely on either architecture-specific adaptations or computationally costly inference procedures. We introduce Learnable Chernoff Baselines (LCBs) as a method for efficiently and approximately sampling from the exponentially tilted kernels that arise from KL-regularized reward alignment. Using only black-box sampling access to the pretrained model, LCBs implement a form of rejection sampling with adaptively selected acceptance probabilities, which allows fine-grained control over inference-compute scaling. We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling while using substantially fewer queries to the pretrained model.