arxiv_cs_lg 2026年2月10日

BFTS: Bayes アドデティブ回帰木を用いた Thompson Sampling

BFTS: Thompson Sampling with Bayesian Additive Regression Trees

Translated: 2026/3/15 7:04:13

bayesian-additive-regression-treesthompson-samplingcontextual-banditsreinforcement-learningmobile-health

Japanese Translation

arXiv:2602.07767v1 発表タイプ：クロス要約：文脈バンドットは、複雑で非線形なユーザー行動に適応した意思決定を必要とするパーソナライズされたモバイルヘルス介入の核となる技術です。Thompson Sampling (TS) はこれらの問題では好まれた戦略ですが、その性能は背後に存在する報酬モデルの品質に依存しています。標準的な線形モデルはバイアスが強く、ニューラルネットワークアプローチはオンライン設定では脆く、調整が困難である傾向にあります。一方、木アンサンブルは構造化データ予測で優勢ですが、通常は直感的な不確実性の定量化に頼っており、TS に対する確率論的な基礎を欠いています。本稿では、Bayesian Forest Thompson Sampling (BFTS) を提案し、これは完全確率的な木足合算モデルである Bayes アドデティブ回帰木 (BART) を最初に文脈バンドットアルゴリズムに直接探索ループに統合しました。我々は BFTS が理論的に健全であることを証明し、情報理論的な Bayes リグレット上界を \tilde{O}(\sqrt{T}) と導出了。補完的な結果として、"feel-good"バリアントに関する頻統計的最小-max 最適性を確立し、BART の事前分布が非パラメトリックバンドットに適していることを確認しました。経験的に、BFTS は構造化データのベンチマークで接近してノーマルの不確実性定量化の下で最良のリグレットを実現しました。さらに、Drink Less マイクロランダム化試験におけるオフラインポリシー評価において、BFTS は展開されたポリシーと比較してエンゲージメント率を 30% 以上向上させ、行動介入における実用的な効果を実証しました。

Original Content

arXiv:2602.07767v1 Announce Type: cross Abstract: Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of $\tilde{O}(\sqrt{T})$. As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.