arxiv_cs_lg 2026年4月24日

オンライン生存解析：Cox 比例危険度モデルにおけるバンディ手法

Online Survival Analysis: A Bandit Approach under Cox PH Model

Translated: 2026/4/24 20:04:36

survival-analysiscox-ph-modelmulti-armed-banditonline-learningregret-bounds

Japanese Translation

生存解析は、欠測を含む時系列イベントデータのマッピングをモデル化するために広く使用される統計的な枠組みである。古典的手法である Cox 比例危険度（Cox PH）モデルは、共変量が危険関数に及ぼす効果を推定するための半パラメトリックアプローチを提供する。その重要性にもかかわらず、生存解析はオンライン環境ではほとんど研究されておらず、特にバンディ枠組みにおいて、新たなデータが時間とともに到着する際に順序立てて治療法を決定する必要があるという状況では十分に探求されていない。本研究では、Cox PH モデルに基づく純粋なオンライン学習環境に生存解析を統合するための初期ステップを踏み、段階的参加、遅延フィードバック、および右方欠測という重要な課題に対処する。我々は、探索と搾取的なバランスを取るために 3 つの標準的なバンディアルゴリズムを適合させ、次線形 regret バウンドの理論的保障を得た。SEER がんデータを用いた大規模シミュレーションと半リアル実験は、我々のアプローチが near-optimal な治療方針の迅速かつ効果的な学習を可能にすることを示した。

Original Content

arXiv:2604.20296v1 Announce Type: cross Abstract: Survival analysis is a widely used statistical framework for modeling time-to-event data under censoring. Classical methods, such as the Cox proportional hazards (Cox PH) model, offer a semiparametric approach to estimating the effects of covariates on the hazard function. Despite its importance, survival analysis has been largely unexplored in online settings, particularly within the bandit framework, where decisions must be made sequentially to optimize treatments as new data arrive over time. In this work, we take an initial step toward integrating survival analysis into a purely online learning setting under the Cox PH model, addressing key challenges including staggered entry, delayed feedback, and right censoring. We adapt three canonical bandit algorithms to balance exploration and exploitation, with theoretical guarantees of sublinear regret bounds. Extensive simulations and semi-real experiments using SEER cancer data demonstrate that our approach enables rapid and effective learning of near-optimal treatment policies.