arxiv_cs_lg 2026年2月10日

Waterstein距離における誤差バウンドを持つ効率的な分布学習

Efficient Distribution Learning with Error Bounds in Wasserstein Distance

Translated: 2026/3/15 15:01:47

waterstein-distanceoptimal-transportmachine-learningdistribution-estimationoptimal-clustering

Japanese Translation

arXiv:2602.08063v1 Announce Type: new 要約：Waterstein距離は確率分布間の距離を定量化する鍵となる指標として登場し、機械学習、制御理論、意思決定理論、生体システムなど多岐にわたる分野に応用されている。その結果、未知の分布をWaterstein距離において非漸近的で計算が容易な誤差バウンド付きで学習することは、多くの分野における基本的な課題となっています。本稿では、未知の確率分布$\\\\mathbb{P}\\\ o$を有限のサンプリングセットから近似離散分布$\\hat{\\mathbb{P}}\\\to$として再構成し、$\\mathbb{P}\\\ o$と$\\hat{\\mathbb{P}}\\\to$間のWaterstein距離をバウンドするよう、新たなアルゴリズム的・理論的枠組みを提案します。我々の枠組みは最適輸送、非線型最適化、集中不等式を利用しています。具体的には、$\\mathbb{P}\\\ o$が未知であるとしても、$\\mathbb{P}\\\ o$と$\\hat{\\mathbb{P}}\\\to$間のWaterstein距離は、$\\hat{\\mathbb{P}}\\\to$のサポートサイズのみに依存するサイズを持つ実行可能な最適化問題（混合整数線型計画）を解くことで、高い確信度で効率的にバウンドすることが示されます。これにより、$\\hat{\\mathbb{P}}\\\to$のサポートを最適に見つけ出し、Waterstein距離誤差を最小化するスマートなクラスタリングアルゴリズムの開発を可能にします。ベンチマークの結果では、我々のアプローチが一般にサポートサイズが明らかに小さく、誤差バウンドがより狭い近似分布を返すことで、状態-of-the-artに準ずる方法よりも優れていることを示しました。

Original Content

arXiv:2602.08063v1 Announce Type: new Abstract: The Wasserstein distance has emerged as a key metric to quantify distances between probability distributions, with applications in various fields, including machine learning, control theory, decision theory, and biological systems. Consequently, learning an unknown distribution with non-asymptotic and easy-to-compute error bounds in Wasserstein distance has become a fundamental problem in many fields. In this paper, we devise a novel algorithmic and theoretical framework to approximate an unknown probability distribution $\mathbb{P}$ from a finite set of samples by an approximate discrete distribution $\widehat{\mathbb{P}}$ while bounding the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$. Our framework leverages optimal transport, nonlinear optimization, and concentration inequalities. In particular, we show that, even if $\mathbb{P}$ is unknown, the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$ can be efficiently bounded with high confidence by solving a tractable optimization problem (a mixed integer linear program) of a size that only depends on the size of the support of $\widehat{\mathbb{P}}$. This enables us to develop intelligent clustering algorithms to optimally find the support of $\widehat{\mathbb{P}}$ while minimizing the Wasserstein distance error. On a set of benchmarks, we demonstrate that our approach outperforms state-of-the-art comparable methods by generally returning approximating distributions with substantially smaller support and tighter error bounds.