arxiv_cs_cv 2026年4月24日

Deep kernel video approximation for unsupervised action segmentation

Translated: 2026/4/24 19:44:59

unsupervised-action-segmentationdeep-kernel-spacemaximum-mean-discrepancyneural-tangent-kernelvideo-processing

Japanese Translation

arXiv:2604.21572v1 Announce Type: new 本稿は、大規模データの保存が可能でない場合や許可されないケースを含むアプリケーションにおいて興味深い無教師済み逐動画行動セグメンテーションに焦点を当てています。われわれは、背後にあるフレーム分布を可能な限り近似するために、ディープカーネル空間で動画をセグメントすることを提案します。もともとの動画分布とその近似の間のこの近さの尺度を定義するために、我々は分布空間における幾何学的性質を保持する尺度である最大均値乖離（MMD）に依存しており、これによりより信頼性の高い見積もりが可能になります。さらに、一般的に使われている最良輸送の尺度とは異なり、MMD は最適化と高速化の両方が容易です。われわれは、MMD が作用するカーネル空間を定義するために、固定されたカーネルに対する改善された描写力を持つニューラルタンゲントカーネル（NTK）を使用することに決定し、かつ、入力（動画近似）およびカーネル関数の共同学習において平凡な解を避けることができるからです。最後に、我々は、6 つの標準ベンチマークにおいて最先進の逐動画手法と比較して競争的な結果を示し、セグメント数が不明な場合に、以前の聚合体作業よりも高い F1 スコアを持つことを示しました。

Original Content

arXiv:2604.21572v1 Announce Type: new Abstract: This work focuses on per-video unsupervised action segmentation, which is of interest to applications where storing large datasets is either not possible, or nor permitted. We propose to segment videos by learning in deep kernel space, to approximate the underlying frame distribution, as closely as possible. To define this closeness metric between the original video distribution and its approximation, we rely on maximum mean discrepancy (MMD) which is a geometry-preserving metric in distribution space, and thus gives more reliable estimates. Moreover, unlike the commonly used optimal transport metric, MMD is both easier to optimize, and faster. We choose to use neural tangent kernels (NTKs) to define the kernel space where MMD operates, because of their improved descriptive power as opposed to fixed kernels. And, also, because NTKs sidestep the trivial solution, when jointly learning the inputs (video approximation) and the kernel function. Finally, we show competitive results when compared to state-of-the-art per-video methods, on six standard benchmarks. Additionally, our method has higher F1 scores than prior agglomerative work, when the number of segments is unknown.