arxiv_cs_ai 2026年4月24日

さらに深く、広く見る：マイクロビデオ人気予測のための統合的時空間拡張

Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

Translated: 2026/4/24 20:36:02

micro-video-popularity-predictionspatio-temporal-enlargementtemporal-dynamicsmemory-bankvideo-recommendation-systems

Japanese Translation

arXiv:2604.20311v2 Announce Type: replace-cross 摘要：マイクロビデオ人気予測（MVPP）は、オンラインメディア上の動画の将来の人気を予測することを目的としており、コンテンツレコメンデーションやトラフィック割当などのアプリケーションにおいて不可欠です。実際の状況において、MVPPのアプローチは、既定的動画の時間的动态（temporal）と他の動画との歴史的関連性（spatial）の両方を理解する必要があります。しかし、既存のアプローチは両方の次元で制限に悩んでおり、時間の次元ではスparserな短距離サンプリングに頼ることによりコンテンツの認識が制限され、空間の次元では限られた容量と低効率のフラットな検索メモリに依存し、スケーラブルな知識利用を妨げています。これらの制限を克服するため、我々は極長動画シーケンスの精密な認識を可能とし、全ての関連的历史動画を統合する無限に拡張できるスケーラブルなメモリバンクをサポートする、統合的時空間拡張を実現する統一されたフレームワークを提案します。技術的には、我々はフレームスコアリングモジュールを利用して、動画フレームから2つの補完的な経路（スparserなサンプリングとdenseな認識）を通じてハイライトの示唆を抽出する、タイムエルアーゼメントを導入し、その出力を適応的に融合させて頑健な長シーケンスコンテンツ理解を可能にします。空間拡張のためには、我々はトポロジ意識メモリバンクを構築し、歴史的に関連するコンテンツをトポロジ的关系に基づいて階層的にクラスター化しました。直接的にメモリ容量を拡張するのではなく、新しい動画を統合する際に対応するクラスターのエンコーダー特性を更新することで、無限の歴史的関連性を得つつ無限の記憶量増加を回避します。3つの広く使用されるMVPPベンチマークに基づく大規模実験により、我々の手法は主要な指標において11の強力なベースラインを一貫して凌駕し、予測精度とランク定一致性の両面で頑健な改善を達成したことを示しています。

Original Content

arXiv:2604.20311v2 Announce Type: replace-cross Abstract: Micro-video popularity prediction (MVPP) aims to forecast the future popularity of videos on online media, which is essential for applications such as content recommendation and traffic allocation. In real-world scenarios, it is critical for MVPP approaches to understand both the temporal dynamics of a given video (temporal) and its historical relevance to other videos (spatial). However, existing approaches sufer from limitations in both dimensions: temporally, they rely on sparse short-range sampling that restricts content perception; spatially, they depend on flat retrieval memory with limited capacity and low efficiency, hindering scalable knowledge utilization. To overcome these limitations, we propose a unified framework that achieves joint spatio-temporal enlargement, enabling precise perception of extremely long video sequences while supporting a scalable memory bank that can infinitely expand to incorporate all relevant historical videos. Technically, we employ a Temporal Enlargement driven by a frame scoring module that extracts highlight cues from video frames through two complementary pathways: sparse sampling and dense perception. Their outputs are adaptively fused to enable robust long-sequence content understanding. For Spatial Enlargement, we construct a Topology-Aware Memory Bank that hierarchically clusters historically relevant content based on topological relationships. Instead of directly expanding memory capacity, we update the encoder features of the corresponding clusters when incorporating new videos, enabling unbounded historical association without unbounded storage growth. Extensive experiments on three widely used MVPP benchmarks demonstrate that our method consistently outperforms 11 strong baselines across mainstream metrics, achieving robust improvements in both prediction accuracy and ranking consistency.