arxiv_cs_cv 2026年4月24日

フレームを超えて：透視動画から 360°パノラマ動画を生成する

Beyond the Frame: Generating 360 Panoramic Videos from Perspective Videos

Translated: 2026/4/24 19:49:21

video-generation360-videocomputer-visionpanoramic-visualsdeep-learning

Japanese Translation

arXiv:2504.07940v3 Announce Type: replace 摘要：360°動画は、我々の動的視覚的世界を表現する有望なメディアとして台頭しました。標準的なカメラの「トンネル視」に比べ、その境界のない視野は周囲をより完全な視点で捉えます。既存の動画モデルは標準的な動画の生成を得意といますが、フルパノラマ動画の生成についてはまだ課題が残っています。この論文では、透視動画をインプットとして受け取り、元の動画と一貫性を持ったフルパノラマ動画を生成する「動画から 360°生成」というタスクに取り組んでいます。従来の動画生成タスクに比べ、出力の視野が著しく広く、またモデルはシーンにおける空間レイアウトと物体の動的挙動の両方に深い理解を持たなければ、空間・時間的一貫性を維持できません。これらの課題に対処するため、まずオンラインに入手可能な豊富な 360°動画を活用し、高品質なデータフィルタリングパイプラインを構築してペアワイズトレーニングデータを厳選しました。その後、学習プロセスを促進し 360°動画の生成品質を向上させるために、幾何学的かつ運動認識の操作を慎重に設計しました。実験結果は、我々のモデルが野外で撮影された透視動画からリアルかつ整合性の高い 360°動画を生成できることを示しています。さらに、動画安定化、カメラ視点制御、インタラクティブな視覚質問応答などの潜在応用例を展示しています。

Original Content

arXiv:2504.07940v3 Announce Type: replace Abstract: 360{\deg} videos have emerged as a promising medium to represent our dynamic visual world. Compared to the "tunnel vision" of standard cameras, their borderless field of view offers a more complete perspective of our surroundings. While existing video models excel at producing standard videos, their ability to generate full panoramic videos remains elusive. In this paper, we investigate the task of video-to-360{\deg} generation: given a perspective video as input, our goal is to generate a full panoramic video that is consistent with the original video. Unlike conventional video generation tasks, the output's field of view is significantly larger, and the model is required to have a deep understanding of both the spatial layout of the scene and the dynamics of objects to maintain spatio-temporal consistency. To address these challenges, we first leverage the abundant 360{\deg} videos available online and develop a high-quality data filtering pipeline to curate pairwise training data. We then carefully design a series of geometry- and motion-aware operations to facilitate the learning process and improve the quality of 360{\deg} video generation. Experimental results demonstrate that our model can generate realistic and coherent 360{\deg} videos from in-the-wild perspective video. In addition, we showcase its potential applications, including video stabilization, camera viewpoint control, and interactive visual question answering.