arxiv_cs_cv 2026年4月20日

VEFX-Bench: 汎用的ビデオ編集とビジュアルエフェクトのための包括的なベンチマーク

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

Translated: 2026/4/20 10:46:48

video-editingvisual-effectsreinforcement-learninggenerative-aicomputer-vision

Japanese Translation

arXiv:2604.16272v1 Announce Type: new Abstract: AI 支援のビデオ制作がより実用的になりつつある中、指示に基づいたビデオ編集は、生成または撮影された映像をプロフェッショナルな要件を満たすよう精製するための不可欠なツールとなっています。しかし、この分野はまだ、完全な編集例を含む大規模な人間の付注データセットと、編集システムの比較に使用できる標準化された評価者に乏しいままです。既存のリソースは規模が小さく、編集された出力を欠いている、または人間の品質ラベルが存在しないという制限を受けており、現在の評価は高価な手動検査や、編集の品質に特化していません。VEFX-Dataset を導入し、これは 9 つの主要な編集カテゴリと 32 つのサブカテゴリにわたる 5,049 件のビデオ編集例を含む人間の付注データセットです。各例は 3 つの解離された次元にラベル付けされています：指示の追従、レンダリング品質、および編集の排他性です。VEFX-Dataset を基に、VEFX-Reward を提案します。これはビデオ編集品質の评估用に特別に設計された報酬モデルです。VEFX-Reward はソースビデオ、編集指示、および編集後のビデオを同時に処理し、順序回帰を介して次元ごとの品質スコアを予測します。さらに、標準的な編集システムの比較のために 300 件の厳選されたビデオプロンプトペアを含む VEFX-Bench を公開しました。実験では、VEFX-Reward は標準的な IQA/VQA メトリックおよびグループごとの好意度評価において、一般的な VLM 評価者および以前の報酬モデルよりも人間の判断とより強く一致することが示されました。VEFX-Reward を評価者として使用する際、代表的な商用およびオープンソースのビデオ編集システムをベンチマークし、現在のモデルにおいて視覚的妥当性、指示の追従、および編集の局所性の間にはまだ持続的なギャップがあることが明らかになりました。

Original Content

arXiv:2604.16272v1 Announce Type: new Abstract: As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured footage to meet professional requirements. Yet the field still lacks both a large-scale human-annotated dataset with complete editing examples and a standardized evaluator for comparing editing systems. Existing resources are limited by small scale, missing edited outputs, or the absence of human quality labels, while current evaluation often relies on expensive manual inspection or generic vision-language model judges that are not specialized for editing quality. We introduce VEFX-Dataset, a human-annotated dataset containing 5,049 video editing examples across 9 major editing categories and 32 subcategories, each labeled along three decoupled dimensions: Instruction Following, Rendering Quality, and Edit Exclusivity. Building on VEFX-Dataset, we propose VEFX-Reward, a reward model designed specifically for video editing quality assessment. VEFX-Reward jointly processes the source video, the editing instruction, and the edited video, and predicts per-dimension quality scores via ordinal regression. We further release VEFX-Bench, a benchmark of 300 curated video-prompt pairs for standardized comparison of editing systems. Experiments show that VEFX-Reward aligns more strongly with human judgments than generic VLM judges and prior reward models on both standard IQA/VQA metrics and group-wise preference evaluation. Using VEFX-Reward as an evaluator, we benchmark representative commercial and open-source video editing systems, revealing a persistent gap between visual plausibility, instruction following, and edit locality in current models.