arxiv_cs_cv 2026年2月10日

TeleBoost: 高忠実性、制御可能、および頑健なビデオ生成のための体系的なアライメントフレームワーク

TeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation

Translated: 2026/3/15 18:04:52

teleboostvideo-generationreinforcement-learningsupervised-fine-tuningpost-training

Japanese Translation

arXiv:2602.07595v1 発表タイプ：新要旨：事前学習モデルを、指示に従う、制御可能、かつ長期間の時系列の文脈で頑健な制作物指向モデルに変換する決定的なステップは、ポストトレーニングである。この報告書では、上質なポリシー形成、報酬駆動型強化学習、および好まれの基盤での微調整を、安定性を制約する単一の最適化スタックに統合する体系的なポストトレーニングフレームワークを提示する。このフレームワークは、高いロールアウトコスト、時間的に累積する失敗モード、および異種的で不確かで、しばしば弱く区別力のあるフィードバックを含む、実用的なビデオ生成制約に基づいて設計されている。最適化を個別のトリックの集まりとしてではなく、段階的で診断駆動型のプロセスとして扱うことにより、この報告書は、感性的忠実性、時系列的整合性、およびプロンプト準拠を向上させながら初期化時に確立された制御性を保持するための、統一的なレシピを要約する。得られたフレームワークは、現実世界のデプロイメント設定において依然として安定性、拡張性、および効率性が保たれるスケーラブルなポストトレーニングパイプラインを構築するための明確なブループリントを提供する。

Original Content

arXiv:2602.07595v1 Announce Type: new Abstract: Post-training is the decisive step for converting a pretrained video generator into a production-oriented model that is instruction-following, controllable, and robust over long temporal horizons. This report presents a systematical post-training framework that organizes supervised policy shaping, reward-driven reinforcement learning, and preference-based refinement into a single stability-constrained optimization stack. The framework is designed around practical video-generation constraints, including high rollout cost, temporally compounding failure modes, and feedback that is heterogeneous, uncertain, and often weakly discriminative. By treating optimization as a staged, diagnostic-driven process rather than a collection of isolated tricks, the report summarizes a cohesive recipe for improving perceptual fidelity, temporal coherence, and prompt adherence while preserving the controllability established at initialization. The resulting framework provides a clear blueprint for building scalable post-training pipelines that remain stable, extensible, and effective in real-world deployment settings.