This paper presents FloVD, a novel optical-flow-based video diffusion model for camera-controllable video generation. FloVD leverages optical flow maps to represent motions of the camera and moving objects. This approach offers two key benefits. Since optical flow can be directly estimated from videos, our approach allows for the use of arbitrary training videos without ground-truth camera parameters. Moreover, as background optical flow encodes 3D correlation across different viewpoints, our method enables detailed camera control by leveraging the background motion. To synthesize natural object motion while supporting detailed camera control, our framework adopts a two-stage video synthesis pipeline consisting of optical flow generation and flow-conditioned video synthesis. Extensive experiments demonstrate the superiority of our method over previous approaches in terms of accurate camera control and natural object motion synthesis.
Given an image and camera parameters, our framework synthesizes a video frames following the input camera trajectory. First, we synthesize two sets of optical flow maps that represent camera and object motions. Then, two optical flow maps are integrated and fed into the flow-conditioned video synthesis model, enabling camera-controllable video generation.
Note: Second and Third examples are derivative works synthesized by FloVD using images from the internet. Those examples will be deleted if there is a copyright issue.
(Left) Input video (Right) Edited video
@article{jin2025flovd,
title = {FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis},
author = {Jin, Wonjoon and Dai, Qi and Luo, Chong and Baek, Seung-Hwan and Cho, Sunghyun},
journal = {arXiv preprint arXiv:2502.08244},
year = {2025},
}