FloVD: Optical Flow Meets
Video Diffusion Model for
Camera-Controlled Video Synthesis

    1POSTECH     2Microsoft Research Asia

FloVD is a camera-controllable video generation framework that leverages optical flow,
enabling accurate camera control and natural object motion synthesis.

photo
(Left) Our method enables camera-controlled video synthesis through optical flow, allowing for complex camera movements (dolly zoom). (Right) Synthesized video frames with 'zoom-out' camera motion. X-t slice reveals pixel value changes along the red line. Our method shows natural object motion and accurate camera control, while CameraCtrl produces an object without motions, and MotionCtrl produces artifacts.

Abstract

This paper presents FloVD, a novel optical-flow-based video diffusion model for camera-controllable video generation. FloVD leverages optical flow maps to represent motions of the camera and moving objects. This approach offers two key benefits. Since optical flow can be directly estimated from videos, our approach allows for the use of arbitrary training videos without ground-truth camera parameters. Moreover, as background optical flow encodes 3D correlation across different viewpoints, our method enables detailed camera control by leveraging the background motion. To synthesize natural object motion while supporting detailed camera control, our framework adopts a two-stage video synthesis pipeline consisting of optical flow generation and flow-conditioned video synthesis. Extensive experiments demonstrate the superiority of our method over previous approaches in terms of accurate camera control and natural object motion synthesis.

Key Contributions

  • We propose FloVD, camera-controllable video generation framework that leverages optical flow.
  • FloVD enables high-quality video synthesis by utilizing arbitrary training videos without ground-truth camera parameters.
  • FloVD allows for accurate camera control by leveraging background motions of optical flow.
  • FloVD adopt a two-stage pipeline for detailed camera control and high-quality video synthesis.

FloVD Framework

Given an image and camera parameters, our framework synthesizes a video frames following the input camera trajectory.
First, we synthesize two sets of optical flow maps that represent camera and object motions.
Then, two optical flow maps are integrated and fed into the flow-conditioned video synthesis model, enabling camera-controllable video generation.

photo

Video Results

Video synthesis results with complex camera trajectories

Video synthesis results with natural object motions (with 'stop' camera motion)

Application: Video synthesis results with dolly zoom

Note: Second and Third examples are derivative works synthesized by FloVD using images from the internet.
Those examples will be deleted if there is a copyright issue.

Application: Temporally-consistent video editing

(Left) Input video       (Right) Edited video


News

  • Code will be released soon!

Related Links

  • We use CameraCtrl as our baseline model.
  • We borrow a website template from Nerfies. Thanks for the source code.

BibTeX

@article{jin2025flovd,
    title   = {FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis},
    author  = {Jin, Wonjoon and Dai, Qi and Luo, Chong and Baek, Seung-Hwan and Cho, Sunghyun},
    journal = {arXiv preprint arXiv:2502.08244},
    year    = {2025},
  }