Generalizable Novel-View Synthesis
using a Stereo Camera

Haechan Lee1,*   Wonjoon Jin1,*   Seung-Hwan Baek1   Sunghyun Cho1
1POSTECH GSAI & CSE

Stereonerf takes multi-view stereo-camera images and synthesizes high-quality novel-view images.

photo

Abstract

In this paper, we propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images. Since recent stereo matching has demonstrated accurate geometry prediction, we introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction. To this end, this paper proposes a novel framework, dubbed StereoNeRF, which integrates stereo matching into a NeRF-based generalizable view synthesis approach. StereoNeRF is equipped with three key components to effectively exploit stereo matching in novel-view synthesis: a stereo feature extractor, a depth-guided plane-sweeping, and a stereo depth loss. Moreover, we propose the StereoNVS dataset, the first multi-view dataset of stereo-camera images, encompassing a wide variety of both real and synthetic scenes. Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis.

Video Results

1. Challenge

Challenges in generalizable novel-view synthesis. Novel-view synthesis is a long-standing ill-posed problem. Feed-forward manner in generalizable view synthesis exacerbates the ill-posedness, leading to inaccurate geometry and low-quality rendering results.

photo

2. Motivation

The superior performance of stereo estimation motivates us to integrate it into the generalizable novel-view synthesis framework. Stereoscopic constraint and large-scale stereo dataset have faciliated remarkable generalization capabilitis of the stereo estimation compared to learning-based MVS methods.

photo

3. Comparison with a Baseline

Since our proposed framework effectively leverages stereoscopic prior from stereo-camera images, our method shows better synthesis quality than a baseline method, GeoNeRF, particularly in scenes with complex structures or textureless regions.

photo

4. Key Method

Our key contribution is a novel framework, dubbed StereoNeRF, which integrates the stereo matching into a existing generalizable view synthesis approach. To this end, we integrate stereo-correlated features from a pre-trained stereo estimation network into the stereo feature extractor.

photo

Main comparison

We compare our method with several baseline methods, GNT, IBRNet, GeoNeRF and NeuRay.

photo

StereoNVS Dataset

We propose the StereoNVS dataset, the first multi-view dataset of stereo-camera images encompassing a wide variety of both real and synthetic scenes. Below are example scenes of the StereoNVS-Real and StereoNVS-Synthetic dataset.

photo

News

  • Our paper is accepted to CVPR 2024.
  • Our dataset is released! Check out our Google Drive Link!
  • Code will be released soon!

Related Links

  • We employ UniMatch as our stereo estimation network thanks to its remarkable generalization capability.
  • We use GeoNeRF as our baseline model.
  • For StereoNVS-Synthetic, we render multi-view stereoscopic images from 3D-FRONT.
  • We borrow a website template from Nerfies. Thanks for the source code.

BibTeX

@article{lee2024generalizable,
  author    = {Lee, Haechan and Jin, Wonjoon and Baek, Seung-Hwan and Cho, Sunghyun},
  title     = {Generalizable Novel-View Synthesis using a Stereo Camera},
  journal   = {CVPR},
  year      = {2024},
}