Room-Scale 3D Reconstruction via Video Diffusion Priors and Parallax-Inducing SfM
Reconstructing high-fidelity, room-scale 3D scenes from a single RGB image remains a persistent challenge. Recent methods like G4Splat have made significant strides
in sparse-view reconstruction by combining 2D Gaussian Splatting with generative priors,
specifically utilizing Stable Virtual Camera for single-view inputs, and enforcing planar constraints to regularize geometry. However,
G4Splat still faces limitations in handling heavily occluded regions, where geometric priors fail to capture complex object interactions,
in single-view scenarios.
We present an enhanced G4Splat pipeline that addresses these limitations through two key innovations. First, we replace the standard MASt3R-SfM initialization with
Depth Anything V3 (DA3), lifting the single view into a denser, metric-accurate point cloud that provides a stronger anchor for the Gaussian optimization, bypassing the MAtCha-based chart
alignment strategy which often fails in non-overlapping regions.. Second, we introduce a parallax-inducing camera trajectory strategy for the generative inpainting stage. Unlike standard rotational
paths or the plane-centric view selection used in the baseline, our "Wiggle & Dolly" trajectories force the video diffusion model to hallucinate valid depth cues and
resolve disocclusions by actively moving into the scene. This approach provides a robust geometric baseline, significantly improving single-view reconstruction quality in the complex,
unseen regions that prior methods struggle to resolve.
Pipeline Highlights: Metric Depth Initialization, Parallax-Inducing Generative Inpainting, and Plane-Based Geometric Regularization.
Single input image
Generated scene
Single input image
Generated scene
Single input image
Generated scene
Figure 1: Result on Single Image 3D Reconstruction using our proposed pipeline.
Our method tackles the sparse view problem in three stages:
Surface normals and depth maps are estimated to regularize the Gaussian splat orientations and positions during optimization. This prevents floating artifacts and encourages accurate geometry from the input view(s).
Scene 1: Outdoor scene with geometric detail
Scene 2: Room reconstruction
This work builds upon G4Splat and Depth Anything V3. We thank the authors for making their code and models publicly available.