GeoEdit

GeoEdit: Geometry-Aware Object Editing via
Dual-Branch Denoising

ECCV 2026

Yi He1,* Jiangming Wang3,* Xinyu Wang1 Mark Fong4 Songchun Zhang5 Yuxuan Xue6,‡ Hai-Tao Zheng1,2,† Yue Ma5,†

1 Shenzhen International Graduate School, Tsinghua University    2 Pengcheng Laboratory, Shenzhen   
3 Sun Yat-sen University    4 Peking University    5 HKUST    6 University of Tübingen

* Equal contribution    Project leader    Corresponding authors

Teaser Image

Precisely manipulating objects in a single photograph (translation, rotation, scaling) while obeying 3D physical constraints remains unsolved for diffusion-based editors. Current 2D methods lack spatial awareness and produce perspective violations. Forcing structural proxies into the latent space also disrupts variance homogeneity, and the resulting self-attention leakage leads to ghosting and background blur. The core difficulty is asymmetric: the relocated object must follow a rigid geometry, yet the uncovered background needs freedom to synthesize plausible content. We present GeoEdit, a training-free Lift-Manipulate-Render-Denoise pipeline that satisfies both constraints. We decouple scene and object in 3D, align them through point correspondence, and render a geometry-aligned proxy with a structural depth map. A Dual-Branch Denoising stage then refines this proxy: a video diffusion backbone preserves object identity, while 3D constraints are injected into the foreground within a narrow denoising window at matching noise variance (variance-homogeneous injection). The background denoises freely. Because the injected signal matches the native latent statistics, self-attention stays undisturbed. We also introduce GeoEditBench, a pose-aware benchmark covering both translation and rotation. Experiments confirm consistent gains in geometric accuracy, identity fidelity, and background quality, validated by automatic metrics and human studies. Code and data will be publicly released.

Framework


Framework Overview

Overview of proposed method. Top: Decoupled 3D reconstruction and precise alignment pipeline. Bottom: Dual-branch denoising architecture featuring warm-start initialization and variance-homogeneous injection.

Results


Each example is displayed as Input / Manipulation / Result.

Qualitative Comparisons


Framework Overview

Qualitative comparisons between baselines and our approach. Our model achieves superior performance compared to state-of-the-art methods in background preservation and geometric consistency.