GeoEdit:
Geometry-Aware Object Editing via
Dual-Branch Denoising
ECCV 2026
1 Shenzhen International Graduate School, Tsinghua University
2 Pengcheng Laboratory, Shenzhen
3 Sun Yat-sen University
4 Peking University
5 HKUST
6 University of Tübingen
* Equal contribution ‡ Project leader † Corresponding authors
Precisely manipulating objects in a single photograph (translation, rotation, scaling) while obeying 3D physical constraints remains unsolved for diffusion-based editors. Current 2D methods lack spatial awareness and produce perspective violations. Forcing structural proxies into the latent space also disrupts variance homogeneity, and the resulting self-attention leakage leads to ghosting and background blur. The core difficulty is asymmetric: the relocated object must follow a rigid geometry, yet the uncovered background needs freedom to synthesize plausible content. We present GeoEdit, a training-free Lift-Manipulate-Render-Denoise pipeline that satisfies both constraints. We decouple scene and object in 3D, align them through point correspondence, and render a geometry-aligned proxy with a structural depth map. A Dual-Branch Denoising stage then refines this proxy: a video diffusion backbone preserves object identity, while 3D constraints are injected into the foreground within a narrow denoising window at matching noise variance (variance-homogeneous injection). The background denoises freely. Because the injected signal matches the native latent statistics, self-attention stays undisturbed. We also introduce GeoEditBench, a pose-aware benchmark covering both translation and rotation. Experiments confirm consistent gains in geometric accuracy, identity fidelity, and background quality, validated by automatic metrics and human studies. Code and data will be publicly released.
Framework
Overview of proposed method. Top: Decoupled 3D reconstruction and precise alignment pipeline. Bottom: Dual-branch denoising architecture featuring warm-start initialization and variance-homogeneous injection.
Results
Gallery
Each example is displayed as Input / Manipulation / Result.
























































































































Qualitative Comparisons
Qualitative comparisons between baselines and our approach. Our model achieves superior performance compared to state-of-the-art methods in background preservation and geometric consistency.