GeoEdit: Geometry-Aware Object Editing via
Dual-Branch Denoising

ECCV 2026

Yi He^1,* Jiangming Wang^3,* Xinyu Wang¹ Mark Fong⁴ Songchun Zhang⁵ Yuxuan Xue^6,‡ Hai-Tao Zheng^1,2,† Yue Ma^5,†

¹ Shenzhen International Graduate School, Tsinghua University ² Pengcheng Laboratory, Shenzhen
³ Sun Yat-sen University ⁴ Peking University ⁵ HKUST ⁶ University of Tübingen

^* Equal contribution ^‡ Project leader ^† Corresponding authors

Precisely manipulating objects in a single photograph (translation, rotation, scaling) while obeying 3D physical constraints remains unsolved for diffusion-based editors. Current 2D methods lack spatial awareness and produce perspective violations. Forcing structural proxies into the latent space also disrupts variance homogeneity, and the resulting self-attention leakage leads to ghosting and background blur. The core difficulty is asymmetric: the relocated object must follow a rigid geometry, yet the uncovered background needs freedom to synthesize plausible content. We present GeoEdit, a training-free Lift-Manipulate-Render-Denoise pipeline that satisfies both constraints. We decouple scene and object in 3D, align them through point correspondence, and render a geometry-aligned proxy with a structural depth map. A Dual-Branch Denoising stage then refines this proxy: a video diffusion backbone preserves object identity, while 3D constraints are injected into the foreground within a narrow denoising window at matching noise variance (variance-homogeneous injection). The background denoises freely. Because the injected signal matches the native latent statistics, self-attention stays undisturbed. We also introduce GeoEditBench, a pose-aware benchmark covering both translation and rotation. Experiments confirm consistent gains in geometric accuracy, identity fidelity, and background quality, validated by automatic metrics and human studies. Code and data will be publicly released.

Framework

Overview of proposed method. Top: Decoupled 3D reconstruction and precise alignment pipeline. Bottom: Dual-branch denoising architecture featuring warm-start initialization and variance-homogeneous injection.

Results

Gallery

Each example is displayed as Input / Manipulation / Result.

Resize

Input (Original)

Manipulation

Result

Translation

Input (Original)

Manipulation

Result

Rotation

Input (Original)

Manipulation

Result

Replace

Input (Original)

Manipulation

Result

Camera

Input (Original)

Manipulation

Result

Combination

Input (Original)

Manipulation

Result

Combination original car roundabout case10

Combination Manipulation car roundabout case10

Combination target car roundabout case10

Combination input hover car roundabout case10

Qualitative Comparisons

Qualitative comparisons between baselines and our approach. Our model achieves superior performance compared to state-of-the-art methods in background preservation and geometric consistency.

GeoEdit: Geometry-Aware Object Editing viaDual-Branch Denoising

Framework

Results

Gallery

Qualitative Comparisons

GeoEdit: Geometry-Aware Object Editing via
Dual-Branch Denoising