EditP23

3D Editing via Propagation of Image Prompts to Multi-View

We propagate user-provided, single-view 2D edits to the multi-view representation of a 3D asset. This enables fast, mask-free, high-fidelity, and consistent 3D editing with intuitive control.

Abstract

We present EditP23, a method for mask-free 3D editing that propagates 2D image edits to multi-view representations in a 3D-consistent manner. In contrast to traditional approaches that rely on text-based prompting or explicit spatial masks, EditP23 enables intuitive edits by conditioning on a pair of images: an original view and its user-edited counterpart. These prompts guide an edit-aware flow in the latent space of a pre-trained multi-view diffusion model, coherently propagating the edit across views. Operating in a feed-forward manner (no optimisation), our method preserves the object’s identity in both structure and appearance. We demonstrate effectiveness across diverse categories and edits, achieving high fidelity to the source without masks.

Method Overview

Edit-guided denoising at one timestep
Overview. At a single diffusion timestep, the source branch (top) feeds the original noisy grid and its source view into the multi-view diffusion model, predicting the velocity towards the source. The edit branch (bottom) feeds the current edited grid conditioned on the target view, predicting the velocity towards the target. Subtracting the two predicts vΔ, an edit-only delta that guides the next update of the edited grid.