EditP23

3D Editing via Propagation of Image Prompts to Multi-View

We propagate user-provided, single-view 2D edits to the multi-view representation of a 3D asset. This enables fast, mask-free, high-fidelity, and consistent 3D editing with intuitive control.

Abstract

We present EditP23, a method for mask-free 3D editing that propagates 2D image edits to multi-view representations in a 3D-consistent manner. In contrast to traditional approaches that rely on text-based prompting or explicit spatial masks, EditP23 enables intuitive edits by conditioning on a pair of images: an original view and its user-edited counterpart. These prompts guide an edit-aware flow in the latent space of a pre-trained multi-view diffusion model, coherently propagating the edit across views. Operating in a feed-forward manner (no optimisation), our method preserves the object’s identity in both structure and appearance. We demonstrate effectiveness across diverse categories and edits, achieving high fidelity to the source without masks.

How EditP23 Works: From a 2D Edit to a 3D Model

The EditP23 pipeline propagates your 2D edit into a full, 3D-consistent object modification. The process is designed to be intuitive, requiring only a single edited view to guide the entire 3D update. Here’s how it unfolds:

1

Rendering & Initial Setup

The process begins with a 3D object, which is rendered to generate a multi-view grid (mv-grid) composed of six different viewpoints, and an additional fixed prompt view.

2

The 2D Edit

The user can take the prompt view and modify it with any preferred 2D editing tool, such as painting or generative AI. This user-edited image becomes the target view, which guides the 3D edit.

3

Edit Propagation via Multi-View Diffusion

The core of the method is a technique called "edit-aware denoising". At each step, the system runs two parallel processes within a multi-view diffusion model:

  • Source Branch: The original source mv-grid and source view are fed into the model to predict the velocity towards the original object.
  • Target Branch: The current, in-progress grid is conditioned on the target view to predict the velocity towards the final, edited object.
Edit-guided denoising at one timestep
By subtracting the source prediction from the target prediction, the model calculates an "edit-only delta" (vΞ”). This delta isolates the user's intended changes, ensuring the rest of the object's structure and appearance are preserved. This delta then guides the update for the next iteration, consistently propagating the edit across all views in the grid.
4

Final 3D Reconstruction

After the diffusion process completes, the final, edited multi-view grid is generated. This edited grid is then converted into a textured 3D mesh using a reconstruction module like InstantMesh. The output is a fully edited 3D object.