Dual Encoders for Diffusion-based Image Inpainting

Dezhi Zheng, Kaijun Deng, Jinbao Wang, Linlin Shen

Research output: Journal PublicationConference articlepeer-review

Abstract

Current diffusion-based inpainting models struggle to preserve unmasked regions or generate highly coherent content. Additionally, it is hard for them to generate meaningful content for 3D inpainting. To tackle these challenges, we design a plug-and-play branch that runs through the entire generation process to enhance existing models. Specifically, we utilize dual encoders - a Convolutional Neural Network (CNN) encoder and the pre-trained Variational AutoEncoder (VAE) encoder, to encode masked images. The latent code and the feature map from the dual encoders are fed to diffusion models simultaneously. In addition, we apply Zero-padded initialization to solve the problem of mode collapse caused by this branch. Experiments on BrushBench and EditBench demonstrate that models with our plug-and-play branch can improve the coherence of inpainting, and our model achieves new state-of-the-art results.

Keywords

  • Diffusion models
  • Image inpainting

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Dual Encoders for Diffusion-based Image Inpainting'. Together they form a unique fingerprint.

Cite this