Abstract
Current diffusion-based inpainting models struggle to preserve unmasked regions or generate highly coherent content. Additionally, it is hard for them to generate meaningful content for 3D inpainting. To tackle these challenges, we design a plug-and-play branch that runs through the entire generation process to enhance existing models. Specifically, we utilize dual encoders - a Convolutional Neural Network (CNN) encoder and the pre-trained Variational AutoEncoder (VAE) encoder, to encode masked images. The latent code and the feature map from the dual encoders are fed to diffusion models simultaneously. In addition, we apply Zero-padded initialization to solve the problem of mode collapse caused by this branch. Experiments on BrushBench and EditBench demonstrate that models with our plug-and-play branch can improve the coherence of inpainting, and our model achieves new state-of-the-art results.
| Original language | English |
|---|---|
| Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
| Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: 6 Apr 2025 → 11 Apr 2025 |
Keywords
- Diffusion models
- Image inpainting
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering