MergeTalk: Audio-Driven Talking Head Generation From Single Image With Feature Merge

Jian Gao, Chang Shu, Ximin Zheng, Zheng Lu, Nengsheng Bao

Research output: Journal PublicationArticlepeer-review

Abstract

Audio-driven talking head generation has wide real world applications but remains challenging due to the problems such as audio-lip synchronization, head poses, identity preservation, video quality, etc. We propose a novel two-stage framework that uses explicit 3D face images rendered from a 3D model based on the audio input, as intermediate features. We devise two independent 3D motion parameter generation networks to generate expression and pose parameters for the popular 3DMM model to solve the audio-lip synchronization problem and natural head poses without losing identity information. To improve the final talking head quality such as avoiding facial distortion and artifacts, we propose a novel face feature merge network to accurately extract and fuse the background, identity information, facial texture from the source image, and the lip movements and head poses from the 3D face images, and generate the final videos based on generative adversarial networks. Extensive experiments show that our framework outperforms the SOTA methods in several aspects and has good generalization ability.

Original languageEnglish
Pages (from-to)1850-1854
Number of pages5
JournalIEEE Signal Processing Letters
Volume31
DOIs
Publication statusPublished - 2024

Keywords

  • 3DMM
  • GAN
  • Talking head generation
  • feature merge

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'MergeTalk: Audio-Driven Talking Head Generation From Single Image With Feature Merge'. Together they form a unique fingerprint.

Cite this