TextFace: Text-to-Style Mapping Based Face Generation and Manipulation

Xianxu Hou; Xiaokang Zhang; Yudong Li; Linlin Shen

doi:10.1109/TMM.2022.3160360

TextFace: Text-to-Style Mapping Based Face Generation and Manipulation

Xianxu Hou, Xiaokang Zhang, Yudong Li, Linlin Shen

Research output: Journal Publication › Article › peer-review

16 Citations (Scopus)

Abstract

As a subtopic of text-to-image synthesis, text-to-face generation has great potential in face-related applications. In this paper, we propose a generic text-to-face framework, namely, TextFace, to achieve diverse and high-quality face image generation from text descriptions. We introduce text-to-style mapping, a novel method where the text description can be directly encoded into the latent space of a pretrained StyleGAN. Guided by our text-image similarity matching and face captioning-based text alignment, the textual latent code can be fed into the generator of a well-trained StyleGAN to produce diverse face images with high resolution (1024×1024). Furthermore, our model inherently supports semantic face editing using text descriptions. Finally, experimental results quantitatively and qualitatively demonstrate the superior performance of our model.

Original language	English
Pages (from-to)	3409-3419
Number of pages	11
Journal	IEEE Transactions on Multimedia
Volume	25
DOIs	https://doi.org/10.1109/TMM.2022.3160360
Publication status	Published - 2023
Externally published	Yes

Keywords

GANs
cross modal
text-guided semantic face manipulation
text-to-face generation
text-to-image generation

ASJC Scopus subject areas

Signal Processing
Electrical and Electronic Engineering
Media Technology
Computer Science Applications

Access to Document

10.1109/TMM.2022.3160360

Cite this

@article{59a84146ed344b599e7a645389bc4a32,

title = "TextFace: Text-to-Style Mapping Based Face Generation and Manipulation",

abstract = "As a subtopic of text-to-image synthesis, text-to-face generation has great potential in face-related applications. In this paper, we propose a generic text-to-face framework, namely, TextFace, to achieve diverse and high-quality face image generation from text descriptions. We introduce text-to-style mapping, a novel method where the text description can be directly encoded into the latent space of a pretrained StyleGAN. Guided by our text-image similarity matching and face captioning-based text alignment, the textual latent code can be fed into the generator of a well-trained StyleGAN to produce diverse face images with high resolution (1024×1024). Furthermore, our model inherently supports semantic face editing using text descriptions. Finally, experimental results quantitatively and qualitatively demonstrate the superior performance of our model.",

keywords = "GANs, cross modal, text-guided semantic face manipulation, text-to-face generation, text-to-image generation",

author = "Xianxu Hou and Xiaokang Zhang and Yudong Li and Linlin Shen",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2023",

doi = "10.1109/TMM.2022.3160360",

language = "English",

volume = "25",

pages = "3409--3419",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - TextFace

T2 - Text-to-Style Mapping Based Face Generation and Manipulation

AU - Hou, Xianxu

AU - Zhang, Xiaokang

AU - Li, Yudong

AU - Shen, Linlin

PY - 2023

Y1 - 2023

N2 - As a subtopic of text-to-image synthesis, text-to-face generation has great potential in face-related applications. In this paper, we propose a generic text-to-face framework, namely, TextFace, to achieve diverse and high-quality face image generation from text descriptions. We introduce text-to-style mapping, a novel method where the text description can be directly encoded into the latent space of a pretrained StyleGAN. Guided by our text-image similarity matching and face captioning-based text alignment, the textual latent code can be fed into the generator of a well-trained StyleGAN to produce diverse face images with high resolution (1024×1024). Furthermore, our model inherently supports semantic face editing using text descriptions. Finally, experimental results quantitatively and qualitatively demonstrate the superior performance of our model.

AB - As a subtopic of text-to-image synthesis, text-to-face generation has great potential in face-related applications. In this paper, we propose a generic text-to-face framework, namely, TextFace, to achieve diverse and high-quality face image generation from text descriptions. We introduce text-to-style mapping, a novel method where the text description can be directly encoded into the latent space of a pretrained StyleGAN. Guided by our text-image similarity matching and face captioning-based text alignment, the textual latent code can be fed into the generator of a well-trained StyleGAN to produce diverse face images with high resolution (1024×1024). Furthermore, our model inherently supports semantic face editing using text descriptions. Finally, experimental results quantitatively and qualitatively demonstrate the superior performance of our model.

KW - GANs

KW - cross modal

KW - text-guided semantic face manipulation

KW - text-to-face generation

KW - text-to-image generation

UR - http://www.scopus.com/inward/record.url?scp=85126695399&partnerID=8YFLogxK

U2 - 10.1109/TMM.2022.3160360

DO - 10.1109/TMM.2022.3160360

M3 - Article

AN - SCOPUS:85126695399

SN - 1520-9210

VL - 25

SP - 3409

EP - 3419

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

TextFace: Text-to-Style Mapping Based Face Generation and Manipulation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this