Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks

Yong Xuan Tan; Chin Poo Lee; Mai Neo; Kian Ming Lim; Jit Yan Lim

Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks

Yong Xuan Tan, Chin Poo Lee, Mai Neo, Kian Ming Lim, Jit Yan Lim

Research output: Journal Publication › Article › peer-review

8 Citations (Scopus)

Abstract

The text-to-image synthesis aims to synthesize an image based on a given text description, which is especially useful for applications in image editing, graphic design, etc. The main challenges of text-to-image synthesis are to generate images that are visually realistic and semantically consistent with the given text description. In this paper, we proposed some enhancements to the conditional generative model that is widely used for text-to-image synthesis. The enhancements include text conditioning augmentation, feature matching, and LI distance loss function. The text conditioning augmentation expands the text embedding feature space to improve the semantic consistency of the model. The feature matching motivates the model to synthesize more photo-realistic images and enrich the image content variations. Apart from that, the LI distance loss allows the model to generate images that have high visual resemblance to the real images. The empirical results on the CUB-200-2011 dataset demonstrate that the text-to-image synthesis conditional generative model with the proposed enhancements yield the highest Inception score and Structural Similarity Index.

Original language	English
Pages (from-to)	1-7
Number of pages	7
Journal	IAENG International Journal of Computer Science
Volume	49
Issue number	1
Publication status	Published - 2022
Externally published	Yes

Keywords

cGANs
conditional generative adversarial networks
GANs
generative adversarial network
text-to-image-synthesis

ASJC Scopus subject areas

General Computer Science

Cite this

@article{9e46479b8ecf4acdb32a060b41ee2f87,

title = "Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks",

abstract = "The text-to-image synthesis aims to synthesize an image based on a given text description, which is especially useful for applications in image editing, graphic design, etc. The main challenges of text-to-image synthesis are to generate images that are visually realistic and semantically consistent with the given text description. In this paper, we proposed some enhancements to the conditional generative model that is widely used for text-to-image synthesis. The enhancements include text conditioning augmentation, feature matching, and LI distance loss function. The text conditioning augmentation expands the text embedding feature space to improve the semantic consistency of the model. The feature matching motivates the model to synthesize more photo-realistic images and enrich the image content variations. Apart from that, the LI distance loss allows the model to generate images that have high visual resemblance to the real images. The empirical results on the CUB-200-2011 dataset demonstrate that the text-to-image synthesis conditional generative model with the proposed enhancements yield the highest Inception score and Structural Similarity Index.",

keywords = "cGANs, conditional generative adversarial networks, GANs, generative adversarial network, text-to-image-synthesis",

author = "Tan, \{Yong Xuan\} and Lee, \{Chin Poo\} and Mai Neo and Lim, \{Kian Ming\} and Lim, \{Jit Yan\}",

year = "2022",

language = "English",

volume = "49",

pages = "1--7",

journal = "IAENG International Journal of Computer Science",

issn = "1819-656X",

publisher = "International Association of Engineers",

number = "1",

}

TY - JOUR

T1 - Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks

AU - Tan, Yong Xuan

AU - Lee, Chin Poo

AU - Neo, Mai

AU - Lim, Kian Ming

AU - Lim, Jit Yan

PY - 2022

Y1 - 2022

N2 - The text-to-image synthesis aims to synthesize an image based on a given text description, which is especially useful for applications in image editing, graphic design, etc. The main challenges of text-to-image synthesis are to generate images that are visually realistic and semantically consistent with the given text description. In this paper, we proposed some enhancements to the conditional generative model that is widely used for text-to-image synthesis. The enhancements include text conditioning augmentation, feature matching, and LI distance loss function. The text conditioning augmentation expands the text embedding feature space to improve the semantic consistency of the model. The feature matching motivates the model to synthesize more photo-realistic images and enrich the image content variations. Apart from that, the LI distance loss allows the model to generate images that have high visual resemblance to the real images. The empirical results on the CUB-200-2011 dataset demonstrate that the text-to-image synthesis conditional generative model with the proposed enhancements yield the highest Inception score and Structural Similarity Index.

AB - The text-to-image synthesis aims to synthesize an image based on a given text description, which is especially useful for applications in image editing, graphic design, etc. The main challenges of text-to-image synthesis are to generate images that are visually realistic and semantically consistent with the given text description. In this paper, we proposed some enhancements to the conditional generative model that is widely used for text-to-image synthesis. The enhancements include text conditioning augmentation, feature matching, and LI distance loss function. The text conditioning augmentation expands the text embedding feature space to improve the semantic consistency of the model. The feature matching motivates the model to synthesize more photo-realistic images and enrich the image content variations. Apart from that, the LI distance loss allows the model to generate images that have high visual resemblance to the real images. The empirical results on the CUB-200-2011 dataset demonstrate that the text-to-image synthesis conditional generative model with the proposed enhancements yield the highest Inception score and Structural Similarity Index.

KW - cGANs

KW - conditional generative adversarial networks

KW - GANs

KW - generative adversarial network

KW - text-to-image-synthesis

UR - http://www.scopus.com/inward/record.url?scp=85130609642&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85130609642

SN - 1819-656X

VL - 49

SP - 1

EP - 7

JO - IAENG International Journal of Computer Science

JF - IAENG International Journal of Computer Science

IS - 1

ER -

Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this