Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects

Yong Xuan Tan; Chin Poo Lee; Mai Neo; Kian Ming Lim; Jit Yan Lim; Ali Alqahtani

doi:10.1109/ACCESS.2023.3306422

Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects

Yong Xuan Tan, Chin Poo Lee, Mai Neo, Kian Ming Lim, Jit Yan Lim, Ali Alqahtani

Research output: Journal Publication › Article › peer-review

8 Citations (Scopus)

Abstract

Text-to-image synthesis is a fascinating area of research that aims to generate images based on textual descriptions. The main goal of this field is to generate images that match the given textual description in terms of both semantic consistency and image realism. While text-to-image synthesis has shown remarkable progress in recent years, it still faces several challenges, mainly related to the level of image realism and semantic consistency. To address these challenges, various approaches have been proposed, which mainly rely on Generative Adversarial Networks (GANs) for optimal performance. This paper provides a review of the existing text-to-image synthesis approaches, which are categorized into four groups: image realism, multiple scene, semantic enhancement, and style transfer. In addition to discussing the existing approaches, this paper also reviews the widely used datasets for text-to-image synthesis, including Oxford-102, CUB-200-2011, and COCO. The evaluation metrics used in this field are also discussed, including Inception Score, Fréchet Inception Distance, Structural Similarity Index, R-precision, Visual-Semantic Similarity, and Semantic Object Accuracy. The paper also offers a compilation of the performance of existing works in the field.

Original language	English
Pages (from-to)	88099-88115
Number of pages	17
Journal	IEEE Access
Volume	11
DOIs	https://doi.org/10.1109/ACCESS.2023.3306422
Publication status	Published - 2023
Externally published	Yes

Keywords

GAN
generative adversarial networks
generative model
review
survey
Text-to-image synthesis

ASJC Scopus subject areas

General Computer Science
General Materials Science
General Engineering

Access to Document

10.1109/ACCESS.2023.3306422

Cite this

@article{801df3f37523464798a1d0c47bacab6f,

title = "Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects",

abstract = "Text-to-image synthesis is a fascinating area of research that aims to generate images based on textual descriptions. The main goal of this field is to generate images that match the given textual description in terms of both semantic consistency and image realism. While text-to-image synthesis has shown remarkable progress in recent years, it still faces several challenges, mainly related to the level of image realism and semantic consistency. To address these challenges, various approaches have been proposed, which mainly rely on Generative Adversarial Networks (GANs) for optimal performance. This paper provides a review of the existing text-to-image synthesis approaches, which are categorized into four groups: image realism, multiple scene, semantic enhancement, and style transfer. In addition to discussing the existing approaches, this paper also reviews the widely used datasets for text-to-image synthesis, including Oxford-102, CUB-200-2011, and COCO. The evaluation metrics used in this field are also discussed, including Inception Score, Fr{\'e}chet Inception Distance, Structural Similarity Index, R-precision, Visual-Semantic Similarity, and Semantic Object Accuracy. The paper also offers a compilation of the performance of existing works in the field.",

keywords = "GAN, generative adversarial networks, generative model, review, survey, Text-to-image synthesis",

author = "Tan, {Yong Xuan} and Lee, {Chin Poo} and Mai Neo and Lim, {Kian Ming} and Lim, {Jit Yan} and Ali Alqahtani",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2023",

doi = "10.1109/ACCESS.2023.3306422",

language = "English",

volume = "11",

pages = "88099--88115",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects

AU - Tan, Yong Xuan

AU - Lee, Chin Poo

AU - Neo, Mai

AU - Lim, Kian Ming

AU - Lim, Jit Yan

AU - Alqahtani, Ali

PY - 2023

Y1 - 2023

N2 - Text-to-image synthesis is a fascinating area of research that aims to generate images based on textual descriptions. The main goal of this field is to generate images that match the given textual description in terms of both semantic consistency and image realism. While text-to-image synthesis has shown remarkable progress in recent years, it still faces several challenges, mainly related to the level of image realism and semantic consistency. To address these challenges, various approaches have been proposed, which mainly rely on Generative Adversarial Networks (GANs) for optimal performance. This paper provides a review of the existing text-to-image synthesis approaches, which are categorized into four groups: image realism, multiple scene, semantic enhancement, and style transfer. In addition to discussing the existing approaches, this paper also reviews the widely used datasets for text-to-image synthesis, including Oxford-102, CUB-200-2011, and COCO. The evaluation metrics used in this field are also discussed, including Inception Score, Fréchet Inception Distance, Structural Similarity Index, R-precision, Visual-Semantic Similarity, and Semantic Object Accuracy. The paper also offers a compilation of the performance of existing works in the field.

AB - Text-to-image synthesis is a fascinating area of research that aims to generate images based on textual descriptions. The main goal of this field is to generate images that match the given textual description in terms of both semantic consistency and image realism. While text-to-image synthesis has shown remarkable progress in recent years, it still faces several challenges, mainly related to the level of image realism and semantic consistency. To address these challenges, various approaches have been proposed, which mainly rely on Generative Adversarial Networks (GANs) for optimal performance. This paper provides a review of the existing text-to-image synthesis approaches, which are categorized into four groups: image realism, multiple scene, semantic enhancement, and style transfer. In addition to discussing the existing approaches, this paper also reviews the widely used datasets for text-to-image synthesis, including Oxford-102, CUB-200-2011, and COCO. The evaluation metrics used in this field are also discussed, including Inception Score, Fréchet Inception Distance, Structural Similarity Index, R-precision, Visual-Semantic Similarity, and Semantic Object Accuracy. The paper also offers a compilation of the performance of existing works in the field.

KW - GAN

KW - generative adversarial networks

KW - generative model

KW - review

KW - survey

KW - Text-to-image synthesis

UR - http://www.scopus.com/inward/record.url?scp=85168719624&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2023.3306422

DO - 10.1109/ACCESS.2023.3306422

M3 - Article

AN - SCOPUS:85168719624

SN - 2169-3536

VL - 11

SP - 88099

EP - 88115

JO - IEEE Access

JF - IEEE Access

ER -

Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this