Enhanced Text-to-Image Synthesis with Self-Supervision

Yong Xuan Tan; Chin Poo Lee; Mai Neo; Kian Ming Lim; Jit Yan Lim

doi:10.1109/ACCESS.2023.3268869

Enhanced Text-to-Image Synthesis with Self-Supervision

Yong Xuan Tan, Chin Poo Lee, Mai Neo, Kian Ming Lim, Jit Yan Lim

Research output: Journal Publication › Article › peer-review

6 Citations (Scopus)

Abstract

The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.

Original language	English
Pages (from-to)	1
Number of pages	1
Journal	IEEE Access
Volume	11
DOIs	https://doi.org/10.1109/ACCESS.2023.3268869
Publication status	Published - 20 Aug 2023
Externally published	Yes

Keywords

Computer architecture
GAN
generative adversarial networks
Generative adversarial networks
generative model
Generators
Image synthesis
self-supervised learning
Semantics
Text mining
text-to-image synthesis
Visualization

ASJC Scopus subject areas

General Computer Science
General Materials Science
General Engineering
Electrical and Electronic Engineering

Access to Document

10.1109/ACCESS.2023.3268869

Cite this

@article{5352751885d74589844430c55344b8e5,

title = "Enhanced Text-to-Image Synthesis with Self-Supervision",

abstract = "The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.",

keywords = "Computer architecture, GAN, generative adversarial networks, Generative adversarial networks, generative model, Generators, Image synthesis, self-supervised learning, Semantics, Text mining, text-to-image synthesis, Visualization",

author = "Tan, {Yong Xuan} and Lee, {Chin Poo} and Mai Neo and Lim, {Kian Ming} and Lim, {Jit Yan}",

note = "Publisher Copyright: Author",

year = "2023",

month = aug,

day = "20",

doi = "10.1109/ACCESS.2023.3268869",

language = "English",

volume = "11",

pages = "1",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Enhanced Text-to-Image Synthesis with Self-Supervision

AU - Tan, Yong Xuan

AU - Lee, Chin Poo

AU - Neo, Mai

AU - Lim, Kian Ming

AU - Lim, Jit Yan

N1 - Publisher Copyright: Author

PY - 2023/8/20

Y1 - 2023/8/20

N2 - The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.

AB - The task of Text-to-Image synthesis is a difficult challenge, especially when dealing with low-data regimes, where the number of training samples is limited. In order to address this challenge, the Self-Supervision Text-to-Image Generative Adversarial Networks (SS-TiGAN) has been proposed. The method employs a bi-level architecture, which allows for the use of self-supervision to increase the number of training samples by generating rotation variants. This, in turn, maximizes the diversity of the model representation and enables the exploration of high-level object information for more detailed image construction. In addition to the use of self-supervision, SS-TiGAN also investigates various techniques to address the stability issues that arise in Generative Adversarial Networks. By implementing these techniques, the proposed SS-TiGAN has achieved a new state-of-the-art performance on two benchmark datasets, Oxford-102 and CUB. These results demonstrate the effectiveness of the SS-TiGAN method in synthesizing high-quality, realistic images from text descriptions under low-data regimes.

KW - Computer architecture

KW - GAN

KW - generative adversarial networks

KW - Generative adversarial networks

KW - generative model

KW - Generators

KW - Image synthesis

KW - self-supervised learning

KW - Semantics

KW - Text mining

KW - text-to-image synthesis

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=85153801523&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2023.3268869

DO - 10.1109/ACCESS.2023.3268869

M3 - Article

AN - SCOPUS:85153801523

SN - 2169-3536

VL - 11

SP - 1

JO - IEEE Access

JF - IEEE Access

ER -

Enhanced Text-to-Image Synthesis with Self-Supervision

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this