LAP-GAN: Label augmentation with perceptual loss for self-supervised text-to-image synthesis

Research output: Journal PublicationArticlepeer-review

Abstract

Generative Adversarial Networks (GANs) has demonstrated remarkable progress in generating realistic images from text descriptions. However, learning the complex data distributions across both text and image domains presents significant challenges, such as producing limited image variety. Previous works have addressed these challenges by incorporating self-supervision into text-to-image GANs approaches. Yet, a limitation of these approaches is that the auxiliary self-supervised tasks ignore the semantic information of the input text-image pairs during training, leading to goal inconsistency within the discriminator. To overcome this, we propose a novel text-to-image synthesis method, Label Augmented Perceptual Generative Adversarial Networks (LAP-GAN), which integrates label augmented discriminators and a perceptual loss mechanism. The label augmented discriminators augment the label for the self-supervised task to consider semantic information, thereby aligning the objectives of the image modeling and self-supervision tasks into a single, unified goal. Concurrently, the perceptual loss mechanism is integrated to leverage semantic high-level features, complementing self-supervision to refine the image synthesis process. The proposed LAP-GAN ultimately achieves high-quality image synthesis, representing a significant advancement in text-to-image generation using three benchmark datasets. The source code for the proposed LAP-GAN is available at: https://github.com/Jityan/lapgan.

Original languageEnglish
Article number129005
JournalExpert Systems with Applications
Volume296
DOIs
Publication statusPublished - 15 Jan 2026

Free Keywords

  • GANs
  • Generative model
  • Label augmentation
  • Self-supervised learning
  • Text-to-image

ASJC Scopus subject areas

  • General Engineering
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'LAP-GAN: Label augmentation with perceptual loss for self-supervised text-to-image synthesis'. Together they form a unique fingerprint.

Cite this