Abstract
Generative Adversarial Networks (GANs) has demonstrated remarkable progress in generating realistic images from text descriptions. However, learning the complex data distributions across both text and image domains presents significant challenges, such as producing limited image variety. Previous works have addressed these challenges by incorporating self-supervision into text-to-image GANs approaches. Yet, a limitation of these approaches is that the auxiliary self-supervised tasks ignore the semantic information of the input text-image pairs during training, leading to goal inconsistency within the discriminator. To overcome this, we propose a novel text-to-image synthesis method, Label Augmented Perceptual Generative Adversarial Networks (LAP-GAN), which integrates label augmented discriminators and a perceptual loss mechanism. The label augmented discriminators augment the label for the self-supervised task to consider semantic information, thereby aligning the objectives of the image modeling and self-supervision tasks into a single, unified goal. Concurrently, the perceptual loss mechanism is integrated to leverage semantic high-level features, complementing self-supervision to refine the image synthesis process. The proposed LAP-GAN ultimately achieves high-quality image synthesis, representing a significant advancement in text-to-image generation using three benchmark datasets. The source code for the proposed LAP-GAN is available at: https://github.com/Jityan/lapgan.
| Original language | English |
|---|---|
| Article number | 129005 |
| Journal | Expert Systems with Applications |
| Volume | 296 |
| DOIs | |
| Publication status | Published - 15 Jan 2026 |
Free Keywords
- GANs
- Generative model
- Label augmentation
- Self-supervised learning
- Text-to-image
ASJC Scopus subject areas
- General Engineering
- Computer Science Applications
- Artificial Intelligence