QARR-QGF: A Dual-Module Data Augmentation Framework for Enhanced Few-Shot Question Answering

Siao Wah Tan, Chin Poo Lee, Kian Ming Lim, Ali Alqahtani

Research output: Journal PublicationArticlepeer-review

Abstract

Few-shot question answering (QA) aims to train effective QA models using very limited annotated data, but existing approaches often struggle to generalize due to insufficient training examples. This paper proposes a two-stage data augmentation framework, called Question-Answer Replacement and Removal and Question Generation and Filtering (QARR-QGF), to address this challenge. The QARR module enhances pretraining data by systematically replacing and removing question-answer pairs to create more diverse examples. During fine-tuning, the QGF module generates paraphrased questions and applies semantic filtering to retain high-quality training samples. The framework is evaluated using three widely used generative models: Longformer-Encoder-Decoder (LED), BART, and T5, on the SQuAD, HotpotQA, and Natural Questions datasets. Experimental results show that QARR-QGF consistently improves performance across all datasets and few-shot settings. For example, the QARR-QGF-T5 model achieves F1 scores of 82.3% on SQuAD, 59.9% on HotpotQA, and 59.0% on Natural Questions in the 16-shot setting, outperforming previous state-of-the-art methods. These results demonstrate the effectiveness of QARR-QGF in improving few-shot QA performance by generating richer and more diverse training data.

Original languageEnglish
Pages (from-to)160722-160736
Number of pages15
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • deep learning
  • Few-shot
  • generative model
  • language model
  • question answering

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'QARR-QGF: A Dual-Module Data Augmentation Framework for Enhanced Few-Shot Question Answering'. Together they form a unique fingerprint.

Cite this