Abstract
With the significant increase in the number of resources providing various types of information, ranging from thousands to millions of sources including research papers, blogs, and more, extracting and providing meaningful insights has become a challenge. This study develops a Question Answering (QA) system using the Haystack framework, capable of retrieving relevant documents and extracting answers from lengthy texts. The implemented QA system consists of four main components: an indexing pipeline, a document store, a searching pipeline, and an evaluation module. Additionally, this study investigates how different model architectures affect performance in retrieving and extracting answers. Two different retrievers, BM25 Retriever and Dense Passage Retriever, and five different readers, BERT, Roberta, albert, MiniLM, and ELECTRA, are examined. The models are tested and evaluated using the SQuAD datasets. The combination of BM25 Retriever and RoBERTa Reader achieved the best performance, with an F1-score of 0.9301 and an Exact Match score of 0.8956 on the SQuAD2.0 dataset.
Original language | English |
---|---|
Pages (from-to) | 18-23 |
Number of pages | 6 |
Journal | Proceedings of the IEEE Conference on Systems, Process and Control, ICSPC |
Issue number | 2024 |
DOIs | |
Publication status | Published - 2024 |
Event | 12th IEEE Conference on Systems, Process and Control, ICSPC 2024 - Malacca, Malaysia Duration: 7 Dec 2024 → … |
Keywords
- ALBERT
- BERT
- BM25 retriever
- dense passage retriever
- ELECTRA
- MiniLM
- natural language processing
- question answering
- Roberta
- transformer
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Information Systems
- Information Systems and Management
- Safety, Risk, Reliability and Quality
- Control and Optimization
- Modelling and Simulation
- Education