Question Answering with Language Models

Jia Yin Wong; Chin Poo Lee; Kian Ming Lim; Jit Yan Lim; Jashila Nair Mogan

doi:10.1109/ICSPC63060.2024.10862271

Question Answering with Language Models

Jia Yin Wong, Chin Poo Lee, Kian Ming Lim, Jit Yan Lim, Jashila Nair Mogan

Research output: Journal Publication › Conference article › peer-review

Abstract

With the significant increase in the number of resources providing various types of information, ranging from thousands to millions of sources including research papers, blogs, and more, extracting and providing meaningful insights has become a challenge. This study develops a Question Answering (QA) system using the Haystack framework, capable of retrieving relevant documents and extracting answers from lengthy texts. The implemented QA system consists of four main components: an indexing pipeline, a document store, a searching pipeline, and an evaluation module. Additionally, this study investigates how different model architectures affect performance in retrieving and extracting answers. Two different retrievers, BM25 Retriever and Dense Passage Retriever, and five different readers, BERT, Roberta, albert, MiniLM, and ELECTRA, are examined. The models are tested and evaluated using the SQuAD datasets. The combination of BM25 Retriever and RoBERTa Reader achieved the best performance, with an F1-score of 0.9301 and an Exact Match score of 0.8956 on the SQuAD2.0 dataset.

Original language	English
Pages (from-to)	18-23
Number of pages	6
Journal	Proceedings of the IEEE Conference on Systems, Process and Control, ICSPC
Issue number	2024
DOIs	https://doi.org/10.1109/ICSPC63060.2024.10862271
Publication status	Published - 2024
Event	12th IEEE Conference on Systems, Process and Control, ICSPC 2024 - Malacca, Malaysia Duration: 7 Dec 2024 → …

Keywords

ALBERT
BERT
BM25 retriever
dense passage retriever
ELECTRA
MiniLM
natural language processing
question answering
Roberta
transformer

ASJC Scopus subject areas

Artificial Intelligence
Computer Science Applications
Information Systems
Information Systems and Management
Safety, Risk, Reliability and Quality
Control and Optimization
Modelling and Simulation
Education

Access to Document

10.1109/ICSPC63060.2024.10862271

Cite this

@article{ccc60528819d43908c734ce3da8edd75,

title = "Question Answering with Language Models",

abstract = "With the significant increase in the number of resources providing various types of information, ranging from thousands to millions of sources including research papers, blogs, and more, extracting and providing meaningful insights has become a challenge. This study develops a Question Answering (QA) system using the Haystack framework, capable of retrieving relevant documents and extracting answers from lengthy texts. The implemented QA system consists of four main components: an indexing pipeline, a document store, a searching pipeline, and an evaluation module. Additionally, this study investigates how different model architectures affect performance in retrieving and extracting answers. Two different retrievers, BM25 Retriever and Dense Passage Retriever, and five different readers, BERT, Roberta, albert, MiniLM, and ELECTRA, are examined. The models are tested and evaluated using the SQuAD datasets. The combination of BM25 Retriever and RoBERTa Reader achieved the best performance, with an F1-score of 0.9301 and an Exact Match score of 0.8956 on the SQuAD2.0 dataset.",

keywords = "ALBERT, BERT, BM25 retriever, dense passage retriever, ELECTRA, MiniLM, natural language processing, question answering, Roberta, transformer",

author = "Wong, {Jia Yin} and Lee, {Chin Poo} and Lim, {Kian Ming} and Lim, {Jit Yan} and Mogan, {Jashila Nair}",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 12th IEEE Conference on Systems, Process and Control, ICSPC 2024 ; Conference date: 07-12-2024",

year = "2024",

doi = "10.1109/ICSPC63060.2024.10862271",

language = "English",

pages = "18--23",

journal = "Proceedings of the IEEE Conference on Systems, Process and Control, ICSPC",

issn = "2769-8378",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2024",

}

TY - JOUR

T1 - Question Answering with Language Models

AU - Wong, Jia Yin

AU - Lee, Chin Poo

AU - Lim, Kian Ming

AU - Lim, Jit Yan

AU - Mogan, Jashila Nair

PY - 2024

Y1 - 2024

N2 - With the significant increase in the number of resources providing various types of information, ranging from thousands to millions of sources including research papers, blogs, and more, extracting and providing meaningful insights has become a challenge. This study develops a Question Answering (QA) system using the Haystack framework, capable of retrieving relevant documents and extracting answers from lengthy texts. The implemented QA system consists of four main components: an indexing pipeline, a document store, a searching pipeline, and an evaluation module. Additionally, this study investigates how different model architectures affect performance in retrieving and extracting answers. Two different retrievers, BM25 Retriever and Dense Passage Retriever, and five different readers, BERT, Roberta, albert, MiniLM, and ELECTRA, are examined. The models are tested and evaluated using the SQuAD datasets. The combination of BM25 Retriever and RoBERTa Reader achieved the best performance, with an F1-score of 0.9301 and an Exact Match score of 0.8956 on the SQuAD2.0 dataset.

AB - With the significant increase in the number of resources providing various types of information, ranging from thousands to millions of sources including research papers, blogs, and more, extracting and providing meaningful insights has become a challenge. This study develops a Question Answering (QA) system using the Haystack framework, capable of retrieving relevant documents and extracting answers from lengthy texts. The implemented QA system consists of four main components: an indexing pipeline, a document store, a searching pipeline, and an evaluation module. Additionally, this study investigates how different model architectures affect performance in retrieving and extracting answers. Two different retrievers, BM25 Retriever and Dense Passage Retriever, and five different readers, BERT, Roberta, albert, MiniLM, and ELECTRA, are examined. The models are tested and evaluated using the SQuAD datasets. The combination of BM25 Retriever and RoBERTa Reader achieved the best performance, with an F1-score of 0.9301 and an Exact Match score of 0.8956 on the SQuAD2.0 dataset.

KW - ALBERT

KW - BERT

KW - BM25 retriever

KW - dense passage retriever

KW - ELECTRA

KW - MiniLM

KW - natural language processing

KW - question answering

KW - Roberta

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=105001473037&partnerID=8YFLogxK

U2 - 10.1109/ICSPC63060.2024.10862271

DO - 10.1109/ICSPC63060.2024.10862271

M3 - Conference article

AN - SCOPUS:105001473037

SN - 2769-8378

SP - 18

EP - 23

JO - Proceedings of the IEEE Conference on Systems, Process and Control, ICSPC

JF - Proceedings of the IEEE Conference on Systems, Process and Control, ICSPC

IS - 2024

T2 - 12th IEEE Conference on Systems, Process and Control, ICSPC 2024

Y2 - 7 December 2024

ER -

Question Answering with Language Models

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this