RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network

Kian Long Tan; Chin Poo Lee; Kalaiarasi Sonai Muthu Anbananthen; Kian Ming Lim

doi:10.1109/ACCESS.2022.3152828

RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network

Kian Long Tan, Chin Poo Lee, Kalaiarasi Sonai Muthu Anbananthen, Kian Ming Lim

Research output: Journal Publication › Article › peer-review

189 Citations (Scopus)

Abstract

Due to the rapid development of technology, social media has become more and more common in human daily life. Social media is a platform for people to express their feelings, feedback, and opinions. To understand the sentiment context of the text, sentiment analysis plays the role to determine whether the sentiment of the text is positive, negative, neutral or any other personal feeling. Sentiment analysis is prominent from the perspective of business or politics where it highly impacts the strategic decision making. The challenges of sentiment analysis are attributable to the lexical diversity, imbalanced dataset and long-distance dependencies of the texts. In view of this, a data augmentation technique with GloVe word embedding is leveraged to synthesize more lexically diverse samples by similar word vector replacements. The data augmentation also focuses on the oversampling of the minority classes to mitigate the imbalanced dataset problems. Apart from that, the existing sentiment analysis mostly leverages sequence models to encode the long-distance dependencies. Nevertheless, the sequence models require a longer execution time as the processing is done sequentially. On the other hand, the Transformer models require less computation time with parallelized processing. To that end, this paper proposes a hybrid deep learning method that combines the strengths of sequence model and Transformer model while suppressing the limitations of sequence model. Specifically, the proposed model integrates Robustly optimized BERT approach and Long Short-Term Memory for sentiment analysis. The Robustly optimized BERT approach maps the words into a compact meaningful word embedding space while the Long Short-Term Memory model captures the long-distance contextual semantics effectively. The experimental results demonstrate that the proposed hybrid model outshines the state-of-the-art methods by achieving F1-scores of 93%, 91%, and 90% on IMDb dataset, Twitter US Airline Sentiment dataset, and Sentiment140 dataset, respectively.

Original language	English
Pages (from-to)	21517-21525
Number of pages	9
Journal	IEEE Access
Volume	10
DOIs	https://doi.org/10.1109/ACCESS.2022.3152828
Publication status	Published - 2022
Externally published	Yes

Keywords

long short-term memory
LSTM
recurrent neural network
RNN
RoBERTa
Sentiment
transformer

ASJC Scopus subject areas

General Computer Science
General Materials Science
General Engineering
Electrical and Electronic Engineering

Access to Document

10.1109/ACCESS.2022.3152828

Cite this

@article{d36cfb4619cd43f0a74e1725e1bc33d6,

title = "RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network",

abstract = "Due to the rapid development of technology, social media has become more and more common in human daily life. Social media is a platform for people to express their feelings, feedback, and opinions. To understand the sentiment context of the text, sentiment analysis plays the role to determine whether the sentiment of the text is positive, negative, neutral or any other personal feeling. Sentiment analysis is prominent from the perspective of business or politics where it highly impacts the strategic decision making. The challenges of sentiment analysis are attributable to the lexical diversity, imbalanced dataset and long-distance dependencies of the texts. In view of this, a data augmentation technique with GloVe word embedding is leveraged to synthesize more lexically diverse samples by similar word vector replacements. The data augmentation also focuses on the oversampling of the minority classes to mitigate the imbalanced dataset problems. Apart from that, the existing sentiment analysis mostly leverages sequence models to encode the long-distance dependencies. Nevertheless, the sequence models require a longer execution time as the processing is done sequentially. On the other hand, the Transformer models require less computation time with parallelized processing. To that end, this paper proposes a hybrid deep learning method that combines the strengths of sequence model and Transformer model while suppressing the limitations of sequence model. Specifically, the proposed model integrates Robustly optimized BERT approach and Long Short-Term Memory for sentiment analysis. The Robustly optimized BERT approach maps the words into a compact meaningful word embedding space while the Long Short-Term Memory model captures the long-distance contextual semantics effectively. The experimental results demonstrate that the proposed hybrid model outshines the state-of-the-art methods by achieving F1-scores of 93%, 91%, and 90% on IMDb dataset, Twitter US Airline Sentiment dataset, and Sentiment140 dataset, respectively.",

keywords = "long short-term memory, LSTM, recurrent neural network, RNN, RoBERTa, Sentiment, transformer",

author = "Tan, {Kian Long} and Lee, {Chin Poo} and Anbananthen, {Kalaiarasi Sonai Muthu} and Lim, {Kian Ming}",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2022",

doi = "10.1109/ACCESS.2022.3152828",

language = "English",

volume = "10",

pages = "21517--21525",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network

AU - Tan, Kian Long

AU - Lee, Chin Poo

AU - Anbananthen, Kalaiarasi Sonai Muthu

AU - Lim, Kian Ming

PY - 2022

Y1 - 2022

N2 - Due to the rapid development of technology, social media has become more and more common in human daily life. Social media is a platform for people to express their feelings, feedback, and opinions. To understand the sentiment context of the text, sentiment analysis plays the role to determine whether the sentiment of the text is positive, negative, neutral or any other personal feeling. Sentiment analysis is prominent from the perspective of business or politics where it highly impacts the strategic decision making. The challenges of sentiment analysis are attributable to the lexical diversity, imbalanced dataset and long-distance dependencies of the texts. In view of this, a data augmentation technique with GloVe word embedding is leveraged to synthesize more lexically diverse samples by similar word vector replacements. The data augmentation also focuses on the oversampling of the minority classes to mitigate the imbalanced dataset problems. Apart from that, the existing sentiment analysis mostly leverages sequence models to encode the long-distance dependencies. Nevertheless, the sequence models require a longer execution time as the processing is done sequentially. On the other hand, the Transformer models require less computation time with parallelized processing. To that end, this paper proposes a hybrid deep learning method that combines the strengths of sequence model and Transformer model while suppressing the limitations of sequence model. Specifically, the proposed model integrates Robustly optimized BERT approach and Long Short-Term Memory for sentiment analysis. The Robustly optimized BERT approach maps the words into a compact meaningful word embedding space while the Long Short-Term Memory model captures the long-distance contextual semantics effectively. The experimental results demonstrate that the proposed hybrid model outshines the state-of-the-art methods by achieving F1-scores of 93%, 91%, and 90% on IMDb dataset, Twitter US Airline Sentiment dataset, and Sentiment140 dataset, respectively.

AB - Due to the rapid development of technology, social media has become more and more common in human daily life. Social media is a platform for people to express their feelings, feedback, and opinions. To understand the sentiment context of the text, sentiment analysis plays the role to determine whether the sentiment of the text is positive, negative, neutral or any other personal feeling. Sentiment analysis is prominent from the perspective of business or politics where it highly impacts the strategic decision making. The challenges of sentiment analysis are attributable to the lexical diversity, imbalanced dataset and long-distance dependencies of the texts. In view of this, a data augmentation technique with GloVe word embedding is leveraged to synthesize more lexically diverse samples by similar word vector replacements. The data augmentation also focuses on the oversampling of the minority classes to mitigate the imbalanced dataset problems. Apart from that, the existing sentiment analysis mostly leverages sequence models to encode the long-distance dependencies. Nevertheless, the sequence models require a longer execution time as the processing is done sequentially. On the other hand, the Transformer models require less computation time with parallelized processing. To that end, this paper proposes a hybrid deep learning method that combines the strengths of sequence model and Transformer model while suppressing the limitations of sequence model. Specifically, the proposed model integrates Robustly optimized BERT approach and Long Short-Term Memory for sentiment analysis. The Robustly optimized BERT approach maps the words into a compact meaningful word embedding space while the Long Short-Term Memory model captures the long-distance contextual semantics effectively. The experimental results demonstrate that the proposed hybrid model outshines the state-of-the-art methods by achieving F1-scores of 93%, 91%, and 90% on IMDb dataset, Twitter US Airline Sentiment dataset, and Sentiment140 dataset, respectively.

KW - long short-term memory

KW - LSTM

KW - recurrent neural network

KW - RNN

KW - RoBERTa

KW - Sentiment

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85125305235&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2022.3152828

DO - 10.1109/ACCESS.2022.3152828

M3 - Article

AN - SCOPUS:85125305235

SN - 2169-3536

VL - 10

SP - 21517

EP - 21525

JO - IEEE Access

JF - IEEE Access

ER -

RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this