TY - GEN
T1 - A Data-driven Affective Text Classification Analysis
AU - Ardakani, Saeid Pourroostaei
AU - Zhou, Can
AU - Wu, Xuting
AU - Ma, Yingrui
AU - Che, Jizhou
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Affective texts play a key role in sentiment classification/prediction and decision making. They are being increasingly used to form and/or share sentiments in financial, economic and/or political applications. However, the processing time is exponentially increased for large affective textual datasets. Moreover, casual expressions such as emoji, slang, abbreviation and misspelling words usually make data analysis (i.e., text classification) complicated. This paper proposes a pipeline model consisting of data pre-processing, feature extraction and classification model training to classify affective text datasets. It offers three contributions including Emoji recovery, misspelling word correction and abbreviation translation that results in maximised classification accuracy. A rigorous experimental plan is designed to evaluate the performance of the proposed approach according to three factors including dataset size (i.e., small, medium and large), NLP feature extraction technique (i.e., TF-IDF, word2vec and BERT) and classification model (i.e., MLP, Logistic Regression, Naive Bayes and SVM). In addition, the proposed approach is compared with a well-known Deep Learning sentiment analysis approach, named sentimentDLmodel, which addresses a pre-trained sentiment analysis. According to the results, the proposed approach significantly outperforms benchmarks in terms of classification model accuracy for most cases.
AB - Affective texts play a key role in sentiment classification/prediction and decision making. They are being increasingly used to form and/or share sentiments in financial, economic and/or political applications. However, the processing time is exponentially increased for large affective textual datasets. Moreover, casual expressions such as emoji, slang, abbreviation and misspelling words usually make data analysis (i.e., text classification) complicated. This paper proposes a pipeline model consisting of data pre-processing, feature extraction and classification model training to classify affective text datasets. It offers three contributions including Emoji recovery, misspelling word correction and abbreviation translation that results in maximised classification accuracy. A rigorous experimental plan is designed to evaluate the performance of the proposed approach according to three factors including dataset size (i.e., small, medium and large), NLP feature extraction technique (i.e., TF-IDF, word2vec and BERT) and classification model (i.e., MLP, Logistic Regression, Naive Bayes and SVM). In addition, the proposed approach is compared with a well-known Deep Learning sentiment analysis approach, named sentimentDLmodel, which addresses a pre-trained sentiment analysis. According to the results, the proposed approach significantly outperforms benchmarks in terms of classification model accuracy for most cases.
KW - Big data
KW - NLP
KW - Sentiment analysis
KW - Social media datasets
KW - Spark
UR - http://www.scopus.com/inward/record.url?scp=85125875546&partnerID=8YFLogxK
U2 - 10.1109/ICMLA52953.2021.00038
DO - 10.1109/ICMLA52953.2021.00038
M3 - Conference contribution
AN - SCOPUS:85125875546
T3 - Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021
SP - 199
EP - 204
BT - Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021
A2 - Wani, M. Arif
A2 - Sethi, Ishwar K.
A2 - Shi, Weisong
A2 - Qu, Guangzhi
A2 - Raicu, Daniela Stan
A2 - Jin, Ruoming
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021
Y2 - 13 December 2021 through 16 December 2021
ER -