A Data-driven Affective Text Classification Analysis

Saeid Pourroostaei Ardakani, Can Zhou, Xuting Wu, Yingrui Ma, Jizhou Che

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Affective texts play a key role in sentiment classification/prediction and decision making. They are being increasingly used to form and/or share sentiments in financial, economic and/or political applications. However, the processing time is exponentially increased for large affective textual datasets. Moreover, casual expressions such as emoji, slang, abbreviation and misspelling words usually make data analysis (i.e., text classification) complicated. This paper proposes a pipeline model consisting of data pre-processing, feature extraction and classification model training to classify affective text datasets. It offers three contributions including Emoji recovery, misspelling word correction and abbreviation translation that results in maximised classification accuracy. A rigorous experimental plan is designed to evaluate the performance of the proposed approach according to three factors including dataset size (i.e., small, medium and large), NLP feature extraction technique (i.e., TF-IDF, word2vec and BERT) and classification model (i.e., MLP, Logistic Regression, Naive Bayes and SVM). In addition, the proposed approach is compared with a well-known Deep Learning sentiment analysis approach, named sentimentDLmodel, which addresses a pre-trained sentiment analysis. According to the results, the proposed approach significantly outperforms benchmarks in terms of classification model accuracy for most cases.

Original languageEnglish
Title of host publicationProceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021
EditorsM. Arif Wani, Ishwar K. Sethi, Weisong Shi, Guangzhi Qu, Daniela Stan Raicu, Ruoming Jin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages199-204
Number of pages6
ISBN (Electronic)9781665443371
DOIs
Publication statusPublished - 2021
Event20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021 - Virtual, Online, United States
Duration: 13 Dec 202116 Dec 2021

Publication series

NameProceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

Conference

Conference20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021
Country/TerritoryUnited States
CityVirtual, Online
Period13/12/2116/12/21

Keywords

  • Big data
  • NLP
  • Sentiment analysis
  • Social media datasets
  • Spark

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Health Informatics
  • Artificial Intelligence
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A Data-driven Affective Text Classification Analysis'. Together they form a unique fingerprint.

Cite this