A Data-driven Affective Text Classification Analysis

Saeid Pourroostaei Ardakani; Can Zhou; Xuting Wu; Yingrui Ma; Jizhou Che

doi:10.1109/ICMLA52953.2021.00038

A Data-driven Affective Text Classification Analysis

Saeid Pourroostaei Ardakani, Can Zhou, Xuting Wu, Yingrui Ma, Jizhou Che

School of Computer Science

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

6 Citations (Scopus)

Abstract

Affective texts play a key role in sentiment classification/prediction and decision making. They are being increasingly used to form and/or share sentiments in financial, economic and/or political applications. However, the processing time is exponentially increased for large affective textual datasets. Moreover, casual expressions such as emoji, slang, abbreviation and misspelling words usually make data analysis (i.e., text classification) complicated. This paper proposes a pipeline model consisting of data pre-processing, feature extraction and classification model training to classify affective text datasets. It offers three contributions including Emoji recovery, misspelling word correction and abbreviation translation that results in maximised classification accuracy. A rigorous experimental plan is designed to evaluate the performance of the proposed approach according to three factors including dataset size (i.e., small, medium and large), NLP feature extraction technique (i.e., TF-IDF, word2vec and BERT) and classification model (i.e., MLP, Logistic Regression, Naive Bayes and SVM). In addition, the proposed approach is compared with a well-known Deep Learning sentiment analysis approach, named sentimentDLmodel, which addresses a pre-trained sentiment analysis. According to the results, the proposed approach significantly outperforms benchmarks in terms of classification model accuracy for most cases.

Original language	English
Title of host publication	Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021
Editors	M. Arif Wani, Ishwar K. Sethi, Weisong Shi, Guangzhi Qu, Daniela Stan Raicu, Ruoming Jin
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	199-204
Number of pages	6
ISBN (Electronic)	9781665443371
DOIs	https://doi.org/10.1109/ICMLA52953.2021.00038
Publication status	Published - 2021
Event	20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021 - Virtual, Online, United States Duration: 13 Dec 2021 → 16 Dec 2021

Publication series

Name	Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

Conference

Conference	20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021
Country/Territory	United States
City	Virtual, Online
Period	13/12/21 → 16/12/21

Keywords

Big data
NLP
Sentiment analysis
Social media datasets
Spark

ASJC Scopus subject areas

Safety, Risk, Reliability and Quality
Health Informatics
Artificial Intelligence
Computer Science Applications

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/ICMLA52953.2021.00038

Cite this

Ardakani, S. P., Zhou, C., Wu, X., Ma, Y., & Che, J. (2021). A Data-driven Affective Text Classification Analysis. In M. A. Wani, I. K. Sethi, W. Shi, G. Qu, D. S. Raicu, & R. Jin (Eds.), Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021 (pp. 199-204). (Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICMLA52953.2021.00038

Ardakani, Saeid Pourroostaei ; Zhou, Can ; Wu, Xuting et al. / A Data-driven Affective Text Classification Analysis. Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021. editor / M. Arif Wani ; Ishwar K. Sethi ; Weisong Shi ; Guangzhi Qu ; Daniela Stan Raicu ; Ruoming Jin. Institute of Electrical and Electronics Engineers Inc., 2021. pp. 199-204 (Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021).

@inproceedings{74ef07aa494648b0b8a88b69903f9450,

title = "A Data-driven Affective Text Classification Analysis",

abstract = "Affective texts play a key role in sentiment classification/prediction and decision making. They are being increasingly used to form and/or share sentiments in financial, economic and/or political applications. However, the processing time is exponentially increased for large affective textual datasets. Moreover, casual expressions such as emoji, slang, abbreviation and misspelling words usually make data analysis (i.e., text classification) complicated. This paper proposes a pipeline model consisting of data pre-processing, feature extraction and classification model training to classify affective text datasets. It offers three contributions including Emoji recovery, misspelling word correction and abbreviation translation that results in maximised classification accuracy. A rigorous experimental plan is designed to evaluate the performance of the proposed approach according to three factors including dataset size (i.e., small, medium and large), NLP feature extraction technique (i.e., TF-IDF, word2vec and BERT) and classification model (i.e., MLP, Logistic Regression, Naive Bayes and SVM). In addition, the proposed approach is compared with a well-known Deep Learning sentiment analysis approach, named sentimentDLmodel, which addresses a pre-trained sentiment analysis. According to the results, the proposed approach significantly outperforms benchmarks in terms of classification model accuracy for most cases.",

keywords = "Big data, NLP, Sentiment analysis, Social media datasets, Spark",

author = "Ardakani, \{Saeid Pourroostaei\} and Can Zhou and Xuting Wu and Yingrui Ma and Jizhou Che",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021 ; Conference date: 13-12-2021 Through 16-12-2021",

year = "2021",

doi = "10.1109/ICMLA52953.2021.00038",

language = "English",

series = "Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "199--204",

editor = "Wani, \{M. Arif\} and Sethi, \{Ishwar K.\} and Weisong Shi and Guangzhi Qu and Raicu, \{Daniela Stan\} and Ruoming Jin",

booktitle = "Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021",

address = "United States",

}

Ardakani, SP, Zhou, C, Wu, X, Ma, Y & Che, J 2021, A Data-driven Affective Text Classification Analysis. in MA Wani, IK Sethi, W Shi, G Qu, DS Raicu & R Jin (eds), Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021. Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021, Institute of Electrical and Electronics Engineers Inc., pp. 199-204, 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021, Virtual, Online, United States, 13/12/21. https://doi.org/10.1109/ICMLA52953.2021.00038

A Data-driven Affective Text Classification Analysis. / Ardakani, Saeid Pourroostaei; Zhou, Can; Wu, Xuting et al.
Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021. ed. / M. Arif Wani; Ishwar K. Sethi; Weisong Shi; Guangzhi Qu; Daniela Stan Raicu; Ruoming Jin. Institute of Electrical and Electronics Engineers Inc., 2021. p. 199-204 (Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A Data-driven Affective Text Classification Analysis

AU - Ardakani, Saeid Pourroostaei

AU - Zhou, Can

AU - Wu, Xuting

AU - Ma, Yingrui

AU - Che, Jizhou

PY - 2021

Y1 - 2021

N2 - Affective texts play a key role in sentiment classification/prediction and decision making. They are being increasingly used to form and/or share sentiments in financial, economic and/or political applications. However, the processing time is exponentially increased for large affective textual datasets. Moreover, casual expressions such as emoji, slang, abbreviation and misspelling words usually make data analysis (i.e., text classification) complicated. This paper proposes a pipeline model consisting of data pre-processing, feature extraction and classification model training to classify affective text datasets. It offers three contributions including Emoji recovery, misspelling word correction and abbreviation translation that results in maximised classification accuracy. A rigorous experimental plan is designed to evaluate the performance of the proposed approach according to three factors including dataset size (i.e., small, medium and large), NLP feature extraction technique (i.e., TF-IDF, word2vec and BERT) and classification model (i.e., MLP, Logistic Regression, Naive Bayes and SVM). In addition, the proposed approach is compared with a well-known Deep Learning sentiment analysis approach, named sentimentDLmodel, which addresses a pre-trained sentiment analysis. According to the results, the proposed approach significantly outperforms benchmarks in terms of classification model accuracy for most cases.

AB - Affective texts play a key role in sentiment classification/prediction and decision making. They are being increasingly used to form and/or share sentiments in financial, economic and/or political applications. However, the processing time is exponentially increased for large affective textual datasets. Moreover, casual expressions such as emoji, slang, abbreviation and misspelling words usually make data analysis (i.e., text classification) complicated. This paper proposes a pipeline model consisting of data pre-processing, feature extraction and classification model training to classify affective text datasets. It offers three contributions including Emoji recovery, misspelling word correction and abbreviation translation that results in maximised classification accuracy. A rigorous experimental plan is designed to evaluate the performance of the proposed approach according to three factors including dataset size (i.e., small, medium and large), NLP feature extraction technique (i.e., TF-IDF, word2vec and BERT) and classification model (i.e., MLP, Logistic Regression, Naive Bayes and SVM). In addition, the proposed approach is compared with a well-known Deep Learning sentiment analysis approach, named sentimentDLmodel, which addresses a pre-trained sentiment analysis. According to the results, the proposed approach significantly outperforms benchmarks in terms of classification model accuracy for most cases.

KW - Big data

KW - NLP

KW - Sentiment analysis

KW - Social media datasets

KW - Spark

UR - http://www.scopus.com/inward/record.url?scp=85125875546&partnerID=8YFLogxK

U2 - 10.1109/ICMLA52953.2021.00038

DO - 10.1109/ICMLA52953.2021.00038

M3 - Conference contribution

AN - SCOPUS:85125875546

T3 - Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

SP - 199

EP - 204

BT - Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

A2 - Wani, M. Arif

A2 - Sethi, Ishwar K.

A2 - Shi, Weisong

A2 - Qu, Guangzhi

A2 - Raicu, Daniela Stan

A2 - Jin, Ruoming

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

Y2 - 13 December 2021 through 16 December 2021

ER -

Ardakani SP, Zhou C, Wu X, Ma Y, Che J. A Data-driven Affective Text Classification Analysis. In Wani MA, Sethi IK, Shi W, Qu G, Raicu DS, Jin R, editors, Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021. Institute of Electrical and Electronics Engineers Inc. 2021. p. 199-204. (Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021). doi: 10.1109/ICMLA52953.2021.00038

A Data-driven Affective Text Classification Analysis

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this