A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression

Yazhe Li; Niall Adams; Tony Bellotti

doi:10.1080/10618600.2021.1978470

A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression

Yazhe Li, Niall Adams, Tony Bellotti

School of Computer Science

Research output: Journal Publication › Article › peer-review

7 Citations (Scopus)

Abstract

Logistic regression is a standard procedure for real-world classification problems. The challenge of class imbalance arises in two-class classification problems when the minority class is observed much less than the majority class. This characteristic is endemic in many domains. Work by Owen has shown that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this article, we propose a novel relabeling approach to handle the class imbalance problem when using logistic regression, which essentially assigns new labels to the minority class observations. An expectation–maximization algorithm is formalized to serve as a tool for efficiently computing this relabeling. Modeling on such relabeled data can lead to improved predictive performance. We demonstrate the effectiveness of this approach with detailed experiments on real datasets. Supplemental materials for the article are available online.

Original language	English
Pages (from-to)	241-253
Number of pages	13
Journal	Journal of Computational and Graphical Statistics
Volume	31
Issue number	1
Early online date	9 Nov 2021
DOIs	https://doi.org/10.1080/10618600.2021.1978470
Publication status	Published Online - 9 Nov 2021

Keywords

EM
High imbalance
Logistic regression
Relabeling

ASJC Scopus subject areas

Statistics and Probability
Discrete Mathematics and Combinatorics
Statistics, Probability and Uncertainty

Access to Document

10.1080/10618600.2021.1978470

Cite this

@article{9ed296f0228844e395ecb3b7d5c4c5d7,

title = "A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression",

abstract = "Logistic regression is a standard procedure for real-world classification problems. The challenge of class imbalance arises in two-class classification problems when the minority class is observed much less than the majority class. This characteristic is endemic in many domains. Work by Owen has shown that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this article, we propose a novel relabeling approach to handle the class imbalance problem when using logistic regression, which essentially assigns new labels to the minority class observations. An expectation–maximization algorithm is formalized to serve as a tool for efficiently computing this relabeling. Modeling on such relabeled data can lead to improved predictive performance. We demonstrate the effectiveness of this approach with detailed experiments on real datasets. Supplemental materials for the article are available online.",

keywords = "EM, High imbalance, Logistic regression, Relabeling",

author = "Yazhe Li and Niall Adams and Tony Bellotti",

note = "Publisher Copyright: {\textcopyright} 2021 The Author(s). Published with license by Taylor & Francis Group, LLC.",

year = "2021",

month = nov,

day = "9",

doi = "10.1080/10618600.2021.1978470",

language = "English",

volume = "31",

pages = "241--253",

journal = "Journal of Computational and Graphical Statistics",

issn = "1061-8600",

publisher = "Taylor and Francis Ltd.",

number = "1",

}

TY - JOUR

T1 - A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression

AU - Li, Yazhe

AU - Adams, Niall

AU - Bellotti, Tony

PY - 2021/11/9

Y1 - 2021/11/9

N2 - Logistic regression is a standard procedure for real-world classification problems. The challenge of class imbalance arises in two-class classification problems when the minority class is observed much less than the majority class. This characteristic is endemic in many domains. Work by Owen has shown that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this article, we propose a novel relabeling approach to handle the class imbalance problem when using logistic regression, which essentially assigns new labels to the minority class observations. An expectation–maximization algorithm is formalized to serve as a tool for efficiently computing this relabeling. Modeling on such relabeled data can lead to improved predictive performance. We demonstrate the effectiveness of this approach with detailed experiments on real datasets. Supplemental materials for the article are available online.

AB - Logistic regression is a standard procedure for real-world classification problems. The challenge of class imbalance arises in two-class classification problems when the minority class is observed much less than the majority class. This characteristic is endemic in many domains. Work by Owen has shown that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this article, we propose a novel relabeling approach to handle the class imbalance problem when using logistic regression, which essentially assigns new labels to the minority class observations. An expectation–maximization algorithm is formalized to serve as a tool for efficiently computing this relabeling. Modeling on such relabeled data can lead to improved predictive performance. We demonstrate the effectiveness of this approach with detailed experiments on real datasets. Supplemental materials for the article are available online.

KW - EM

KW - High imbalance

KW - Logistic regression

KW - Relabeling

UR - http://www.scopus.com/inward/record.url?scp=85119285282&partnerID=8YFLogxK

U2 - 10.1080/10618600.2021.1978470

DO - 10.1080/10618600.2021.1978470

M3 - Article

AN - SCOPUS:85119285282

SN - 1061-8600

VL - 31

SP - 241

EP - 253

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

IS - 1

ER -

A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this