Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction

Zixue Zhao, Tianxiang Cui, Shusheng Ding, Jiawei Li, Anthony Graham Bellotti

Research output: Journal PublicationArticlepeer-review

1 Citation (Scopus)

Abstract

Credit risk prediction heavily relies on historical data provided by financial institutions. The goal is to identify commonalities among defaulting users based on existing information. However, data on defaulters is often limited, leading to a concentration of credit data where positive samples (defaults) are significantly fewer than negative samples (nondefaults). It poses a serious challenge known as the class imbalance problem, which can substantially impact data quality and predictive model effectiveness. To address the problem, various resampling techniques have been proposed and studied extensively. However, despite ongoing research, there is no consensus on the most effective technique. The choice of resampling technique is closely related to the dataset size and imbalance ratio, and its effectiveness varies across different classifiers. Moreover, there is a notable gap in research concerning suitable techniques for extremely imbalanced datasets. Therefore, this study aims to compare popular resampling techniques across different datasets and classifiers while also proposing a novel hybrid sampling method tailored for extremely imbalanced datasets. Our experimental results demonstrate that this new technique significantly enhances classifier predictive performance, shedding light on effective strategies for managing the class imbalance problem in credit risk prediction.

Original languageEnglish
Article number701
JournalMathematics
Volume12
Issue number5
DOIs
Publication statusPublished - Mar 2024

Keywords

  • class imbalance
  • credit risk prediction
  • resampling

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Mathematics
  • Engineering (miscellaneous)

Fingerprint

Dive into the research topics of 'Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction'. Together they form a unique fingerprint.

Cite this