TY - GEN
T1 - State-of-the-Art Machine Learning Models for Detecting and Mitigating Disparities in Healthcare
AU - Shen, Yuan
AU - Mahmud, Mufti
AU - Rai, Teena
AU - Brown, David J.
AU - He, Jun
AU - Rahman, Muhammad Arifur
AU - Baldwin, David R.
AU - Kaur, Jaspreet
AU - O’Dowd, Emma
AU - Hubbard, Richard B.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Machine learning models have been applied to various healthcare tasks. Such models include both inherently interpretable models and black-box models. In most cases, these models are capable of achieving high accuracy. It is also known that the model should also be well calibrated. Recently, the issues of algorithmic bias in clinical predictive models have attracted attention. This is because such bias would result in disparities in health care, introducing disadvantages to some subgroups of the population. The aim is to detect such disparities and then remove them afterwards. In this perspective, those predictors used by the model need to be differentiated between sensitive variables and the rest. Those sensitive variables include age, race among the others. Among these disparities, the most comprehensible one is so-called data disparities. It is known that a target population usually includes a large number of subgroups. Many of such subgroups could be quite small. When the population data is used for training a predictive model, the resulting characteristics of those outcomes will be largely dominated by a few major subgroups. On the other hand, when we fit the models with individual subgroup data, it is expected that the data in some small subgroups are not sufficient for a proper model training, thus producing disparately predicted outcomes. Most of clinical predictive models don’t include domain-specific knowledge. Causal inference allows for incorporating experts’ knowledge into the relation within the set of predictive variables. The model is referred to as causal-effect model. This approach can help mitigate those disparate outcomes from those small subgroups thanks to inclusion of domain knowledge. The principled approach is to find different but related data set. Generally, it can be done within the frame of transfer learning. Apart from the re-training approaches, domain adaptation can be used to project a number of source domains jointly to a target domain. It is expected that the resulting target domain should have sufficient data even for those small subgroups. It has been debated whether or no protected variables/characteristics (such as race and gender) should be used for clinical predictive models.
AB - Machine learning models have been applied to various healthcare tasks. Such models include both inherently interpretable models and black-box models. In most cases, these models are capable of achieving high accuracy. It is also known that the model should also be well calibrated. Recently, the issues of algorithmic bias in clinical predictive models have attracted attention. This is because such bias would result in disparities in health care, introducing disadvantages to some subgroups of the population. The aim is to detect such disparities and then remove them afterwards. In this perspective, those predictors used by the model need to be differentiated between sensitive variables and the rest. Those sensitive variables include age, race among the others. Among these disparities, the most comprehensible one is so-called data disparities. It is known that a target population usually includes a large number of subgroups. Many of such subgroups could be quite small. When the population data is used for training a predictive model, the resulting characteristics of those outcomes will be largely dominated by a few major subgroups. On the other hand, when we fit the models with individual subgroup data, it is expected that the data in some small subgroups are not sufficient for a proper model training, thus producing disparately predicted outcomes. Most of clinical predictive models don’t include domain-specific knowledge. Causal inference allows for incorporating experts’ knowledge into the relation within the set of predictive variables. The model is referred to as causal-effect model. This approach can help mitigate those disparate outcomes from those small subgroups thanks to inclusion of domain knowledge. The principled approach is to find different but related data set. Generally, it can be done within the frame of transfer learning. Apart from the re-training approaches, domain adaptation can be used to project a number of source domains jointly to a target domain. It is expected that the resulting target domain should have sufficient data even for those small subgroups. It has been debated whether or no protected variables/characteristics (such as race and gender) should be used for clinical predictive models.
KW - Algorithmic Bias
KW - Causal Structure
KW - Conditional Tree Inference
KW - Deep Transfer Learning
KW - Domain Adaptation
KW - Health Disparities
KW - Hospital Readmission
UR - https://www.scopus.com/pages/publications/105029059786
U2 - 10.1007/978-3-032-13022-8_26
DO - 10.1007/978-3-032-13022-8_26
M3 - Conference contribution
AN - SCOPUS:105029059786
SN - 9783032130242
T3 - Lecture Notes in Computer Science
SP - 375
EP - 385
BT - HCI International 2025 – Late Breaking Papers - 27th International Conference on Human-Computer Interaction, HCII 2025, Proceedings
A2 - Duffy, Vincent G.
A2 - Gao, Qin
A2 - Zhou, Jia
PB - Springer Science and Business Media Deutschland GmbH
T2 - Late breaking papers from the 27th International Conference on Human-Computer Interaction, HCI International 2025
Y2 - 22 June 2025 through 27 June 2025
ER -