TY - GEN
T1 - Explainable Boosting Machines for Lung Cancer Prediction and Explanation
AU - Shen, Yuan
AU - Mahmud, Mufti
AU - Rai, Teena
AU - He, Jun
AU - Arifur Rahman, Muhammad
AU - Brown, David J.
AU - Kaur, Jaspreet
AU - Baldwin, David R.
AU - O’Dowd, Emma
AU - Hubbard, Richard B.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Clinical predictive models have played an important role in healthcare. An important task in lung cancer healthcare is to identify those participants involved in a screening program with higher lung cancer risk from a selected population. More interestingly, Electronic Healthcare Records (EHRs) data can be acquired from primary care and have been used to emulate a screening program. An example of such EHR dataset is Clinical Practice Research Datalink (CPRD) that covers 4.5% UK population. In this paper, we provide a worked example for such task while employing Explainable Boosting Machine (EBM) as the predictive model and using CPRD dataset as the EHRs. EBM is a prominent example of inherently interpretable models (i.e., IIM). IIMs can predict target variables and model explanation simultaneously. More importantly, EBMs represent a family of non-linear IIMs. This kind of generalisation presents a significant extension of logistic regression. EBMs have been developed as an end-to-end system at Microsoft Research. It provide powerful visualisation tools for evaluating both model prediction and explanation. On the other hand, EBM users like to know more technical details about EBM itself. Thus, we provide a brief introduction to Generalised Additive Model, Gradient Boosting, Boosted Trees, and Bagging Ensemble. Finally, we further provide two EBM-based Use Cases in healthcare domain as well as an illustrative example of lung cancer prediction and explanation.
AB - Clinical predictive models have played an important role in healthcare. An important task in lung cancer healthcare is to identify those participants involved in a screening program with higher lung cancer risk from a selected population. More interestingly, Electronic Healthcare Records (EHRs) data can be acquired from primary care and have been used to emulate a screening program. An example of such EHR dataset is Clinical Practice Research Datalink (CPRD) that covers 4.5% UK population. In this paper, we provide a worked example for such task while employing Explainable Boosting Machine (EBM) as the predictive model and using CPRD dataset as the EHRs. EBM is a prominent example of inherently interpretable models (i.e., IIM). IIMs can predict target variables and model explanation simultaneously. More importantly, EBMs represent a family of non-linear IIMs. This kind of generalisation presents a significant extension of logistic regression. EBMs have been developed as an end-to-end system at Microsoft Research. It provide powerful visualisation tools for evaluating both model prediction and explanation. On the other hand, EBM users like to know more technical details about EBM itself. Thus, we provide a brief introduction to Generalised Additive Model, Gradient Boosting, Boosted Trees, and Bagging Ensemble. Finally, we further provide two EBM-based Use Cases in healthcare domain as well as an illustrative example of lung cancer prediction and explanation.
KW - Collinearity
KW - Feature Importance Measures
KW - Feature Selection
KW - Greedy Approach to Boosting
KW - Ischemic Heart Disease
KW - Lung Cancer
KW - Missing Values
KW - Rectal Cancer
KW - Round-Robin Cycle
UR - https://www.scopus.com/pages/publications/105020244855
U2 - 10.1007/978-3-032-04657-4_13
DO - 10.1007/978-3-032-04657-4_13
M3 - Conference contribution
AN - SCOPUS:105020244855
SN - 9783032046567
T3 - Communications in Computer and Information Science
SP - 184
EP - 199
BT - Applied Intelligence and Informatics - 4th International Conference, AII 2024, Revised Selected Papers
A2 - Mahmud, Mufti
A2 - Kaiser, M. Shamim
A2 - Kamruzzaman, Joarder
A2 - Iftekharuddin, Khan
A2 - Ahad, Md Atiqur Rahman
A2 - Zhong, Ning
PB - Springer Science and Business Media Deutschland GmbH
T2 - 4th International Conference on Applied Intelligence and Informatics, AII 2024
Y2 - 18 December 2024 through 20 December 2024
ER -