Explainable Boosting Machines for Lung Cancer Prediction and Explanation

Yuan Shen, Mufti Mahmud, Teena Rai, Jun He, Muhammad Arifur Rahman, David J. Brown, Jaspreet Kaur, David R. Baldwin, Emma O’Dowd, Richard B. Hubbard

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Clinical predictive models have played an important role in healthcare. An important task in lung cancer healthcare is to identify those participants involved in a screening program with higher lung cancer risk from a selected population. More interestingly, Electronic Healthcare Records (EHRs) data can be acquired from primary care and have been used to emulate a screening program. An example of such EHR dataset is Clinical Practice Research Datalink (CPRD) that covers 4.5% UK population. In this paper, we provide a worked example for such task while employing Explainable Boosting Machine (EBM) as the predictive model and using CPRD dataset as the EHRs. EBM is a prominent example of inherently interpretable models (i.e., IIM). IIMs can predict target variables and model explanation simultaneously. More importantly, EBMs represent a family of non-linear IIMs. This kind of generalisation presents a significant extension of logistic regression. EBMs have been developed as an end-to-end system at Microsoft Research. It provide powerful visualisation tools for evaluating both model prediction and explanation. On the other hand, EBM users like to know more technical details about EBM itself. Thus, we provide a brief introduction to Generalised Additive Model, Gradient Boosting, Boosted Trees, and Bagging Ensemble. Finally, we further provide two EBM-based Use Cases in healthcare domain as well as an illustrative example of lung cancer prediction and explanation.

Original languageEnglish
Title of host publicationApplied Intelligence and Informatics - 4th International Conference, AII 2024, Revised Selected Papers
EditorsMufti Mahmud, M. Shamim Kaiser, Joarder Kamruzzaman, Khan Iftekharuddin, Md Atiqur Rahman Ahad, Ning Zhong
PublisherSpringer Science and Business Media Deutschland GmbH
Pages184-199
Number of pages16
ISBN (Print)9783032046567
DOIs
Publication statusPublished - 2025
Event4th International Conference on Applied Intelligence and Informatics, AII 2024 - London, United Kingdom
Duration: 18 Dec 202420 Dec 2024

Publication series

NameCommunications in Computer and Information Science
Volume2607 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference4th International Conference on Applied Intelligence and Informatics, AII 2024
Country/TerritoryUnited Kingdom
CityLondon
Period18/12/2420/12/24

Keywords

  • Collinearity
  • Feature Importance Measures
  • Feature Selection
  • Greedy Approach to Boosting
  • Ischemic Heart Disease
  • Lung Cancer
  • Missing Values
  • Rectal Cancer
  • Round-Robin Cycle

ASJC Scopus subject areas

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'Explainable Boosting Machines for Lung Cancer Prediction and Explanation'. Together they form a unique fingerprint.

Cite this