Understanding Feature Importance of Prediction Models Based on Lung Cancer Primary Care Data

Teena Rai, Yuan Shen, Jun He, Mufti Mahmud, David J. Brown, Jaspreet Kaur, Emma O'Dowd, David R. Baldwin, Richard Hubbard

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Machine learning (ML) models in healthcare are increasing but the lack of interpretability of these models results in them not being suitable for use in clinical practice. In the medical field, it is vital to clarify to clinicians and patients the rationale behind a model's high probability prediction for a specific disease in an individual patient. This transparency fosters trust, facilitates informed decision-making, and empowers both clinicians and patients to understand the underlying factors driving the model's output. This paper aims to incorporate explainability to ML models such as Random Forest (RF), eXtreme Gradient Boosting (XGBoost) and Multilyer Perceptron (MLP) for using with Clinical Practice Research Datalink (CPRD) data and interpret them in terms of feature importance to identify the top most features when distinguishing between lung cancer and non-lung cancer cases. The SHapley Additive exPlanations (SHAP) method has been used in this work to interpret the models. We use SHAP to gain insights into explaining individual predictions as well as interpreting them globally. The feature importance from SHAP is compared with the default feature importance of the models to identify any discrepancies between the results. Based on experimental findings, it has been found that the default feature importance from the tree-based models and SHAP is consistent with features 'age' and 'smoking status' which serve as the top features for predicting lung cancer among patients. Additionally, this work pinpoints that feature importance for a single patient may vary leading to a varied prediction depending on the employed model. Finally, the work concludes that individual-level explanation of feature importance is crucial in mission-critical applications like healthcare to better understand personal health and lifestyle factors in the early prediction of diseases that may lead to terminal illness.

Original languageEnglish
Title of host publication2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350359312
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan
Duration: 30 Jun 20245 Jul 2024

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2024 International Joint Conference on Neural Networks, IJCNN 2024
Country/TerritoryJapan
CityYokohama
Period30/06/245/07/24

Keywords

  • CPRD
  • SHAP
  • feature importance
  • interpretability
  • lung cancer
  • machine learning

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Understanding Feature Importance of Prediction Models Based on Lung Cancer Primary Care Data'. Together they form a unique fingerprint.

Cite this