Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data

Teena Rai, Yuan Shen, Jaspreet Kaur, Jun He, Mufti Mahmud, David J. Brown, David R. Baldwin, Emma O’Dowd, Richard Hubbard

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Lung cancer has the highest cancer mortality rate in the UK. Most patients are diagnosed at an advanced stage because common symptoms for lung cancer such as cough, pain, dyspnoea and anorexia are also present in other diseases. This partly attributes towards the low survival rate. Therefore, it is crucial to screen high risk patients for lung cancer at an early stage through computed tomography (CT) scans. As shown in a previous study, for patients who were screened for lung cancer and were identified with stage I lung cancer, the estimated survival rate was 88% compared to only 5% who have stage IV lung cancer. This paper aims to build tree-based machine learning models for predicting lung cancer risk by extracting significant factors associated with lung cancer. The Clinical Practice Research Datalink (CPRD) data was used in this study which are anonymised patient data collected from 945 general practices across the UK. Two tree-based models (decision trees and random forest) are developed and implemented. The performance of the two models is compared with a logistic regression model in terms of accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity and specificity, and both achieve better results. However, as for interpretability, it was found that, unlike coefficients in logistic regression, the default feature importance is non-negative in random forests and decision trees. This makes tree-based models less interpretable than logistic regression.

Original languageEnglish
Title of host publicationArtificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings
EditorsJose M. Juarez, Mar Marcos, Gregor Stiglic, Allan Tucker
PublisherSpringer Science and Business Media Deutschland GmbH
Pages35-39
Number of pages5
ISBN (Print)9783031343438
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event21st International Conference on Artificial Intelligence in Medicine, AIME 2023 - Portoroz, Slovenia
Duration: 12 Jun 202315 Jun 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13897 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Artificial Intelligence in Medicine, AIME 2023
Country/TerritorySlovenia
CityPortoroz
Period12/06/2315/06/23

Keywords

  • CPRD
  • decision tree
  • explainable AI
  • lung cancer early detection
  • lung cancer screening
  • primary care
  • random forest

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data'. Together they form a unique fingerprint.

Cite this