Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data

Teena Rai; Yuan Shen; Jaspreet Kaur; Jun He; Mufti Mahmud; David J. Brown; David R. Baldwin; Emma O’Dowd; Richard Hubbard

doi:10.1007/978-3-031-34344-5_4

Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data

Teena Rai, Yuan Shen, Jaspreet Kaur, Jun He, Mufti Mahmud, David J. Brown, David R. Baldwin, Emma O’Dowd, Richard Hubbard

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

9 Citations (Scopus)

Abstract

Lung cancer has the highest cancer mortality rate in the UK. Most patients are diagnosed at an advanced stage because common symptoms for lung cancer such as cough, pain, dyspnoea and anorexia are also present in other diseases. This partly attributes towards the low survival rate. Therefore, it is crucial to screen high risk patients for lung cancer at an early stage through computed tomography (CT) scans. As shown in a previous study, for patients who were screened for lung cancer and were identified with stage I lung cancer, the estimated survival rate was 88% compared to only 5% who have stage IV lung cancer. This paper aims to build tree-based machine learning models for predicting lung cancer risk by extracting significant factors associated with lung cancer. The Clinical Practice Research Datalink (CPRD) data was used in this study which are anonymised patient data collected from 945 general practices across the UK. Two tree-based models (decision trees and random forest) are developed and implemented. The performance of the two models is compared with a logistic regression model in terms of accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity and specificity, and both achieve better results. However, as for interpretability, it was found that, unlike coefficients in logistic regression, the default feature importance is non-negative in random forests and decision trees. This makes tree-based models less interpretable than logistic regression.

Original language	English
Title of host publication	Artificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings
Editors	Jose M. Juarez, Mar Marcos, Gregor Stiglic, Allan Tucker
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	35-39
Number of pages	5
ISBN (Print)	9783031343438
DOIs	https://doi.org/10.1007/978-3-031-34344-5_4
Publication status	Published - 2023
Externally published	Yes
Event	21st International Conference on Artificial Intelligence in Medicine, AIME 2023 - Portoroz, Slovenia Duration: 12 Jun 2023 → 15 Jun 2023

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13897 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	21st International Conference on Artificial Intelligence in Medicine, AIME 2023
Country/Territory	Slovenia
City	Portoroz
Period	12/06/23 → 15/06/23

Keywords

CPRD
decision tree
explainable AI
lung cancer early detection
lung cancer screening
primary care
random forest

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1007/978-3-031-34344-5_4

Cite this

Rai, T., Shen, Y., Kaur, J., He, J., Mahmud, M., Brown, D. J., Baldwin, D. R., O’Dowd, E., & Hubbard, R. (2023). Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data. In J. M. Juarez, M. Marcos, G. Stiglic, & A. Tucker (Eds.), Artificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings (pp. 35-39). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13897 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-34344-5_4

Rai, Teena ; Shen, Yuan ; Kaur, Jaspreet et al. / Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data. Artificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings. editor / Jose M. Juarez ; Mar Marcos ; Gregor Stiglic ; Allan Tucker. Springer Science and Business Media Deutschland GmbH, 2023. pp. 35-39 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{0335b21a664542e0bd1567629205eb60,

title = "Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data",

abstract = "Lung cancer has the highest cancer mortality rate in the UK. Most patients are diagnosed at an advanced stage because common symptoms for lung cancer such as cough, pain, dyspnoea and anorexia are also present in other diseases. This partly attributes towards the low survival rate. Therefore, it is crucial to screen high risk patients for lung cancer at an early stage through computed tomography (CT) scans. As shown in a previous study, for patients who were screened for lung cancer and were identified with stage I lung cancer, the estimated survival rate was 88\% compared to only 5\% who have stage IV lung cancer. This paper aims to build tree-based machine learning models for predicting lung cancer risk by extracting significant factors associated with lung cancer. The Clinical Practice Research Datalink (CPRD) data was used in this study which are anonymised patient data collected from 945 general practices across the UK. Two tree-based models (decision trees and random forest) are developed and implemented. The performance of the two models is compared with a logistic regression model in terms of accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity and specificity, and both achieve better results. However, as for interpretability, it was found that, unlike coefficients in logistic regression, the default feature importance is non-negative in random forests and decision trees. This makes tree-based models less interpretable than logistic regression.",

keywords = "CPRD, decision tree, explainable AI, lung cancer early detection, lung cancer screening, primary care, random forest",

author = "Teena Rai and Yuan Shen and Jaspreet Kaur and Jun He and Mufti Mahmud and Brown, \{David J.\} and Baldwin, \{David R.\} and Emma O{\textquoteright}Dowd and Richard Hubbard",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 21st International Conference on Artificial Intelligence in Medicine, AIME 2023 ; Conference date: 12-06-2023 Through 15-06-2023",

year = "2023",

doi = "10.1007/978-3-031-34344-5\_4",

language = "English",

isbn = "9783031343438",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "35--39",

editor = "Juarez, \{Jose M.\} and Mar Marcos and Gregor Stiglic and Allan Tucker",

booktitle = "Artificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings",

address = "Germany",

}

Rai, T, Shen, Y, Kaur, J, He, J, Mahmud, M, Brown, DJ, Baldwin, DR, O’Dowd, E & Hubbard, R 2023, Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data. in JM Juarez, M Marcos, G Stiglic & A Tucker (eds), Artificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13897 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 35-39, 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Portoroz, Slovenia, 12/06/23. https://doi.org/10.1007/978-3-031-34344-5_4

Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data. / Rai, Teena; Shen, Yuan; Kaur, Jaspreet et al.
Artificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings. ed. / Jose M. Juarez; Mar Marcos; Gregor Stiglic; Allan Tucker. Springer Science and Business Media Deutschland GmbH, 2023. p. 35-39 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13897 LNAI).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data

AU - Rai, Teena

AU - Shen, Yuan

AU - Kaur, Jaspreet

AU - He, Jun

AU - Mahmud, Mufti

AU - Brown, David J.

AU - Baldwin, David R.

AU - O’Dowd, Emma

AU - Hubbard, Richard

PY - 2023

Y1 - 2023

N2 - Lung cancer has the highest cancer mortality rate in the UK. Most patients are diagnosed at an advanced stage because common symptoms for lung cancer such as cough, pain, dyspnoea and anorexia are also present in other diseases. This partly attributes towards the low survival rate. Therefore, it is crucial to screen high risk patients for lung cancer at an early stage through computed tomography (CT) scans. As shown in a previous study, for patients who were screened for lung cancer and were identified with stage I lung cancer, the estimated survival rate was 88% compared to only 5% who have stage IV lung cancer. This paper aims to build tree-based machine learning models for predicting lung cancer risk by extracting significant factors associated with lung cancer. The Clinical Practice Research Datalink (CPRD) data was used in this study which are anonymised patient data collected from 945 general practices across the UK. Two tree-based models (decision trees and random forest) are developed and implemented. The performance of the two models is compared with a logistic regression model in terms of accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity and specificity, and both achieve better results. However, as for interpretability, it was found that, unlike coefficients in logistic regression, the default feature importance is non-negative in random forests and decision trees. This makes tree-based models less interpretable than logistic regression.

AB - Lung cancer has the highest cancer mortality rate in the UK. Most patients are diagnosed at an advanced stage because common symptoms for lung cancer such as cough, pain, dyspnoea and anorexia are also present in other diseases. This partly attributes towards the low survival rate. Therefore, it is crucial to screen high risk patients for lung cancer at an early stage through computed tomography (CT) scans. As shown in a previous study, for patients who were screened for lung cancer and were identified with stage I lung cancer, the estimated survival rate was 88% compared to only 5% who have stage IV lung cancer. This paper aims to build tree-based machine learning models for predicting lung cancer risk by extracting significant factors associated with lung cancer. The Clinical Practice Research Datalink (CPRD) data was used in this study which are anonymised patient data collected from 945 general practices across the UK. Two tree-based models (decision trees and random forest) are developed and implemented. The performance of the two models is compared with a logistic regression model in terms of accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity and specificity, and both achieve better results. However, as for interpretability, it was found that, unlike coefficients in logistic regression, the default feature importance is non-negative in random forests and decision trees. This makes tree-based models less interpretable than logistic regression.

KW - CPRD

KW - decision tree

KW - explainable AI

KW - lung cancer early detection

KW - lung cancer screening

KW - primary care

KW - random forest

UR - http://www.scopus.com/inward/record.url?scp=85164015365&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-34344-5_4

DO - 10.1007/978-3-031-34344-5_4

M3 - Conference contribution

AN - SCOPUS:85164015365

SN - 9783031343438

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 35

EP - 39

BT - Artificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings

A2 - Juarez, Jose M.

A2 - Marcos, Mar

A2 - Stiglic, Gregor

A2 - Tucker, Allan

PB - Springer Science and Business Media Deutschland GmbH

T2 - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023

Y2 - 12 June 2023 through 15 June 2023

ER -

Rai T, Shen Y, Kaur J, He J, Mahmud M, Brown DJ et al. Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data. In Juarez JM, Marcos M, Stiglic G, Tucker A, editors, Artificial Intelligence in Medicine - 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. p. 35-39. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-34344-5_4