Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data

Yuan Shen; Jaspreet Kaur; Mufti Mahmud; David J. Brown; Jun He; Muhammad Arifur Rahman; David R. Baldwin; Emma O’Dowd; Richard B. Hubbard

doi:10.1007/978-981-99-1916-1_15

Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data

Yuan Shen, Jaspreet Kaur, Mufti Mahmud, David J. Brown, Jun He, Muhammad Arifur Rahman, David R. Baldwin, Emma O’Dowd, Richard B. Hubbard

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

Abstract

Lung cancer is one of the most deadly cancers in the world. Its mortality rate is high when the cancer is diagnosed late. Therefore, early detection is a crucial factor for an increase in survival rate, and lung cancer screening is one of the most important intervention tools. However, the screening would be cost-effective only when we can accurately select a sub-population which is at the most risk of lung cancer. It is hypothesised that this selection task can be done cost-effectively when we use clinical data (e.g. demographic, lifestyle and comorbidity variables) rather than low-dose CT. This work used the clinical data extracted from Clinical Practice Research Datalink (CPRD). The goal is to test whether this approach can achieve comparable or even better selection performance when compared to an alternative approach using clinical data from lung cancer screening trials. The latter approach is adopted in [54]. In this paper, we further adapt the logistic regression model for a joint classification and feature selection analysis. The model is implemented in an ‘ensemble learning’ manner to deal with severe ‘class imbalance’ problems. It is observed that the sensitivity and specificity results are slightly better than those reported in [54]. Also, we identified a comorbidity factor COPD and a smoking-related factor smk-status as the two most discriminative features.

Original language	English
Title of host publication	Proceedings of Trends in Electronics and Health Informatics - TEHI 2022
Editors	Mufti Mahmud, Claudia Mendoza-Barrera, M. Shamim Kaiser, Anirban Bandyopadhyay, Kanad Ray, Eduardo Lugo
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	191-206
Number of pages	16
ISBN (Print)	9789819919154
DOIs	https://doi.org/10.1007/978-981-99-1916-1_15
Publication status	Published - 2023
Externally published	Yes
Event	2nd International Conference on Trends in Electronics and Health Informatics, TEHI 2022 - Puebla, Mexico Duration: 7 Dec 2022 → 9 Dec 2022

Publication series

Name	Lecture Notes in Networks and Systems
Volume	675 LNNS
ISSN (Print)	2367-3370
ISSN (Electronic)	2367-3389

Conference

Conference	2nd International Conference on Trends in Electronics and Health Informatics, TEHI 2022
Country/Territory	Mexico
City	Puebla
Period	7/12/22 → 9/12/22

Keywords

CPRD
Cancer screening
Classification
Cost-effectiveness
Early detection
Feature selection
Imbalanced classification
Logistic regression
Lung cancer

ASJC Scopus subject areas

Control and Systems Engineering
Signal Processing
Computer Networks and Communications

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1007/978-981-99-1916-1_15

Cite this

Shen, Y., Kaur, J., Mahmud, M., Brown, D. J., He, J., Rahman, M. A., Baldwin, D. R., O’Dowd, E., & Hubbard, R. B. (2023). Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data. In M. Mahmud, C. Mendoza-Barrera, M. S. Kaiser, A. Bandyopadhyay, K. Ray, & E. Lugo (Eds.), Proceedings of Trends in Electronics and Health Informatics - TEHI 2022 (pp. 191-206). (Lecture Notes in Networks and Systems; Vol. 675 LNNS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-1916-1_15

Shen, Yuan ; Kaur, Jaspreet ; Mahmud, Mufti et al. / Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data. Proceedings of Trends in Electronics and Health Informatics - TEHI 2022. editor / Mufti Mahmud ; Claudia Mendoza-Barrera ; M. Shamim Kaiser ; Anirban Bandyopadhyay ; Kanad Ray ; Eduardo Lugo. Springer Science and Business Media Deutschland GmbH, 2023. pp. 191-206 (Lecture Notes in Networks and Systems).

@inproceedings{f5ead27240cc4900bf5991f2454ce3a1,

title = "Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data",

abstract = "Lung cancer is one of the most deadly cancers in the world. Its mortality rate is high when the cancer is diagnosed late. Therefore, early detection is a crucial factor for an increase in survival rate, and lung cancer screening is one of the most important intervention tools. However, the screening would be cost-effective only when we can accurately select a sub-population which is at the most risk of lung cancer. It is hypothesised that this selection task can be done cost-effectively when we use clinical data (e.g. demographic, lifestyle and comorbidity variables) rather than low-dose CT. This work used the clinical data extracted from Clinical Practice Research Datalink (CPRD). The goal is to test whether this approach can achieve comparable or even better selection performance when compared to an alternative approach using clinical data from lung cancer screening trials. The latter approach is adopted in [54]. In this paper, we further adapt the logistic regression model for a joint classification and feature selection analysis. The model is implemented in an {\textquoteleft}ensemble learning{\textquoteright} manner to deal with severe {\textquoteleft}class imbalance{\textquoteright} problems. It is observed that the sensitivity and specificity results are slightly better than those reported in [54]. Also, we identified a comorbidity factor COPD and a smoking-related factor smk-status as the two most discriminative features.",

keywords = "CPRD, Cancer screening, Classification, Cost-effectiveness, Early detection, Feature selection, Imbalanced classification, Logistic regression, Lung cancer",

author = "Yuan Shen and Jaspreet Kaur and Mufti Mahmud and Brown, {David J.} and Jun He and Rahman, {Muhammad Arifur} and Baldwin, {David R.} and Emma O{\textquoteright}Dowd and Hubbard, {Richard B.}",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.; 2nd International Conference on Trends in Electronics and Health Informatics, TEHI 2022 ; Conference date: 07-12-2022 Through 09-12-2022",

year = "2023",

doi = "10.1007/978-981-99-1916-1_15",

language = "English",

isbn = "9789819919154",

series = "Lecture Notes in Networks and Systems",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "191--206",

editor = "Mufti Mahmud and Claudia Mendoza-Barrera and Kaiser, {M. Shamim} and Anirban Bandyopadhyay and Kanad Ray and Eduardo Lugo",

booktitle = "Proceedings of Trends in Electronics and Health Informatics - TEHI 2022",

address = "Germany",

}

Shen, Y, Kaur, J, Mahmud, M, Brown, DJ, He, J, Rahman, MA, Baldwin, DR, O’Dowd, E & Hubbard, RB 2023, Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data. in M Mahmud, C Mendoza-Barrera, MS Kaiser, A Bandyopadhyay, K Ray & E Lugo (eds), Proceedings of Trends in Electronics and Health Informatics - TEHI 2022. Lecture Notes in Networks and Systems, vol. 675 LNNS, Springer Science and Business Media Deutschland GmbH, pp. 191-206, 2nd International Conference on Trends in Electronics and Health Informatics, TEHI 2022, Puebla, Mexico, 7/12/22. https://doi.org/10.1007/978-981-99-1916-1_15

Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data. / Shen, Yuan; Kaur, Jaspreet; Mahmud, Mufti et al.
Proceedings of Trends in Electronics and Health Informatics - TEHI 2022. ed. / Mufti Mahmud; Claudia Mendoza-Barrera; M. Shamim Kaiser; Anirban Bandyopadhyay; Kanad Ray; Eduardo Lugo. Springer Science and Business Media Deutschland GmbH, 2023. p. 191-206 (Lecture Notes in Networks and Systems; Vol. 675 LNNS).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data

AU - Shen, Yuan

AU - Kaur, Jaspreet

AU - Mahmud, Mufti

AU - Brown, David J.

AU - He, Jun

AU - Rahman, Muhammad Arifur

AU - Baldwin, David R.

AU - O’Dowd, Emma

AU - Hubbard, Richard B.

PY - 2023

Y1 - 2023

N2 - Lung cancer is one of the most deadly cancers in the world. Its mortality rate is high when the cancer is diagnosed late. Therefore, early detection is a crucial factor for an increase in survival rate, and lung cancer screening is one of the most important intervention tools. However, the screening would be cost-effective only when we can accurately select a sub-population which is at the most risk of lung cancer. It is hypothesised that this selection task can be done cost-effectively when we use clinical data (e.g. demographic, lifestyle and comorbidity variables) rather than low-dose CT. This work used the clinical data extracted from Clinical Practice Research Datalink (CPRD). The goal is to test whether this approach can achieve comparable or even better selection performance when compared to an alternative approach using clinical data from lung cancer screening trials. The latter approach is adopted in [54]. In this paper, we further adapt the logistic regression model for a joint classification and feature selection analysis. The model is implemented in an ‘ensemble learning’ manner to deal with severe ‘class imbalance’ problems. It is observed that the sensitivity and specificity results are slightly better than those reported in [54]. Also, we identified a comorbidity factor COPD and a smoking-related factor smk-status as the two most discriminative features.

AB - Lung cancer is one of the most deadly cancers in the world. Its mortality rate is high when the cancer is diagnosed late. Therefore, early detection is a crucial factor for an increase in survival rate, and lung cancer screening is one of the most important intervention tools. However, the screening would be cost-effective only when we can accurately select a sub-population which is at the most risk of lung cancer. It is hypothesised that this selection task can be done cost-effectively when we use clinical data (e.g. demographic, lifestyle and comorbidity variables) rather than low-dose CT. This work used the clinical data extracted from Clinical Practice Research Datalink (CPRD). The goal is to test whether this approach can achieve comparable or even better selection performance when compared to an alternative approach using clinical data from lung cancer screening trials. The latter approach is adopted in [54]. In this paper, we further adapt the logistic regression model for a joint classification and feature selection analysis. The model is implemented in an ‘ensemble learning’ manner to deal with severe ‘class imbalance’ problems. It is observed that the sensitivity and specificity results are slightly better than those reported in [54]. Also, we identified a comorbidity factor COPD and a smoking-related factor smk-status as the two most discriminative features.

KW - CPRD

KW - Cancer screening

KW - Classification

KW - Cost-effectiveness

KW - Early detection

KW - Feature selection

KW - Imbalanced classification

KW - Logistic regression

KW - Lung cancer

UR - http://www.scopus.com/inward/record.url?scp=85164968575&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-1916-1_15

DO - 10.1007/978-981-99-1916-1_15

M3 - Conference contribution

AN - SCOPUS:85164968575

SN - 9789819919154

T3 - Lecture Notes in Networks and Systems

SP - 191

EP - 206

BT - Proceedings of Trends in Electronics and Health Informatics - TEHI 2022

A2 - Mahmud, Mufti

A2 - Mendoza-Barrera, Claudia

A2 - Kaiser, M. Shamim

A2 - Bandyopadhyay, Anirban

A2 - Ray, Kanad

A2 - Lugo, Eduardo

PB - Springer Science and Business Media Deutschland GmbH

T2 - 2nd International Conference on Trends in Electronics and Health Informatics, TEHI 2022

Y2 - 7 December 2022 through 9 December 2022

ER -

Shen Y, Kaur J, Mahmud M, Brown DJ, He J, Rahman MA et al. Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data. In Mahmud M, Mendoza-Barrera C, Kaiser MS, Bandyopadhyay A, Ray K, Lugo E, editors, Proceedings of Trends in Electronics and Health Informatics - TEHI 2022. Springer Science and Business Media Deutschland GmbH. 2023. p. 191-206. (Lecture Notes in Networks and Systems). doi: 10.1007/978-981-99-1916-1_15