Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data

Yuan Shen, Jaspreet Kaur, Mufti Mahmud, David J. Brown, Jun He, Muhammad Arifur Rahman, David R. Baldwin, Emma O’Dowd, Richard B. Hubbard

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Lung cancer is one of the most deadly cancers in the world. Its mortality rate is high when the cancer is diagnosed late. Therefore, early detection is a crucial factor for an increase in survival rate, and lung cancer screening is one of the most important intervention tools. However, the screening would be cost-effective only when we can accurately select a sub-population which is at the most risk of lung cancer. It is hypothesised that this selection task can be done cost-effectively when we use clinical data (e.g. demographic, lifestyle and comorbidity variables) rather than low-dose CT. This work used the clinical data extracted from Clinical Practice Research Datalink (CPRD). The goal is to test whether this approach can achieve comparable or even better selection performance when compared to an alternative approach using clinical data from lung cancer screening trials. The latter approach is adopted in [54]. In this paper, we further adapt the logistic regression model for a joint classification and feature selection analysis. The model is implemented in an ‘ensemble learning’ manner to deal with severe ‘class imbalance’ problems. It is observed that the sensitivity and specificity results are slightly better than those reported in [54]. Also, we identified a comorbidity factor COPD and a smoking-related factor smk-status as the two most discriminative features.

Original languageEnglish
Title of host publicationProceedings of Trends in Electronics and Health Informatics - TEHI 2022
EditorsMufti Mahmud, Claudia Mendoza-Barrera, M. Shamim Kaiser, Anirban Bandyopadhyay, Kanad Ray, Eduardo Lugo
PublisherSpringer Science and Business Media Deutschland GmbH
Pages191-206
Number of pages16
ISBN (Print)9789819919154
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event2nd International Conference on Trends in Electronics and Health Informatics, TEHI 2022 - Puebla, Mexico
Duration: 7 Dec 20229 Dec 2022

Publication series

NameLecture Notes in Networks and Systems
Volume675 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference2nd International Conference on Trends in Electronics and Health Informatics, TEHI 2022
Country/TerritoryMexico
CityPuebla
Period7/12/229/12/22

Keywords

  • CPRD
  • Cancer screening
  • Classification
  • Cost-effectiveness
  • Early detection
  • Feature selection
  • Imbalanced classification
  • Logistic regression
  • Lung cancer

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Logistic Regression Approach to a Joint Classification and Feature Selection in Lung Cancer Screening Using CPRD Data'. Together they form a unique fingerprint.

Cite this