ViTMed: Vision Transformer for Medical Image Analysis: Vision Transformer for Medical Image Analysis

Yu Jie Lim; Kian Ming Lim; Roy Kwang Yang Chang; Chin Poo Lee; Jit Yan Lim

doi:10.1109/ICoICT58202.2023.10262548

ViTMed: Vision Transformer for Medical Image Analysis: Vision Transformer for Medical Image Analysis

Yu Jie Lim, Kian Ming Lim, Roy Kwang Yang Chang, Chin Poo Lee, Jit Yan Lim

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

3 Citations (Scopus)

Abstract

The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.

Original language	English
Title of host publication	2023 11th International Conference on Information and Communication Technology, ICoICT 2023
Publisher	IEEE Computer Society
Pages	277-282
Number of pages	6
ISBN (Electronic)	9798350321982
DOIs	https://doi.org/10.1109/ICoICT58202.2023.10262548
Publication status	Published - 2023
Externally published	Yes
Event	11th International Conference on Information and Communication Technology, ICoICT 2023 - Melaka, Malaysia Duration: 23 Aug 2023 → 24 Aug 2023

Publication series

Name	International Conference on ICT Convergence
Volume	2023-August
ISSN (Print)	2162-1233
ISSN (Electronic)	2162-1241

Conference

Conference	11th International Conference on Information and Communication Technology, ICoICT 2023
Country/Territory	Malaysia
City	Melaka
Period	23/08/23 → 24/08/23

Keywords

Attention
COVID-19
CT-Scan
Medical Image Analysis
Vision Transformer

ASJC Scopus subject areas

Information Systems
Computer Networks and Communications

Access to Document

10.1109/ICoICT58202.2023.10262548

Cite this

Lim, Y. J., Lim, K. M., Yang Chang, R. K., Poo Lee, C., & Lim, J. Y. (2023). ViTMed: Vision Transformer for Medical Image Analysis: Vision Transformer for Medical Image Analysis. In 2023 11th International Conference on Information and Communication Technology, ICoICT 2023 (pp. 277-282). (International Conference on ICT Convergence; Vol. 2023-August). IEEE Computer Society. https://doi.org/10.1109/ICoICT58202.2023.10262548

@inproceedings{252ff324525d4e6a92ca1eac9541f41c,

title = "ViTMed: Vision Transformer for Medical Image Analysis: Vision Transformer for Medical Image Analysis",

abstract = "The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.",

keywords = "Attention, COVID-19, CT-Scan, Medical Image Analysis, Vision Transformer",

author = "Lim, {Yu Jie} and Lim, {Kian Ming} and {Yang Chang}, {Roy Kwang} and {Poo Lee}, Chin and Lim, {Jit Yan}",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 11th International Conference on Information and Communication Technology, ICoICT 2023 ; Conference date: 23-08-2023 Through 24-08-2023",

year = "2023",

doi = "10.1109/ICoICT58202.2023.10262548",

language = "English",

series = "International Conference on ICT Convergence",

publisher = "IEEE Computer Society",

pages = "277--282",

booktitle = "2023 11th International Conference on Information and Communication Technology, ICoICT 2023",

address = "United States",

}

Lim, YJ, Lim, KM, Yang Chang, RK, Poo Lee, C & Lim, JY 2023, ViTMed: Vision Transformer for Medical Image Analysis: Vision Transformer for Medical Image Analysis. in 2023 11th International Conference on Information and Communication Technology, ICoICT 2023. International Conference on ICT Convergence, vol. 2023-August, IEEE Computer Society, pp. 277-282, 11th International Conference on Information and Communication Technology, ICoICT 2023, Melaka, Malaysia, 23/08/23. https://doi.org/10.1109/ICoICT58202.2023.10262548

ViTMed: Vision Transformer for Medical Image Analysis: Vision Transformer for Medical Image Analysis. / Lim, Yu Jie; Lim, Kian Ming; Yang Chang, Roy Kwang et al.
2023 11th International Conference on Information and Communication Technology, ICoICT 2023. IEEE Computer Society, 2023. p. 277-282 (International Conference on ICT Convergence; Vol. 2023-August).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - ViTMed: Vision Transformer for Medical Image Analysis

T2 - 11th International Conference on Information and Communication Technology, ICoICT 2023

AU - Lim, Yu Jie

AU - Lim, Kian Ming

AU - Yang Chang, Roy Kwang

AU - Poo Lee, Chin

AU - Lim, Jit Yan

PY - 2023

Y1 - 2023

N2 - The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.

AB - The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.

KW - Attention

KW - COVID-19

KW - CT-Scan

KW - Medical Image Analysis

KW - Vision Transformer

UR - http://www.scopus.com/inward/record.url?scp=85174423662&partnerID=8YFLogxK

U2 - 10.1109/ICoICT58202.2023.10262548

DO - 10.1109/ICoICT58202.2023.10262548

M3 - Conference contribution

AN - SCOPUS:85174423662

T3 - International Conference on ICT Convergence

SP - 277

EP - 282

BT - 2023 11th International Conference on Information and Communication Technology, ICoICT 2023

PB - IEEE Computer Society

Y2 - 23 August 2023 through 24 August 2023

ER -

Lim YJ, Lim KM, Yang Chang RK, Poo Lee C, Lim JY. ViTMed: Vision Transformer for Medical Image Analysis: Vision Transformer for Medical Image Analysis. In 2023 11th International Conference on Information and Communication Technology, ICoICT 2023. IEEE Computer Society. 2023. p. 277-282. (International Conference on ICT Convergence). doi: 10.1109/ICoICT58202.2023.10262548

ViTMed: Vision Transformer for Medical Image Analysis: Vision Transformer for Medical Image Analysis

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this