TY - GEN
T1 - ViTMed: Vision Transformer for Medical Image Analysis
T2 - 11th International Conference on Information and Communication Technology, ICoICT 2023
AU - Lim, Yu Jie
AU - Lim, Kian Ming
AU - Yang Chang, Roy Kwang
AU - Poo Lee, Chin
AU - Lim, Jit Yan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.
AB - The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.
KW - Attention
KW - COVID-19
KW - CT-Scan
KW - Medical Image Analysis
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85174423662&partnerID=8YFLogxK
U2 - 10.1109/ICoICT58202.2023.10262548
DO - 10.1109/ICoICT58202.2023.10262548
M3 - Conference contribution
AN - SCOPUS:85174423662
T3 - International Conference on ICT Convergence
SP - 277
EP - 282
BT - 2023 11th International Conference on Information and Communication Technology, ICoICT 2023
PB - IEEE Computer Society
Y2 - 23 August 2023 through 24 August 2023
ER -