TY - GEN
T1 - EchoCardMAE
T2 - 28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025
AU - Yang, Xuan
AU - Xu, Rui
AU - Ye, Xinchen
AU - Wang, Zhihui
AU - Zhang, Miao
AU - Wang, Yi
AU - Fan, Xin
AU - Wang, Hongkai
AU - Yue, Qingxiong
AU - He, Xiangjian
AU - Chen, Yen Wei
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Echocardiography, a vital cardiac imaging modality, faces challenges due to limited annotated data, impeding the application of deep learning. This paper introduces EchoCardMAE, a customized masked video autoencoder framework designed to leverage unlabeled echocardiography data and enhance performance across diverse cardiac tasks. EchoCardMAE addresses key challenges in echocardiogram analysis through three innovations built upon masked video modeling (MVM): (1) Key Area Masking, which concentrates feature learning on the diagnostically relevant sector of the image; (2) Temporal-Invariant Alignment Loss, promoting feature consistency across different clips of the same echocardiogram; and (3) Reconstruction Denoising, improving robustness to speckle noise inherent in echocardiography. We comprehensively evaluated EchoCardMAE on three public datasets, demonstrating state-of-the-art results in ejection fraction (EF) estimation, Myocardial infarction (MI) prediction, and cardiac segmentation. For example, on the EchoNet-Dynamic dataset, EchoCardMAE achieved an EF estimation MAE of 3.78 and a left ventricular segmentation mDice of 92.96, surpassing existing methods. The code is available at https://github.com/m1dsolo/EchoCardMAE.
AB - Echocardiography, a vital cardiac imaging modality, faces challenges due to limited annotated data, impeding the application of deep learning. This paper introduces EchoCardMAE, a customized masked video autoencoder framework designed to leverage unlabeled echocardiography data and enhance performance across diverse cardiac tasks. EchoCardMAE addresses key challenges in echocardiogram analysis through three innovations built upon masked video modeling (MVM): (1) Key Area Masking, which concentrates feature learning on the diagnostically relevant sector of the image; (2) Temporal-Invariant Alignment Loss, promoting feature consistency across different clips of the same echocardiogram; and (3) Reconstruction Denoising, improving robustness to speckle noise inherent in echocardiography. We comprehensively evaluated EchoCardMAE on three public datasets, demonstrating state-of-the-art results in ejection fraction (EF) estimation, Myocardial infarction (MI) prediction, and cardiac segmentation. For example, on the EchoNet-Dynamic dataset, EchoCardMAE achieved an EF estimation MAE of 3.78 and a left ventricular segmentation mDice of 92.96, surpassing existing methods. The code is available at https://github.com/m1dsolo/EchoCardMAE.
KW - Echocardiography
KW - Foundation Model
KW - Mask Video Modeling
UR - https://www.scopus.com/pages/publications/105018077760
U2 - 10.1007/978-3-032-05169-1_17
DO - 10.1007/978-3-032-05169-1_17
M3 - Conference contribution
AN - SCOPUS:105018077760
SN - 9783032051684
T3 - Lecture Notes in Computer Science
SP - 171
EP - 180
BT - Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - 28th International Conference, 2025, Proceedings
A2 - Gee, James C.
A2 - Hong, Jaesung
A2 - Sudre, Carole H.
A2 - Golland, Polina
A2 - Park, Jinah
A2 - Alexander, Daniel C.
A2 - Iglesias, Juan Eugenio
A2 - Venkataraman, Archana
A2 - Kim, Jong Hyo
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 September 2025 through 27 September 2025
ER -