TY - JOUR
T1 - Development and Validation of a Deep Learning System to Detect Glaucomatous Optic Neuropathy Using Fundus Photographs
AU - Liu, Hanruo
AU - Li, Liu
AU - Wormstone, I. Michael
AU - Qiao, Chunyan
AU - Zhang, Chun
AU - Liu, Ping
AU - Li, Shuning
AU - Wang, Huaizhou
AU - Mou, Dapeng
AU - Pang, Ruiqi
AU - Yang, Diya
AU - Zangwill, Linda M.
AU - Moghimi, Sasan
AU - Hou, Huiyuan
AU - Bowd, Christopher
AU - Jiang, Lai
AU - Chen, Yihan
AU - Hu, Man
AU - Xu, Yongli
AU - Kang, Hong
AU - Ji, Xin
AU - Chang, Robert
AU - Tham, Clement
AU - Cheung, Carol
AU - Ting, Daniel Shu Wei
AU - Wong, Tien Yin
AU - Wang, Zulin
AU - Weinreb, Robert N.
AU - Xu, Mai
AU - Wang, Ningli
N1 - Publisher Copyright:
© 2019 American Medical Association. All rights reserved.
PY - 2019/12
Y1 - 2019/12
N2 - Importance: A deep learning system (DLS) that could automatically detect glaucomatous optic neuropathy (GON) with high sensitivity and specificity could expedite screening for GON. Objective: To establish a DLS for detection of GON using retinal fundus images and glaucoma diagnosis with convoluted neural networks (GD-CNN) that has the ability to be generalized across populations. Design, Setting, and Participants: In this cross-sectional study, a DLS for the classification of GON was developed for automated classification of GON using retinal fundus images obtained from the Chinese Glaucoma Study Alliance, the Handan Eye Study, and online databases. The researchers selected 241 032 images were selected as the training data set. The images were entered into the databases on June 9, 2009, obtained on July 11, 2018, and analyses were performed on December 15, 2018. The generalization of the DLS was tested in several validation data sets, which allowed assessment of the DLS in a clinical setting without exclusions, testing against variable image quality based on fundus photographs obtained from websites, evaluation in a population-based study that reflects a natural distribution of patients with glaucoma within the cohort and an additive data set that has a diverse ethnic distribution. An online learning system was established to transfer the trained and validated DLS to generalize the results with fundus images from new sources. To better understand the DLS decision-making process, a prediction visualization test was performed that identified regions of the fundus images utilized by the DLS for diagnosis. Exposures: Use of a deep learning system. Main Outcomes and Measures: Area under the receiver operating characteristics curve (AUC), sensitivity and specificity for DLS with reference to professional graders. Results: From a total of 274 413 fundus images initially obtained from CGSA, 269 601 images passed initial image quality review and were graded for GON. A total of 241 032 images (definite GON 29 865 [12.4%], probable GON 11 046 [4.6%], unlikely GON 200 121 [83%]) from 68 013 patients were selected using random sampling to train the GD-CNN model. Validation and evaluation of the GD-CNN model was assessed using the remaining 28 569 images from CGSA. The AUC of the GD-CNN model in primary local validation data sets was 0.996 (95% CI, 0.995-0.998), with sensitivity of 96.2% and specificity of 97.7%. The most common reason for both false-negative and false-positive grading by GD-CNN (51 of 119 [46.3%] and 191 of 588 [32.3%]) and manual grading (50 of 113 [44.2%] and 183 of 538 [34.0%]) was pathologic or high myopia. Conclusions and Relevance: Application of GD-CNN to fundus images from different settings and varying image quality demonstrated a high sensitivity, specificity, and generalizability for detecting GON. These findings suggest that automated DLS could enhance current screening programs in a cost-effective and time-efficient manner.
AB - Importance: A deep learning system (DLS) that could automatically detect glaucomatous optic neuropathy (GON) with high sensitivity and specificity could expedite screening for GON. Objective: To establish a DLS for detection of GON using retinal fundus images and glaucoma diagnosis with convoluted neural networks (GD-CNN) that has the ability to be generalized across populations. Design, Setting, and Participants: In this cross-sectional study, a DLS for the classification of GON was developed for automated classification of GON using retinal fundus images obtained from the Chinese Glaucoma Study Alliance, the Handan Eye Study, and online databases. The researchers selected 241 032 images were selected as the training data set. The images were entered into the databases on June 9, 2009, obtained on July 11, 2018, and analyses were performed on December 15, 2018. The generalization of the DLS was tested in several validation data sets, which allowed assessment of the DLS in a clinical setting without exclusions, testing against variable image quality based on fundus photographs obtained from websites, evaluation in a population-based study that reflects a natural distribution of patients with glaucoma within the cohort and an additive data set that has a diverse ethnic distribution. An online learning system was established to transfer the trained and validated DLS to generalize the results with fundus images from new sources. To better understand the DLS decision-making process, a prediction visualization test was performed that identified regions of the fundus images utilized by the DLS for diagnosis. Exposures: Use of a deep learning system. Main Outcomes and Measures: Area under the receiver operating characteristics curve (AUC), sensitivity and specificity for DLS with reference to professional graders. Results: From a total of 274 413 fundus images initially obtained from CGSA, 269 601 images passed initial image quality review and were graded for GON. A total of 241 032 images (definite GON 29 865 [12.4%], probable GON 11 046 [4.6%], unlikely GON 200 121 [83%]) from 68 013 patients were selected using random sampling to train the GD-CNN model. Validation and evaluation of the GD-CNN model was assessed using the remaining 28 569 images from CGSA. The AUC of the GD-CNN model in primary local validation data sets was 0.996 (95% CI, 0.995-0.998), with sensitivity of 96.2% and specificity of 97.7%. The most common reason for both false-negative and false-positive grading by GD-CNN (51 of 119 [46.3%] and 191 of 588 [32.3%]) and manual grading (50 of 113 [44.2%] and 183 of 538 [34.0%]) was pathologic or high myopia. Conclusions and Relevance: Application of GD-CNN to fundus images from different settings and varying image quality demonstrated a high sensitivity, specificity, and generalizability for detecting GON. These findings suggest that automated DLS could enhance current screening programs in a cost-effective and time-efficient manner.
UR - http://www.scopus.com/inward/record.url?scp=85072161323&partnerID=8YFLogxK
U2 - 10.1001/jamaophthalmol.2019.3501
DO - 10.1001/jamaophthalmol.2019.3501
M3 - Article
C2 - 31513266
AN - SCOPUS:85072161323
SN - 2168-6165
VL - 137
SP - 1353
EP - 1360
JO - JAMA Ophthalmology
JF - JAMA Ophthalmology
IS - 12
ER -