Global–local multi-stage temporal convolutional network for cataract surgery phase recognition

Lixin Fang, Lei Mou, Yuanyuan Gu, Yan Hu, Bang Chen, Xu Chen, Yang Wang, Jiang Liu, Yitian Zhao

Research output: Journal PublicationArticlepeer-review


Background: Surgical video phase recognition is an essential technique in computer-assisted surgical systems for monitoring surgical procedures, which can assist surgeons in standardizing procedures and enhancing postsurgical assessment and indexing. However, the high similarity between the phases and temporal variations of cataract videos still poses the greatest challenge for video phase recognition. Methods: In this paper, we introduce a global–local multi-stage temporal convolutional network (GL-MSTCN) to explore the subtle differences between high similarity surgical phases and mitigate the temporal variations of surgical videos. The presented work consists of a triple-stream network (i.e., pupil stream, instrument stream, and video frame stream) and a multi-stage temporal convolutional network. The triple-stream network first detects the pupil and surgical instruments regions in the frame separately and then obtains the fine-grained semantic features of the video frames. The proposed multi-stage temporal convolutional network improves the surgical phase recognition performance by capturing longer time series features through dilated convolutional layers with varying receptive fields. Results: Our method is thoroughly validated on the CSVideo dataset with 32 cataract surgery videos and the public Cataract101 dataset with 101 cataract surgery videos, outperforming state-of-the-art approaches with 95.8% and 96.5% accuracy, respectively. Conclusions: The experimental results show that the use of global and local feature information can effectively enhance the model to explore fine-grained features and mitigate temporal and spatial variations, thus improving the surgical phase recognition performance of the proposed GL-MSTCN.

Original languageEnglish
Article number82
JournalBioMedical Engineering Online
Issue number1
Publication statusPublished - Dec 2022
Externally publishedYes


  • Cataract surgery videos
  • Deep learning
  • Surgical phase recognition
  • Temporal convolutional networks

ASJC Scopus subject areas

  • Radiological and Ultrasound Technology
  • Biomaterials
  • Biomedical Engineering
  • Radiology Nuclear Medicine and imaging


Dive into the research topics of 'Global–local multi-stage temporal convolutional network for cataract surgery phase recognition'. Together they form a unique fingerprint.

Cite this