CWT-ViT: A time–frequency representation and vision transformer-based framework for automated robotic surgical skill assessment

Yiming Zhang, Ying Weng, Boding Wang

Research output: Journal PublicationArticlepeer-review

1 Citation (Scopus)

Abstract

Surgical skill assessment currently hinges on the manual observations of senior surgeons, and the assessment process is inherently time-consuming and subjective. Hence, there is a need to develop machine learning-based automated robotic surgical skill assessment. However, the existing machine learning-based works are only built in either the time domain or frequency domain but have never considered the investigation on the time–frequency domain. To fill the research gap, we explore the representation of the surgery motion data in the time–frequency domain. In this study, we propose a novel automated robotic surgical skill assessment framework called Continuous Wavelet Transform-Vision Transformer (CWT-ViT). We apply continuous wavelet transform, i.e., a time–frequency representation method, to convert robotic surgery kinematic data to synthesis images. Furthermore, by taking advantage of the prior knowledge of the da Vinci surgical system, we design a four branches-based architecture, each branch representing a robotic manipulator. We have conducted extensive experiments and achieved comparable results on the benchmark robotic surgical skill dataset JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our proposed CWT-ViT framework has demonstrated the feasibility of applying time–frequency representation for automated robotic surgical skill assessment using kinematic data. The code is available at https://github.com/yiming95/CWT-ViT-Surgery.

Original languageEnglish
Article number125064
JournalExpert Systems with Applications
Volume258
DOIs
Publication statusPublished - 15 Dec 2024

Keywords

  • Automated robotic surgical skill assessment
  • Decision support systems
  • Deep learning
  • Medical data analysis
  • Surgical data science
  • Time–frequency representation

ASJC Scopus subject areas

  • General Engineering
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'CWT-ViT: A time–frequency representation and vision transformer-based framework for automated robotic surgical skill assessment'. Together they form a unique fingerprint.

Cite this