Skip to main navigation Skip to search Skip to main content

Cross-modal interaction and multi-source visual fusion for video generation in fetal cardiac screening

  • Guosong Zhu
  • , Erqiang Deng
  • , Zhen Qin
  • , Fazlullah Khan*
  • , Wei Wei
  • , Gautam Srivastava
  • , Hu Xiong
  • , Saru Kumari
  • *Corresponding author for this work

Research output: Journal PublicationArticlepeer-review

10 Citations (Scopus)

Abstract

To address the limitation of preserving data for dynamic visualization in fetal ultrasound screening, a novel framework is proposed to facilitate the generation of fetal four-chamber echocardiogram videos, incorporating multi-source visual fusion and understanding. The framework utilizes an effective spectrogram-ultrasound synchronizer to align the ultrasound images with time, ensuring the generated video matches the actual heartbeat rhythm. It further employs effective frame interpolation techniques to synthesize a video by incorporating a nonlinear bidirectional motion prediction. By integrating a Transformer model for the autoregressive generation of visual semantic sequence, the proposed framework demonstrates its capability to generate high-resolution frames. Experimental outcomes show the Clip-Similarity of 96.23% and DINOv2-Similarity of 99.77%. Furthermore, a multimodal dataset of fetal echocardiogram examinations has been constructed.

Original languageEnglish
Article number102510
JournalInformation Fusion
Volume111
DOIs
Publication statusPublished - Nov 2024

Free Keywords

  • Cross-modal synchronization
  • Fetal echocardiogram scenario
  • Multi-source visual fusion and understanding
  • Transformer model
  • Visual data generation

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Cross-modal interaction and multi-source visual fusion for video generation in fetal cardiac screening'. Together they form a unique fingerprint.

Cite this