Cross-modal interaction and multi-source visual fusion for video generation in fetal cardiac screening

Guosong Zhu, Erqiang Deng, Zhen Qin, Fazlullah Khan, Wei Wei, Gautam Srivastava, Hu Xiong, Saru Kumari

Research output: Journal PublicationArticlepeer-review

Abstract

To address the limitation of preserving data for dynamic visualization in fetal ultrasound screening, a novel framework is proposed to facilitate the generation of fetal four-chamber echocardiogram videos, incorporating multi-source visual fusion and understanding. The framework utilizes an effective spectrogram-ultrasound synchronizer to align the ultrasound images with time, ensuring the generated video matches the actual heartbeat rhythm. It further employs effective frame interpolation techniques to synthesize a video by incorporating a nonlinear bidirectional motion prediction. By integrating a Transformer model for the autoregressive generation of visual semantic sequence, the proposed framework demonstrates its capability to generate high-resolution frames. Experimental outcomes show the Clip-Similarity of 96.23% and DINOv2-Similarity of 99.77%. Furthermore, a multimodal dataset of fetal echocardiogram examinations has been constructed.

Original languageEnglish
Article number102510
JournalInformation Fusion
Volume111
DOIs
Publication statusPublished - Nov 2024

Keywords

  • Cross-modal synchronization
  • Fetal echocardiogram scenario
  • Multi-source visual fusion and understanding
  • Transformer model
  • Visual data generation

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Cross-modal interaction and multi-source visual fusion for video generation in fetal cardiac screening'. Together they form a unique fingerprint.

Cite this