Bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention

Weichen Dai, Xiaoyang Weng, Donglei SUN, Yuhang Ming, Wanzeng Kong

Research output: Journal PublicationArticlepeer-review

Abstract

Optical flow and scene flow from images and point clouds serve to jointly estimate the motion field, which has extensive applications in robotics. Using two complementary modalities, the fusion estimation process often neglects the fact that visual images inherently contain more information, the reason being that visual information exhibits dense characteristics in perception, whereas point clouds sample the three-dimensional space sparsely and non-uniformly. In order to further exploit the fine-grained visual information and the complementarity between these two modalities, we propose a method for the bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention. The extracted features from both types of modalities are fused bidirectionally using geometric projection to preserve geometric relationships. By utilizing Gaussian attention on the fused features, the comprehensive spatial information embedded within both dense images and sparse point clouds is further exploited to capture the matching-prior knowledge. Since the proposed method can fully utilize the fine-grained information possessed by both modalities, it enables better joint estimation of the optical flow and scene flow. Experimental results show that the proposed method can achieve competitive performance on public datasets and prevails on the FlyingThings3D and KITTI datasets. The code of this work is available at https://github.com/HDU-ASL/camliga.
Original languageEnglish
Article number443
Number of pages11
JournalSignal, Image and Video Processing
Volume19
DOIs
Publication statusPublished - 2 Apr 2025

Keywords

  • Optical flow
  • Scene flow
  • Multi-modal fusion
  • Motion estimation

Fingerprint

Dive into the research topics of 'Bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention'. Together they form a unique fingerprint.

Cite this