Abstract
Optical flow and scene flow from images and point clouds serve to jointly estimate the motion field, which has extensive applications in robotics. Using two complementary modalities, the fusion estimation process often neglects the fact that visual images inherently contain more information, the reason being that visual information exhibits dense characteristics in perception, whereas point clouds sample the three-dimensional space sparsely and non-uniformly. In order to further exploit the fine-grained visual information and the complementarity between these two modalities, we propose a method for the bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention. The extracted features from both types of modalities are fused bidirectionally using geometric projection to preserve geometric relationships. By utilizing Gaussian attention on the fused features, the comprehensive spatial information embedded within both dense images and sparse point clouds is further exploited to capture the matching-prior knowledge. Since the proposed method can fully utilize the fine-grained information possessed by both modalities, it enables better joint estimation of the optical flow and scene flow. Experimental results show that the proposed method can achieve competitive performance on public datasets and prevails on the FlyingThings3D and KITTI datasets. The code of this work is available at https://github.com/HDU-ASL/camliga.
Original language | English |
---|---|
Article number | 443 |
Number of pages | 11 |
Journal | Signal, Image and Video Processing |
Volume | 19 |
DOIs | |
Publication status | Published - 2 Apr 2025 |
Keywords
- Optical flow
- Scene flow
- Multi-modal fusion
- Motion estimation