Bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention

Weichen Dai; Xiaoyang Weng; Donglei SUN; Yuhang Ming; Wanzeng Kong

doi:10.1007/s11760-025-04042-6

Bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention

Weichen Dai, Xiaoyang Weng, Donglei SUN, Yuhang Ming, Wanzeng Kong

Department of Mechanical, Materials and Manufacturing Engineering

Research output: Journal Publication › Article › peer-review

Abstract

Optical flow and scene flow from images and point clouds serve to jointly estimate the motion field, which has extensive applications in robotics. Using two complementary modalities, the fusion estimation process often neglects the fact that visual images inherently contain more information, the reason being that visual information exhibits dense characteristics in perception, whereas point clouds sample the three-dimensional space sparsely and non-uniformly. In order to further exploit the fine-grained visual information and the complementarity between these two modalities, we propose a method for the bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention. The extracted features from both types of modalities are fused bidirectionally using geometric projection to preserve geometric relationships. By utilizing Gaussian attention on the fused features, the comprehensive spatial information embedded within both dense images and sparse point clouds is further exploited to capture the matching-prior knowledge. Since the proposed method can fully utilize the fine-grained information possessed by both modalities, it enables better joint estimation of the optical flow and scene flow. Experimental results show that the proposed method can achieve competitive performance on public datasets and prevails on the FlyingThings3D and KITTI datasets. The code of this work is available at https://github.com/HDU-ASL/camliga.

Original language	English
Article number	443
Number of pages	11
Journal	Signal, Image and Video Processing
Volume	19
DOIs	https://doi.org/10.1007/s11760-025-04042-6
Publication status	Published - 2 Apr 2025

Keywords

Optical flow
Scene flow
Multi-modal fusion
Motion estimation

Access to Document

10.1007/s11760-025-04042-6

Cite this

@article{d3731d7781154b89a5111a4e0b79a4c2,

title = "Bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention",

abstract = "Optical flow and scene flow from images and point clouds serve to jointly estimate the motion field, which has extensive applications in robotics. Using two complementary modalities, the fusion estimation process often neglects the fact that visual images inherently contain more information, the reason being that visual information exhibits dense characteristics in perception, whereas point clouds sample the three-dimensional space sparsely and non-uniformly. In order to further exploit the fine-grained visual information and the complementarity between these two modalities, we propose a method for the bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention. The extracted features from both types of modalities are fused bidirectionally using geometric projection to preserve geometric relationships. By utilizing Gaussian attention on the fused features, the comprehensive spatial information embedded within both dense images and sparse point clouds is further exploited to capture the matching-prior knowledge. Since the proposed method can fully utilize the fine-grained information possessed by both modalities, it enables better joint estimation of the optical flow and scene flow. Experimental results show that the proposed method can achieve competitive performance on public datasets and prevails on the FlyingThings3D and KITTI datasets. The code of this work is available at https://github.com/HDU-ASL/camliga.",

keywords = "Optical flow, Scene flow, Multi-modal fusion, Motion estimation",

author = "Weichen Dai and Xiaoyang Weng and Donglei SUN and Yuhang Ming and Wanzeng Kong",

year = "2025",

month = apr,

day = "2",

doi = "10.1007/s11760-025-04042-6",

language = "English",

volume = "19",

journal = "Signal, Image and Video Processing",

issn = "1863-1703",

publisher = "Springer London",

}

TY - JOUR

T1 - Bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention

AU - Dai, Weichen

AU - Weng, Xiaoyang

AU - SUN, Donglei

AU - Ming, Yuhang

AU - Kong, Wanzeng

PY - 2025/4/2

Y1 - 2025/4/2

N2 - Optical flow and scene flow from images and point clouds serve to jointly estimate the motion field, which has extensive applications in robotics. Using two complementary modalities, the fusion estimation process often neglects the fact that visual images inherently contain more information, the reason being that visual information exhibits dense characteristics in perception, whereas point clouds sample the three-dimensional space sparsely and non-uniformly. In order to further exploit the fine-grained visual information and the complementarity between these two modalities, we propose a method for the bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention. The extracted features from both types of modalities are fused bidirectionally using geometric projection to preserve geometric relationships. By utilizing Gaussian attention on the fused features, the comprehensive spatial information embedded within both dense images and sparse point clouds is further exploited to capture the matching-prior knowledge. Since the proposed method can fully utilize the fine-grained information possessed by both modalities, it enables better joint estimation of the optical flow and scene flow. Experimental results show that the proposed method can achieve competitive performance on public datasets and prevails on the FlyingThings3D and KITTI datasets. The code of this work is available at https://github.com/HDU-ASL/camliga.

AB - Optical flow and scene flow from images and point clouds serve to jointly estimate the motion field, which has extensive applications in robotics. Using two complementary modalities, the fusion estimation process often neglects the fact that visual images inherently contain more information, the reason being that visual information exhibits dense characteristics in perception, whereas point clouds sample the three-dimensional space sparsely and non-uniformly. In order to further exploit the fine-grained visual information and the complementarity between these two modalities, we propose a method for the bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention. The extracted features from both types of modalities are fused bidirectionally using geometric projection to preserve geometric relationships. By utilizing Gaussian attention on the fused features, the comprehensive spatial information embedded within both dense images and sparse point clouds is further exploited to capture the matching-prior knowledge. Since the proposed method can fully utilize the fine-grained information possessed by both modalities, it enables better joint estimation of the optical flow and scene flow. Experimental results show that the proposed method can achieve competitive performance on public datasets and prevails on the FlyingThings3D and KITTI datasets. The code of this work is available at https://github.com/HDU-ASL/camliga.

KW - Optical flow

KW - Scene flow

KW - Multi-modal fusion

KW - Motion estimation

U2 - 10.1007/s11760-025-04042-6

DO - 10.1007/s11760-025-04042-6

M3 - Article

SN - 1863-1703

VL - 19

JO - Signal, Image and Video Processing

JF - Signal, Image and Video Processing

M1 - 443

ER -

Bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention

Abstract

Keywords

Access to Document

Fingerprint

Cite this