MDNet: Multi-Modal Cooperative Perception via Spatial Alignment of Modal Decision-Making

Junyang He, Xiaoheng Deng, Jinsong Gui, Tao Zhang, Xiangjian He

Research output: Journal PublicationArticlepeer-review

Abstract

Through internet of things (IOT) communication technology, collaborative perception enhances a vehicle's capacity to discern its surroundings while driving by integrating and synchronizing sensor data from multiple agents. With the advancement of cooperative perception techniques in single-modality methods, there has been a growing trend toward integrating multi-modal data from heterogeneous sensors in recent years. However, due to the data heterogeneity inherent in diverse sensors, Bird's Eye View (BEV) maps generated from different types of sensors may exhibit local discrepancies in the spatial representation of entity positions. Furthermore, individual agents may produce uncertain and flawed feature representations in real noisy environments. The influence of this indeterminacy exacerbates the issue of local inconsistency, leading to misalignment of the detected target during BEV alignment and fusion, thereby reducing detection accuracy. To address these problems, we propose a modal decision-making spatial alignment cooperative perception network (MDNet). First, the network generates BEV feature maps through dense depth image supervision for voxel feature extraction and model-guided selective feature fusion. Subsequently, we achieve enhanced accuracy in object detection by performing spatial alignment of BEV representations generated from two distinct sensors, both globally and locally within the spatial domain. Besides, we employ a cascaded centralized pyramid strategy during the message fusion stage, facilitating flexible sampling across horizontal and vertical spatial dimensions, promoting deep interaction among multiple agents. We conduct quantitative and qualitative experiments on the public OPV2V and DAIR-V2X-C benchmarks, and our proposed MDNet exhibits superior performance and stronger robustness in the 3D object detection task, providing more precise target detection results.

Original languageEnglish
JournalIEEE Internet of Things Journal
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • 3D Object Detection
  • Cooperative Perception
  • Internet of Things (IOT)
  • Multi-Agent Perception
  • Multi-Modal Fusion
  • Vehicle-to-Vehicle Application

ASJC Scopus subject areas

  • Signal Processing
  • Information Systems
  • Hardware and Architecture
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'MDNet: Multi-Modal Cooperative Perception via Spatial Alignment of Modal Decision-Making'. Together they form a unique fingerprint.

Cite this