Vision Foundation Model Guided Multimodal Fusion Network for Remote Sensing Semantic Segmentation

  • Chen Pan
  • , Xijian Fan
  • , Tardi Tjahjadi
  • , Haiyan Guan
  • , Liyong Fu
  • , Qiaolin Ye
  • , Ruili Wang

Research output: Journal PublicationArticlepeer-review

5 Citations (Scopus)

Abstract

With the rapid development of Earth observation sensors, the fusion of remote sensing (RS) data in multimodal semantic segmentation has garnered significant research focus in recent years. The fusion of multimodal data presents challenges due to discrepancies in image acquisition mechanisms among different sensors, leading to misalignment issues. To mitigate this challenge, this article presents VSGNet, a novel multimodal fusion framework designed for RS semantic segmentation. The work aims to utilize vision structure guidance derived by vision foundation model for accurate segmentation without the need for auxiliary sensors. Specifically, the framework incorporates a cross-modal collaborative network for feature embedding that blends a convolutional neural network and vision transformer to simultaneously capture both local information and long-range dependencies from the input modalities. Subsequently, a multiscale cross-modal feature fusion comprising fusion enhancement and feature recalibration modules is proposed to emphasize the adaptive multiscale interaction of diverse complementary cues between each modality while suppressing the impact of noise and uncertainties present in RS data. Extensive experiments conducted on four diverse RS datasets, i.e., ISPRS Potsdam, ISPRS Vaihingen, LoveDA, and tree mapping, demonstrate VSGNet outperforms state-of-the-art RS semantic segmentation models.

Original languageEnglish
Pages (from-to)9409-9431
Number of pages23
JournalIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Volume18
DOIs
Publication statusPublished - 2025
Externally publishedYes

Free Keywords

  • Cross-modal fusion
  • land cover mapping
  • semantic segmentation
  • vision foundation model (VFM)

ASJC Scopus subject areas

  • Computers in Earth Sciences
  • Atmospheric Science

Fingerprint

Dive into the research topics of 'Vision Foundation Model Guided Multimodal Fusion Network for Remote Sensing Semantic Segmentation'. Together they form a unique fingerprint.

Cite this