Rethinking feature aggregation for deep RGB-D salient object detection

Yuan fang Zhang, Jiangbin Zheng, Long Li, Nian Liu, Wenjing Jia, Xiaochen Fan, Chengpei Xu, Xiangjian He

Research output: Journal PublicationArticlepeer-review

7 Citations (Scopus)


Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.

Original languageEnglish
Pages (from-to)463-473
Number of pages11
Publication statusPublished - 29 Jan 2021
Externally publishedYes


  • Feature aggregation
  • Gated attention
  • RGB-D saliency detection
  • UNet

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence


Dive into the research topics of 'Rethinking feature aggregation for deep RGB-D salient object detection'. Together they form a unique fingerprint.

Cite this