Abstract
Weakly supervised salient object detection (WSOD) methods endeavor to boost sparse labels to get more salient cues in various ways. Among them, an effective approach is using pseudo labels from multiple unsupervised self-learning methods, but inaccurate and inconsistent pseudo labels could ultimately lead to detection performance degradation. To tackle this problem, we develop a new multi-source WSOD framework, WBNet, that can effectively utilize pseudo-background (non-salient region) labels combined with scribble labels to obtain more accurate salient features. We first design a comprehensive salient pseudo-mask generator from multiple self-learning features. Then, we pioneer the exploration of generating salient pseudo-labels via point-prompted and box-prompted Segment Anything Models (SAM). Then, WBNet leverages a pixel-level Feature Aggregation Module (FAM), a mask-level Transformer-decoder (TFD), and an auxiliary Boundary Prediction Module (EPM) with a hybrid loss function to handle complex saliency detection tasks. Comprehensively evaluated with state-of-the-art methods on five widely used datasets, the proposed method significantly improves saliency detection performance. The code and results are publicly available at https://github.com/yiwangtz/WBNet.
Original language | English |
---|---|
Article number | 110579 |
Journal | Pattern Recognition |
Volume | 154 |
DOIs | |
Publication status | Published - Oct 2024 |
Keywords
- Neural networks
- Pseudo labels
- Salient object detection
- Scribble labels
- Transformer
- Weakly supervision
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence