Abstract
Scribble annotations offer a practical alternative to pixel-wise labels in video salient object detection (V-SOD). However, their sparse foreground coverage and ambiguous boundaries introduce background interference and error propagation, degrading segmentation accuracy across frames. To address this issue, we propose a novel Knowledge-sharing Hierarchical Memory Fusion Network (KHMF-Net) for scribble-supervised V-SOD. The core of our framework is a Hierarchical Memory Bank (HMB) that stores initial scribbles, historical high-confidence regions, and historical full salient maps, enabling long-term spatiotemporal context modeling to suppress error propagation. Additionally, we introduce an Adaptive Memory Fusion (AMF) module to dynamically integrate multi-confidence features, providing reliable guidance during salient mask expansion. To address background interference, we design an Interactive Equalized Matching (IEM) module with reference-wise softmax, ensuring balanced contributions from reference frame pixels. A dual-attention knowledge-sharing mechanism is further proposed to enhance IEM by transferring high-performance attention features from a Teacher to a Student module, improving segmentation accuracy. Experimental results demonstrate that KHMF-Net's hierarchical memory architecture and effective background-target discrimination enable state-of-the-art performance on three scribble-annotated datasets, even exceeding some fully supervised approaches. The module and predicted maps are publicly available at https://github.com/TOMMYWHY/KHMF-Net.
| Original language | English |
|---|---|
| Pages (from-to) | 177-183 |
| Number of pages | 7 |
| Journal | Pattern Recognition Letters |
| Volume | 196 |
| DOIs | |
| Publication status | Published - Oct 2025 |
Keywords
- Knowledge-sharing
- Scribble-supervised
- Video salient object detection
- Weakly supervised
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence