CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation

Jinheng Xie, Songhe Deng, Xianxu Hou, Zhaochuan Luo, Linlin Shen, Yawen Huang, Yefeng Zheng, Mike Zheng Shou

Research output: Journal PublicationArticlepeer-review

Abstract

While promising results have been achieved in weakly-supervised semantic segmentation (WSSS), limited supervision from image-level tags inevitably induces discriminative reliance and spurious relations between target classes and background regions. Thus, Class Activation Map (CAM) usually tends to activate discriminative object regions and falsely includes lots of class-related backgrounds. Without pixel-level supervisions, it could be very difficult to enlarge the foreground activation and suppress those false activation of background regions. In this paper, we propose a novel framework of Cross Language Image Matching with Automatic Context Discovery (CLIMS++), based on the recently introduced Contrastive Language-Image Pre-training (CLIP) model, for WSSS. The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress class-related background regions in CAM. In particular, we design object, background region, and text label matching losses to guide the model to excite more reasonable object regions of each category. In addition, we propose to automatically find spurious relations between foreground categories and backgrounds, through which a background suppression loss is designed to suppress the activation of class-related backgrounds. The above designs enable the proposed CLIMS++ to generate a more complete and compact activation map for the target objects. Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 datasets show that our CLIMS++ significantly outperforms the previous state-of-the-art methods.

Original languageEnglish
Pages (from-to)5569-5588
Number of pages20
JournalInternational Journal of Computer Vision
Volume133
Issue number8
DOIs
Publication statusPublished - Aug 2025
Externally publishedYes

Keywords

  • Multi-modal learning
  • Semantic segmentation
  • Weakly-supervised learning

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation'. Together they form a unique fingerprint.

Cite this