CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder

Ye Huang; Di Kang; Liang Chen; Wenjing Jia; Xiangjian He; Lixin Duan; Xuefei Zhe; Linchao Bao

doi:10.1109/TCSVT.2024.3395132

CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder

Ye Huang, Di Kang, Liang Chen, Wenjing Jia, Xiangjian He, Lixin Duan, Xuefei Zhe, Linchao Bao

School of Computer Science

Research output: Journal Publication › Article › peer-review

Abstract

Semantic segmentation has recently achieved notable advances by exploiting 'class-level' contextual information during learning, e.g., the Object Contextual Representation (OCR) and Context Prior (CPNet) approaches. However, these approaches simply concatenate class-level information to pixel features to boost pixel representation learning, which cannot fully utilize intra-class and inter-class contextual information. Moreover, these approaches learn soft class centers based on coarse mask prediction, which is prone to error accumulation. To better exploit class-level information, we propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning, motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with. Moreover, we design a dedicated decoder for CAR (named CARD), which consists of a novel spatial token mixer and an upsampling module, to maximize its gain for existing baselines while being highly efficient in terms of computational cost. Specifically, CAR consists of three novel loss functions. The first loss function encourages more compact class representations within each class, the second directly maximizes the distance between different class centers, and the third further pushes the distance between inter-class centers and pixels. Furthermore, the class center in our approach is directly generated from ground truth instead of from the error-prone coarse prediction. CAR can be directly applied to most existing segmentation models during training, including OCR and CPNet, and can largely improve their accuracy at no additional inference overhead. Extensive experiments and ablation studies conducted on multiple benchmark datasets demonstrate that the proposed CAR can boost the accuracy of all baseline models by up to 2.23% mIOU with superior generalization ability. CARD outperforms state-of-the-art approaches on multiple benchmarks with a highly efficient architecture. The code will be available at https://github.com/edwardyehuang/CAR.

Original language	English
Pages (from-to)	9024-9038
Number of pages	15
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	34
Issue number	10
DOIs	https://doi.org/10.1109/TCSVT.2024.3395132
Publication status	Published - 2024

Keywords

COCOStuff
Pascal context
Semantic segmentation
cityscapes
representation learning

ASJC Scopus subject areas

Media Technology
Electrical and Electronic Engineering

Access to Document

10.1109/TCSVT.2024.3395132

Cite this

@article{2ffed19a0adf4f00b6ffe1c8dd120ce6,

title = "CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder",

abstract = "Semantic segmentation has recently achieved notable advances by exploiting 'class-level' contextual information during learning, e.g., the Object Contextual Representation (OCR) and Context Prior (CPNet) approaches. However, these approaches simply concatenate class-level information to pixel features to boost pixel representation learning, which cannot fully utilize intra-class and inter-class contextual information. Moreover, these approaches learn soft class centers based on coarse mask prediction, which is prone to error accumulation. To better exploit class-level information, we propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning, motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with. Moreover, we design a dedicated decoder for CAR (named CARD), which consists of a novel spatial token mixer and an upsampling module, to maximize its gain for existing baselines while being highly efficient in terms of computational cost. Specifically, CAR consists of three novel loss functions. The first loss function encourages more compact class representations within each class, the second directly maximizes the distance between different class centers, and the third further pushes the distance between inter-class centers and pixels. Furthermore, the class center in our approach is directly generated from ground truth instead of from the error-prone coarse prediction. CAR can be directly applied to most existing segmentation models during training, including OCR and CPNet, and can largely improve their accuracy at no additional inference overhead. Extensive experiments and ablation studies conducted on multiple benchmark datasets demonstrate that the proposed CAR can boost the accuracy of all baseline models by up to 2.23% mIOU with superior generalization ability. CARD outperforms state-of-the-art approaches on multiple benchmarks with a highly efficient architecture. The code will be available at https://github.com/edwardyehuang/CAR.",

keywords = "COCOStuff, Pascal context, Semantic segmentation, cityscapes, representation learning",

author = "Ye Huang and Di Kang and Liang Chen and Wenjing Jia and Xiangjian He and Lixin Duan and Xuefei Zhe and Linchao Bao",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2024",

doi = "10.1109/TCSVT.2024.3395132",

language = "English",

volume = "34",

pages = "9024--9038",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "10",

}

TY - JOUR

T1 - CARD

T2 - Semantic Segmentation with Efficient Class-Aware Regularized Decoder

AU - Huang, Ye

AU - Kang, Di

AU - Chen, Liang

AU - Jia, Wenjing

AU - He, Xiangjian

AU - Duan, Lixin

AU - Zhe, Xuefei

AU - Bao, Linchao

PY - 2024

Y1 - 2024

N2 - Semantic segmentation has recently achieved notable advances by exploiting 'class-level' contextual information during learning, e.g., the Object Contextual Representation (OCR) and Context Prior (CPNet) approaches. However, these approaches simply concatenate class-level information to pixel features to boost pixel representation learning, which cannot fully utilize intra-class and inter-class contextual information. Moreover, these approaches learn soft class centers based on coarse mask prediction, which is prone to error accumulation. To better exploit class-level information, we propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning, motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with. Moreover, we design a dedicated decoder for CAR (named CARD), which consists of a novel spatial token mixer and an upsampling module, to maximize its gain for existing baselines while being highly efficient in terms of computational cost. Specifically, CAR consists of three novel loss functions. The first loss function encourages more compact class representations within each class, the second directly maximizes the distance between different class centers, and the third further pushes the distance between inter-class centers and pixels. Furthermore, the class center in our approach is directly generated from ground truth instead of from the error-prone coarse prediction. CAR can be directly applied to most existing segmentation models during training, including OCR and CPNet, and can largely improve their accuracy at no additional inference overhead. Extensive experiments and ablation studies conducted on multiple benchmark datasets demonstrate that the proposed CAR can boost the accuracy of all baseline models by up to 2.23% mIOU with superior generalization ability. CARD outperforms state-of-the-art approaches on multiple benchmarks with a highly efficient architecture. The code will be available at https://github.com/edwardyehuang/CAR.

AB - Semantic segmentation has recently achieved notable advances by exploiting 'class-level' contextual information during learning, e.g., the Object Contextual Representation (OCR) and Context Prior (CPNet) approaches. However, these approaches simply concatenate class-level information to pixel features to boost pixel representation learning, which cannot fully utilize intra-class and inter-class contextual information. Moreover, these approaches learn soft class centers based on coarse mask prediction, which is prone to error accumulation. To better exploit class-level information, we propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning, motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with. Moreover, we design a dedicated decoder for CAR (named CARD), which consists of a novel spatial token mixer and an upsampling module, to maximize its gain for existing baselines while being highly efficient in terms of computational cost. Specifically, CAR consists of three novel loss functions. The first loss function encourages more compact class representations within each class, the second directly maximizes the distance between different class centers, and the third further pushes the distance between inter-class centers and pixels. Furthermore, the class center in our approach is directly generated from ground truth instead of from the error-prone coarse prediction. CAR can be directly applied to most existing segmentation models during training, including OCR and CPNet, and can largely improve their accuracy at no additional inference overhead. Extensive experiments and ablation studies conducted on multiple benchmark datasets demonstrate that the proposed CAR can boost the accuracy of all baseline models by up to 2.23% mIOU with superior generalization ability. CARD outperforms state-of-the-art approaches on multiple benchmarks with a highly efficient architecture. The code will be available at https://github.com/edwardyehuang/CAR.

KW - COCOStuff

KW - Pascal context

KW - Semantic segmentation

KW - cityscapes

KW - representation learning

UR - http://www.scopus.com/inward/record.url?scp=85192186308&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2024.3395132

DO - 10.1109/TCSVT.2024.3395132

M3 - Article

AN - SCOPUS:85192186308

SN - 1051-8215

VL - 34

SP - 9024

EP - 9038

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 10

ER -

CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this