Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition

Weicheng Xie; Zhibin Peng; Linlin Shen; Wenya Lu; Yang Zhang; Siyang Song

doi:10.1109/TIP.2024.3378459

Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition

Weicheng Xie, Zhibin Peng, Linlin Shen, Wenya Lu, Yang Zhang, Siyang Song

Research output: Journal Publication › Article › peer-review

7 Citations (Scopus)

Abstract

Convolutional neural networks (CNNs) have achieved significant improvement for the task of facial expression recognition. However, current training still suffers from the inconsistent learning intensities among different layers, i.e., the feature representations in the shallow layers are not sufficiently learned compared with those in deep layers. To this end, this work proposes a contrastive learning framework to align the feature semantics of shallow and deep layers, followed by an attention module for representing the multi-scale features in the weight-adaptive manner. The proposed algorithm has three main merits. First, the learning intensity, defined as the magnitude of the backpropagation gradient, of the features on the shallow layer is enhanced by cross-layer contrastive learning. Second, the latent semantics in the shallow-layer and deep-layer features are explored and aligned in the contrastive learning, and thus the fine-grained characteristics of expressions can be taken into account for the feature representation learning. Third, by integrating the multi-scale features from multiple layers with an attention module, our algorithm achieved the state-of-the-art performances, i.e. 92.21%, 89.50%, 62.82%, on three in-the-wild expression databases, i.e. RAF-DB, FERPlus, SFEW, and the second best performance, i.e. 65.29% on AffectNet dataset. Our codes will be made publicly available.

Original language	English
Pages (from-to)	2514-2529
Number of pages	16
Journal	IEEE Transactions on Image Processing
Volume	33
DOIs	https://doi.org/10.1109/TIP.2024.3378459
Publication status	Published - 2024
Externally published	Yes

Keywords

Facial expression recognition
contrastive learning
latent semantic alignment
multi-layer attention

ASJC Scopus subject areas

Software
Computer Graphics and Computer-Aided Design

Access to Document

10.1109/TIP.2024.3378459

Cite this

@article{5b7f9824e6d5410cab3abdce91763978,

title = "Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition",

abstract = "Convolutional neural networks (CNNs) have achieved significant improvement for the task of facial expression recognition. However, current training still suffers from the inconsistent learning intensities among different layers, i.e., the feature representations in the shallow layers are not sufficiently learned compared with those in deep layers. To this end, this work proposes a contrastive learning framework to align the feature semantics of shallow and deep layers, followed by an attention module for representing the multi-scale features in the weight-adaptive manner. The proposed algorithm has three main merits. First, the learning intensity, defined as the magnitude of the backpropagation gradient, of the features on the shallow layer is enhanced by cross-layer contrastive learning. Second, the latent semantics in the shallow-layer and deep-layer features are explored and aligned in the contrastive learning, and thus the fine-grained characteristics of expressions can be taken into account for the feature representation learning. Third, by integrating the multi-scale features from multiple layers with an attention module, our algorithm achieved the state-of-the-art performances, i.e. 92.21%, 89.50%, 62.82%, on three in-the-wild expression databases, i.e. RAF-DB, FERPlus, SFEW, and the second best performance, i.e. 65.29% on AffectNet dataset. Our codes will be made publicly available.",

keywords = "Facial expression recognition, contrastive learning, latent semantic alignment, multi-layer attention",

author = "Weicheng Xie and Zhibin Peng and Linlin Shen and Wenya Lu and Yang Zhang and Siyang Song",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2024",

doi = "10.1109/TIP.2024.3378459",

language = "English",

volume = "33",

pages = "2514--2529",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition

AU - Xie, Weicheng

AU - Peng, Zhibin

AU - Shen, Linlin

AU - Lu, Wenya

AU - Zhang, Yang

AU - Song, Siyang

PY - 2024

Y1 - 2024

N2 - Convolutional neural networks (CNNs) have achieved significant improvement for the task of facial expression recognition. However, current training still suffers from the inconsistent learning intensities among different layers, i.e., the feature representations in the shallow layers are not sufficiently learned compared with those in deep layers. To this end, this work proposes a contrastive learning framework to align the feature semantics of shallow and deep layers, followed by an attention module for representing the multi-scale features in the weight-adaptive manner. The proposed algorithm has three main merits. First, the learning intensity, defined as the magnitude of the backpropagation gradient, of the features on the shallow layer is enhanced by cross-layer contrastive learning. Second, the latent semantics in the shallow-layer and deep-layer features are explored and aligned in the contrastive learning, and thus the fine-grained characteristics of expressions can be taken into account for the feature representation learning. Third, by integrating the multi-scale features from multiple layers with an attention module, our algorithm achieved the state-of-the-art performances, i.e. 92.21%, 89.50%, 62.82%, on three in-the-wild expression databases, i.e. RAF-DB, FERPlus, SFEW, and the second best performance, i.e. 65.29% on AffectNet dataset. Our codes will be made publicly available.

AB - Convolutional neural networks (CNNs) have achieved significant improvement for the task of facial expression recognition. However, current training still suffers from the inconsistent learning intensities among different layers, i.e., the feature representations in the shallow layers are not sufficiently learned compared with those in deep layers. To this end, this work proposes a contrastive learning framework to align the feature semantics of shallow and deep layers, followed by an attention module for representing the multi-scale features in the weight-adaptive manner. The proposed algorithm has three main merits. First, the learning intensity, defined as the magnitude of the backpropagation gradient, of the features on the shallow layer is enhanced by cross-layer contrastive learning. Second, the latent semantics in the shallow-layer and deep-layer features are explored and aligned in the contrastive learning, and thus the fine-grained characteristics of expressions can be taken into account for the feature representation learning. Third, by integrating the multi-scale features from multiple layers with an attention module, our algorithm achieved the state-of-the-art performances, i.e. 92.21%, 89.50%, 62.82%, on three in-the-wild expression databases, i.e. RAF-DB, FERPlus, SFEW, and the second best performance, i.e. 65.29% on AffectNet dataset. Our codes will be made publicly available.

KW - Facial expression recognition

KW - contrastive learning

KW - latent semantic alignment

KW - multi-layer attention

UR - http://www.scopus.com/inward/record.url?scp=85189501402&partnerID=8YFLogxK

U2 - 10.1109/TIP.2024.3378459

DO - 10.1109/TIP.2024.3378459

M3 - Article

C2 - 38530732

AN - SCOPUS:85189501402

SN - 1057-7149

VL - 33

SP - 2514

EP - 2529

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

ER -

Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this