Self-supervised learning of dynamic representations for static images

Siyang Song; Enrique Sanchez; Linlin Shen; Michel Valstar

doi:10.1109/ICPR48806.2021.9412942

Self-supervised learning of dynamic representations for static images

Siyang Song, Enrique Sanchez, Linlin Shen, Michel Valstar

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

9 Citations (Scopus)

Abstract

Facial actions are spatio-temporal signals by nature, and therefore their modeling is crucially dependent on the availability of temporal information. In this paper, we focus on inferring such temporal dynamics of facial actions when no explicit temporal information is available, i.e. from still images. We present a novel self-supervised learning approach to capture multiple scales of temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation. In particular: 1. We propose a framework that infers a dynamic representation (DR) from a still image, capturing the bi-directional flow of time within a short time-window centered at the input image; 2. We show that the proposed rank loss can apply facial temporal evolution to self-supervise the training process without using target representations, allowing the network to represent dynamics more broadly; 3. We propose a multiple temporal scale approach that infers DRs for different window lengths (MDR) from a still image. We empirically validate the value of our approach on the task of frame ranking, and show how our proposed MDR attains state of the art results on BP4D for AU intensity estimation and on SEMAINE for dimensional affect estimation, using only still images at test time.

Original language	English
Title of host publication	Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1619-1626
Number of pages	8
ISBN (Electronic)	9781728188089
DOIs	https://doi.org/10.1109/ICPR48806.2021.9412942
Publication status	Published - 2020
Externally published	Yes
Event	25th International Conference on Pattern Recognition, ICPR 2020 - Virtual, Milan, Italy Duration: 10 Jan 2021 → 15 Jan 2021

Publication series

Name	Proceedings - International Conference on Pattern Recognition
ISSN (Print)	1051-4651

Conference

Conference	25th International Conference on Pattern Recognition, ICPR 2020
Country/Territory	Italy
City	Virtual, Milan
Period	10/01/21 → 15/01/21

ASJC Scopus subject areas

Computer Vision and Pattern Recognition

Access to Document

10.1109/ICPR48806.2021.9412942

Cite this

Song, S., Sanchez, E., Shen, L., & Valstar, M. (2020). Self-supervised learning of dynamic representations for static images. In Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition (pp. 1619-1626). Article 9412942 (Proceedings - International Conference on Pattern Recognition). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR48806.2021.9412942

@inproceedings{190e7557cb3947a8aa605633630d048d,

title = "Self-supervised learning of dynamic representations for static images",

abstract = "Facial actions are spatio-temporal signals by nature, and therefore their modeling is crucially dependent on the availability of temporal information. In this paper, we focus on inferring such temporal dynamics of facial actions when no explicit temporal information is available, i.e. from still images. We present a novel self-supervised learning approach to capture multiple scales of temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation. In particular: 1. We propose a framework that infers a dynamic representation (DR) from a still image, capturing the bi-directional flow of time within a short time-window centered at the input image; 2. We show that the proposed rank loss can apply facial temporal evolution to self-supervise the training process without using target representations, allowing the network to represent dynamics more broadly; 3. We propose a multiple temporal scale approach that infers DRs for different window lengths (MDR) from a still image. We empirically validate the value of our approach on the task of frame ranking, and show how our proposed MDR attains state of the art results on BP4D for AU intensity estimation and on SEMAINE for dimensional affect estimation, using only still images at test time.",

author = "Siyang Song and Enrique Sanchez and Linlin Shen and Michel Valstar",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE; 25th International Conference on Pattern Recognition, ICPR 2020 ; Conference date: 10-01-2021 Through 15-01-2021",

year = "2020",

doi = "10.1109/ICPR48806.2021.9412942",

language = "English",

series = "Proceedings - International Conference on Pattern Recognition",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1619--1626",

booktitle = "Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition",

address = "United States",

}

Song, S, Sanchez, E, Shen, L & Valstar, M 2020, Self-supervised learning of dynamic representations for static images. in Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition., 9412942, Proceedings - International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers Inc., pp. 1619-1626, 25th International Conference on Pattern Recognition, ICPR 2020, Virtual, Milan, Italy, 10/01/21. https://doi.org/10.1109/ICPR48806.2021.9412942

Self-supervised learning of dynamic representations for static images. / Song, Siyang; Sanchez, Enrique; Shen, Linlin et al.
Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2020. p. 1619-1626 9412942 (Proceedings - International Conference on Pattern Recognition).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Self-supervised learning of dynamic representations for static images

AU - Song, Siyang

AU - Sanchez, Enrique

AU - Shen, Linlin

AU - Valstar, Michel

PY - 2020

Y1 - 2020

N2 - Facial actions are spatio-temporal signals by nature, and therefore their modeling is crucially dependent on the availability of temporal information. In this paper, we focus on inferring such temporal dynamics of facial actions when no explicit temporal information is available, i.e. from still images. We present a novel self-supervised learning approach to capture multiple scales of temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation. In particular: 1. We propose a framework that infers a dynamic representation (DR) from a still image, capturing the bi-directional flow of time within a short time-window centered at the input image; 2. We show that the proposed rank loss can apply facial temporal evolution to self-supervise the training process without using target representations, allowing the network to represent dynamics more broadly; 3. We propose a multiple temporal scale approach that infers DRs for different window lengths (MDR) from a still image. We empirically validate the value of our approach on the task of frame ranking, and show how our proposed MDR attains state of the art results on BP4D for AU intensity estimation and on SEMAINE for dimensional affect estimation, using only still images at test time.

AB - Facial actions are spatio-temporal signals by nature, and therefore their modeling is crucially dependent on the availability of temporal information. In this paper, we focus on inferring such temporal dynamics of facial actions when no explicit temporal information is available, i.e. from still images. We present a novel self-supervised learning approach to capture multiple scales of temporal dynamics, with an application to facial Action Unit (AU) intensity estimation and dimensional affect estimation. In particular: 1. We propose a framework that infers a dynamic representation (DR) from a still image, capturing the bi-directional flow of time within a short time-window centered at the input image; 2. We show that the proposed rank loss can apply facial temporal evolution to self-supervise the training process without using target representations, allowing the network to represent dynamics more broadly; 3. We propose a multiple temporal scale approach that infers DRs for different window lengths (MDR) from a still image. We empirically validate the value of our approach on the task of frame ranking, and show how our proposed MDR attains state of the art results on BP4D for AU intensity estimation and on SEMAINE for dimensional affect estimation, using only still images at test time.

UR - http://www.scopus.com/inward/record.url?scp=85110442260&partnerID=8YFLogxK

U2 - 10.1109/ICPR48806.2021.9412942

DO - 10.1109/ICPR48806.2021.9412942

M3 - Conference contribution

AN - SCOPUS:85110442260

T3 - Proceedings - International Conference on Pattern Recognition

SP - 1619

EP - 1626

BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 25th International Conference on Pattern Recognition, ICPR 2020

Y2 - 10 January 2021 through 15 January 2021

ER -

Song S, Sanchez E, Shen L, Valstar M. Self-supervised learning of dynamic representations for static images. In Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc. 2020. p. 1619-1626. 9412942. (Proceedings - International Conference on Pattern Recognition). doi: 10.1109/ICPR48806.2021.9412942

Self-supervised learning of dynamic representations for static images

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this