Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition

Bizhu Wu; Mingyan Wu; Haoqin Ji; Linlin Shen

doi:10.1109/IJCB54206.2022.10007979

Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition

Bizhu Wu, Mingyan Wu, Haoqin Ji, Linlin Shen

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

Abstract

Recently, researchers have achieved significant results in the skeleton based action recognition task. To better model the skeleton sequences, existing methods learned the feature representations in the self-supervised setting by solving pretext tasks, such as predicting the order of a shuffled skeleton sequence or verifying whether a given skeleton sequence is shuffled or not. However, these pretext tasks are either too challenging or too easy for the encoder to obtain a proper skeleton representation for action recognition. Therefore, we propose a novel self-pretraining pretext task, Which One Is Better (WOIB), to identify which one is more temporally coherent, given two shuffled skeleton sequences. Experiments on the NTU RGB+D, NTU RGB+D 120, and Kinetics-Skeleton datasets with different network architectures show significant improvements in recognition accuracy, demonstrating that such a well-designed pretext task is general and able to drive the encoder to learn more discriminative representations.

Original language	English
Title of host publication	2022 IEEE International Joint Conference on Biometrics, IJCB 2022
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781665463942
DOIs	https://doi.org/10.1109/IJCB54206.2022.10007979
Publication status	Published - 2022
Externally published	Yes
Event	2022 IEEE International Joint Conference on Biometrics, IJCB 2022 - Abu Dhabi, United Arab Emirates Duration: 10 Oct 2022 → 13 Oct 2022

Publication series

Name	2022 IEEE International Joint Conference on Biometrics, IJCB 2022

Conference

Conference	2022 IEEE International Joint Conference on Biometrics, IJCB 2022
Country/Territory	United Arab Emirates
City	Abu Dhabi
Period	10/10/22 → 13/10/22

ASJC Scopus subject areas

Agricultural and Biological Sciences (miscellaneous)
Computer Vision and Pattern Recognition
Health Informatics
Instrumentation

Access to Document

10.1109/IJCB54206.2022.10007979

Cite this

Wu, B., Wu, M., Ji, H., & Shen, L. (2022). Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition. In 2022 IEEE International Joint Conference on Biometrics, IJCB 2022 (2022 IEEE International Joint Conference on Biometrics, IJCB 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCB54206.2022.10007979

@inproceedings{9619a79f502d4e639a6f0be7b6513240,

title = "Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition",

abstract = "Recently, researchers have achieved significant results in the skeleton based action recognition task. To better model the skeleton sequences, existing methods learned the feature representations in the self-supervised setting by solving pretext tasks, such as predicting the order of a shuffled skeleton sequence or verifying whether a given skeleton sequence is shuffled or not. However, these pretext tasks are either too challenging or too easy for the encoder to obtain a proper skeleton representation for action recognition. Therefore, we propose a novel self-pretraining pretext task, Which One Is Better (WOIB), to identify which one is more temporally coherent, given two shuffled skeleton sequences. Experiments on the NTU RGB+D, NTU RGB+D 120, and Kinetics-Skeleton datasets with different network architectures show significant improvements in recognition accuracy, demonstrating that such a well-designed pretext task is general and able to drive the encoder to learn more discriminative representations.",

author = "Bizhu Wu and Mingyan Wu and Haoqin Ji and Linlin Shen",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 IEEE International Joint Conference on Biometrics, IJCB 2022 ; Conference date: 10-10-2022 Through 13-10-2022",

year = "2022",

doi = "10.1109/IJCB54206.2022.10007979",

language = "English",

series = "2022 IEEE International Joint Conference on Biometrics, IJCB 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2022 IEEE International Joint Conference on Biometrics, IJCB 2022",

address = "United States",

}

Wu, B, Wu, M, Ji, H & Shen, L 2022, Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition. in 2022 IEEE International Joint Conference on Biometrics, IJCB 2022. 2022 IEEE International Joint Conference on Biometrics, IJCB 2022, Institute of Electrical and Electronics Engineers Inc., 2022 IEEE International Joint Conference on Biometrics, IJCB 2022, Abu Dhabi, United Arab Emirates, 10/10/22. https://doi.org/10.1109/IJCB54206.2022.10007979

Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition. / Wu, Bizhu; Wu, Mingyan; Ji, Haoqin et al.
2022 IEEE International Joint Conference on Biometrics, IJCB 2022. Institute of Electrical and Electronics Engineers Inc., 2022. (2022 IEEE International Joint Conference on Biometrics, IJCB 2022).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition

AU - Wu, Bizhu

AU - Wu, Mingyan

AU - Ji, Haoqin

AU - Shen, Linlin

PY - 2022

Y1 - 2022

N2 - Recently, researchers have achieved significant results in the skeleton based action recognition task. To better model the skeleton sequences, existing methods learned the feature representations in the self-supervised setting by solving pretext tasks, such as predicting the order of a shuffled skeleton sequence or verifying whether a given skeleton sequence is shuffled or not. However, these pretext tasks are either too challenging or too easy for the encoder to obtain a proper skeleton representation for action recognition. Therefore, we propose a novel self-pretraining pretext task, Which One Is Better (WOIB), to identify which one is more temporally coherent, given two shuffled skeleton sequences. Experiments on the NTU RGB+D, NTU RGB+D 120, and Kinetics-Skeleton datasets with different network architectures show significant improvements in recognition accuracy, demonstrating that such a well-designed pretext task is general and able to drive the encoder to learn more discriminative representations.

AB - Recently, researchers have achieved significant results in the skeleton based action recognition task. To better model the skeleton sequences, existing methods learned the feature representations in the self-supervised setting by solving pretext tasks, such as predicting the order of a shuffled skeleton sequence or verifying whether a given skeleton sequence is shuffled or not. However, these pretext tasks are either too challenging or too easy for the encoder to obtain a proper skeleton representation for action recognition. Therefore, we propose a novel self-pretraining pretext task, Which One Is Better (WOIB), to identify which one is more temporally coherent, given two shuffled skeleton sequences. Experiments on the NTU RGB+D, NTU RGB+D 120, and Kinetics-Skeleton datasets with different network architectures show significant improvements in recognition accuracy, demonstrating that such a well-designed pretext task is general and able to drive the encoder to learn more discriminative representations.

UR - http://www.scopus.com/inward/record.url?scp=85147258924&partnerID=8YFLogxK

U2 - 10.1109/IJCB54206.2022.10007979

DO - 10.1109/IJCB54206.2022.10007979

M3 - Conference contribution

AN - SCOPUS:85147258924

T3 - 2022 IEEE International Joint Conference on Biometrics, IJCB 2022

BT - 2022 IEEE International Joint Conference on Biometrics, IJCB 2022

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 IEEE International Joint Conference on Biometrics, IJCB 2022

Y2 - 10 October 2022 through 13 October 2022

ER -

Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this