TY - GEN
T1 - Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action Recognition
AU - Wu, Bizhu
AU - Wu, Mingyan
AU - Ji, Haoqin
AU - Shen, Linlin
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Recently, researchers have achieved significant results in the skeleton based action recognition task. To better model the skeleton sequences, existing methods learned the feature representations in the self-supervised setting by solving pretext tasks, such as predicting the order of a shuffled skeleton sequence or verifying whether a given skeleton sequence is shuffled or not. However, these pretext tasks are either too challenging or too easy for the encoder to obtain a proper skeleton representation for action recognition. Therefore, we propose a novel self-pretraining pretext task, Which One Is Better (WOIB), to identify which one is more temporally coherent, given two shuffled skeleton sequences. Experiments on the NTU RGB+D, NTU RGB+D 120, and Kinetics-Skeleton datasets with different network architectures show significant improvements in recognition accuracy, demonstrating that such a well-designed pretext task is general and able to drive the encoder to learn more discriminative representations.
AB - Recently, researchers have achieved significant results in the skeleton based action recognition task. To better model the skeleton sequences, existing methods learned the feature representations in the self-supervised setting by solving pretext tasks, such as predicting the order of a shuffled skeleton sequence or verifying whether a given skeleton sequence is shuffled or not. However, these pretext tasks are either too challenging or too easy for the encoder to obtain a proper skeleton representation for action recognition. Therefore, we propose a novel self-pretraining pretext task, Which One Is Better (WOIB), to identify which one is more temporally coherent, given two shuffled skeleton sequences. Experiments on the NTU RGB+D, NTU RGB+D 120, and Kinetics-Skeleton datasets with different network architectures show significant improvements in recognition accuracy, demonstrating that such a well-designed pretext task is general and able to drive the encoder to learn more discriminative representations.
UR - http://www.scopus.com/inward/record.url?scp=85147258924&partnerID=8YFLogxK
U2 - 10.1109/IJCB54206.2022.10007979
DO - 10.1109/IJCB54206.2022.10007979
M3 - Conference contribution
AN - SCOPUS:85147258924
T3 - 2022 IEEE International Joint Conference on Biometrics, IJCB 2022
BT - 2022 IEEE International Joint Conference on Biometrics, IJCB 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Joint Conference on Biometrics, IJCB 2022
Y2 - 10 October 2022 through 13 October 2022
ER -