Hand gesture recognition via enhanced densely connected convolutional neural network

Yong Soon Tan; Kian Ming Lim; Chin Poo Lee

doi:10.1016/j.eswa.2021.114797

Hand gesture recognition via enhanced densely connected convolutional neural network

Yong Soon Tan, Kian Ming Lim, Chin Poo Lee

Research output: Journal Publication › Article › peer-review

92 Citations (Scopus)

Abstract

Hand gesture recognition (HGR) serves as a fundamental way of communication and interaction for human being. While HGR can be applied in human computer interaction (HCI) to facilitate user interaction, it can also be utilized for bridging the language barrier. For instance, HGR can be utilized to recognize sign language, which is a visual language represented by hand gestures and used by the deaf and mute all over the world as a primary way of communication. Hand-crafted approach for vision-based HGR typically involves multiple stages of specialized processing, such as hand-crafted feature extraction methods, which are usually designed to deal with particular challenges specifically. Hence, the effectiveness of the system and its ability to deal with varied challenges across multiple datasets are heavily reliant on the methods being utilized. In contrast, deep learning approach such as convolutional neural network (CNN), adapts to varied challenges via supervised learning. However, attaining satisfactory generalization on unseen data is not only dependent on the architecture of the CNN, but also dependent on the quantity and variety of the training data. Therefore, a customized network architecture dubbed as enhanced densely connected convolutional neural network (EDenseNet) is proposed for vision-based hand gesture recognition. The modified transition layer in EDenseNet further strengthens feature propagation, by utilizing bottleneck layer to propagate the features being reused to all the feature maps in a bottleneck manner, and the following Conv layer smooths out the unwanted features. Differences between EDenseNet and DenseNet are discerned, and its performance gains are scrutinized in the ablation study. Furthermore, numerous data augmentation techniques are utilized to attenuate the effect of data scarcity, by increasing the quantity of training data, and enriching its variety to further improve generalization. Experiments have been carried out on multiple datasets, namely one NUS hand gesture dataset and two American Sign Language (ASL) datasets. The proposed EDenseNet obtains 98.50% average accuracy without augmented data, and 99.64% average accuracy with augmented data, outperforming other deep learning driven instances in both settings, with and without augmented data.

Original language	English
Article number	114797
Journal	Expert Systems with Applications
Volume	175
DOIs	https://doi.org/10.1016/j.eswa.2021.114797
Publication status	Published - 1 Aug 2021
Externally published	Yes

Keywords

Convolutional neural network (CNN)
Enhanced densely connected convolutional neural network (EDenseNet)
Hand gesture recognition
Sign language recognition

ASJC Scopus subject areas

General Engineering
Computer Science Applications
Artificial Intelligence

Access to Document

10.1016/j.eswa.2021.114797

Cite this

@article{65651db19a5a4e7397c3a914d4111a59,

title = "Hand gesture recognition via enhanced densely connected convolutional neural network",

abstract = "Hand gesture recognition (HGR) serves as a fundamental way of communication and interaction for human being. While HGR can be applied in human computer interaction (HCI) to facilitate user interaction, it can also be utilized for bridging the language barrier. For instance, HGR can be utilized to recognize sign language, which is a visual language represented by hand gestures and used by the deaf and mute all over the world as a primary way of communication. Hand-crafted approach for vision-based HGR typically involves multiple stages of specialized processing, such as hand-crafted feature extraction methods, which are usually designed to deal with particular challenges specifically. Hence, the effectiveness of the system and its ability to deal with varied challenges across multiple datasets are heavily reliant on the methods being utilized. In contrast, deep learning approach such as convolutional neural network (CNN), adapts to varied challenges via supervised learning. However, attaining satisfactory generalization on unseen data is not only dependent on the architecture of the CNN, but also dependent on the quantity and variety of the training data. Therefore, a customized network architecture dubbed as enhanced densely connected convolutional neural network (EDenseNet) is proposed for vision-based hand gesture recognition. The modified transition layer in EDenseNet further strengthens feature propagation, by utilizing bottleneck layer to propagate the features being reused to all the feature maps in a bottleneck manner, and the following Conv layer smooths out the unwanted features. Differences between EDenseNet and DenseNet are discerned, and its performance gains are scrutinized in the ablation study. Furthermore, numerous data augmentation techniques are utilized to attenuate the effect of data scarcity, by increasing the quantity of training data, and enriching its variety to further improve generalization. Experiments have been carried out on multiple datasets, namely one NUS hand gesture dataset and two American Sign Language (ASL) datasets. The proposed EDenseNet obtains 98.50% average accuracy without augmented data, and 99.64% average accuracy with augmented data, outperforming other deep learning driven instances in both settings, with and without augmented data.",

keywords = "Convolutional neural network (CNN), Enhanced densely connected convolutional neural network (EDenseNet), Hand gesture recognition, Sign language recognition",

author = "Tan, {Yong Soon} and Lim, {Kian Ming} and Lee, {Chin Poo}",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier Ltd",

year = "2021",

month = aug,

day = "1",

doi = "10.1016/j.eswa.2021.114797",

language = "English",

volume = "175",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Hand gesture recognition via enhanced densely connected convolutional neural network

AU - Tan, Yong Soon

AU - Lim, Kian Ming

AU - Lee, Chin Poo

PY - 2021/8/1

Y1 - 2021/8/1

N2 - Hand gesture recognition (HGR) serves as a fundamental way of communication and interaction for human being. While HGR can be applied in human computer interaction (HCI) to facilitate user interaction, it can also be utilized for bridging the language barrier. For instance, HGR can be utilized to recognize sign language, which is a visual language represented by hand gestures and used by the deaf and mute all over the world as a primary way of communication. Hand-crafted approach for vision-based HGR typically involves multiple stages of specialized processing, such as hand-crafted feature extraction methods, which are usually designed to deal with particular challenges specifically. Hence, the effectiveness of the system and its ability to deal with varied challenges across multiple datasets are heavily reliant on the methods being utilized. In contrast, deep learning approach such as convolutional neural network (CNN), adapts to varied challenges via supervised learning. However, attaining satisfactory generalization on unseen data is not only dependent on the architecture of the CNN, but also dependent on the quantity and variety of the training data. Therefore, a customized network architecture dubbed as enhanced densely connected convolutional neural network (EDenseNet) is proposed for vision-based hand gesture recognition. The modified transition layer in EDenseNet further strengthens feature propagation, by utilizing bottleneck layer to propagate the features being reused to all the feature maps in a bottleneck manner, and the following Conv layer smooths out the unwanted features. Differences between EDenseNet and DenseNet are discerned, and its performance gains are scrutinized in the ablation study. Furthermore, numerous data augmentation techniques are utilized to attenuate the effect of data scarcity, by increasing the quantity of training data, and enriching its variety to further improve generalization. Experiments have been carried out on multiple datasets, namely one NUS hand gesture dataset and two American Sign Language (ASL) datasets. The proposed EDenseNet obtains 98.50% average accuracy without augmented data, and 99.64% average accuracy with augmented data, outperforming other deep learning driven instances in both settings, with and without augmented data.

AB - Hand gesture recognition (HGR) serves as a fundamental way of communication and interaction for human being. While HGR can be applied in human computer interaction (HCI) to facilitate user interaction, it can also be utilized for bridging the language barrier. For instance, HGR can be utilized to recognize sign language, which is a visual language represented by hand gestures and used by the deaf and mute all over the world as a primary way of communication. Hand-crafted approach for vision-based HGR typically involves multiple stages of specialized processing, such as hand-crafted feature extraction methods, which are usually designed to deal with particular challenges specifically. Hence, the effectiveness of the system and its ability to deal with varied challenges across multiple datasets are heavily reliant on the methods being utilized. In contrast, deep learning approach such as convolutional neural network (CNN), adapts to varied challenges via supervised learning. However, attaining satisfactory generalization on unseen data is not only dependent on the architecture of the CNN, but also dependent on the quantity and variety of the training data. Therefore, a customized network architecture dubbed as enhanced densely connected convolutional neural network (EDenseNet) is proposed for vision-based hand gesture recognition. The modified transition layer in EDenseNet further strengthens feature propagation, by utilizing bottleneck layer to propagate the features being reused to all the feature maps in a bottleneck manner, and the following Conv layer smooths out the unwanted features. Differences between EDenseNet and DenseNet are discerned, and its performance gains are scrutinized in the ablation study. Furthermore, numerous data augmentation techniques are utilized to attenuate the effect of data scarcity, by increasing the quantity of training data, and enriching its variety to further improve generalization. Experiments have been carried out on multiple datasets, namely one NUS hand gesture dataset and two American Sign Language (ASL) datasets. The proposed EDenseNet obtains 98.50% average accuracy without augmented data, and 99.64% average accuracy with augmented data, outperforming other deep learning driven instances in both settings, with and without augmented data.

KW - Convolutional neural network (CNN)

KW - Enhanced densely connected convolutional neural network (EDenseNet)

KW - Hand gesture recognition

KW - Sign language recognition

UR - http://www.scopus.com/inward/record.url?scp=85103100073&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2021.114797

DO - 10.1016/j.eswa.2021.114797

M3 - Article

AN - SCOPUS:85103100073

SN - 0957-4174

VL - 175

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 114797

ER -

Hand gesture recognition via enhanced densely connected convolutional neural network

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this