TY - GEN
T1 - Automatic Speech-Based Smoking Status Identification
AU - Ma, Zhizhong
AU - Singh, Satwinder
AU - Qiu, Yuanhang
AU - Hou, Feng
AU - Wang, Ruili
AU - Bullen, Christopher
AU - Chu, Joanna Ting Wai
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker's low-level acoustic features such as fundamental frequency (F0), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F0, jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3%, a relative improvement of 12.7% and 29.8% on the initial audio classification approach and rule-based approach, respectively.
AB - Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker's low-level acoustic features such as fundamental frequency (F0), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F0, jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3%, a relative improvement of 12.7% and 29.8% on the initial audio classification approach and rule-based approach, respectively.
KW - Acoustic features
KW - Smoking status identification
KW - Speech processing
UR - http://www.scopus.com/inward/record.url?scp=85135010492&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-10467-1_11
DO - 10.1007/978-3-031-10467-1_11
M3 - Conference contribution
AN - SCOPUS:85135010492
SN - 9783031104664
T3 - Lecture Notes in Networks and Systems
SP - 193
EP - 203
BT - Intelligent Computing - Proceedings of the 2022 Computing Conference
A2 - Arai, Kohei
PB - Springer Science and Business Media Deutschland GmbH
T2 - Computing Conference, 2022
Y2 - 14 July 2022 through 15 July 2022
ER -