Automatic Speech-Based Smoking Status Identification

Zhizhong Ma; Satwinder Singh; Yuanhang Qiu; Feng Hou; Ruili Wang; Christopher Bullen; Joanna Ting Wai Chu

doi:10.1007/978-3-031-10467-1_11

Automatic Speech-Based Smoking Status Identification

Zhizhong Ma, Satwinder Singh, Yuanhang Qiu, Feng Hou, Ruili Wang, Christopher Bullen, Joanna Ting Wai Chu

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

4 Citations (Scopus)

Abstract

Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker's low-level acoustic features such as fundamental frequency (F₀), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F₀, jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3%, a relative improvement of 12.7% and 29.8% on the initial audio classification approach and rule-based approach, respectively.

Original language	English
Title of host publication	Intelligent Computing - Proceedings of the 2022 Computing Conference
Editors	Kohei Arai
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	193-203
Number of pages	11
ISBN (Print)	9783031104664
DOIs	https://doi.org/10.1007/978-3-031-10467-1_11
Publication status	Published - 2022
Externally published	Yes
Event	Computing Conference, 2022 - Virtual, Online Duration: 14 Jul 2022 → 15 Jul 2022

Publication series

Name	Lecture Notes in Networks and Systems
Volume	508 LNNS
ISSN (Print)	2367-3370
ISSN (Electronic)	2367-3389

Conference

Conference	Computing Conference, 2022
City	Virtual, Online
Period	14/07/22 → 15/07/22

Keywords

Acoustic features
Smoking status identification
Speech processing

ASJC Scopus subject areas

Control and Systems Engineering
Signal Processing
Computer Networks and Communications

Access to Document

10.1007/978-3-031-10467-1_11

Cite this

Ma, Z., Singh, S., Qiu, Y., Hou, F., Wang, R., Bullen, C., & Chu, J. T. W. (2022). Automatic Speech-Based Smoking Status Identification. In K. Arai (Ed.), Intelligent Computing - Proceedings of the 2022 Computing Conference (pp. 193-203). (Lecture Notes in Networks and Systems; Vol. 508 LNNS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-10467-1_11

@inproceedings{a6b24710a2254dacb388666d35e01319,

title = "Automatic Speech-Based Smoking Status Identification",

abstract = "Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker's low-level acoustic features such as fundamental frequency (F0), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F0, jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3\%, a relative improvement of 12.7\% and 29.8\% on the initial audio classification approach and rule-based approach, respectively.",

keywords = "Acoustic features, Smoking status identification, Speech processing",

author = "Zhizhong Ma and Satwinder Singh and Yuanhang Qiu and Feng Hou and Ruili Wang and Christopher Bullen and Chu, \{Joanna Ting Wai\}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; Computing Conference, 2022 ; Conference date: 14-07-2022 Through 15-07-2022",

year = "2022",

doi = "10.1007/978-3-031-10467-1\_11",

language = "English",

isbn = "9783031104664",

series = "Lecture Notes in Networks and Systems",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "193--203",

editor = "Kohei Arai",

booktitle = "Intelligent Computing - Proceedings of the 2022 Computing Conference",

address = "Germany",

}

Ma, Z, Singh, S, Qiu, Y, Hou, F, Wang, R, Bullen, C & Chu, JTW 2022, Automatic Speech-Based Smoking Status Identification. in K Arai (ed.), Intelligent Computing - Proceedings of the 2022 Computing Conference. Lecture Notes in Networks and Systems, vol. 508 LNNS, Springer Science and Business Media Deutschland GmbH, pp. 193-203, Computing Conference, 2022, Virtual, Online, 14/07/22. https://doi.org/10.1007/978-3-031-10467-1_11

Automatic Speech-Based Smoking Status Identification. / Ma, Zhizhong; Singh, Satwinder; Qiu, Yuanhang et al.
Intelligent Computing - Proceedings of the 2022 Computing Conference. ed. / Kohei Arai. Springer Science and Business Media Deutschland GmbH, 2022. p. 193-203 (Lecture Notes in Networks and Systems; Vol. 508 LNNS).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Automatic Speech-Based Smoking Status Identification

AU - Ma, Zhizhong

AU - Singh, Satwinder

AU - Qiu, Yuanhang

AU - Hou, Feng

AU - Wang, Ruili

AU - Bullen, Christopher

AU - Chu, Joanna Ting Wai

PY - 2022

Y1 - 2022

N2 - Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker's low-level acoustic features such as fundamental frequency (F0), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F0, jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3%, a relative improvement of 12.7% and 29.8% on the initial audio classification approach and rule-based approach, respectively.

AB - Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker's low-level acoustic features such as fundamental frequency (F0), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F0, jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3%, a relative improvement of 12.7% and 29.8% on the initial audio classification approach and rule-based approach, respectively.

KW - Acoustic features

KW - Smoking status identification

KW - Speech processing

UR - http://www.scopus.com/inward/record.url?scp=85135010492&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-10467-1_11

DO - 10.1007/978-3-031-10467-1_11

M3 - Conference contribution

AN - SCOPUS:85135010492

SN - 9783031104664

T3 - Lecture Notes in Networks and Systems

SP - 193

EP - 203

BT - Intelligent Computing - Proceedings of the 2022 Computing Conference

A2 - Arai, Kohei

PB - Springer Science and Business Media Deutschland GmbH

T2 - Computing Conference, 2022

Y2 - 14 July 2022 through 15 July 2022

ER -

Automatic Speech-Based Smoking Status Identification

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this