Abstract
In recent years, research into automating the recognition and classification of diverse acoustic events in audio recordings has surged. This technological advancement has profound implications for fields such as speech recognition, music information retrieval, and environmental sound monitoring. This study introduces a novel approach to acoustic event classification using a fine-tuned EfficientNet-B0 model. To mitigate overfitting, data augmentation techniques including pitch shifting, time stretching, noise addition, and time shifting are employed, thereby expanding the training dataset. Subsequently, these augmented audio signals undergo Short-Time Fourier Transform (STFT) to generate Log Mel-spectrograms, which are then integrated into the proposed fine-tuned EfficientNet-B0 architecture. Experimental results demonstrate promising performance across diverse settings, achieving validation accuracies of 89.44% and 74.23% on the ESC-10 and ESC- 50 datasets, respectively.
Original language | English |
---|---|
Pages (from-to) | 13-17 |
Number of pages | 5 |
Journal | Proceedings of the IEEE Conference on Systems, Process and Control, ICSPC |
Issue number | 2024 |
DOIs | |
Publication status | Published - 2024 |
Event | 12th IEEE Conference on Systems, Process and Control, ICSPC 2024 - Malacca, Malaysia Duration: 7 Dec 2024 → … |
Keywords
- acoustic event classification
- EfficientNet
- Log Mel-spectrograms
- pitch shifting
- time stretching
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Information Systems
- Information Systems and Management
- Safety, Risk, Reliability and Quality
- Control and Optimization
- Modelling and Simulation
- Education