Abstract
Insufficient training in firefighting techniques increases the risk of injuries and fatalities among firefighters. Human activity recognition methods show promising potential for performance monitoring and evaluation. However, the present studies focus mainly on individual modalities. This limited scope presents challenges in effectively distinguishing between intricate tasks, such as those encountered in firefighting operations. This study introduces an innovative multimodal decision-fusion network designed to overcome this limitation. This is achieved by integrating vision data sourced from three distinct cameras and sensor data collected from four wearable devices. The proposed network combines a vision-focused Video Swin network with a sensor-driven Sensor Transformer network where the results show that the use of only vision-based methods is insufficient to accurately classify firefighting training activities. The proposed decision-fusion network improves classification with a mean F1-score of 95.73%, outperformed the existing hybrid machine learning network.
| Original language | English |
|---|---|
| Article number | 6011704 |
| Journal | IEEE Sensors Letters |
| Volume | 9 |
| Issue number | 11 |
| DOIs | |
| Publication status | Published Online - Sept 2025 |
Free Keywords
- activity recognition
- deep learning
- Multimodal fusion
- sensor fusion
- transformer
- video processing
ASJC Scopus subject areas
- Instrumentation
- Electrical and Electronic Engineering