TY - GEN
T1 - Edge deep learning for low capabilities devices
AU - Filippou, Fotios
AU - Foukalas, Fotis
AU - Tsiftsis, Theodoros
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Nowadays, Deep Learning (DL) is being used to construct many applications in domains such as object detection, image classification, speech to text etc. Deep Neural Networks (DNNs) are the core of Deep Learning as they offer remarkable accuracy and performance across various tasks. Despite their powerful capabilities, DNNs often require substantial computational resources, which can be challenging to manage, especially when deploying them on edge devices. So, these models have to be optimized before being deployed to these devices. Optimizing a model means making it smaller and more efficient without losing too much performance. Even though techniques like pruning reduce the number of parameters, the goal is to keep accuracy and speed as close as possible to the original. We are going to present a hybrid solution combining two techniques, pruning and quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the model is quantized by converting the weights of the remaining parameters from 32 floating point precision to half or to INT8. We verify and validate the performance of this hybrid approach using the COCO dataset (contains 80 classes) and the pre-trained YOLOv8 model. At the final stage, the hybrid model is deployed on two different edge devices for benchmarking, the NVIDIA Jetson Nano (4GB) and the Raspberry Pi 5 (16GB).
AB - Nowadays, Deep Learning (DL) is being used to construct many applications in domains such as object detection, image classification, speech to text etc. Deep Neural Networks (DNNs) are the core of Deep Learning as they offer remarkable accuracy and performance across various tasks. Despite their powerful capabilities, DNNs often require substantial computational resources, which can be challenging to manage, especially when deploying them on edge devices. So, these models have to be optimized before being deployed to these devices. Optimizing a model means making it smaller and more efficient without losing too much performance. Even though techniques like pruning reduce the number of parameters, the goal is to keep accuracy and speed as close as possible to the original. We are going to present a hybrid solution combining two techniques, pruning and quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the model is quantized by converting the weights of the remaining parameters from 32 floating point precision to half or to INT8. We verify and validate the performance of this hybrid approach using the COCO dataset (contains 80 classes) and the pre-trained YOLOv8 model. At the final stage, the hybrid model is deployed on two different edge devices for benchmarking, the NVIDIA Jetson Nano (4GB) and the Raspberry Pi 5 (16GB).
KW - deep learning
KW - edge devices
KW - object detection
KW - pruning
KW - quantization
KW - YOLO
UR - https://www.scopus.com/pages/publications/105022593008
U2 - 10.1109/BALKANCOM65827.2025.11185953
DO - 10.1109/BALKANCOM65827.2025.11185953
M3 - Conference contribution
AN - SCOPUS:105022593008
T3 - Balkancom 2025 - 8th International Balkan Conference on Communications and Networking: Empowering Connections, Enabling Innovation
BT - Balkancom 2025 - 8th International Balkan Conference on Communications and Networking
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Balkan Conference on Communications and Networking, Balkancom 2025
Y2 - 17 June 2025 through 20 June 2025
ER -