Learning efficient detector with semi-supervised adaptive distillation

Shitao Tang; Litong Feng; Wenqi Shao; Zhanghui Kuang; Wayne Zhang; Zheng Lu

Learning efficient detector with semi-supervised adaptive distillation

Shitao Tang, Litong Feng, Wenqi Shao, Zhanghui Kuang, Wayne Zhang, Zheng Lu

School of Computer Science

Research output: Contribution to conference › Paper › peer-review

3 Citations (Scopus)

Abstract

Convolutional Neural Networks based object detection techniques produce accurate results but often time consuming. Knowledge distillation has been popular for model compression to speed up. In this paper, we propose a Semi-supervised Adaptive Distillation (SAD) framework to accelerate single-stage detectors while still improving the overall accuracy. We introduce our Adaptive Distillation Loss (ADL) that enables student model to mimic teacher's logits adaptively with more attention paid on two types of hard samples, hard-to-learn samples predicted by teacher model with low certainty and hard-to-mimic samples with a large gap between the teacher's and the student's prediction. We then show that student model can be improved further in the semi-supervised setting with the help of ADL. Our experiments validate that for distillation on unlabeled data. ADL achieves better performance than existing data distillation using both soft and hard targets. On the COCO database, SAD makes a student detector with a backbone of ResNet-50 out-perform its teacher with a backbone of ResNet-101, while the student has half of the teacher's computation complexity.

Original language	English
Publication status	Published - 2020
Event	30th British Machine Vision Conference, BMVC 2019 - Cardiff, United Kingdom Duration: 9 Sept 2019 → 12 Sept 2019

Conference

Conference	30th British Machine Vision Conference, BMVC 2019
Country/Territory	United Kingdom
City	Cardiff
Period	9/09/19 → 12/09/19

ASJC Scopus subject areas

Computer Vision and Pattern Recognition

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Cite this

@conference{06724e7a7e1e446db62b999dfb2bdb1f,

title = "Learning efficient detector with semi-supervised adaptive distillation",

abstract = "Convolutional Neural Networks based object detection techniques produce accurate results but often time consuming. Knowledge distillation has been popular for model compression to speed up. In this paper, we propose a Semi-supervised Adaptive Distillation (SAD) framework to accelerate single-stage detectors while still improving the overall accuracy. We introduce our Adaptive Distillation Loss (ADL) that enables student model to mimic teacher's logits adaptively with more attention paid on two types of hard samples, hard-to-learn samples predicted by teacher model with low certainty and hard-to-mimic samples with a large gap between the teacher's and the student's prediction. We then show that student model can be improved further in the semi-supervised setting with the help of ADL. Our experiments validate that for distillation on unlabeled data. ADL achieves better performance than existing data distillation using both soft and hard targets. On the COCO database, SAD makes a student detector with a backbone of ResNet-50 out-perform its teacher with a backbone of ResNet-101, while the student has half of the teacher's computation complexity.",

author = "Shitao Tang and Litong Feng and Wenqi Shao and Zhanghui Kuang and Wayne Zhang and Zheng Lu",

note = "Publisher Copyright: {\textcopyright} 2019. The copyright of this document resides with its authors.; 30th British Machine Vision Conference, BMVC 2019 ; Conference date: 09-09-2019 Through 12-09-2019",

year = "2020",

language = "English",

}

TY - CONF

T1 - Learning efficient detector with semi-supervised adaptive distillation

AU - Tang, Shitao

AU - Feng, Litong

AU - Shao, Wenqi

AU - Kuang, Zhanghui

AU - Zhang, Wayne

AU - Lu, Zheng

PY - 2020

Y1 - 2020

N2 - Convolutional Neural Networks based object detection techniques produce accurate results but often time consuming. Knowledge distillation has been popular for model compression to speed up. In this paper, we propose a Semi-supervised Adaptive Distillation (SAD) framework to accelerate single-stage detectors while still improving the overall accuracy. We introduce our Adaptive Distillation Loss (ADL) that enables student model to mimic teacher's logits adaptively with more attention paid on two types of hard samples, hard-to-learn samples predicted by teacher model with low certainty and hard-to-mimic samples with a large gap between the teacher's and the student's prediction. We then show that student model can be improved further in the semi-supervised setting with the help of ADL. Our experiments validate that for distillation on unlabeled data. ADL achieves better performance than existing data distillation using both soft and hard targets. On the COCO database, SAD makes a student detector with a backbone of ResNet-50 out-perform its teacher with a backbone of ResNet-101, while the student has half of the teacher's computation complexity.

AB - Convolutional Neural Networks based object detection techniques produce accurate results but often time consuming. Knowledge distillation has been popular for model compression to speed up. In this paper, we propose a Semi-supervised Adaptive Distillation (SAD) framework to accelerate single-stage detectors while still improving the overall accuracy. We introduce our Adaptive Distillation Loss (ADL) that enables student model to mimic teacher's logits adaptively with more attention paid on two types of hard samples, hard-to-learn samples predicted by teacher model with low certainty and hard-to-mimic samples with a large gap between the teacher's and the student's prediction. We then show that student model can be improved further in the semi-supervised setting with the help of ADL. Our experiments validate that for distillation on unlabeled data. ADL achieves better performance than existing data distillation using both soft and hard targets. On the COCO database, SAD makes a student detector with a backbone of ResNet-50 out-perform its teacher with a backbone of ResNet-101, while the student has half of the teacher's computation complexity.

UR - http://www.scopus.com/inward/record.url?scp=85087328775&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85087328775

T2 - 30th British Machine Vision Conference, BMVC 2019

Y2 - 9 September 2019 through 12 September 2019

ER -

Learning efficient detector with semi-supervised adaptive distillation

Abstract

Conference

ASJC Scopus subject areas

UN SDGs

Other files and links

Fingerprint

Cite this