Delving into the Scale Variance Problem in Object Detection

Junliang Chen; Xiaodong Zhao; Linlin Shen

doi:10.1109/ICTAI52525.2021.00145

Delving into the Scale Variance Problem in Object Detection

Junliang Chen, Xiaodong Zhao, Linlin Shen

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

2 Citations (Scopus)

Abstract

Object detection has made substantial progress in the last decade, due to the capability of convolution in extracting local context of objects. However, the scales of objects are diverse and current convolution can only process single-scale input. The capability of traditional convolution with a fixed receptive field in dealing with such a scale variance problem, is thus limited. Multiscale feature representation has been proven to be an effective way to mitigate the scale variance problem. Recent researches mainly adopt partial connection with certain scales, or aggregate features from all scales and focus on the global information across the scales. However, the information across spatial and depth dimensions is ignored. Inspired by this, we propose the multi-scale convolution (MSConv) to handle this problem. Taking into consideration scale, spatial and depth information at the same time, MSConv is able to process multi-scale input more comprehensively. MSConv is effective and computationally efficient, with only a small increase of computational cost. For most of the single-stage object detectors, replacing the traditional convolutions with MSConvs in the detection head can bring more than 2.5% improvement in AP (on COCO 2017 dataset), with only 3% increase of FLOPs. MSConv is also flexible and effective for two-stage object detectors. When extended to the mainstream two-stage object detectors, MSConv can bring up to 3.0% improvement in AP. Our best model under single-scale testing achieves 48.9% AP on COCO 2017 test-dev split, which surpasses many state-of-the-art methods.

Original language	English
Title of host publication	Proceedings - 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI 2021
Publisher	IEEE Computer Society
Pages	902-909
Number of pages	8
ISBN (Electronic)	9781665408981
DOIs	https://doi.org/10.1109/ICTAI52525.2021.00145
Publication status	Published - 2021
Externally published	Yes
Event	33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021 - Virtual, Online, United States Duration: 1 Nov 2021 → 3 Nov 2021

Publication series

Name	Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Volume	2021-November
ISSN (Print)	1082-3409

Conference

Conference	33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021
Country/Territory	United States
City	Virtual, Online
Period	1/11/21 → 3/11/21

Keywords

multi-scale convolution
object detection
scale variance

ASJC Scopus subject areas

Software
Artificial Intelligence
Computer Science Applications

Access to Document

10.1109/ICTAI52525.2021.00145

Cite this

Chen, J., Zhao, X., & Shen, L. (2021). Delving into the Scale Variance Problem in Object Detection. In Proceedings - 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI 2021 (pp. 902-909). (Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI; Vol. 2021-November). IEEE Computer Society. https://doi.org/10.1109/ICTAI52525.2021.00145

@inproceedings{6141105b578f4c5fbef1be225ee1e0d0,

title = "Delving into the Scale Variance Problem in Object Detection",

abstract = "Object detection has made substantial progress in the last decade, due to the capability of convolution in extracting local context of objects. However, the scales of objects are diverse and current convolution can only process single-scale input. The capability of traditional convolution with a fixed receptive field in dealing with such a scale variance problem, is thus limited. Multiscale feature representation has been proven to be an effective way to mitigate the scale variance problem. Recent researches mainly adopt partial connection with certain scales, or aggregate features from all scales and focus on the global information across the scales. However, the information across spatial and depth dimensions is ignored. Inspired by this, we propose the multi-scale convolution (MSConv) to handle this problem. Taking into consideration scale, spatial and depth information at the same time, MSConv is able to process multi-scale input more comprehensively. MSConv is effective and computationally efficient, with only a small increase of computational cost. For most of the single-stage object detectors, replacing the traditional convolutions with MSConvs in the detection head can bring more than 2.5% improvement in AP (on COCO 2017 dataset), with only 3% increase of FLOPs. MSConv is also flexible and effective for two-stage object detectors. When extended to the mainstream two-stage object detectors, MSConv can bring up to 3.0% improvement in AP. Our best model under single-scale testing achieves 48.9% AP on COCO 2017 test-dev split, which surpasses many state-of-the-art methods.",

keywords = "multi-scale convolution, object detection, scale variance",

author = "Junliang Chen and Xiaodong Zhao and Linlin Shen",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021 ; Conference date: 01-11-2021 Through 03-11-2021",

year = "2021",

doi = "10.1109/ICTAI52525.2021.00145",

language = "English",

series = "Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI",

publisher = "IEEE Computer Society",

pages = "902--909",

booktitle = "Proceedings - 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI 2021",

address = "United States",

}

Chen, J, Zhao, X & Shen, L 2021, Delving into the Scale Variance Problem in Object Detection. in Proceedings - 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI 2021. Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, vol. 2021-November, IEEE Computer Society, pp. 902-909, 33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021, Virtual, Online, United States, 1/11/21. https://doi.org/10.1109/ICTAI52525.2021.00145

Delving into the Scale Variance Problem in Object Detection. / Chen, Junliang; Zhao, Xiaodong; Shen, Linlin.
Proceedings - 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI 2021. IEEE Computer Society, 2021. p. 902-909 (Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI; Vol. 2021-November).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Delving into the Scale Variance Problem in Object Detection

AU - Chen, Junliang

AU - Zhao, Xiaodong

AU - Shen, Linlin

PY - 2021

Y1 - 2021

N2 - Object detection has made substantial progress in the last decade, due to the capability of convolution in extracting local context of objects. However, the scales of objects are diverse and current convolution can only process single-scale input. The capability of traditional convolution with a fixed receptive field in dealing with such a scale variance problem, is thus limited. Multiscale feature representation has been proven to be an effective way to mitigate the scale variance problem. Recent researches mainly adopt partial connection with certain scales, or aggregate features from all scales and focus on the global information across the scales. However, the information across spatial and depth dimensions is ignored. Inspired by this, we propose the multi-scale convolution (MSConv) to handle this problem. Taking into consideration scale, spatial and depth information at the same time, MSConv is able to process multi-scale input more comprehensively. MSConv is effective and computationally efficient, with only a small increase of computational cost. For most of the single-stage object detectors, replacing the traditional convolutions with MSConvs in the detection head can bring more than 2.5% improvement in AP (on COCO 2017 dataset), with only 3% increase of FLOPs. MSConv is also flexible and effective for two-stage object detectors. When extended to the mainstream two-stage object detectors, MSConv can bring up to 3.0% improvement in AP. Our best model under single-scale testing achieves 48.9% AP on COCO 2017 test-dev split, which surpasses many state-of-the-art methods.

AB - Object detection has made substantial progress in the last decade, due to the capability of convolution in extracting local context of objects. However, the scales of objects are diverse and current convolution can only process single-scale input. The capability of traditional convolution with a fixed receptive field in dealing with such a scale variance problem, is thus limited. Multiscale feature representation has been proven to be an effective way to mitigate the scale variance problem. Recent researches mainly adopt partial connection with certain scales, or aggregate features from all scales and focus on the global information across the scales. However, the information across spatial and depth dimensions is ignored. Inspired by this, we propose the multi-scale convolution (MSConv) to handle this problem. Taking into consideration scale, spatial and depth information at the same time, MSConv is able to process multi-scale input more comprehensively. MSConv is effective and computationally efficient, with only a small increase of computational cost. For most of the single-stage object detectors, replacing the traditional convolutions with MSConvs in the detection head can bring more than 2.5% improvement in AP (on COCO 2017 dataset), with only 3% increase of FLOPs. MSConv is also flexible and effective for two-stage object detectors. When extended to the mainstream two-stage object detectors, MSConv can bring up to 3.0% improvement in AP. Our best model under single-scale testing achieves 48.9% AP on COCO 2017 test-dev split, which surpasses many state-of-the-art methods.

KW - multi-scale convolution

KW - object detection

KW - scale variance

UR - http://www.scopus.com/inward/record.url?scp=85123922711&partnerID=8YFLogxK

U2 - 10.1109/ICTAI52525.2021.00145

DO - 10.1109/ICTAI52525.2021.00145

M3 - Conference contribution

AN - SCOPUS:85123922711

T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI

SP - 902

EP - 909

BT - Proceedings - 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI 2021

PB - IEEE Computer Society

T2 - 33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2021

Y2 - 1 November 2021 through 3 November 2021

ER -

Delving into the Scale Variance Problem in Object Detection

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this