TAIT: One-Shot Full-Integer Lightweight DNN Quantization via Tunable Activation Imbalance Transfer

Weixiong Jiang; Heng Yu; Xinzhe Liu; Hao Sun; Rui Li; Yajun Ha

doi:10.1109/DAC18074.2021.9586109

TAIT: One-Shot Full-Integer Lightweight DNN Quantization via Tunable Activation Imbalance Transfer

Weixiong Jiang, Heng Yu, Xinzhe Liu, Hao Sun, Rui Li, Yajun Ha

School of Computer Science

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

13 Citations (Scopus)

Abstract

Both parameter quantization and depthwise convolution are essential measures to provide high-accuracy, lightweight, and resource-friendly solutions when deploying deep neural networks (DNNs) onto edge-AI devices. However, combining the two methodologies may lead to adverse effects: It either suffers from significant accuracy loss or long finetuning time. Besides, contemporary quantization methods are only selectively applied to weight and activation values but not bias and scaling factor values, making them less practical for ASIC/FPGA accelerators. To solve these issues, we propose a novel quantization framework that is effectively optimized for depthwise convolution networks. We discover that the uniformity of the value range within a tensor can serve as a predictor for the tensor's quantization error. Under the guidance of this predictor, we develop a mechanism called Tunable Activation Imbalance Transfer (TAIT), which tunes the value range uniformity between an activated feature map and its latter weights. Moreover, TAIT fully supports full-integer quantization. We demonstrate TAIT on SkyNet and deploy it on FPGA. Compared to the state-of-the-art, our quantization framework and system design achieve 2.2%+ IoU, 2.4 × speed, and 1.8 × energy efficiency improvements, without any requirement of finetuning.

Original language	English
Title of host publication	2021 58th ACM/IEEE Design Automation Conference, DAC 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1027-1032
Number of pages	6
ISBN (Electronic)	9781665432740
DOIs	https://doi.org/10.1109/DAC18074.2021.9586109
Publication status	Published - 5 Dec 2021
Event	58th ACM/IEEE Design Automation Conference, DAC 2021 - San Francisco, United States Duration: 5 Dec 2021 → 9 Dec 2021

Publication series

Name	Proceedings - Design Automation Conference
Volume	2021-December
ISSN (Print)	0738-100X

Conference

Conference	58th ACM/IEEE Design Automation Conference, DAC 2021
Country/Territory	United States
City	San Francisco
Period	5/12/21 → 9/12/21

ASJC Scopus subject areas

Computer Science Applications
Control and Systems Engineering
Electrical and Electronic Engineering
Modelling and Simulation

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/DAC18074.2021.9586109

Cite this

Jiang, W., Yu, H., Liu, X., Sun, H., Li, R., & Ha, Y. (2021). TAIT: One-Shot Full-Integer Lightweight DNN Quantization via Tunable Activation Imbalance Transfer. In 2021 58th ACM/IEEE Design Automation Conference, DAC 2021 (pp. 1027-1032). (Proceedings - Design Automation Conference; Vol. 2021-December). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DAC18074.2021.9586109

@inproceedings{dc901d5eaaf6424e91086c754586ac7e,

title = "TAIT: One-Shot Full-Integer Lightweight DNN Quantization via Tunable Activation Imbalance Transfer",

abstract = "Both parameter quantization and depthwise convolution are essential measures to provide high-accuracy, lightweight, and resource-friendly solutions when deploying deep neural networks (DNNs) onto edge-AI devices. However, combining the two methodologies may lead to adverse effects: It either suffers from significant accuracy loss or long finetuning time. Besides, contemporary quantization methods are only selectively applied to weight and activation values but not bias and scaling factor values, making them less practical for ASIC/FPGA accelerators. To solve these issues, we propose a novel quantization framework that is effectively optimized for depthwise convolution networks. We discover that the uniformity of the value range within a tensor can serve as a predictor for the tensor's quantization error. Under the guidance of this predictor, we develop a mechanism called Tunable Activation Imbalance Transfer (TAIT), which tunes the value range uniformity between an activated feature map and its latter weights. Moreover, TAIT fully supports full-integer quantization. We demonstrate TAIT on SkyNet and deploy it on FPGA. Compared to the state-of-the-art, our quantization framework and system design achieve 2.2\%+ IoU, 2.4 × speed, and 1.8 × energy efficiency improvements, without any requirement of finetuning.",

author = "Weixiong Jiang and Heng Yu and Xinzhe Liu and Hao Sun and Rui Li and Yajun Ha",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 58th ACM/IEEE Design Automation Conference, DAC 2021 ; Conference date: 05-12-2021 Through 09-12-2021",

year = "2021",

month = dec,

day = "5",

doi = "10.1109/DAC18074.2021.9586109",

language = "English",

series = "Proceedings - Design Automation Conference",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1027--1032",

booktitle = "2021 58th ACM/IEEE Design Automation Conference, DAC 2021",

address = "United States",

}

Jiang, W, Yu, H, Liu, X, Sun, H, Li, R & Ha, Y 2021, TAIT: One-Shot Full-Integer Lightweight DNN Quantization via Tunable Activation Imbalance Transfer. in 2021 58th ACM/IEEE Design Automation Conference, DAC 2021. Proceedings - Design Automation Conference, vol. 2021-December, Institute of Electrical and Electronics Engineers Inc., pp. 1027-1032, 58th ACM/IEEE Design Automation Conference, DAC 2021, San Francisco, United States, 5/12/21. https://doi.org/10.1109/DAC18074.2021.9586109

TAIT: One-Shot Full-Integer Lightweight DNN Quantization via Tunable Activation Imbalance Transfer. / Jiang, Weixiong; Yu, Heng; Liu, Xinzhe et al.
2021 58th ACM/IEEE Design Automation Conference, DAC 2021. Institute of Electrical and Electronics Engineers Inc., 2021. p. 1027-1032 (Proceedings - Design Automation Conference; Vol. 2021-December).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - TAIT

T2 - 58th ACM/IEEE Design Automation Conference, DAC 2021

AU - Jiang, Weixiong

AU - Yu, Heng

AU - Liu, Xinzhe

AU - Sun, Hao

AU - Li, Rui

AU - Ha, Yajun

PY - 2021/12/5

Y1 - 2021/12/5

N2 - Both parameter quantization and depthwise convolution are essential measures to provide high-accuracy, lightweight, and resource-friendly solutions when deploying deep neural networks (DNNs) onto edge-AI devices. However, combining the two methodologies may lead to adverse effects: It either suffers from significant accuracy loss or long finetuning time. Besides, contemporary quantization methods are only selectively applied to weight and activation values but not bias and scaling factor values, making them less practical for ASIC/FPGA accelerators. To solve these issues, we propose a novel quantization framework that is effectively optimized for depthwise convolution networks. We discover that the uniformity of the value range within a tensor can serve as a predictor for the tensor's quantization error. Under the guidance of this predictor, we develop a mechanism called Tunable Activation Imbalance Transfer (TAIT), which tunes the value range uniformity between an activated feature map and its latter weights. Moreover, TAIT fully supports full-integer quantization. We demonstrate TAIT on SkyNet and deploy it on FPGA. Compared to the state-of-the-art, our quantization framework and system design achieve 2.2%+ IoU, 2.4 × speed, and 1.8 × energy efficiency improvements, without any requirement of finetuning.

AB - Both parameter quantization and depthwise convolution are essential measures to provide high-accuracy, lightweight, and resource-friendly solutions when deploying deep neural networks (DNNs) onto edge-AI devices. However, combining the two methodologies may lead to adverse effects: It either suffers from significant accuracy loss or long finetuning time. Besides, contemporary quantization methods are only selectively applied to weight and activation values but not bias and scaling factor values, making them less practical for ASIC/FPGA accelerators. To solve these issues, we propose a novel quantization framework that is effectively optimized for depthwise convolution networks. We discover that the uniformity of the value range within a tensor can serve as a predictor for the tensor's quantization error. Under the guidance of this predictor, we develop a mechanism called Tunable Activation Imbalance Transfer (TAIT), which tunes the value range uniformity between an activated feature map and its latter weights. Moreover, TAIT fully supports full-integer quantization. We demonstrate TAIT on SkyNet and deploy it on FPGA. Compared to the state-of-the-art, our quantization framework and system design achieve 2.2%+ IoU, 2.4 × speed, and 1.8 × energy efficiency improvements, without any requirement of finetuning.

UR - http://www.scopus.com/inward/record.url?scp=85114648944&partnerID=8YFLogxK

U2 - 10.1109/DAC18074.2021.9586109

DO - 10.1109/DAC18074.2021.9586109

M3 - Conference contribution

AN - SCOPUS:85114648944

T3 - Proceedings - Design Automation Conference

SP - 1027

EP - 1032

BT - 2021 58th ACM/IEEE Design Automation Conference, DAC 2021

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 5 December 2021 through 9 December 2021

ER -

TAIT: One-Shot Full-Integer Lightweight DNN Quantization via Tunable Activation Imbalance Transfer

Abstract

Publication series

Conference

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this