An FPGA-based Multi-Core Overlay Processor for Transformer-based Models

Shaoqiang Lu, Tiandong Zhao, Rumin Zhang, Ting Jung Lin, Chen Wu, Lei He

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

Transformer-based models have achieved extensive success with increasingly large numbers of parameters and computations, for which many multi-core accelerators have been developed. Nevertheless, they suffer from limited throughput due to either low operating frequency or high communication overhead between cores. This paper proposes an FPGA-based multi-core overlay processor, MCore-OPU, to optimize intra-core computation and inter-core communication. First, we boost the operating frequency of the processing element (PE) array to dou-ble the rest of the processor to improve the intra-core throughput. Second, we develop on-chip synchronization routers to reduce expensive off-chip memory traffic, where only the partial sum and maximum are communicated between cores rather than entire vectors for layer normalization and softmax. Meanwhile, we optimize the multi-core model allocation and scheduling to minimize the inter-core communications and maximize the intra-core computation efficiency. The MCore-OPU is implemented with four cores and four DDRs on the Xilinx U200 FPGA, where the PE array runs 600MHz, and the rest runs 300MHz. Experimental results show that the MCore-OPU outperforms other FPGA-based accelerators by 1.24x-l.39x and A100 GPU by 5.31x-5.81x in throughput per DSP for BERT, ViT, GPT-2 and LLaMA inference, respectively.

Original languageEnglish
Title of host publication2024 International Symposium of Electronics Design Automation, ISEDA 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages697-702
Number of pages6
ISBN (Electronic)9798350352030
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event2024 International Symposium of Electronics Design Automation, ISEDA 2024 - Xi�an, China
Duration: 10 May 202413 May 2024

Publication series

Name2024 International Symposium of Electronics Design Automation, ISEDA 2024

Conference

Conference2024 International Symposium of Electronics Design Automation, ISEDA 2024
Country/TerritoryChina
CityXi�an
Period10/05/2413/05/24

Keywords

  • FPGA Overlay Processor
  • Multi-Core
  • Synchronization Router
  • Transformer

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Electrical and Electronic Engineering
  • Electronic, Optical and Magnetic Materials
  • Control and Optimization
  • Modelling and Simulation
  • Atomic and Molecular Physics, and Optics

Fingerprint

Dive into the research topics of 'An FPGA-based Multi-Core Overlay Processor for Transformer-based Models'. Together they form a unique fingerprint.

Cite this