MambaOPU: An FPGA Overlay Processor for State-space-duality-based Mamba Models

Shaoqiang Lu, Xuliang Yu, Tiandong Zhao, Siyuan Miao, Xinsong Sheng, Chen Wu, Liang Zhao, Ting Jung Lin, Lei He

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

State-space models (SSMs), such as Mamba, have emerged as a promising alternative to Transformers. However, the recently developed Mamba2, based on state space duality (SSD), is highly memorybound and suffers from limited computation efficiency. This inefficiency arises from its irregular broadcast element-wise multiplications and structured sparse computations. In this work, we propose MambaOPU, an FPGA overlay processor, to accelerate SSD. First, to reduce memory overhead, we introduce a software-hardware co-optimized operator fusion framework. Specifically, operator merging combines adjacent broadcast multiplication and summation operations into a single descriptor, while operator backward shifting embeds segment multiplication into subsequent operations. Both techniques shorten the computation path and improve computation efficiency. Second, to enhance sparse computation efficiency, we skip zero-region computations using a tensor-reorder-and-group algorithm combined with a sparse-predefined data fetcher. Additionally, since Mamba integrates linear operations with SSD, we develop a reconfigurable systolic array to improve data reuse across different computation modes. Extensive experiment results demonstrate that MambaOPU achieves up to 1812 × and 880.79 × higher normalized throughput and up to 12908 × and 24.27 × higher energy efficiency over Intel Xeon Gold 6348 CPU and NVIDIA A100 GPU, respectively.

Original languageEnglish
Title of host publication2025 62nd ACM/IEEE Design Automation Conference, DAC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331503048
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event62nd ACM/IEEE Design Automation Conference, DAC 2025 - San Francisco, United States
Duration: 22 Jun 202525 Jun 2025

Publication series

NameProceedings - Design Automation Conference
ISSN (Print)0738-100X

Conference

Conference62nd ACM/IEEE Design Automation Conference, DAC 2025
Country/TerritoryUnited States
CitySan Francisco
Period22/06/2525/06/25

ASJC Scopus subject areas

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'MambaOPU: An FPGA Overlay Processor for State-space-duality-based Mamba Models'. Together they form a unique fingerprint.

Cite this