• German
German

Main Navigation

Tsou/Chen/2022a: This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference on ReRAM-based CNN Accelerator

Bibtype Inproceedings
Bibkey Tsou/Chen/2022a
Author Tsou, Yen-Ting and Chen, Kuan-Hsun and Yang, Chia-Lin and Cheng, Hsiang-Yun and Chen, Jian-Jia and Tsai, Der-Yu
Title This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference on ReRAM-based CNN Accelerator
Booktitle 27th Asia and South Pacific Design Automation Conference (ASP-DAC)
Pages 702-707
Publisher IEEE
Abstract Resistive memory-based computing-in-memory (CIM) has been considered as a promising solution to accelerate convolutional neural networks (CNN) inference, which stores the weights in crossbar memory arrays and performs in-situ matrix-vector multiplications (MVMs) in an analog manner. Several techniques assume that a whole crossbar can operate concurrently and discuss how to efficiently map the weights onto crossbar arrays. However, in practice, the accumulated effect of per-cell current deviation and Analog-to-Digital-Converter overhead may greatly degrade inference accuracy, which motivates the concept of Operation Unit (OU), by which an operation per cycle in a crossbar only involve limited wordlines and bitlines to preserve satisfactory inference accuracy. With OU-based operations, the mapping of weights and scheduling strategy for parallelizing CNN convolution operations should take the cost of communication overhead and resource utilization into consideration to optimize the inference acceleration. In this work, we propose the first optimization framework named SPATEM, that efficiently executes MVMs with OU-based operations on ReRAM-based CIM accelerators. It decouples the design space into tractable steps, models the expected inference latency, and derives an optimized spatial-temporal-aware scheduling strategy. By comparing with state-of-the-arts, the experimental result shows that the derived scheduling strategy of SPATEM achieves on average 29.24% inference latency reduction with 31.28% less communication overhead by exploiting more originally unused crossbar cells.
Year 2022
Projekt SFB876-A1
Doi 10.1109/ASP-DAC52403.2022.9712536
Issn 2153-697X
 
Bibtex Here you can get this literature entry as BibTeX format.