To support quantized neural networks in low-end CPUs, we propose STAR MAC, a reconfigurable multiply-and-accumulate unit based on a modified Baugh-Wooley architecture that operates at a variable reduced precision. We integrated it in a small RISC-V processor called Ibex obtaining an acceleration up to 5.8 in Fully-Connected (FC) layers, 3.7 in 2D-Convolution (2DConv) layers, and 2.8 in Depth-Wise Convolution (DWConv) layers, with respect to the original Ibex core (Orig.), and up to 4.5 in FC layers, 3.0 in 2DConv layers, and 2.3 in DWConv layers, against a modified Ibex core supporting standard 32-bit MAC operations (Orig.+MAC). Area and power in a 28-nm technology with 200 and 600 MHz target clock frequency are 0.015 and 0.017 mm, and 1.5 and 4.3 mW, respectively, with a limited overhead within 10% and 3% with respect to Orig., and within 3% and 3% against Orig.+MAC.
Accelerating Quantized DNN Layers on RISC-V with a STAR MAC Unit
Urbinati, Luca
Secondo
;
2024
Abstract
To support quantized neural networks in low-end CPUs, we propose STAR MAC, a reconfigurable multiply-and-accumulate unit based on a modified Baugh-Wooley architecture that operates at a variable reduced precision. We integrated it in a small RISC-V processor called Ibex obtaining an acceleration up to 5.8 in Fully-Connected (FC) layers, 3.7 in 2D-Convolution (2DConv) layers, and 2.8 in Depth-Wise Convolution (DWConv) layers, with respect to the original Ibex core (Orig.), and up to 4.5 in FC layers, 3.0 in 2DConv layers, and 2.3 in DWConv layers, against a modified Ibex core supporting standard 32-bit MAC operations (Orig.+MAC). Area and power in a 28-nm technology with 200 and 600 MHz target clock frequency are 0.015 and 0.017 mm, and 1.5 and 4.3 mW, respectively, with a limited overhead within 10% and 3% with respect to Orig., and within 3% and 3% against Orig.+MAC.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


