Akash Sankhe -

Recently, SRAM-embedded compute-in-memory (CIM) hardware has emerged as a promising solution to mitigate von-neumann bottlenecks and has shown noteworthy improvements in energy efficiency and throughput for matrix-vector multiplication, a significant portion of neural networks. While PVT variations significantly impact traditional analog/mixedsignal (AMS) macros, the DCIM macros are more robust. This article proposes a DCIM macro that incorporates an 8-transistor SRAM bitcell capable of performing 1-bit multiplications and addressing the bit-flip issue arising from the simultaneous activation of multiple array rows. The macro also includes a 2D interleaved adder tree constructed using novel 7T-based ripple carry adder (RCA), significantly reducing the adder tree's area. The proposed 16Kb DCIM macro computes 64 parallel products in a single clock cycle. It demonstrates 2× higher energy efficiency than recent state-of-the-art works at 65nm CMOS. The macro is validated at 250MHz and achieves the classification accuracy of 98.7%, 98.8% for 1A4W precision, and 99.1%, 97.8% for 4A4W precision, for LeNet-5 architecture using MNIST and A-Z alphabet datasets respectively.