Anubha Sehgal -

The quantization of synaptic weights is a powerful technique for reducing the memory footprint and computational complexity of neural networks (NNs), particularly in energy constrained edge and on-chip learning systems. By discretizing continuous weights into low-bit representations, quantization not only minimizes storage but also enables simplified computations, reducing energy and latency during both on-chip training and inference. However, the practical implementation of quantized synaptic weights is constrained by the non-ideal characteristics of memory devices, including a limited number of discrete quantized states, substantial intrinsic device variations, and stochasticity in the synaptic state writing process. This paper presents a quantized triple-level spin-orbit torque magnetic random access memory (SOT-MRAM) based synaptic architecture, specifically tailored for on-chip training and inference of NNs. A detailed device model capturing the non-idealities of triple-level cell (TLC), such as stochasticity and process variations, is developed and integrated into system-level simulations. To support training with low-resolution weights, we develop a custom quantization aware learning algorithm that dynamically adapts to device constraints. The VGG-8 architecture for CIFAR-10 image classification has been simulated by using the extracted synaptic device characteristics. The proposed quantized neural network achieves accuracies of 93.92% during training and 92.12% during inference. The simulation results show a 28.60×, 8.13×, and 3.18× reduction in energy, latency, and area, respectively, compared to complementary metal oxide semiconductor (CMOS)-based floating-point precision synaptic weights.