In recent years, convolutional neural networks (CNNs) have become the core of many artificial intelligence applications, especially in fields such as image recognition and speech recognition. Deploying convolutional neural networks in hardware, as opposed to software, can increase speed and reduce power consumption. In this article, we propose an FPGA-based convolutional neural network acceleration system. This system optimizes LeNet-5 into a lightweight convolutional neural network model by replacing traditional convolution with depthwise separable convolutions and reducing the number of fully connected layers. After designing a parallel processing scheme for the computation process of the model, a CNN acceleration IP core is implemented using Verilog and applied to real-time handwritten digit recognition. This system can recognize one frame of image in 326.24μs, which is approximately 1s faster than CPU recognition. The total power consumption of the entire system is 1.947W, meeting the requirements of high real-time performance and low power consumption.