loading page

Hardware Acceleration of Number Theoretic Transform for zk-SNARK
  • +4
  • Haixu Zhao,
  • Dong Ding,
  • Feng Wang,
  • Pengcheng Hua,
  • Ning Wang,
  • Qin Wu,
  • Zhilei Chai
Haixu Zhao
Jiangnan University
Author Profile
Dong Ding
Jiangnan University
Author Profile
Feng Wang
Jiangnan University
Author Profile
Pengcheng Hua
Jiangnan University
Author Profile
Ning Wang
Jiangnan University
Author Profile
Qin Wu
Jiangnan University
Author Profile
Zhilei Chai
Jiangnan University

Corresponding Author:zlchai@jiangnan.edu.cn

Author Profile

Abstract

Zk-SNARK unleashes the great potential of ZKP (zero-knowledge proof) in the blockchain, distributed storage, etc. However, the proof-generation of zk-SNARK is excessively time intensive, making it a challenge to deploy a high-performance zk-SNARK in most real applications. As a result, NTT (Number Theoretic Transform), one of the most time-consuming parts in proof-generation, needs to be accelerated significantly. To address this issue, we propose a novel and efficient “data reordering” technique to enable a highly pipelined architecture, on which an FPGA-based hardware accelerator is designed to support the large-bitwidth and large-scale NTT tasks in zk-SNARK. Our architecture achieves a two-level pipeline: 1) the top-level pipeline is achieved among smaller NTT sub-tasks, which are decomposed from a large-scale NTT task; 2) the bottom-level pipeline is achieved in each sub-task, among butterfly operations with different step sizes. This architecture can effectively reduce the data dependency and memory access requirements, meanwhile, can be flexibly scaled to different scales of FPGAs. To balance computing efficiency and flexibility, the OpenCL equipped with HLS is used to implement the heterogeneous acceleration system. We prototype the accelerator on the AMD-Xilinx Alveo U50 card (UltraScale+ XCU50 FPGA). The evaluation results show that 1) our accelerator shows high scalability for different scales of FPGAs with a stable performance improvement; 2) it performs 1.95× faster than the one in PipeZK; 3) and it achieves 27.98×, 1.74× speedup and 6.9×, 6× energy efficiency improvement than AMD Ryzen 9 5900X single core and 12 cores respectively when integrated into the well-known ZKP open-source project, Bellman.
29 Nov 2022Submitted to Engineering Reports
30 Nov 2022Submission Checks Completed
30 Nov 2022Assigned to Editor
30 Nov 2022Review(s) Completed, Editorial Evaluation Pending
02 Dec 2022Reviewer(s) Assigned
27 Dec 2022Editorial Decision: Revise Major
01 Jan 20231st Revision Received
03 Jan 2023Submission Checks Completed
03 Jan 2023Assigned to Editor
03 Jan 2023Review(s) Completed, Editorial Evaluation Pending
05 Jan 2023Reviewer(s) Assigned
30 Jan 2023Editorial Decision: Accept