Zero Knowledge (ZK) technology is a fundamental building block for decentralized computing. Its two main applications are privacy-preserving computation and verifiable computation.
For specific types of ZK such as SNARK and STARK-based systems, additional properties include public verifiability, smaller proof sizes, and fast verification, making these kinds of ZK perfect for use in blockchains for scalability and privacy purposes.
At present, ZK tech is being developed and used by leading blockchains (e.g. Filecoin, Aleo), L2s (e.g. Starknet, zkSync) and decentralized applications (e.g. Dark Forest, ZKAttestor).
Unfortunately, nothing great ever comes easily. The Prover, responsible for generating the proof, must run a computationally intensive algorithm with significant data blowup during the computation. Recent estimates suggest a factor of up to 10 million in prover overhead while producing the proof, compared to directly running the computation.
Today, prover overhead is considered the main computational bottleneck for applied ZK. Without exception, every project built on ZK technology is facing or will face this bottleneck, which manifests adversely in either latency, throughput, memory, power consumption, or cost.
A unique property of Zero Knowledge computation is that it runs modular arithmetic under the hood, on enormous field sizes. This requirement and its trials on CPUs lead to the conclusion that modern CPU architecture is simply not built to handle this form of computation efficiently.
As a result, the need for specialized hardware ZK prover acceleration is clear.
Until now, the majority of hardware experimentation for accelerating ZK has been done with GPUs. In our recent paper, PipeMSM, we explored bringing functioning and operational ZK for the first time to FPGAs. We believe this to be a superior approach for accelerating ZK computation on the path to developing Zero Knowledge for Application Specific Integrated Circuits (ASICs).
It is our belief that the energy efficiency in FPGAs is more suitable for ZK due to their function-specific design compared with GPU’s narrower flexibility. With FPGAs at the base, we take a holistic approach to ZK optimization with designs based on a novel algorithmic approach and hardware-specific optimizations.
In the paper, we use the Parallel Bucket Method and low latency Complete Elliptic Curve addition formulae along with Domb-Barret Reduction to produce MSMs, a major bottleneck for many ZK provers, with improved energy efficiency and speeds comparative to GPUs.
We implemented and tested our design on FPGA, and highlight the promise of optimized hardware over state-of-the-art GPU-based MSM solvers, in terms of both speed and energy expenditure.
A disadvantage of FPGAs compared to GPUs is their lack of accessibility, while GPUs have effectively become commodity hardware. This means there is a high barrier for ZK users simply getting an FPGA into their hands.
One promising approach to overcoming this barrier is by utilizing cloud computing. Similar to the way cloud providers have enabled easy and immediate access to massive data center resources which has lowered the high barrier to entry for small companies, clouds offering FPGA resources such as AWS, Alibaba Cloud and others have made FPGAs more accessible with the same kind of “on-tap” accessibility.
This is a GAME-CHANGER in the world of hardware acceleration.
With the Cloud-ZK toolkit we have made open source, the benefits of ZK hardware acceleration on FPGAs are accessible to anyone, anywhere, anytime and with costs comparable to any other standard CPU instance on an hourly basis.
As part of the initial release we have included:
Our FPGA code achieves 4x the baseline of Zprize FPGA MSM competition, where max prize criteria is 2x :)
As an example application, our AWS F1 instance can be used to accelerate Aleo prover. In terms of proofs per second, the accelerator can generate up to ~5x proofs per second compared to running on CPU alone.
In the future, we plan to improve the toolkit in the following ways:
PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication