Product Update: ICICLE V1.2.0

Published on: 
Feb 9, 2024

ICICLE is a library for ZK acceleration using CUDA-enabled GPUs.

Learn more about how to use ICICLE on our documentation website.

Highlights:

  • Support for Poseidon hash accompanied by a dedicated tree builder.
  • Support for a new NTT algorithm
  • Improved MSM design with support for G2 and batched MSM
  • Rust bindings now supports Windows
  • Fixed overflow in large coset NTTs

Poseidon hash

Poseidon is the most popular zk-friendly hash function. Our Poseidon hash implementation is based on Filecoin’s optimized instantiation (Neptune).

Key features:

  • GPU-Optimized Parallelization — Our Poseidon hash supports “hash-many” mode which computes multiple hashes in parallel, this is more parallel friendly and produces best results on GPU.
  • Flexible Arities — We support arities of 2, 4, 8, and 11, covering the most common use cases, while still being adaptable to any arity greater than one.
  • Optimized Constants — We currently only support optimized constants for all officially supported Curves
  • User generated constants — A user can generate his/her own constants for other curves and configurations (this may not be so simple for every curve, more details in the documentation).

Merkle Tree Builder

Complementing Poseidon, we have also implemented a Merkle tree builder, optimized for GPU. The tree builder supports only Poseidon hash and is optimized for use with Poseidon hash. The implementation allows a user to specify how many top rows of digests he wants to store in memory. This is very useful for tall trees.

The tree builder will first determine the height of the subtree that will fit 1GB of device memory. Then a number of streams will be created to match YourDeviceMemory / 1GB. All the subtrees will be built by this stream pool. This allows copying the leaves from the host and building the subtrees in parallel — using the maximum of the GPUs and host RAM+CPU capabilities.

Performance

We ran the Poseidon hash tree builder on the following specs:

CPU: 12th Gen Intel(R) Core(TM) i9–12900K

GPU: RTX 3090 Ti

Tree height: 30 (2²⁹ elements)

The benchmarks include copying data to and from the device.

We invite you to read our updated documentation regarding this feature.

New NTT Algorithm — Mixed Radix

Until now we only supported the Radix-2 algorithm which excels at smaller NTT sizes. Mixed Radix NTT becomes particularly advantageous for larger sequence lengths or when the sequence length is a product of small primes.

Key features:

  • New algorithm — Users can now select between Radix-2 and Mixed Radix, based on NTT size and number of batches. This allows users to fine tune performance for specific use cases.
  • Algorithm autoselect — ICICLE NTT will attempt to automatically select the best algorithm for your inputs. This of course can be overridden.

We currently define small NTTs as logn < 16, however we will be adding a more dynamic and heuristic approach for auto selecting the most favorable algorithm for inputs.

If you wish to override “autoselection” of the algorithm, ICICLE allows you to explicitly define which algorithm you wish to use.

Performance

We invite you to read our updated documentation regarding this feature.

Improved MSM

MSM support for extension curves (G2 Curves) was broken in ICICLE V1.0. We fixed extension curve support in this release. The fix details can be found in the pull request.

We merged MSM and Batched MSM into a single method. Batched MSM now also supports bucket accumulation in addition to large triangle accumulation, bringing its performance to on par with our latest algorithm. The new bucket accumulation method is more parallel friendly and should be used for most use cases. Reference the pull request to understand the implementation better.

We invite you to read our updated documentation regarding this feature.

Bug fixes

Resolved overflow error occurring in NTT

When performing the NTT on a subset of size 2¹⁶ derived from a larger set with 2¹⁷ elements, the process fails due to the index variable in the BatchMulKernel component exceeding its maximum value. This overflow error occurs because the index variable is declared as an int type, which cannot accommodate the large index values needed for this operation. The issue is resolved by changing the index variable to a long type, which has a larger capacity, preventing overflow and ensuring the operation completes successfully.

For more details please refer to the pull request.

Rust bindings now support windows

We fixed the Rust build on Windows by conditionally managing curve configurations and streamlining complex type handling within Rust, rather than through external C functions. For a detailed explanation of the fix please refer to the pull request.

Wrapping up

For a full list of changes view our change log. Upcoming features (to be released in the next few days):

  • Polynomial arithmetic
  • Redesigned Golang support
  • Multi GPU support
  • Small fields support

If you are interested in testing these features pre-release, or have some thoughts about design considerations, please reach out to Immanuel

light

Written by

Table of Contents

Want to discuss further?

Ingonyama is commited to developing hardware for a private future using Zero Knowledge Proofs.

Get in touch
Get our RSS feed