1. Introduction
In everyday life, individuals often need to prove statements to others. The simplest method is by plainly stating, explaining, or showing evidence that can be verified. For instance, when purchasing age-restricted goods, a customer might show an identity document to prove their age to a cashier. However, this process can expose more information than necessary, such as the customer’s exact birth date and other personal details. In digital environments, the risk is even higher as servers can store copies of sensitive information. Zero-knowledge proofs (ZKPs), first introduced in a work by Goldwasser et al. [
1], are a recent technology that could solve these problems. ZKPs allow a prover to prove a given statement, the proof of which a verifier can subsequently verify without being able to obtain any knowledge apart from the facts induced by the correctness of the statement itself. However, traditional ZKPs are interactive, meaning that they require multiple interactions between the prover and verifier before the verifier can trust or reject the statement. Additionally, other parties cannot verify the same proof afterward since this would require additional interactions. This limits the practicality of standard ZKPs. To this end, Blum et al. proposed non-interactive zero-knowledge proofs (NIZKPs) [
2]. NIZKPs enable a verifier to verify a claim in a single interaction while also allowing other verifiers to verify the truth of the proven statement at another point in time.
Notably, ZKPs, especially the non-interactive variants, have gained prominence in cryptocurrencies like Zcash [
3] and Ethereum [
4]. In these contexts, they facilitate transaction verification without disclosing sensitive transaction details, thereby preserving privacy. Although cryptocurrencies have been the main source of interest in ZKPs due to their surge in popularity next to other blockchain technologies, the utility of ZKPs extends far beyond this domain. In our previous systematic literature review (SLR) [
5], a summary of which we detail later, we collected applications of the three main NIZKP protocols relating to privacy-preserving authentication. Notably, we investigated applications and the performance of the zk-SNARK (zero-knowledge succinct non-interactive argument of knowledge) [
6,
7], zk-STARK (zero-knowledge succinct transparent argument of knowledge) [
8], and Bulletproof [
9] protocols. In the SLR, we examined a total of 41 works that applied NIZKP protocols in a diverse set of applications. However, we found high variability in protocol performance metrics between the several applications, which we believed to be attributable in large part to the difference in applications and benchmarking procedures. This result indicated that a research gap exists for a comparison of the three main NIZKP protocols benchmarked in an equal, real-world applicable, use case.
Our aim in this work is to satisfy the observed research gap by performing a benchmark of the three main NIZKP protocols implemented in an equal, real-world privacy-preserving related, application. The relevance of this lies mostly with researchers and application designers obtaining a meaningful overview of the main NIZKP protocols, the situations in which they excel, and their implied performance characteristics. Insights from this work can furthermore guide researchers to the main aspects of concern when applying NIZKP protocols to real-world applications. This, in turn, can incite research into mathematical improvements and newly designed NIZKP protocols that reduce the deficiencies of existing protocols.
To define our aims and objectives for this research, we first outline the key research questions that we intend to address as a result of this research work. These questions serve to guide the main direction of this research investigating the differences between the zk-SNARK, zk-STARK, and Bulletproof protocols:
What are the performance differences between the three included NIZKP protocols, as observed from a real-world implementation of each protocol in an application that is as equal as possible, expressed in efficiency and security level?
What use case contexts are the most beneficial for each NIZKP protocol, given the unique combination of its features and performance metrics?
In our previous SLR [
5], the applications described in the included research works were each implemented with a single protocol. This meant that the research works were hard to compare on common grounds because of the dissimilar applications, benchmark procedures, and results. Therefore, the objective of this research is to implement a single application for the three protocols in a manner that is as similar as possible, with the direct purpose of making comparisons between the three protocols more straightforward. As a result, the comparison outcomes should be more informative. This objective is deeply embedded in the previously stated research questions, meaning that these questions will guide us toward a deep exploration of the three NIZKP protocols in a manner that aims to expose and clarify their associated differences.
We now reflect on the objectives we set for our overall research, specifying those we were unable to fully meet as outlined in the SLR. These objectives included filling the research gap by comparing the three most used NIZKP protocols and providing recommendations on the settings in which each protocol is most advantageous. The goals we aim to achieve in this research are as follows:
To implement and evaluate the protocols in a practical setting, using a common benchmark for a real-world use case.
To compare the efficiency and security of these three protocols, including their trade-offs between efficiency and security.
To provide recommendations for the use of these protocols in different applications, based on their strengths and weaknesses.
While we made advances on these objectives in our previous SLR, we intend to further progress in the development of understanding related to these aims. Therefore, this specific research work aims to more comprehensively achieve the stated objectives to determine conclusive answers to the research questions from the previous section. To conclude, our aims and objectives for this research are to further detail the performance characteristics of the three most prevalent NIZKP protocols. We aim to do so by more comprehensively comparing those protocols in a benchmark, where we implemented each protocol in an application that is as equal as possible between the three implementations. We can then thoroughly answer which aspects of each NIZKP protocol should be considered when choosing a protocol to be applied in a particular environment.
The scope of our research is twofold. First, we briefly describe the mathematical and cryptographic primitives underlying each of the three main NIZKP protocols, the intention of which is to provide a concise understanding of the fundamental techniques that differentiate them. We do not, however, aim to accomplish a comprehensive mathematical and cryptographic manual that can be used as the basis for implementing the protocol itself in code or to create a new protocol from scratch. Furthermore, we describe the security model of each protocol, next to some vulnerabilities that have surfaced in at least some of the NIZKPs included in this work. The intention is, again, not to be comprehensive; instead, the information should serve as a general overview of security aspects and security vulnerabilities to consider when choosing a NIZKP protocol. Second, this work designs and performs a benchmark comparing the three NIZKP protocols zk-SNARK, zk-STARK, and Bulletproofs on their performance and security level. In the benchmark, each protocol is implemented in a privacy-preserving, authentication-related application using general-purpose programming libraries designed for each protocol. There are several limitations to this part of our scope. First, we intend to implement each protocol in an application to enable straightforwardly comparing their performance. For this, the application should be as equal as possible. The application, however, does not need to consider and implement every aspect that a production-ready real-world application would, as long as the benchmark results are representative. Secondly, we implement each protocol within a single application. We do not create multiple application benchmarks, nor will we implement the benchmark application across an exhaustive selection of programming languages and NIZKP protocol libraries. Provided that our benchmark implements the application using at least each of the NIZKP protocols, we have achieved this scope. Finally, while we aspire to benchmark the security level of each protocol, we will not allocate time for an in-depth attempt to breach the security of each protocol. We leave this to other researchers, as it is more meaningful to perform such tests in the context of an actual production-ready application rather than in our representative benchmark application.
As mentioned before, the relevance of this work lies mostly in providing other researchers and application designers with a meaningful overview of the three most prevalent NIZKP protocols and the situations in which they excel. The description of their mathematical and cryptographic primitives, as well as their security aspects and trade-offs, should provide researchers with a concise reference for understanding each protocol. Next, the benchmark results should provide researchers and application designers with a novel comparison of the three NIZKP protocols in an equal setting. This, in turn, should help them make informed decisions about which protocols to apply in which real-world applications, given the performance characteristics we detailed. While our previous SLR was a first step in achieving this, this research takes it a step further, helping researchers and application designers to choose the best-fitting NIZKP protocol for their requirements.
Therefore, we believe that our work benefits multiple entities. First, it serves as additional work for researchers just entering the field of NIZKPs next to our previous SLR [
5]. Second, it should help individuals and organizations interested in applying NIZKP protocols to real-world applications by providing them with insights into each protocol’s performance and suitability in privacy-preserving related applications. Ultimately, we believe that our work will benefit academia, industry, and society as a whole by advancing the understanding and application of NIZKP protocols.
We organized this work as follows. First, we summarize our previous SLR, detailing its findings and the rationale for this follow-up research. Second, we describe our methodology for performing a benchmark comparison of NIZKP protocols, including the design and approach used for analyzing our results. Third, we provide a brief overview of the mathematical and cryptographic primitives for each of the three NIZKP protocols. Fourth, we detail the setup used for the benchmark, including the software, hardware, and specifics of our implementation. Fifth, we present the results from our benchmark and analyze them. Sixth, we discuss our results by answering our research questions and detailing the strengths and limitations of this research. Finally, we conclude this research with the main findings and recommendations, as well as a description of potential future research directions.
2. Related Work
In our previous SLR, we analyzed a broad spectrum of research works that described diverse use cases related to authentication. All included works were related because of our requirement that the use case applied at least one of the three NIZKP protocols, zk-SNARK, zk-STARK, or Bulletproofs, for privacy-preserving use within the application context. Ultimately, we examined 41 research works that surfaced from our collection and filtering criteria, discussing their implementation of the NIZKP protocol, and comparing these implementations on their use case. Furthermore, we discussed the performance and security of the NIZKP in the application when a work included benchmarked figures for these. For anyone interested in a more detailed description of our SLR intentions, collection and filtering process, results, and discussion, amongst other things, we recommend consulting the full research document [
5]. We limit the remainder of this section to highlight the key findings from the SLR.
To start, 31 of the 41 works included in our SLR employed the zk-SNARK protocol in their described application, whereas the other 10 works utilized the Bulletproof protocol. This indeed means that our work did not end up including any works that based their application on the zk-STARK protocol. While this prevented us from drawing definitive conclusions on the proportionate use of the zk-STARK protocol compared to the other protocol, we did remark that this finding signifies the zk-STARK protocol was not commonly deployed in privacy-preserving authentication-related applications. More specifically, applications adhering to the search and filtering criteria from the SLR do not seem to utilize the zk-STARK protocol. We exert confidence in the notion that the reason for this will be more evident by the end of this work.
We also want to recite the observation that all but two works did not mention the quantum resistance of their implementation. We find this interesting especially since none of the 41 included works applied the only quantum-resistant protocol, zk-STARK. This clearly emphasizes a lack of consideration regarding this security aspect, despite quantum computing and quantum-resistant cryptographic protocols having been ongoing important topics for the past few years [
10].
Of the 41 works included in the SLR, 30 works included some form of performance analysis of the implementation. Among those, 22 employed the zk-SNARK protocol, with the remaining 8 works utilizing Bulletproofs. In the SLR, we discussed the performance results in several categories, although here we will only review the overall performance differences between all works. We observed highly varying measures in multiple categories of performance metrics, including proof size, proof generation time, and proof verification times. These variations were significant, with several orders of magnitude performance differences between the same protocol applied in different works. Considering this extreme variance in observed metrics, we concluded that it was impossible to draw any definitive conclusions from comparing the performance between applications. The research works would have to specifically perform their benchmarks in a related way to another research work for us to draw any revealing conclusions from the comparison.
We had to draw a similar conclusion to that of the performance comparison for the security comparison, which proved to be even more complex to perform and accomplish a reasonable comparison. The main reason for this difficulty involved the diverse ways researchers used to describe the security of each implementation. Some works described the security by proving mathematical theorems in either natural language or as mathematical statements, whereas others described the security requirements of their application and mentioned either how they were achieved or how attacks were mitigated through implemented security measures, just to name a few of the encountered possibilities. Altogether, our SLR had a particularly challenging time inferring any reliable security comparison outcomes from the 31 works that included some form of security analysis.
5. Proposed Solution
In this section, we describe the proposed solution according to the methodology as described in
Section 3. First, in
Section 5.1, we restate our implementation for the proposed solution and link this to the research gap observed in our SLR. In
Section 5.2, we then describe in detail the software and hardware that were used to perform the benchmark, while in
Section 5.3 we comprehensively describe the implementation of the benchmark design as outlined in
Section 3.2. After that, we detail the benchmark procedure that we followed to obtain the actual results from our implementation in
Section 5.4. Finally, we justify our proposed solution where we briefly state how our proposed solution will address our research questions in this work in
Section 5.5 and present a schematic overview of our proposed solution in
Section 5.6.
5.3. Implementation
Now that we have determined which software and dependencies we want to use to implement the benchmark, we will describe the actual implementation of the benchmark using the chosen NIZKP libraries.
Our initial idea for the implementation, as described in
Section 3.2, comprised of a zero-knowledge proof, which proved that a given public elliptic curve digital signature algorithm (ECDSA) key verified a signature and is included on a list of trusted keys. The intention for such proof was to prove that the user utilized a hardware security key from a trusted manufacturer to sign a message, without leaking the manufacturer details or batch information of the hardware security key. Our benchmark application would have implemented such proof for each of the three NIZKP protocols, albeit without communicating to a real hardware security key, generating the public keys in code instead. Our first step in creating the implementation was to create a proof of concept using the gnark zk-SNARK library. We implemented the proof of concept in gnark because of the great documentation, familiarity with the language, and numerous existing cryptographic primitives that the codebase contained. We started with an implementation using the Edwards-curve digital signature algorithm (EdDSA) to become familiar with the gnark library since creating a gnark circuit for proving the verification of an EdDSA signature was explained in a tutorial [
21]. We expanded this proof to additionally verify that the used public key was included in a provided list of trusted public keys. We defined the public key as a secret input to the circuit, while we set the message, signature, and trusted key list as public inputs. The code for this implementation can be found in the Git repository for this research [
22]. With a working implementation for EdDSA, we re-implemented the same approach in gnark for ECDSA. This process was more involved, because we had to use more primitive cryptographic building blocks, yet eventually we obtained the ECDSA-proof circuit working identically to the EdDSA circuit. We should note though that, since we ended up not using this implementation, we did not fully implement some aspects of the proof that did not impact functionality but would have impacted security in any real use cases. The corresponding code can be found in our Git repository [
22].
Since we had a working zk-SNARK implementation using the gnark library, we knew that the idea would technically be possible to implement. With that said, we had to implement the same application for each of the three ZKP protocol libraries in Rust, which is where we hit some difficulties. First, while we implemented the proof-of-concept idea in gnark because it provided a tutorial, documentation, and many cryptographic primitives, this was not the case for the Rust ZKP libraries. This meant that we would have had to implement these primitives ourselves, leading to more opportunities for security issues. More importantly, we expected that this would take more time than we had available for the research. Even more critically, their creators geared the zk-STARK library toward succinctly proving computations, as opposed to knowledge like the zk-SNARK and Bulletproof libraries. This meant that the application would require a completely different approach in the STARK implementation compared to the other two protocols. On top of this, at the time of implementation, the STARK library did not provide perfect zero-knowledge. This meant that there was no option for us to provide the used public key to the circuit, as required in our proof of concept since the proof would not keep this key private. While it sounds strange to have to keep a public key secret, we reiterate that openly providing this key would reveal some privacy-sensitive information about the used hardware security key. As a result, doing so would invalidate the entire reason for utilizing a NIZKP in the application in the first place. For these reasons, we decided to abandon this idea for our benchmark application. Instead, we opted to use a more rudimentary application.
For the basic ZKP application idea that we could implement more equally for all three protocols, we implemented a hash function. Our application would ensure this hash either had a variable number of rounds or would use the hash as part of a hash chain, to enable some way to increase the required amount of work in the proof. After some deliberation between the MiMC [
23], Poseidon [
24], and rescue [
25] hashes, we eventually chose the MiMC hash function. Namely, this hash function is well-optimized for zero-knowledge proofs [
26] and has a simple algorithm that is easy to implement in proof circuits; moreover, example implementations we could adapt and build on were available for the SNARK and Bulletproof Rust ZKP libraries. The number of rounds used in the MiMC hash can be varied in our benchmark, where each round requires a different round constant for security. This enabled us to implement the hash for all three protocols, since, at least for our intents and purposes, proving knowledge of the pre-image of a public hash is the same as proving the computation of calculating the required hash from a pre-image provided by the prover. However, in the latter case, applicable to the STARK implementation, the pre-image would not necessarily remain private. Therefore, for equality reasons, we did not focus on these variables remaining private in the other protocols either. This is a limitation of our benchmark, for which we decided that the most important aim was to keep the proof as similar as possible. Since this limitation is important to consider for real-world implementations using ZKPs, we further discuss this limitation in
Section 7.4.
The MiMC hash, named for its minimal complexity multiplication, is optimal for use in zero-knowledge proofs due to its simplicity and minimal multiplication requirements. While this simplicity limits the complexity of the proof, potentially making MiMC less directly applicable to more sophisticated cryptographic hash functions or complex computational problems, it is crucial to note that our benchmark intends to assess the core performance characteristics of the underlying protocols rather than specific applications.
The benchmark’s equivalence to more complex applications is ensured by adjusting the number of rounds in the MiMC hash, simulating increased computational effort akin to more sophisticated use cases. The approach is representative of complex applications since all statements, regardless of their complexity, are transformed into simple proof circuits with a varying number of gates before being processed into the proof. This method provides a foundational understanding of protocol behavior under varied computational loads, irrespective of specific hash functions or applications.
Although the MiMC application may not generalize directly to all scenarios, its purpose here is to offer a controlled environment to evaluate the protocol implementations. The focus is on the protocols’ handling of computational complexity, with the MiMC hash serving as a scalable proxy. The differences observed in performance metrics are primarily attributed to the NIZKP library’s implementation intricacies rather than the inherent limitations of the MiMC hash itself. Thus, while specific hash functions may yield different absolute performance results, the relative performance insights provided by our benchmark remain robust and informative.
To summarize, our actual implementation existed of a proof that verifies that the prover knows a pre-image to a certain MiMC hash image. The MiMC hash had a variable number of rounds, and we provided the round constants as input to the circuit. We implemented this application in each of the three chosen Rust protocol libraries. Our implementation adapted and built upon example implementations for both the Rust SNARK library [
27] and Bulletproof library [
28], while we created the Winterfell STARK library implementation from scratch. Moreover, we implemented the application in the Go gnark zk-SNARK library as well, for comparison reasons described in
Section 5.2. We conjecture that this implementation provided the best possible comparison between the three protocols. Where significant for such real-world implementations, we provide additional protocol-specific context in
Section 6 and
Section 7. We also present additional justification for our implementation idea in
Section 3.2. The code for all implementations can be found in the Git repository for this research [
22].
An important consideration for the Bulletproof implementation was that we did not apply any form of batch verification, even though this is one of the beneficial aspects of the Bulletproof protocol that the Bulletproof library implements. While such batching verification could reduce the total verification time compared to performing each proof verification separately, it required an application where such batching is viable. In this work, we benchmarked the process of generating and verifying a single proof, which means that batching did not apply to our benchmark. We will discuss the implications of this in
Section 7.
Finally, when inspecting our implementation, one should consider that we used seeded randomness for our benchmark. This means that the randomness we used in our implementation is not secure. Any real-world implementation should at minimum replace the seeded randomness with a cryptographically secure randomness source.
8. Conclusions
In this section, we conclude our research in which we performed a benchmark for the zk-SNARK, zk-STARK, and Bulletproof ZKP protocols. First off, in
Section 8.1, we recollect the results from
Section 6 and reiterate our key findings. Following our key findings, we provide some recommendations on the utilization of NIZKPs that follow our benchmark in
Section 8.2. Subsequently, we provide some promising future research directions on all kinds of NIZKP aspects that we would like to see realized in section
Section 8.3. In drawing things to a close, we finalize our work by providing a conclusion with some final remarks in
Section 8.5.
8.2. Recommendations
Reflecting on the obtained results from
Section 6, and the discussion that subsequently ensued in
Section 7, in this section we strive to provide some recommendations on which application contexts we would recommend utilizing each protocol.
We start with the zk-SNARK protocol. The two implementations for this protocol showed the smallest proof sizes, in addition to the proof size itself being constant. The small proof size makes this protocol a great contender for applications where either storage space is limited, or where the network connection has a restricted capacity or transfer speed. An example of a situation where storage space is limited is in blockchain systems, for which we can see the zk-SNARK protocol already in use, e.g., in Zcash [
43]. Limited network connections, on the other hand, are a reality for Low Power Wide Area Networks (LPWANs), often used in Internet of Things (IoT) applications and sensor networks where the devices are in a remote location and have low power requirements [
44]. The small and constant size of the SNARK proofs, especially those created by the Rust implementation, make the zk-SNARK protocol a good protocol to consider for these kinds of applications. Furthermore, as the benchmark, creating a SNARK proof is not much more compute-intensive than creating a STARK proof, which is beneficial for the IoT application where devices and sensors are often low-powered devices with little computing power. The most important consideration to make before applying the zk-SNARK protocol, even for these applications, is whether the requirement for a trusted setup is acceptable. There are sparks of hope to apply the zk-SNARK protocol in situations where a trusted setup is unacceptable. Researchers have recently created new SNARK backend techniques, including Supersonic [
45] and Halo [
46], which do not require a trusted setup in certain situations. Zcash currently uses a Halo 2 zk-SNARK backend [
46] in their network, which according to them eliminates the trusted setup requirement. As it currently stands, however, the trusted setup is a definite requirement in the Groth16 backend implementation used by both the Rust and Go zk-SNARK protocol libraries benchmarked in this work. Therefore, we recommend investigating the use of the zk-SNARK protocol for applications where the proof size is a key factor, including blockchain and IoT applications, yet to ensure that the trusted setup requirement to obtain a CRS is not a hindrance in said application.
For applications in which a trusted setup is not an option, the Bulletproof protocol offers a viable alternative. Bulletproof proofs are not considerably larger than SNARK proofs, especially when compared to STARK proofs. Unlike the SNARK proofs, though, the size of the Bulletproof proofs is not constant. A further downside for the applicability of the Bulletproof protocol is the much larger proof creation and verification times than in the two other protocols, which furthermore increase more rapidly as well with the size of the computation. At present, this makes the Bulletproof protocol less suitable to apply to low-compute IoT environments. In applications where aggregation of proof and batch verification, as discussed in
Section 7.4, is possible, the proof size and especially the verification times can however be significantly reduced. This is beneficial in situations where a single prover must create the proof, but many verifiers need to verify that proof. This applies for example when proving and verifying transactions in blockchains, for which e.g., the Monero network [
47] already applies the Bulletproof protocol. The Bulletproof protocol has yet another benefit, not visible in our benchmark since we use R1CS proofs, in that it specializes in range proofs. This allows the Bulletproof protocol to be especially beneficial and performant in applications that use ZKPs to prove that a certain value lies within a pre-determined interval. In general, applications that benefit from such a range of proofs include financial transactions, income checks, and age verification. There are, however, many more specialized uses for range proofs, including genomic range queries [
48]. In brief, we recommend that the Bulletproof protocol could be a viable alternative to the SNARK protocol in situations where a trusted setup is undesirable, where the proof creation cost is not a limiting factor, or where proof is verified frequently after it is created once. Furthermore, we recommend investigating the use of the Bulletproof protocol specifically where the proof must prove that a value is inside of a pre-determined range, a use case in which Bulletproof range proofs are particularly good.
Finally, there is the zk-STARK protocol. Given the proof size which, in our benchmark, was at least an order of magnitude larger than that for the other two protocols, we can only recommend the use of the STARK protocol for applications where the proof size is not important. An example where the proof size is unlikely to be important is in the context of cloud computing, data centers, or machine learning. In that application context, ample storage space and network capacity are available, and datasets used as input to calculations can be extremely large to begin with. In return for the large proof sizes, we observed a low proof creation time and especially short proof verification time compared to the other protocols in our benchmark. These small proof and verification times become especially useful when applied to large computations as performed in data centers and machine learning. This applicability factors into the zk-STARK protocol in general, and to an even greater degree for the Winterfell library used in our benchmark. Currently, this library does not implement perfect zero-knowledge, instead, the library aims to enable succinctly proving computations. This makes it hard to securely implement applications where the proof proves a statement on confidential data, as the generated proof could leak the data. This is a significant distinction from the Bulletproof and zk-SNARK protocol implementations, which do intend to guard against the verifier obtaining confidential information. For the reasons listed above, we recommend considering the zk-STARK protocol, and specifically the Winterfell library, in situations where the application uses ZKPs to ensure the correct execution of a computation in a succinct manner. This includes but is not limited to, machine learning, distributed or multi-party computations, and verifiable computing applications, e.g., in the cloud.
This brings us to our final advice when contemplating which NIZKP protocol and library to use for a given application context. We recommend, where possible, creating a proof of concept for the desired application using multiple libraries implementing the same protocol. When in doubt between multiple protocols, try them all in a way that is representative yet does not cost a lot of time. This recommendation stems from two observations: first, the challenges we had in applying the three protocols to a single, equivalent, application. Second, the Rust and Go libraries both implement the same Groth16 SNARK protocol [
7], yet exhibit different performance metrics, particularly regarding the size of the proving and verifying keys in the CRS. Furthermore, we not only recommend trying out multiple protocols and multiple libraries for the same protocol, but we also advocate attempting different methods to utilize ZKPs in the application. Specifically, when using the STARK protocol, Furthermore, we recommend evaluating the performance for several configurations to see which best achieves a pre-determined set of objectives for the application. All these tests can lead to vastly different performance metrics, which could make or break the usability of NIZKPs in an application context. While we understand that this recommendation requires a considerable time investment, we hope that our work can reduce this time investment by serving as a knowledge base to limit the amount of experimentation required to find the right NIZKP protocol that best fits the application needs.
Table 12 provides protocol-specific recommendations, while
Table 13 summarizes which protocols we would consider optimal for several applications. In addition to protocol-specific recommendations, we provide general recommendations in
Table 14.
8.3. Future Directions
With the results, discussion, strengths, limitations, and recommendations out of the way, we will now provide some suggestions for future research directions.
First, we would like to suggest research that compares many different programming libraries implementing the same NIZKP protocol. These libraries could be written in different programming languages, as long as the implemented protocol is the same. This would not only better indicate the differences between several libraries than we did in our comparison since that was not our main goal, but it would also provide a nice overview for anyone wanting to implement a given protocol in an application using a library. The comparison could not only compare the performance of the protocols but also the features that each implementation includes. In addition, a comparison of different libraries implementing an identical protocol would have an easier time implementing a more detailed and interesting application for the benchmark. The direct result of such a benchmark would be that it provides visibility to the specialization of the protocol more than our benchmark did. We believe that research performing the described comparison is valuable to read for anyone who has the goal to utilize that specific NIZKP protocol in any given application.
Second, we think it would be interesting for future researchers to examine whether our initial benchmark application idea of implementing ZKAttest, as introduced by Faz-Hernández et al. [
13], for all three NIZKP protocols, would be doable after all. Our research as described did not have the capacity to implement this application, yet any research could easily extend our current benchmark with the results of a benchmark for such an application. Such an addition would provide an even better idea of the real-world performance to expect from each protocol and matching libraries.
Third, we believe there is room for more research into new and improved NIZKP protocols. Researchers have performed vast amounts of research on NIZKP protocols in the past few years, with the Bulletproof protocol [
9] and FRI underlying the STARK protocol [
49] originating only in 2017. Work on the zk-SNARK protocol has not been dormant either, with the introduction of Sonic [
50], Supersonic [
45], Halo [
46], and Halo 2. Zcash currently uses a Halo 2 zk-SNARK backend in their network, which, according to them, eliminates the trusted setup requirement [
46]. Even the Groth16 SNARK scheme [
7], which originated in 2016 and is widely implemented in SNARK libraries, is continuously improved upon; for example, see
Section 7.
Section 7.4 mentions work by Gailly et al. [
51] from 2021, which introduced aggregation for Groth16 proofs. As we found in this research, however, in practice, implementations understandably lag research. Furthermore, there is still a vast number of limitations and performance implications that anyone utilizing NIZKPs to prove knowledge or computations in their application must deal with. We expect that future research works can resolve more of these limitations, which would open opportunities to gain benefits from using the ZKP protocols in applications without the current downsides. For this reason, we argue that more research on NIZKP protocol improvements would benefit the ZKP ecosystem.
Fourth, as mentioned in the limitations to our work in
Section 7, our work was unable to compare in detail the actual security level of most of the benchmarked protocol implementations. This leaves us with questions on which of the three protocols is most secure. Therefore, we indicate this aspect could be researched in-depth in future work.
Fifth and last, we recommend a future research direction into the establishment of benchmarking standards for ZKP applications. We anticipate that introducing such a standard would make it easier to compare research on applications implementing ZKPs when the authors of these works benchmarked their application and followed the set standard while doing so. Furthermore, we anticipate that an established benchmark standard would entice implementing libraries to implement functionality to obtain the metrics defined in this benchmarking standard, which would make it even easier for researchers who implement an application using such a library to include the standardized ZKP metrics for comparison. While we do not expect a standard to be all-encompassing, nor do we expect every researcher to embrace it, we would still consider it an improvement over the current situation in which comparing the performance of ZKP protocols in applications is a complex endeavor.
Alternative Zero-Knowledge Proof Protocols
In addition to zk-SNARK, zk-STARK, and Bulletproofs, other non-interactive zero-knowledge proof implementations offer various advantages depending on application requirements. All proof systems, except for zk-SNARG, are considered to be alternative zk-SNARK constructions to the Groth16 implementation benchmarked in this work. This section provides a detailed comparison of these alternatives, including their strengths, weaknesses, quantum resistance, and preferred applications.
Table 15 summarizes the key characteristics of these systems.
Table 15 provides a comprehensive comparison of various zero-knowledge proof protocols, integrating quantum resistance to give a complete overview of each protocol’s characteristics and suitability for different applications. Future research should consider these aspects to identify and develop more resilient cryptographic solutions.