Zero-knowledge rollups are getting increasingly popular thanks to their innovative approach to scaling Ethereum and are expected to overshadow optimistic ones in a few years. As Tenderly strives to support Web3 teams and individuals in their development journeys, providing support for ZK rollups is an essential part of this effort.
The first step in the integration process was finding an implementation of the EVM with support for zero-knowledge proofs. We decided to start with Polygon’s EMV-compatible zkEVM, specifically, its implementation based on the Erigon Ethereum client.
Integrating the vanilla zkEVM implementation would have been a much more challenging task because it’s, at its core, written almost entirely in C++ while our platform is almost fully written in Golang. For this reason, integrating zkEVM-Erigon seemed more straightforward.
However, the Erigon-based implementation also proved to be more challenging than expected. The implementation is still in its early phases, which introduced several issues of its own.
The Polygon zkEVM network
The Polygon zkEVM network, formerly known as Polygon Hermez, is a rollup solution based on the eponymous zero-knowledge EVM variant. It’s considered to be in its beta stage and is among the few EVM-compatible networks with significant traffic.
Compatibility with the Ethereum Virtual Machine
Apart from executing an EVM program, zero-knowledge EVMs are required to output cryptographic zero-knowledge proofs of validity for every step in the execution. Due to this, designing a zero-knowledge EVM variant that behaves exactly like the canonical EVM is a difficult task. Vitalik has written an article classifying the various types of zero-knowledge EVM variants according to how closely they emulate the EVM.
In the case of the zkEVM, it’s considered to be a type 3 zero-knowledge EVM, meaning that it implements nearly all of the EVM functionalities, apart from some precompiled contracts. To achieve this near-EVM-equivalence, the zkEVM team relied on two custom intermediary languages: the zkASM and the Polynomial Identity Language. In tandem, they served as the base layer on top of which the EVM would be interpreted while being able to generate the required zero-knowledge proofs.
The problems of integrating a network in its infancy
As we have already mentioned, ZK rollup solutions are a wonderful piece of technology. However, they’re still very young. Even though the Polygon zkEVM network is considered to be among the top ZK solutions and has substantial total value locked, it’s still very much in development and testing.
Additionally, we did not use the original zkEVM node developed by Polygon for our integration. Instead, we used the zkEVM node implementation based on Erigon. Polygon officially added the Erigon zkEVM repository to their organization the day before we started our integration work. It was still in the 0.0.1
alpha release when we started the integration and it caused many problems down the line.
Our component which re-executes transactions and enriches them with additional execution data is called an agent. Agent testing features rely heavily on the debug_traceTransaction
RPC call. This call essentially lists all of the opcodes executed in sequence by the EVM, alongside some accompanying data such as the contents of the stack and memory at any given point in the execution. We use this data provided by the node and compare it with our agent execution data to see if there are any gas or execution differences.
Node inconsistencies
Initially, using this RPC call helped us solve multiple problems. Eventually, however, it led to some unnecessary confusion. We received many reported differences in execution while having the same gas usage as the node. It took us some time to realize that the node’s execution trace was incorrect.
Furthermore, some gas differences just kept appearing, seemingly out of nowhere. After a deeper investigation, we realized that the discrepancy was once again caused by the node. Apparently, eth_getBlockByHash
was not behaving consistently. It turned out that zkEVM Erigon was not calculating block hashes properly in some instances and served those incorrect values when those particular blocks were queried.
Deployment issues
There was also a deployment problem on Alpine Linux. zkEVM Erigon uses a cryptography library written in C++ (vectorized poseidon gold) which, when combined with its dependencies, isn’t well suited for building on Alpine Linux.
In early development, we used a Ubuntu-based deployment, until we figured out a way to replace the problematic library with its Alpine-friendly, but less efficient, Go implementation.
Our execution environment doesn’t even compute hash during block processing, so poseidon gold is not used and we should have no performance penalties when using the slower Go implementation. In the meantime, their team figured out a way to run the original library on Alpine Linux, so we will be able to switch to it in the future.
All of the mentioned problems were reported to zkEVM team and as a result, several issues were raised and resolved. We hopefully helped to make the zkEVM Erigon-based node more stable in the process.
3 Integration problems and how we solved them
We tackled three major problems during the integration work, including:
- The
NUMBER
opcode returning the correct value, decreased by one - The
SLOAD
opcode returning zero instead of the correct value when reading from a specific contract effectiveGasPricePercentage
not being taken into account when calculating gas price
1. The NUMBER opcode problem
The zkEVM node uses the magic 0x5ca1ab1e
address to store transaction (block) numbers. A difference between EVM and zkEVM is that the NUMBER
opcode returns the transaction number instead of the block number, which is essentially the same thing as the zkEVM network contains one transaction per block.
A difference in execution between the zkEVM node and our transaction applier showed that our applier is returning the transaction number which is one less than expected. Our initial fix just incremented the returned transaction number, which solved the problem.
Afterward, we examined the zkEVM Erigon code more closely and realized that there is a state presetting process in which the transaction number gets incremented before block execution. We mimicked that functionality in our applier and by doing so created a cleaner fix.
2. The SLOAD opcode problem
The SLOAD
problem manifested when reading only from the PolygonZkEVMGlobalExitRootL2 contract, or more precisely, its proxy contract. It stores a mapping between global exit roots and their timestamps. However, this mapping isn’t accessible for editing via the contract. Instead, it can only be updated programmatically by the zkEVM node.
We found a part of the code that is (like in the case of the NUMBER
opcode) executed before block execution and realized that it updates the global exit root mapping. This was problematic because our applier had no notion of this L2 batch data and the updates which are usually done before executing a transaction were not being applied on our end.
We created a fix by fetching the L2 batch data from the zkEVM node using the zkevm_getBatchByNumber
RPC call. Because the batch number isn’t available to the applier by default, we first had to fetch the batch number by block number using the zkevm_batchNumberByBlockNumber
RPC call.
So, the block number is used to get the batch number. The batch number is then used to get batch data, which is used to update the global exit root mapping. The update is done similarly to the zkEVM node, and it presets the states in the PolygonZkEVMGlobalExitRootL2 contract.
3. Effective gas price percentage
The effective gas price percentage represents the percentage of the proposed gas price which should be used when calculating fees. It caused some gas differences between node’s and agent’s executions.
As this data is not returned as part of any standard RPC call, we had to improvise to fetch it. The effectiveGasPrice
(EGP) field contained in the transaction receipt can be used to calculate the effectiveGasPricePercentage
(EGPP), so we decided to take this path.
However, the zkEVM Erigon node had a bug and returned gasPrice
instead of the effectiveGasPrice
field. We reported the bug and, in discussion with the gateway.fm team, they even decided to add the effectiveGasPricePercentage
field as a part of the transaction.
At the time of the integration, we were still waiting for the effectiveGasPrice
bug to be solved or for effectiveGasPricePercentage
will to be added to the transaction data. This is why we implemented some logic to handle this in the meantime. It looks for the EGPP field and if it isn’t present, the logic falls back to calculating EGPP by using EGP. Therefore, as soon as EGPP is added or the EGP bug is fixed, our gas differences would disappear automatically.
Working with cutting-edge technology
It turns out that integrating zkEVM was not an easy task, even though we used the zkEVM-Erigon implementation mostly written in Go. In fact, a decent number of problems we encountered were tied to the fact that the node was still in its early phases of heavy development (version 0.0.1 alpha) and was not yet production-ready at the time.
In conclusion, working with cutting-edge technology which is constantly being updated and not able to provide consistent data makes it harder to pinpoint the problems while debugging. In the end, we integrated zkEVM Erigon and we hope that our ride (however bumpy) was fun to read about and provided you with some insights.