Historical Data Caching
All historical data on `node-server` is read from our two databases. A consequence of this is that tail latencies for fetching historical data are much higher than for mutable / near-latest data.
* Getting a better understanding of how historical data is being interacted with
* Creating a test suite for thorough historical node benchmarking
* Researching the viability of different caching systems for historical data
* Different heuristics for predictive state caching
* Predictive log and trace caching (optional)
* Implementing production-grade caching.
Ideally this should be open ended and up to the intern to research and explore. That said, these basic solutions should be covered during research and benchmarking.
For states:
* No caching (control for benchmarks)
* Caching only the requested state keys
* Caching the requested state keys, as well as their values in a certain block number neighborhood (e.g. when fetching the balance of account A at block number N, also fetch it for the previous and next X blocks)
* Caching the requested state keys, as well as other values on the same block number
For logs:
* No caching (control for benchmarks)
* Caching only the requested logs
* Caching the requested logs as well as the adjacent blocks
* Caching the requested logs as well as the adjacent addresses/topics
None of the implemented caching systems beat the control group (no cache) on benchmarks.
1. Onboarding (2 weeks)
1.1. Ethereum blockchain, Ethereum state and EVM
1.2. Node architecture (components overview)
1.3. Storage layer (databases, key layouts, etc)
2. Collecting data and creating benchmarks (3 weeks)
2.1. Timestamped requests (historical `eth_call` and `eth_getLogs` requests) should be collected and analysed / filtered for the purposes of benchmarking
2.2. Implement a set of scripts which can be configured and ran to benchmark historical node requests
2.3. Decide on prioritization of `eth_call` vs `eth_getLogs` caching based on the performed benchmarks
3. Implementation and benchmarking of basic solutions (3 weeks)
3.1. Depending on the results of the previous milestone, implement the respective basic solutions for the types of requests where caching is likely to have a greater impact
3.2. Benchmark these solutions and fine-tune hyper-parameters (such as block ranges, cache size, etc) based on results
3.3. Generate a simple report going over the results, including but not limited to impacts on latencies, cpu and memory usage, cache hit rate, database cpu usage
4. Further data analysis; Implementation and benchmarking of additional solutions (3 weeks)
4.1. Understand data more deeply through exploratory data analysis, going over concrete data samples, plotting graphs and looking for usage patterns which might inspire more nuanced caching heuristics
4.2. Come up with and implement additional systems (either for the same or different type of requests) based on the conclusions drawn from the previous analysis
4.3. Benchmark these solutions and generate a report similar to the one in the previous milestone
5. Project report (1 week)
5.1. Generate a final report going over the project timeline, solutions, successes, failures, and detailed results
Ready to Optimize Blockchain Infrastructure?
Join us in improving historical data access and performance for node services.
Apply Now