add caching in hashTreeRoot#62
Conversation
gballet
left a comment
There was a problem hiding this comment.
After reviewing, I think that there is a fundamental problem with how the cached hashing has been designed:
- There is no reason to make it
Sha256-specific, it should be easy to make it agnostic - It should be possible not to use caching, which isn't the case here
- imo, the cached hashing should be a wrapper around the regular root-hashing function, e.g. by introducing a
TreeHasherinterface, with ahashTreeRootfunction - which isHasher-agnostic (and passed as a parameter). If caching is active, then the wrapper is used, otherwise the regular object is used.
| // Tests | ||
| const Sha256 = std.crypto.hash.sha2.Sha256; | ||
| const lib = @import("./lib.zig"); | ||
|
|
There was a problem hiding this comment.
put this at the start of the file, or just import it inside the tests that need it
| /// Hasher-agnostic cache type. We use SHA256 as the default since that's the | ||
| /// standard SSZ hasher, but the cache is only used when hashTreeRoot is called | ||
| /// with a matching hasher. |
There was a problem hiding this comment.
This comment doesn't look correct. Either it's Sha256-specific, or it's hasher-agnostic. Looks like slop, I suggest removing it.
| const std = @import("std"); | ||
| const zeros = @import("./zeros.zig"); | ||
|
|
||
| const BYTES_PER_CHUNK = 32; |
There was a problem hiding this comment.
this is already defined in src/utils.zig
|
|
||
| const BYTES_PER_CHUNK = 32; | ||
| const chunk = [BYTES_PER_CHUNK]u8; | ||
| const zero_chunk: chunk = [_]u8{0} ** BYTES_PER_CHUNK; |
| /// Node 1 is the root. Node i has children 2i and 2i+1. | ||
| /// Leaves occupy indices [capacity .. 2*capacity). |
There was a problem hiding this comment.
stop the slop, it is obvious from reading the code.
| /// Node 1 is the root. Node i has children 2i and 2i+1. | |
| /// Leaves occupy indices [capacity .. 2*capacity). |
| .int => blk: { | ||
| const bytes_per_item = @sizeOf(Item); | ||
| const items_per_chunk = BYTES_PER_CHUNK / bytes_per_item; | ||
| break :blk element_index / items_per_chunk; |
| } | ||
|
|
||
| /// Free the Merkle cache without freeing the list itself. | ||
| /// Called by lib.hashTreeRoot to clean up caches on value copies. |
There was a problem hiding this comment.
| /// Called by lib.hashTreeRoot to clean up caches on value copies. |
| } | ||
| cache.nodes[cache.capacity + chunk_idx] = leaf; | ||
| } | ||
| // Zero out chunks beyond data |
There was a problem hiding this comment.
| // Zero out chunks beyond data |
| const end_item = @min(start_item + items_per_chunk, items.len); | ||
| for (start_item..end_item) |item_i| { | ||
| const pos = (item_i % items_per_chunk) * bytes_per_item; | ||
| // SSZ requires little-endian encoding |
There was a problem hiding this comment.
| // SSZ requires little-endian encoding |
| // Composite types: each item is its own chunk (hash tree root of item) | ||
| // Composite items may contain pointers to mutable data that | ||
| // can change without going through set(), so we must always | ||
| // recompute all item hashes. We compare against cached leaves | ||
| // to only mark actually-changed nodes dirty for tree rehashing. |
There was a problem hiding this comment.
Is the cache bringing any performance, then? Most of the types we want are composite types, so if they are recomputed no matter what, that's a lot of performance left on the table.
Adds caching in hashing path for List and BitList