blockchaindev
☰ Lessons

Build a blockchain in Rust · Lesson 02 of 07

Hashing and Merkle trees in Rust

TL;DR — Hash data with a real crate (sha2 or blake3), then build a Merkle tree whose root commits to every leaf — and verify membership with a logarithmic-size proof.

What hashing buys a blockchain

A cryptographic hash function maps arbitrary bytes to a fixed-size digest such that it is infeasible to (a) find an input for a given output (preimage resistance) or (b) find two inputs with the same output (collision resistance). Those two properties are the entire load-bearing structure of a blockchain: a block “contains” the previous block’s hash, so altering any historical byte changes every subsequent hash and breaks the chain. Identifiers, content-addressing, and commitments all reduce to hashing.

In this lesson we compute digests with two production crates and then build the data structure that lets a block commit to thousands of transactions with a single 32-byte value: the Merkle tree.

We’ll reuse the Hash newtype from the previous lesson:

hash.rs
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct Hash(pub [u8; 32]);

Hashing with sha2 (RustCrypto)

The sha2 crate from the RustCrypto project implements the SHA-2 family. Its API follows the Digest trait: construct a hasher, feed it bytes with update (repeatable, so you can hash a stream without concatenating), then call finalize to consume it and produce the digest.

Cargo.toml
[dependencies]
sha2 = "0.10"
sha256.rs
use sha2::{Sha256, Digest};
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct Hash(pub [u8; 32]);

/// SHA-256 of an arbitrary byte slice.
pub fn sha256(data: &[u8]) -> Hash {
    let mut hasher = Sha256::new();
    hasher.update(data);
    let digest = hasher.finalize(); // GenericArray<u8, U32>
    Hash(digest.into())             // into a plain [u8; 32]
}

finalize returns a GenericArray<u8, U32> — a const-generic-style fixed array. It implements Into<[u8; 32]>, so digest.into() gives us the plain array our Hash newtype wants. The update/finalize split exists so you can hash large or streamed inputs incrementally; for a one-shot, Sha256::digest(data) is the shorthand.

Hashing with blake3

BLAKE3 is a newer hash that is substantially faster than SHA-256, internally tree-structured (so it parallelizes), and still 256-bit. Its one-shot API is a single function call.

Cargo.toml
[dependencies]
blake3 = "1"
blake3_hash.rs
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct Hash(pub [u8; 32]);

pub fn blake3_hash(data: &[u8]) -> Hash {
    let digest = blake3::hash(data);    // returns blake3::Hash
    Hash(*digest.as_bytes())            // as_bytes() -> &[u8; 32]
}

blake3::hash returns a blake3::Hash whose as_bytes() borrows the underlying &[u8; 32], which we copy into our own newtype. For the Merkle code below the choice of function is irrelevant — we depend only on “bytes in, 32 bytes out.” We’ll standardize on a single hash_leaf/hash_nodes interface so the tree is agnostic.

What a Merkle tree is, precisely

A Merkle tree is a binary tree built over a list of leaves. Each leaf is the hash of one item (say, a transaction). Each internal node is the hash of its two children’s hashes concatenated. The single node at the top — the root — is a commitment to the entire ordered list: change any leaf, reorder leaves, or add/remove one, and the root changes.

This gives a block a constant-size summary of an arbitrarily large transaction set, and — the part that makes it powerful — it lets a verifier confirm that a specific item is in the set without seeing the whole set. They need only the item, the root (which they already trust), and a logarithmic-size proof.

Two details matter for correctness:

  1. Domain separation. A second-preimage attack exists if an attacker can present an internal node’s hash as if it were a leaf. The standard defense is to prefix a distinct byte before hashing leaves (0x00) versus internal nodes (0x01), so the two can never collide. We do this.
  2. Odd counts. When a level has an odd number of nodes, we duplicate the last node (hash it with itself). This is simple and widely used; document it, because it has known edge cases an alternative scheme (promoting the lone node unchanged) avoids.

Building the tree

We build level by level from the leaves up, keeping every level so we can later generate proofs.

merkle.rs
use sha2::{Sha256, Digest};
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct Hash(pub [u8; 32]);

/// Leaf hash with a 0x00 domain-separation prefix.
fn hash_leaf(data: &[u8]) -> Hash {
    let mut h = Sha256::new();
    h.update([0x00]);
    h.update(data);
    Hash(h.finalize().into())
}

/// Internal-node hash with a 0x01 prefix over the two child hashes.
fn hash_nodes(left: &Hash, right: &Hash) -> Hash {
    let mut h = Sha256::new();
    h.update([0x01]);
    h.update(left.0);
    h.update(right.0);
    Hash(h.finalize().into())
}

pub struct MerkleTree {
    /// levels[0] = leaves, levels[last] = [root]
    levels: Vec<Vec<Hash>>,
}

impl MerkleTree {
    pub fn build(items: &[Vec<u8>]) -> Option<MerkleTree> {
        if items.is_empty() {
            return None; // an empty tree has no well-defined root
        }
        let leaves: Vec<Hash> = items.iter().map(|d| hash_leaf(d)).collect();
        let mut levels = vec![leaves];

        while levels.last().unwrap().len() > 1 {
            let cur = levels.last().unwrap();
            let mut next = Vec::with_capacity((cur.len() + 1) / 2);
            let mut i = 0;
            while i < cur.len() {
                let left = &cur[i];
                // Duplicate the last node when the count is odd.
                let right = if i + 1 < cur.len() { &cur[i + 1] } else { left };
                next.push(hash_nodes(left, right));
                i += 2;
            }
            levels.push(next);
        }
        Some(MerkleTree { levels })
    }

    pub fn root(&self) -> Hash {
        *self.levels.last().unwrap().last().unwrap()
    }
}

The loop is the whole algorithm: take the current level, pair adjacent nodes (duplicating the tail if odd), hash each pair into the next level, repeat until one node remains.

Generating an inclusion proof

A proof for leaf i is the sequence of sibling hashes you encounter walking from the leaf to the root, plus, at each step, whether the sibling is on the left or the right (you need this to concatenate in the correct order). With those siblings, a verifier can recompute the root.

proof.rs
/// One step: a sibling hash and which side it sits on.
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct Hash(pub [u8; 32]);
pub struct MerkleTree { levels: Vec<Vec<Hash>> }
pub struct ProofStep {
    pub sibling: Hash,
    pub sibling_is_left: bool,
}

impl MerkleTree {
    pub fn proof(&self, mut index: usize) -> Option<Vec<ProofStep>> {
        if index >= self.levels[0].len() {
            return None;
        }
        let mut steps = Vec::new();
        // Walk every level except the root level.
        for level in &self.levels[..self.levels.len() - 1] {
            let sibling_idx = if index % 2 == 0 {
                // Right sibling; if it doesn't exist, the node was paired
                // with itself (odd duplication), so the sibling is itself.
                (index + 1).min(level.len() - 1)
            } else {
                index - 1 // left sibling
            };
            steps.push(ProofStep {
                sibling: level[sibling_idx],
                sibling_is_left: index % 2 == 1,
            });
            index /= 2; // move to the parent's index in the next level
        }
        Some(steps)
    }
}

The index arithmetic mirrors construction: a node at even index pairs with the node to its right, an odd index with the one to its left, and index / 2 is the parent’s position one level up. The odd-duplication case is handled by clamping the right sibling to the last node — consistent with how build reused the lone node.

Verifying a proof

Verification is stateless and needs only the leaf data, the proof, and the trusted root — never the tree. We re-hash the leaf, fold in each sibling on the correct side, and check the result equals the root. This runs in O(log n) and is what a light client does.

verify.rs
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct Hash(pub [u8; 32]);
pub struct ProofStep { pub sibling: Hash, pub sibling_is_left: bool }
fn hash_leaf(_: &[u8]) -> Hash { unimplemented!() }
fn hash_nodes(_: &Hash, _: &Hash) -> Hash { unimplemented!() }
pub fn verify(root: &Hash, leaf_data: &[u8], proof: &[ProofStep]) -> bool {
    let mut acc = hash_leaf(leaf_data);
    for step in proof {
        acc = if step.sibling_is_left {
            hash_nodes(&step.sibling, &acc) // sibling on the left
        } else {
            hash_nodes(&acc, &step.sibling) // sibling on the right
        };
    }
    &acc == root
}

The sibling_is_left flag is load-bearing: hashing (a, b) and (b, a) give different results, so getting the order wrong would reject valid proofs.

End-to-end test

This proves the round trip: build a tree, prove a leaf, verify it against the root, and confirm that tampering with the leaf is rejected.

tests.rs
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn merkle_roundtrip() {
        let items: Vec<Vec<u8>> =
            (0u8..5).map(|i| vec![i, i + 1, i + 2]).collect(); // odd count: 5
        let tree = MerkleTree::build(&items).unwrap();
        let root = tree.root();

        for (i, item) in items.iter().enumerate() {
            let proof = tree.proof(i).unwrap();
            assert!(verify(&root, item, &proof), "leaf {i} should verify");

            // A tampered leaf must NOT verify against the same proof.
            let mut bad = item.clone();
            bad[0] ^= 0xff;
            assert!(!verify(&root, &bad, &proof), "tampered leaf {i} must fail");
        }
    }
}

We deliberately use five leaves to exercise the odd-duplication path on two different levels. Each leaf verifies; each tampered leaf fails.

Takeaways

  • A cryptographic hash gives you preimage and collision resistance; everything a blockchain “links” or “commits to” is built on those guarantees.
  • Use a real crate — sha2 for SHA-256 via the Digest trait, or blake3 for a faster tree-hash — and keep the tree code generic over a plain “bytes → [u8; 32]” interface.
  • A Merkle tree compresses an ordered list of items into one root hash, and lets anyone prove membership with an O(log n) proof and just the trusted root.
  • Apply domain separation (distinct prefixes for leaves vs. internal nodes) to block second-preimage attacks, and document your odd-count rule — these are the two places naive implementations go wrong.
  • Verification must track which side each sibling is on; concatenation order is part of the commitment.

With blocks summarized by a Merkle root, the next lesson assembles headers and blocks into a validated chain.

Sources