Implementing a mempool

Why a mempool exists

Between “a transaction is submitted” and “a transaction is in a block” there’s a waiting room. That waiting room is the mempool: an in-memory set of transactions that have passed cheap validation but haven’t yet been included in a block. A block producer reaches into it, picks the most valuable transactions that fit, and seals them into the next block.

A mempool has four jobs, and we’ll implement each:

Validated insert — reject obviously bad transactions on the way in, not at block-build time.
Dedup — never store the same transaction twice, and never store two transactions that conflict (same sender + same nonce).
Ordering — surface the highest-fee transactions first, because that’s what a rational producer includes.
Eviction — when the pool hits its cap, drop the least valuable transaction to make room.

We’ll use std only, plus a tiny error enum. The model transaction below is deliberately minimal: a sender, a nonce (the per-sender sequence number that prevents replay), a fee, and a precomputed hash that uniquely identifies it.

mempool.rs

use std::collections::HashMap;

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct Transaction {
    pub hash: [u8; 32],      // unique id (from the hashing lesson)
    pub sender: [u8; 20],    // account address
    pub nonce: u64,          // per-sender sequence number
    pub fee: u64,            // total fee offered; higher = more attractive
    pub payload: Vec<u8>,    // opaque body
}

#[derive(Debug, PartialEq, Eq)]
pub enum InsertError {
    Duplicate,        // same hash already pooled
    NonceConflict,    // same (sender, nonce) already pooled with >= fee
    ZeroFee,          // we refuse free transactions in this pool
    TooLarge,         // payload exceeds the per-tx limit
}

The pool’s indexes

The naive mempool is a Vec<Transaction> you scan linearly. That’s O(n) for every dedup check and every conflict check, which collapses under load. Instead we maintain two indexes that make the hot-path checks O(1):

by_hash: HashMap<[u8; 32], Transaction> — the source of truth and the dedup index.
by_sender_nonce: HashMap<([u8; 20], u64), [u8; 32]> — maps a (sender, nonce) pair to the hash of the tx currently occupying that slot, so we can detect and replace conflicts.

Keeping these two maps in sync is the core invariant of the whole type. Every mutation touches both.

mempool.rs

pub struct Mempool {
    by_hash: HashMap<[u8; 32], Transaction>,
    by_sender_nonce: HashMap<([u8; 20], u64), [u8; 32]>,
    max_txs: usize,
    max_payload: usize,
}

impl Mempool {
    pub fn new(max_txs: usize, max_payload: usize) -> Self {
        Mempool {
            by_hash: HashMap::new(),
            by_sender_nonce: HashMap::new(),
            max_txs,
            max_payload,
        }
    }

    pub fn len(&self) -> usize {
        self.by_hash.len()
    }

    pub fn is_empty(&self) -> bool {
        self.by_hash.is_empty()
    }
}

Insert with validation and replace-by-fee

insert is where the rules live. It runs cheap validation first, then handles the (sender, nonce) conflict. The standard policy — borrowed from real chains — is replace-by-fee: if a new transaction occupies the same (sender, nonce) slot as an existing one, it only replaces it if it offers a strictly higher fee. Otherwise we reject the newcomer. This lets a sender bump a stuck transaction without letting an attacker churn the slot for free.

After a successful insert, if the pool is over capacity we evict — but we never evict the transaction we just accepted unless it’s genuinely the worst one. We handle that by inserting first, then evicting the global minimum.

mempool.rs

impl Mempool {
    pub fn insert(&mut self, tx: Transaction) -> Result<(), InsertError> {
        if tx.fee == 0 {
            return Err(InsertError::ZeroFee);
        }
        if tx.payload.len() > self.max_payload {
            return Err(InsertError::TooLarge);
        }
        if self.by_hash.contains_key(&tx.hash) {
            return Err(InsertError::Duplicate);
        }

        let slot = (tx.sender, tx.nonce);
        if let Some(existing_hash) = self.by_sender_nonce.get(&slot) {
            let existing = &self.by_hash[existing_hash];
            if tx.fee <= existing.fee {
                return Err(InsertError::NonceConflict);
            }
            // Replace-by-fee: evict the incumbent for this slot.
            let evicted = *existing_hash;
            self.by_hash.remove(&evicted);
        }

        self.by_sender_nonce.insert(slot, tx.hash);
        self.by_hash.insert(tx.hash, tx);

        // Enforce the global cap.
        if self.by_hash.len() > self.max_txs {
            self.evict_lowest_fee();
        }
        Ok(())
    }

    /// Drop the single lowest-fee transaction, keeping both indexes consistent.
    fn evict_lowest_fee(&mut self) {
        if let Some((&victim_hash, victim)) =
            self.by_hash.iter().min_by_key(|(_, tx)| tx.fee)
        {
            let slot = (victim.sender, victim.nonce);
            self.by_hash.remove(&victim_hash);
            self.by_sender_nonce.remove(&slot);
        }
    }
}

Notice evict_lowest_fee removes from both maps using the victim’s own (sender, nonce) — that’s the sync invariant in action. The replace-by-fee branch also removes only from by_hash because the line right after overwrites the by_sender_nonce slot with the new hash, so the old entry can’t leak.

Selecting transactions for a block

The producer wants the highest-fee transactions that fit in a block. The right tool is a max-heap keyed by fee. Rust’s BinaryHeap is a max-heap by default, so popping yields the largest element first — exactly the order we want. We build a heap of (fee, hash) and drain up to limit items.

We pull (fee, hash) rather than whole transactions into the heap to keep comparisons cheap and avoid cloning bodies until we’ve decided to include them. A subtle point: BinaryHeap orders by the tuple, so ties on fee fall back to comparing hash, giving a fully deterministic order — important if you want every node to build identical blocks from identical pools.

mempool.rs

use std::collections::BinaryHeap;

impl Mempool {
    /// Return up to `limit` transactions, highest fee first.
    /// Does not remove them from the pool — that happens when the block
    /// is finalized and you call `remove_included`.
    pub fn select_for_block(&self, limit: usize) -> Vec<Transaction> {
        // (fee, hash) tuples; BinaryHeap is a max-heap so highest fee pops first.
        let mut heap: BinaryHeap<(u64, [u8; 32])> = self
            .by_hash
            .values()
            .map(|tx| (tx.fee, tx.hash))
            .collect();

        let mut chosen = Vec::with_capacity(limit.min(heap.len()));
        while chosen.len() < limit {
            match heap.pop() {
                Some((_, hash)) => chosen.push(self.by_hash[&hash].clone()),
                None => break,
            }
        }
        chosen
    }

    /// Called after a block is committed: drop everything it included.
    pub fn remove_included(&mut self, txs: &[Transaction]) {
        for tx in txs {
            if self.by_hash.remove(&tx.hash).is_some() {
                self.by_sender_nonce.remove(&(tx.sender, tx.nonce));
            }
        }
    }
}

If you instead needed the lowest-fee item efficiently (for incremental eviction on a hot path), you’d wrap the key in std::cmp::Reverse to turn the max-heap into a min-heap: BinaryHeap<Reverse<(u64, [u8; 32])>>. For this lesson the linear min_by_key eviction is fine because eviction is rare relative to inserts.

Proving it works

This test exercises every path: a validated insert, fee rejection, exact dedup, replace-by-fee, fee-ordered selection, and post-commit removal.

mempool.rs

#[cfg(test)]
mod tests {
    use super::*;

    fn tx(id: u8, sender: u8, nonce: u64, fee: u64) -> Transaction {
        Transaction {
            hash: [id; 32],
            sender: [sender; 20],
            nonce,
            fee,
            payload: vec![],
        }
    }

    #[test]
    fn insert_dedup_order_and_evict() {
        let mut pool = Mempool::new(10, 1024);

        assert_eq!(pool.insert(tx(1, 0xAA, 0, 5)), Ok(()));
        assert_eq!(pool.insert(tx(2, 0xBB, 0, 50)), Ok(()));
        assert_eq!(pool.insert(tx(3, 0xCC, 0, 0)), Err(InsertError::ZeroFee));
        assert_eq!(pool.insert(tx(1, 0xAA, 0, 5)), Err(InsertError::Duplicate));

        // Same (sender, nonce) as tx 1 but lower fee => rejected.
        assert_eq!(pool.insert(tx(9, 0xAA, 0, 3)), Err(InsertError::NonceConflict));
        // Higher fee => replace-by-fee succeeds, pool size unchanged.
        assert_eq!(pool.insert(tx(9, 0xAA, 0, 9)), Ok(()));
        assert_eq!(pool.len(), 2);

        // Highest fee first: tx 2 (fee 50) before tx 9 (fee 9).
        let block = pool.select_for_block(10);
        assert_eq!(block[0].fee, 50);
        assert_eq!(block[1].fee, 9);

        pool.remove_included(&block);
        assert!(pool.is_empty());
    }
}

Run cargo test. The zero-fee and duplicate transactions bounce, the low-fee nonce conflict is rejected while the high-fee one replaces the incumbent, selection comes out fee-ordered, and removal empties the pool — with both indexes staying consistent throughout.

A real node has many producers and consumers touching the mempool at once: the RPC layer inserts incoming transactions while the block-production loop selects and removes them. A bare Mempool isn’t safe to mutate from multiple threads, so you wrap it. Because reads (selection) and writes (insert/remove) are both common, an RwLock is a good default — many readers can select concurrently, and writers get exclusive access. Share it with Arc:

shared.rs

use std::sync::{Arc, RwLock};

pub type SharedMempool = Arc<RwLock<Mempool>>;

pub fn shared(max_txs: usize, max_payload: usize) -> SharedMempool {
    Arc::new(RwLock::new(Mempool::new(max_txs, max_payload)))
}

pub fn submit(pool: &SharedMempool, tx: Transaction) -> Result<(), InsertError> {
    // Acquire the write lock only for the duration of the insert.
    pool.write().expect("mempool lock poisoned").insert(tx)
}

pub fn build_block(pool: &SharedMempool, limit: usize) -> Vec<Transaction> {
    pool.read().expect("mempool lock poisoned").select_for_block(limit)
}

Keep the locked sections short — grab the guard, do the operation, drop it — so you never hold the lock across I/O or block production. The Rust Book’s shared-state concurrency chapter covers Arc<RwLock<T>> and lock poisoning in depth; under heavy contention you’d reach for a sharded pool or a lock-free structure, but Arc<RwLock<Mempool>> is the correct, simple starting point.

With a validated, ordered, capacity-bounded mempool feeding the block producer from the previous lesson, you now have the full submit → pool → select → seal pipeline. Next we decide who gets to seal a block: proof of stake from scratch.

Why a mempool exists

The pool’s indexes

Insert with validation and replace-by-fee

Selecting transactions for a block

Proving it works

Sharing the pool across threads

Sources