Rust: Sorted Keys In BTreeMap

Created 2025-09-12

BTreeMap in rust’s equivalent of C++’s std::map, the former representing a self balancing tree data structure while the latter uses self-balancing binary search tree implementation (in the form of red-black tree).

Ordering Limitation

BTreeMap only supports ascending order by default and does not provide a built-in descending order, unlike C++.

std::map<int, std::string> asc;  // ascending
std::map<int, std::string, std::greater<int>> desc;  // descending

That said, it is possible to obtain a reverse iterator that can support descending order. Let’s take this example.

let mut book_side = BTreeMap::<u32, u32>::new();
book_side.insert(1, 10);
book_side.insert(2, 20);
book_side.insert(3, 30);

let fwd_iter = book_side.iter(); //Iter<u32, u32>
let rev_iter = book_side.iter().rev(); // Rev<Iter<u32,u32>>

However, when you want to generalise and intend to obtain the iterator into a single variable depending on some condition, limitations arise. The following causes a compile error, as they are of different static types.

let iter = match descending {
    true => book_side.iter(),
    false => book_side.iter().rev(),
};

When building order books, you want an easy way to get the opposite side based on the side on which the current matching order is executing. There are a few different approaches to accomplish that.

Dynamimc dispatch with Box<dyn ..> i.e traits

The types returned by iter() and iter().rev() both implement the Iterator trait. You can Box the iterator into Box<dyn Iterator<Item = (&u32, &u32)>> to collect it.

let opposite_side: Box<dyn Iterator<Item = (&u32, &u32)>> = match descending {
    true => Box::new(book_side.iter()),
    false => Box::new(book_side.iter().rev()),
};

In most scenarios where performance is not a priority, this works just fine. However, boxing causes both dynamic dispatch for every iteration and an additional allocation during this assignment, which is not ideal.

Sorted Keys

When performance is critical, another approach is to use a composite key type instead of a bare type, where the ordering information is encoded into the key itself.

pub enum Sort {
    ASC,
    DESC,
}

pub struct SortedKey {
    pub price: u32, 
    pub soort: Sort,
}

Combined with custom implementation for Ord, PartialOrd and other traits, you can force BTreeMap to layout the keys in a different order than what the type itself prefers.

impl PartialOrd for SortedKey {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        Some(self.cmp(other))
    }
}

impl Ord for SortedKey {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
        if self.sort == Sort::ASC {
            self.price.cmp(&other.price)
        } else {
            other.price.cmp(&self.price)
        }
    }
}

This allows operating on the same type of iterators for instances that represent different sorted keys, which is very convenient when traversing different price levels.

There are some tradeoffs with this approach. - Padding & Alignment changes in contrast to a bare type and require more space. Writes will invoke the custom Ord/PartialOrd implementations. - However, during reads, there are no more allocations, uses static dispatch and is as performant as the native type itself.

This allows us to do something like this:

let asks = BTreeMap::<SortedKey,PriceLevel>::new()
let bids = BTreeMap::<SortedKey,PriceLevel>::new()

asks.insert( SortedKey::new(1, Sort::ASC), 10 );
bids.insert( SortedKey::new(1, Sort::DESC), 10 );


let opposite_side = match side {
    Side::Buy => &self.asks,
    Side::Sell => &self.bids,
};

// let opposite_side = &mut opposite_side.borrow_mut();
// let trades = opposite_side.match_order(order.clone());

Using Macros

Macros are another approach to ensure the code works for both ascending & descending branches of code. However, while it keeps the written code to a similar amount, the actual generated code is not.

In a performance-critical area such as order matching, this doubles the machine code generated for the matching logic, which exacerbates instruction cache misses and the cost of branch mispredictions. IMO, it is a worse approach than the other two we just discussed.

In a future post, I will compile some performance benchmarks for various approaches.