arena-allocator
A bump arena memory allocator in C11.
A bump arena is the simplest correct memory allocator there is.
The whole design fits on an index card: one contiguous region
of memory, one pointer that walks forward through it, and
one operation — arena_reset — that walks the
pointer back to the start.
Why a bump arena
Most allocation workloads have a property that malloc
ignores: a clear phase boundary. A request handler allocates
things for the duration of one HTTP request, then throws it all
away. A compiler allocates things for one translation unit, then
throws it all away. A parser allocates things for one
translation, then throws it all away.
A bump arena is built for that pattern. Every allocation is the
same shape: bump the pointer, return the old position. The
allocator never has to find free space, never has to coalesce,
never has to track per-allocation metadata. The cost of
arena_alloc is roughly four instructions: an align
up, a bounds check, the pointer bump, and a return.
The benchmark below shows what that buys you.
The API
Six functions cover the whole surface. The public header is short enough to read in one sitting:
typedef struct arena arena;
arena *arena_create (size_t capacity);
void *arena_alloc (arena *a, size_t size);
void arena_reset (arena *a);
void arena_destroy (arena *a);
size_t arena_in_use (const arena *a);
size_t arena_high_wat (const arena *a);
size_t arena_cap (const arena *a);
Note the deliberate asymmetry: there is no arena_free.
Individual allocations cannot be released. The whole arena is
freed at once by arena_destroy, or rewound to empty
by arena_reset. This is the price of the speed.
It is also the design.
Try it
The playground below is a JavaScript port of the same API. The horizontal bar is the mmap'd region. The yellow tick is the bump pointer. Click an alloc button to move the pointer; click reset to send it home.
Playground
How it works
The arena is a single mmap'd region sized to capacity
bytes, rounded up to the page size. The first few bytes hold a
small arena struct that records the base pointer,
the current offset, the high-water mark, and the total capacity.
Everything after the struct is the bump region.
arena_alloc aligns the requested size up to 16
bytes, checks that the bump pointer plus the aligned size still
fits, then advances the pointer. The pointer to the old position
is returned. That's the whole allocator.
arena_reset is a single pointer write back to the
size of the struct. The cost is one memory access. This is what
makes the design useful for phases: the per-allocation cost is
zero, the per-phase cost is one pointer write.
Benchmarks
Apple M-series, single thread, -O2:
| Operation | Rate |
|---|---|
| arena_alloc (1×int) | ~455 M ops/s |
| malloc / free (1×int) | ~33 M ops/s |
| arena vs malloc | ~14x faster |
| arena_reset | ~0.5 ns |
| 1k alloc + 1k reset cycle | ~480 M ops/s |
The 14x number is real, not synthetic. It is the cost of all the
bookkeeping that malloc has to do to support
per-allocation free: thread-local caches, size
class buckets, free lists, coalescing. The bump arena does none
of that, and the result is that the hot path is the size of the
function prologue.
When to use a bump arena
Use it for any workload with a clear phase: build a tree of allocations, use it, throw it all away, repeat. Compilers, parsers, request handlers, frame allocators in a renderer, simulation loops — all fit the pattern. The cost of memory management per phase collapses from N frees to one reset.
Do not use it when individual allocations have lifetimes that
don't share a common destruction point. That is what
malloc is for. The arena does not replace the
general allocator — it replaces the patterns where the general
allocator is the wrong tool.
Tests
143 assertions across 9 test cases, all passing. The suite covers: null safety, basic alloc, alignment, OOM behaviour, high-watermark tracking across resets, reset semantics, multi- region allocs, alignment edge cases, and a randomized fuzz test that runs 10k random alloc/reset cycles and reconciles against a reference Python implementation.
Source
The repository lives at github.com/404Piyush/arena-allocator. The public header, the implementation, and the test suite are the three files worth reading. The header is the contract. The implementation is about 100 lines. The test suite is about 250 lines and reads like a spec.
To build and test locally:
git clone https://github.com/404Piyush/arena-allocator
cd arena-allocator
make test
make bench
One of the gpu-engineering curriculum, a three-year systems and hardware roadmap. More projects incoming.