arena-allocator

A bump arena memory allocator in C11.

Week 8 project · 14 June 2026 · 4 min read

A bump arena is the simplest correct memory allocator there is. The whole design fits on an index card: one contiguous region of memory, one pointer that walks forward through it, and one operation — arena_reset — that walks the pointer back to the start.

Why a bump arena

Most allocation workloads have a property that malloc ignores: a clear phase boundary. A request handler allocates things for the duration of one HTTP request, then throws it all away. A compiler allocates things for one translation unit, then throws it all away. A parser allocates things for one translation, then throws it all away.

A bump arena is built for that pattern. Every allocation is the same shape: bump the pointer, return the old position. The allocator never has to find free space, never has to coalesce, never has to track per-allocation metadata. The cost of arena_alloc is roughly four instructions: an align up, a bounds check, the pointer bump, and a return.

The benchmark below shows what that buys you.

The API

Six functions cover the whole surface. The public header is short enough to read in one sitting:

typedef struct arena arena;

arena *arena_create  (size_t capacity);
void  *arena_alloc    (arena *a, size_t size);
void   arena_reset    (arena *a);
void   arena_destroy  (arena *a);

size_t arena_in_use   (const arena *a);
size_t arena_high_wat (const arena *a);
size_t arena_cap      (const arena *a);

Note the deliberate asymmetry: there is no arena_free. Individual allocations cannot be released. The whole arena is freed at once by arena_destroy, or rewound to empty by arena_reset. This is the price of the speed. It is also the design.

Try it

The playground below is a JavaScript port of the same API. The horizontal bar is the mmap'd region. The yellow tick is the bump pointer. Click an alloc button to move the pointer; click reset to send it home.

Playground

0 / 65536 bytes

in_use: 0 high_wat: 0 capacity: 65536

How it works

The arena is a single mmap'd region sized to capacity bytes, rounded up to the page size. The first few bytes hold a small arena struct that records the base pointer, the current offset, the high-water mark, and the total capacity. Everything after the struct is the bump region.

arena_alloc aligns the requested size up to 16 bytes, checks that the bump pointer plus the aligned size still fits, then advances the pointer. The pointer to the old position is returned. That's the whole allocator.

arena_reset is a single pointer write back to the size of the struct. The cost is one memory access. This is what makes the design useful for phases: the per-allocation cost is zero, the per-phase cost is one pointer write.

Benchmarks

Apple M-series, single thread, -O2:

Operation	Rate
arena_alloc (1×int)	~450 M ops/s
arena_reset	~0.5 ns (one pointer write)
malloc / free (1×int)	~75 M ops/s
arena vs malloc	~5–7x faster
1k alloc + 1k reset cycle	~450 M ops/s

The speedup is real, not synthetic. It is the cost of all the bookkeeping that malloc has to do to support per-allocation free: thread-local caches, size class buckets, free lists, coalescing. The bump arena does none of that, and the result is that the hot path is roughly four instructions: align up, bounds check, pointer bump, return.

When to use a bump arena

Use it for any workload with a clear phase: build a tree of allocations, use it, throw it all away, repeat. Compilers, parsers, request handlers, frame allocators in a renderer, simulation loops — all fit the pattern. The cost of memory management per phase collapses from N frees to one reset.

Do not use it when individual allocations have lifetimes that don't share a common destruction point. That is what malloc is for. The arena does not replace the general allocator — it replaces the patterns where the general allocator is the wrong tool.

Tests

143 assertions across 9 test cases, all passing. The suite covers: null safety, basic alloc, alignment, OOM behaviour, high-watermark tracking across resets, reset semantics, multi- region allocs, alignment edge cases, and a randomized fuzz test that runs 10k random alloc/reset cycles and reconciles against a reference Python implementation.

Source

The repository lives at github.com/404Piyush/arena-allocator. The public header, the implementation, and the test suite are the three files worth reading. The header is the contract. The implementation is about 100 lines. The test suite is about 250 lines and reads like a spec.

To build and test locally:

git clone https://github.com/404Piyush/arena-allocator
cd arena-allocator
make test
make bench

One of the gpu-engineering curriculum, a three-year systems and hardware roadmap. More projects incoming.