Programming Gems: The Arena

AuthorKevin Nygaard
Published

The arena is a fast and simple memory allocator that avoids memory micromanagement by freeing allocations in bulk. This deceptively powerful data structure is easy to understand, easy to implement, and easy to integrate into existing systems.

Contents

1 The Arena

The arena, also known as a zone or a linear allocator, is a fast and simple memory allocator and memory management data structure. Allocations incrementally slice off pieces of a large contiguous chunk of memory under the arena's control.1 They remain active until the arena is reset, which deallocates all allocations simultaneously, avoiding tedious micromanagement of individual allocations, or resorting to slow, complicated, and error-prone garbage collectors. Below is an example implementation:

#include <assert.h>
#include <stddef.h>
#include <stdlib.h>
typedef struct { void *Data; size_t Used, Size; } arena;
void *Alloc(arena *Arena, size_t Size) {
    assert(Arena);
    if (!Arena->Data) {
        if (!Arena->Size) { Arena->Size = 8 * 1024; }
        Arena->Data = malloc(Arena->Size);
    }
    assert(Arena->Data);
    assert(Arena->Used + Size <= Arena->Size);
    void *Ret = (char *)Arena->Data + Arena->Used;
    Arena->Used += Size;
    return Ret;
}

The arena structure contains a pointer to chunk of memory, called Data, and two counters describing it: Used records the number of allocated bytes, and Size records the overall size of the memory. The first half of Alloc() verifies or performs arena initialization, and the second half returns a pointer to and accounts for the new allocation (should the arena have enough free space). The program raises an assertion on over-allocation, and setting Used to 0 immediately frees all allocations. Arenas never free their backing memory – they reuse it, and so avoid unnecessary interactions with the operating system, increasing performance.2

This design auto-initializes itself and integrates cleanly with existing systems. Eschewing explicit constructors, zero-initialized arenas are valid and work "out of the box." The implementation allocates memory by malloc(3) using a default size, or if non-zero, the Size specified in the structure. Specifying Data bypasses auto-initialization, allowing memory to be pre-allocated or derived from some other source, such as static allocations, other memory managers, or even inside other data structures.

Removing the auto-initialization and level of indirection improves performance, but forms an unfriendly and inflexible interface. Such features allow the arena to "just work" with minimal fuss, but also give the user absolute control over the memory.3

2 Alternate Out of Memory Behaviors

In this implementation, running out of memory raises an assertion. Efficient programs prudently budget their memory, and an assertion alerts programmers to incorrect approximations or inefficient memory usages.4

Oversized auto-initializations also raise assertions. If an allocation is greater than the default or specified arena Size, and the arena is uninitialized, the implementation asserts instead of adjusting the arena size. This prevents unintentional creation of large arenas holding only a single allocation.

However, there are other strategies for handling out of memory scenarios, including reallocation and chained allocation – the "best" method depends on the application.

2.1 Reallocation

With a reallocation strategy, out of memory arenas increase their size with realloc(3). Leaning on the external allocator, they grow in size indefinitely until the operating system runs out of memory; otherwise, the theory of operation and implementation are identical to the assert strategy.

However, realloc() invalidates all pointers to allocations within the arena – reallocation allocates a larger block at a different address, copies the original data, and frees the old block. This relocation is harmless to offset-centric code, but a dangerous operation to pointer-heavy code. The pointers must be individually tracked and updated, or shielded from the instability by a layer of indirection. Both solutions are costly and error-prone. However, the next strategy expands an arena while keeping the pointers stable.

2.2 Chain Allocation

With a chain allocation strategy, out of memory arenas increase their size by allocating a new block and adding it to their list of active blocks for future reclamation. Thus, like the reallocation policy, they grow in size indefinitely until the operating system runs out of memory. But unlike the reallocation strategy, pointers remain stable: allocations within old blocks are still valid, and only new allocations use the new blocks.

However, resetting the arena is more complicated. The previous strategies directly set Used to 0 because they only managed a single block. Now managing many blocks, the arena must move the "in use" blocks to a "ready to use" list to prevent memory leaks. This costs additional time, space, and complexity.

Future allocations search this list before allocating new blocks. If all blocks are identically sized, the search is simple: just take the first one.5 If the blocks are mixed sizes, the search is more nuanced: the block must be large enough for the allocation, and ideally, the exact size. However, searching for an exact sized block takes time. Different search strategies (first fit, best fit, etc) trade off between speed and memory utilization. Prudence is advised, as such problems start leaning into the realm of general purpose memory allocators, bloating and undermining the fast and simple arena.

3 Usage Techniques

Periodic, allocation-heavy workflows benefit from the arena's fast allocation and deallocation features. For example, consider an HTML generator utility which parses an input document into a tree, and prints the output as HTML. Freeing a tree under traditional memory management recursively walks the structure, freeing each individual node – a pedantic and laborious process, especially if the memory will soon be reused to build a new tree. Freeing a tree in an arena is a simple reset, skipping the lengthy tree walk and immediately reclaiming all allocated memory. If parsing multiple documents individually, the same memory is reused for each individual document, and the overall memory usage never exceeds the largest tree size.

However, unlike traditional memory management, arbitrary allocations cannot be freed. If an early allocation is no longer needed, its memory is wasted until an arena reset. Well organized allocations counter this limitation with finer-grained resets, either by treating the arena as a stack, or by distributing allocations between multiple arenas.

3.1 Partial Resets and Stack Allocators

In general, an arena reset is "all or nothing." However, when viewed as a stack, arenas are partially resettable. Allocations are placed on top of one another, and Used describes the "top of stack." Setting Used to a previous watermark level, rather than 0, "pops" the stack to this previous state:

size_t Watermark = Arena->Used;
...
Alloc(...);
...
Arena->Used = Watermark;

For example, consider a parser that uses temporary memory while processing a single document. It may parse equations from the document, constructing a temporary expression tree and reducing it into a single node in the final output tree. The intermediate tree is no longer needed after evaluating the expression, but it is not reclaimed until the end of the document when the arena is reset.

By treating the arena as a stack, saving and restoring the arena's watermark around the offending code frees these temporary allocations, allowing their memory to be repurposed before the main arena reset.

This is also the key idea behind stack allocators, which store stack restoration information either in separate memory, as shown above, or within the memory itself as activation records or stack frames.

In this light, an arena is also a "long lived" stack. Normal function variables are dynamically allocated on the call stack, and are only valid during function execution. Completing the function restores the stack to its prior state, invalidating the allocated variables. As such, a function cannot return pointers to its local variables. However, a function can allocate variables in an arena, using it as secondary stack, and validly pass their pointers to the caller. By inverting the standard call convention, the caller can inspect and use these variables after the function has returned; however, the caller is now responsible for cleaning up this stack.

If allocations are not stack friendly, another technique enables finer-grained resets: using multiple arenas.

3.2 Multiple Arenas

Wasted space accumulates in an arena when "dead" allocations cannot be freed because "live" ones are preventing a reset. The obvious solution is don't mix dead and live allocations together, put them into separate arenas. This technique effectively groups allocations by temperature: hot data is frequently accessed and short-lived, and cold data is infrequently accessed and long-lived.6

For example, the document parser may generate a global table of contents containing the headings from each document. The generated tree contains the headings, but resetting the arena after the document is processed discards this information. Preventing the reset preserves the headings, but wastes the majority of memory on a now unused tree. Instead, allocate the headings using a separate arena which is preserved across documents. The primary arena is safely reset for the next document, and the secondary arena safely stores the headings for later collation.

Allocations with unknown temperatures can be copied and reallocated into cold arenas at runtime, assuming the allocations are relatively few and small. Constantly recopying large chunks of memory points to a design issue, which is either poor access patterns, poor data layout, or both. Also, data is always assumed hot until proven otherwise. A cold allocation prevents an entire hot arena from being reset, but a hot allocation in a cold arena only wastes a negligible amount of space (by definition, cold data dominates a cold arena's memory usage). Therefore, picking mislabeled cold data out of a hot arena is the wisest approach.

4 Closing Thoughts

The arena is a simple yet powerful data structure for managing memory. Coupled with prudent design, the arena eliminates entire classes of problems, resulting in fast, efficient, and easy to maintain programs. The easiest problem to solve is the one you don't have.7

I hope you enjoyed this short exposition on memory arenas and gained something useful. Feedback and corrections are welcome via email. ✚

See Also

Footnotes

  1. This is unsurprising to Real Programmers (like Ed Post), since the only useful data structure is the array.
  2. System calls are expensive and should be avoided when possible; communicating with the kernel has non-negligible overheads. (Dynamic library calls are also expensive – not as expensive as system calls, but still worth mentioning. Because they are linked at runtime, the compiler cannot optimize across library calls. They are also slower than normal functions since they bounce through trampolines.) Furthermore, freeing memory is a waste of time (except for long running applications with infrequent, temporary, very large allocations). The operating system reclaims all memory automatically when a program terminates, so if a program exits shortly after using an allocation, explicitly freeing it is unnecessary and wasteful. If a program doesn't exit after using an allocation, but will request more memory soon, just reuse that same allocation and ask for more if you need to. The allocation you have is already in cache, and free()ing it causes the allocator to perform bookkeeping tasks, like recombining neighboring free blocks, which will soon be disturbed.
  3. The application ultimately decides the best interface for any data structure. If it doesn't need or use certain features, remove them. Don't keep features because they were used in the past, might be used in the future, or suggested by someone. Truly remove them from the source code, don't just comment them out. Overeager excising is reversible with version control – slogging through unused and irrelevant lines of code is not.
  4. If you don't know how much memory your program will use, do a quick back-of-the-envelope calculation. If you can't, give yourself 4 kB and revisit the problem later.
  5. If the arena only holds one type of object, it is very similar to an object pool.
  6. This terminology also describes a piece of data's expected location within a system. Frequently accessed data moves to faster media, like caches and registers, whereas rarely accessed data moves to slower media, like RAM and disk.
  7. This is an adaptation of an old optimization adage, and also a key philosophy behind Forth programming. Closely related is the well known quote attributed to Jamie Zawinski: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

Further Resources