Phyllotaxis and Fibonacci

mpool

An overhead free aligned memory allocator

(Well, almost overhead free)
 
  
I designed this memory allocator for a Linux device driver I wrote 20 years ago.  It has the property that if you ask for 64 bytes you get a 64 byte aligned block; if you ask for 128, you get a 128 byte aligned block; and so on.  The other nice thing about it is that it automatically defragments, which is to say that if, for example, a 128 byte aligned 128 byte block is free then it can be allocated as a 128 byte block, even if it one or both halves have previously been allocated as 64 byte blocks and later freed.

The key to the design is the data structure.  Each free block has a 64 byte header consisting of 16 pointers (I wrote this for a 32-bit architecture, although it could easily be ported).  For this reason 64 bytes is the smallest block size that can be allocated.  The pointers are all either NULL or point to another free block of the same size, forming a 16-ary tree of equal-sized free blocks.  At the base of the tree is a single pointer, and so the only overhead is an array of pointers, one for each block size: 64, 128, 256, 512, and so on.

When a block is freed mpool_free searches down the relevant tree for a partner that could be combined with it to form a block of the next size up.  For example, if a previously allocated 64 byte block at 0x0001ffc0 were freed and 0x0001ff80 was already in the tree of free blocks, the two blocks could be combined to form a free block of 128 bytes.  In general if the top 32-7 bits match then two 64 byte blocks can be combined.  Likewise, for 128 byte blocks, if the top 32-8 bits match, they can be combined into a 256 byte block.  The part of the address used for matching is called the match field.

Freeing a block involves recursing down the tree.  The first 4 bits of match are used as an index into the first block, the next 4 bits are used as an index into the next block and so on, until a leaf block is found.  If no matching block was found in that process the block is added as a new leaf by setting a pointer in the header of the current leaf.  If alternatively a matching block was found then this is removed from the tree and the two blocks are freed as a single block of twice the size.  (But if the matching block wasn't a leaf it must be replaced by one of the leaves under it!)

When an aligned memory allocation is requested mpool_alloc simply frees the first block in the corresponding tree, unless that tree is empty in which case it just tries the next block size up.  If there is unneeded memory at the end of the block, mpool_alloc frees that before returning:
 
    // Now release the parts of the block not needed
    free_sz = (64<<idx) - sz;   // The amount not required by caller
    free_addr = retval + sz;    // The start address of the memory not required
    
    for (; sz <= free_sz; sz <<= 1)
    {
        mpool_free(mp, free_addr, sz);
        free_addr += sz;
        free_sz -= sz;
    }

This is similar to how the entire structure is initialized in the first place by mpool_init.   This function simply breaks the memory space provided into aligned blocks and frees them sequentially:

    // Round up addr to 64 bytes
    waste = (((unsigned long) addr + 63) & ~63) - (unsigned long) addr;
    addr += waste;
    sz -= waste;

    // Free the blocks to make them available for allocation later
    for (i = 64; i <= sz; i <<= 1)
    {
        if ((unsigned long) addr & i)
        {
            mpool_free(mp, addr, i);
            addr += i;
            sz -= i;
        }
    }

    i >>= 1;

    for (; i >= 64; i >>=1)
    {
        if (sz >= i)
        {
            mpool_free(mp, addr, i);
            addr += i;
            sz -= i;
        }
    }

    // keep track of wasted bytes
    waste += sz;

The resulting memory allocator is extremely low overhead, guarantees memory aligned allocations, and $\mathcal{O}(\log{n})$ operations, where $n$ is the number of address bits.

POSTSCRIPT

There are a couple of potential modifications that could be made to the current code.  The existing implementation rounds up requested block sizes, for example if 1080 bytes is requested this is rounded up to 2048, wasting 968 bytes.  This could be made more memory efficient by adding a dispatch layer above mpool_alloc to free unneeded parts at the end.  In this case 64 bytes at 1088 would be freed, 128 bytes at 1152, and so on.  A matching dispatch layer would be required above mpool_free.  Also, since mpool requires the user to keep track of the buffer length it cannot be used in place of malloc/free.  There are a number of potential solutions to this but all come at the price of introducing a memory overhead.
 
 

Comments