Starkiller
finding a path through the programming world
Posted on 03 May 2013

Redis bills itself as a data structure server directly on its homepage; indeed, the project provides a myriad of data structures accessible in a key-value interface. In fact, Redis provides many key components that one might desire in a remote data structure server: a fast asynchronous networking layer, a reference-counting based memory management layer, multiple built-in persistence strategies, and immediate client support for new commands (for some clients, anyway).

However, what Redis lacks is an easy way to add new data structures to the core. Salvatore has explained why he is against a plugin system, and it seems as if such a system will not make its way to Redis. This means we'll have to add a new data structure directly to the Redis core.

There are five areas that we'll address as we add a custom datatype:

For this exercise, I am working off of the 2.6.13 codeline. A working example of the code in this exercise can be found on GitHub.

Interval Trees

A quick interlude about the datatype we are going to be adding...

A coworker (Bill Mill) and I decided to extend Redis with a custom datatype last year when attempting to determine the set of globally announced CIDR addresses a given IPv4 address falls in. This type of problem is a traditional stabbing problem, one that is commonly solved by interval trees.

To implement interval trees, I decided to build an augmented AVL tree as described in the wonderful CLRS. In such a tree, the nodes are sorted by the minimum value of the range the node represents. Each node carries an additional attribute that represents the largest maximum range value of all of the node's subtrees. With this information, stabbing queries can easily be written that perform at O(log(n)) complexity.

We represented interval trees in Redis as a "Interval Set" datatype; in such a set, a value is added to the set along with a range. The operations added to Redis act as a somewhat inverse to sorted sets; whereas with sorted sets you can ask for all members and their respective scores that fall within a range, with interval sets, you can ask for all members that have ranges that include a given score.

We can model intervals using standard Redis types and achieve functionality similar to what we will be adding here, but at the cost of two to three orders of magnitude in performance.

Setting the Stage: Declaring the Custom Type

So let's get started!

We can first add a couple of constants that will be used throughout the process to identify our interval set and its internal encoding. We can add the following to the object types defined starting at line 125 of src/redis.h:

#define REDIS_ISET 5

And we can add the following to the object encoding definitions at line 132 of src/redis.h:

#define REDIS_ENCODING_AVLTREE 8

We also need to add a declaration/definition for the dictType used for our custom datatype. This structure is used to properly insert, retrieve, and destroy our custom datatype from Redis' internal dict type. We add the following to the extern declarations at line 761 of src/redis.h:

extern dictType isetDictType;

And we add the definition where the other dictType definitions are made at line 480 of src/redis.c:

dictType isetDictType = {
    dictEncObjHash,            /* hash function */
    NULL,                      /* key dup */
    NULL,                      /* val dup */
    dictEncObjKeyCompare,      /* key compare */
    dictRedisObjectDestructor, /* key destructor */
    NULL                       /* val destructor */
};

The last Redis-specific declaration we need to make is the declaration of a function responsible for allocating memory for our new datatype. The function responsible for deallocation does not need a declaration, as it will not be accessible outside of the memory management module we'll edit shortly. We'll add the creation function to the object implementation section at line 864 of src/redis.h:

robj *createIsetObject(void);

With the Redis-specific declarations done, we should now declare/define the structs needed to represent our interval trees as well as declare the functions required to manage the structures. We can add the structs after the sorted set ("zset") declarations around line 465 of src/redis.h:

/* ISET specialized AVL structures */
typedef struct avlNode {
       robj *obj;
       double scores[2];
       double subLeftMax, subRightMax;
       char balance;
       struct avlNode *left, *right, *parent, *next;
} avlNode;

typedef struct avl {
       struct avlNode *root;
    dict *dict;
       unsigned long size;
} avl;

typedef struct iset {
       avl *avltree;
       dict *dict;
} iset;

The management functions can be placed after the object implementation section, around line 931 of src/redis.h:

avl *avlCreate(void);
avlNode *avlCreateNode(double lscore, double rscore, robj *obj);
void avlFreeNode(avlNode *node, int removeList);
void avlFree(avl *tree);
int avlNodeCmp(avlNode *a, avlNode *b);
void avlLeftRotation(avl * tree, avlNode *locNode);
void avlRightRotation(avl * tree, avlNode *locNode);
void avlResetBalance(avlNode *locNode);
int avlInsertNode(avl * tree, avlNode *locNode, avlNode *insertNode);
avlNode *avlInsert(avl *tree, double lscore, double rscore, robj *obj);

Finally, we should declare the functions that will handle the external Redis commands exposed for our custom datatype. We'll add the following declarations at the end of the commands prototypes section at line 1063 of src/redis.h:

void iaddCommand(redisClient *c);
void iremCommand(redisClient *c);
void irembystabCommand(redisClient *c);
void istabCommand(redisClient *c);

As you can see, we'll be defining four separate operations for managing our custom datatype.

Defining the Memory Management Functions

With most of the declarations out of the way, we can flesh out the object allocation and deallocation routines. Redis represents internal objects as robj types, which include a pointer to the actual data, the type of the data, the encoding of the data, and a reference count for automated deallocation.

We will define our previously declared createIsetObject() function in the src/object.c file, which is responsible for the memory management of objects in Redis. We'll also define the freeIsetObject() for deallocating the structure.

robj *createIsetObject(void) {
    avl *a = avlCreate();
    robj *o = createObject(REDIS_ISET,a);
    o->encoding = REDIS_ENCODING_AVLTREE;
    return o;
}

void freeIsetObject(robj *o) {
    avlFree(o->ptr);
}

Creating the object is pretty straightforward; we create the augmented AVL structure we will define later, create a new Redis object based off of the tree, and set the encoding.

All we have left to do is tie the deallocation to the reference counter. We'll modify the decrRefCount() at line 196 of src/object.c and add the following as a case to the switch statement:

case REDIS_ISET: freeIsetObject(o); break;

When the object is no longer needed, it will be automatically freed.

Defining the Interval Set

With the majority of the supporting tie-in code written, we can turn our focus on the actual data structure and the functions that respond to the external commands. First, we should add our command functions to the command table at line 115 of src/redis.c.

{"iadd",iaddCommand,-5,"wm",0,NULL,1,1,1,0,0},
{"irem",iremCommand,-3,"w",0,NULL,1,1,1,0,0},
{"irembystab",irembystabCommand,3,"w",0,NULL,1,1,1,0,0},
{"istab",istabCommand,-3,"r",0,NULL,1,1,1,0,0},

We'll define that AVL management functions as well as the four above functions in src/t_iset.c, a new module we create to hold just our custom datatype.

I won't be going through every aspect of the datatype, as the implementation is almost 1,000 lines long (view the full source). However, I do want to touch on a couple of points.

  • The avlCreate() function creates a structure for holding an interval tree as well as an internal Redis dict using dictCreate(). We use this dict to maintain the set of values that have already been added to the interval sets, guaranteeing constant member lookup and deletion.
  • The avlFree() function calls decrRefCount() with the robj at the node being freed. This is required to ensure Redis properly removes the object when no longer referenced (it may still be referenced in the dict mentioned above).
  • The iaddCommand() function is variadic, and thus can be used to add multiple items to the set with a single call. This is a good example method to reference for implementing similar functionality.

Persistence

Redis provides two forms of disk persistence: RDB and AOF. RDB is a binary storage format that represents snapshots of the entire Redis database at any point in time. AOF (append-only file) is an append-only log of all Redis commands that have been run to build the database in its current format.

For our custom datatype, we will build in support for both types of persistence. If you can be sure that you will only use one type of persistence, you can implement support for only that format. Be forewarned, however, that the Redis server will throw a panic if you attempt to use a persistence strategy that is not supported by your custom type.

RDB

We'll start off by adding RDB support for interval sets. First, we need to define a constant that represents our interval type for RDB at the datatype definitions starting at line 72 of src/rdb.h.

#define REDIS_RDB_TYPE_ISET   5

Next, we need to add the code responsible for saving the interval set. First we need to add the interval set case to the switch statement in rdbSaveObjectType() at line 430 of src/rdb.c.

case REDIS_ISET:
    return rdbSaveType(rdb,REDIS_RDB_TYPE_ISET);

Second, we will add the actual save logic to rdbSaveObject() at line 478 of src/rdb.c.

else if (o->type == REDIS_ISET) {
    avl * tree = o->ptr;
    dictIterator *di = dictGetIterator(tree->dict);
    dictEntry *de;

    if ((n = rdbSaveLen(rdb,dictSize(tree->dict))) == -1) return -1;
    nwritten += n;

    while((de = dictNext(di)) != NULL) {
        robj *eleobj = dictGetKey(de);
        double *scores = dictGetVal(de);

        if ((n = rdbSaveStringObject(rdb,eleobj)) == -1) return -1;
        nwritten += n;
        if ((n = rdbSaveDoubleValue(rdb,scores[0])) == -1) return -1;
        nwritten += n;
        if ((n = rdbSaveDoubleValue(rdb,scores[1])) == -1) return -1;
        nwritten += n;
    }
    dictReleaseIterator(di);
}

This code creates an iterator on the dict internal to the interval set and proceeds to walk the dict. For each element encountered, the element's value is saved along with the minimum and maximum scores associated with the interval.

After finishing with the save logic, we need to implement load logic in the rdbLoadObject() function at line 771 of src/rdb.c.

else if (rdbtype == REDIS_RDB_TYPE_ISET) {
    size_t isetlen;
    avl * tree;
    size_t maxelelen = 0;

    if ((isetlen = rdbLoadLen(rdb,NULL)) == REDIS_RDB_LENERR) return NULL;
    o = createIsetObject();
    tree = o->ptr;

    while(isetlen--) {
        robj *ele;
        double score1;
        double score2;

        avlNode *inode;

        if ((ele = rdbLoadEncodedStringObject(rdb)) == NULL) return NULL;
        ele = tryObjectEncoding(ele);
        if (rdbLoadDoubleValue(rdb,&score1) == -1) return NULL;
        if (rdbLoadDoubleValue(rdb,&score2) == -1) return NULL;

        /* Don't care about integer-encoded strings. */
        if (ele->encoding == REDIS_ENCODING_RAW &&
            sdslen(ele->ptr) > maxelelen)
            maxelelen = sdslen(ele->ptr);

        inode = avlInsert(tree, score1, score2, ele);
        dictAdd(tree->dict,ele,&inode->scores);
        incrRefCount(ele); /* Added to dictionary. */
    }
}

We start the load code by reading the number of elements to expect during the load; this was automatically written by rdbSaveObject(). We then simply create a new interval set with createIsetObject() and then read in each node one by one, adding each to the tree with avlInsert() and dictAdd.

With that, the RDB code is complete.

Append-Only File

AOF support is much simpler; by default, each command sent to the Redis server is appended to the log, and thus support is already baked in. However, Redis does intelligently rewrite the append-only file when the file gets too big to eliminate redundancy. For instance, if a specific element has been added and deleted 500 times, we shouldn't have 500 copies of the same add and delete commands. Thus we need to add a method that will iterate the interval set and write out the IADD commands to regenerate the set.

We can add the function after all of the other datatype output functions at line 773 of src/aof.c.

int rewriteIntervalSetObject(rio *r, robj *key, robj *o) {
    long long count = 0, items = isetLength(o);

    if (o->encoding == REDIS_ENCODING_AVLTREE) {
        dictIterator *di = dictGetIterator(((avl *) o->ptr)->dict);
        dictEntry *de;

        while((de = dictNext(di)) != NULL) {
            robj *eleobj = dictGetKey(de);
            double **scores = dictGetVal(de);

            if (count == 0) {
                int cmd_items = (items > REDIS_AOF_REWRITE_ITEMS_PER_CMD) ?
                    REDIS_AOF_REWRITE_ITEMS_PER_CMD : items;

                if (rioWriteBulkCount(r,'*',2+cmd_items*3) == 0) return 0;
                if (rioWriteBulkString(r,"IADD",4) == 0) return 0;
                if (rioWriteBulkObject(r,key) == 0) return 0;
            }
            if (rioWriteBulkDouble(r,(*scores)[0]) == 0) return 0;
            if (rioWriteBulkDouble(r,(*scores)[1]) == 0) return 0;
            if (rioWriteBulkObject(r,eleobj) == 0) return 0;
            if (++count == REDIS_AOF_REWRITE_ITEMS_PER_CMD) count = 0;
            items--;
        }
        dictReleaseIterator(di);
    } else {
        redisPanic("Unknown interval set encoding");
    }
    return 1;
}

This code is quite similar to the RDB save logic; we create an interval associated with the interval set, step through each node, and write the node to the persistence log.

We tie this method into the AOF process by adding a call to the method in rewriteAppendOnlyFile() at line 906 of src/aof.c.

else if (o->type == REDIS_ISET) {
    if (rewriteIntervalSetObject(&aof,&key,o) == 0) goto werr;
}

That should do it for disk persistence.

Tie into the Build System and Try it Out

With the code implemented, all we need to do is add t_iset.o to line 102 of src/Makefile.

With that done, we can compile and run the Redis server:

> make
> src/redis-server

Now we can run a basic test:

redis 127.0.0.1:6379> iadd test 0 10 test1
(integer) 1
redis 127.0.0.1:6379> iadd test 0 20 test2
(integer) 1
redis 127.0.0.1:6379> iadd test 10 30 test3
(integer) 1
redis 127.0.0.1:6379> istab test 5
1) "test1"
2) "test2"
redis 127.0.0.1:6379> istab test 15
1) "test3"
2) "test2"
redis 127.0.0.1:6379> istab test 25
1) "test3"

Hooray! We should now have successfully added interval sets to Redis!

While we had to go through a large number of steps to get us here, adding a datatype and supporting commands to Redis is not very difficult. The supporting infrastructure for Redis truly makes it a great system for storing data structures.

See More in the Hacking Redis Series