dallaylaen ([personal profile] dallaylaen) wrote2012-11-06 10:27 pm
Entry tags:

Redis: locking vs transactions

Suppose we need to load data from Redis, change it, and save it back. Looks easy. Here's the pseudocode.

data = redis.get(id);
do_stuff(data);
redis.set(id, data);


Now, we notice that several threads (processes, execution contexts, whatever) may access the same data, and overwrite it! So, let's use transactions.

redis.watch(id);
data = redis.get(id);
do_stuff(data);
redis.multi();
redis.set(id, data);
redis.exec();
redis.unwatch(id);


Oops, we've just lost some data, let's repeat the transaction:

redis.watch(id);
while 1:
    do_stuff(data);
    redis.multi();
    redis.set(id, data);
    if (redis.exec()):
        break;
redis.unwatch(id);


So it's 6+4*n operations where n is the number of collisions we had to survive.

On to locking. There's a great post with actual python code here.

However, I'll lock via incr instead of setnx, as incr is atomic enough (because atomic is atomic enough) and lets us guess how many contenders there are for the lock. A returned value of 1 (and only 1) means the lock is ours.

while 1:
    n = redis.incr(lock(id));
    if (n != 1):
         sleep (delay * n);
         continue;
    data = redis.get(id);
    do_stuff(data);
    redis.set(id, data);
    redis.del(lock(id));
    break;


Oops. The application crashed midway and left id locked forever. Let's fix it.

while 1:
    n = redis.incr(lock(id));
    if (n != 1):
         sleep (delay * n);
         continue;
    redis.expire(lock(id), lock_ttl);
    data = redis.get(id);
    do_stuff(data);
    redis.set(id, data);
    redis.del(lock(id));
    break;


How much work is it? Looks like 5 + n operations. A little better than with transactions, and we don't have to redo stuff (which may or may not be expensive).

What drawbacks I see here:
  • locks are arbitrary external something;

  • do_stuff() must finish before lock expires;


Still, locking looks preferable to me.

P.S. What about using real IPC instead? It's limited to the host, but then we can spread the data based on id between hosts (i.e. use some kind of sharding).