A Simple Yet Powerful Cache Tagging Strategy for Redis

When using Redis for caching data related to specific entities, a basic requirement is being able to invalidate all the data related to any entity we update, so that any subsequent request related to that entity will refresh its cache. This is usually called “tagging”, as the basic idea is to tag the data you’re caching with the identifiers of the entities related to that data, so that when one of your entities changes, you (somehow) invalidate all the data tagged with that entity’s identifier.


Active Data Invalidation

The usual advice for implementing this on Redis is based on an active data invalidation strategy, where we actively invalidate (read: delete) all of the cache entries for any tag at the point where we’re updating its data.

Writing to the Cache

  1. Insert a record with the data you’re caching, usually as a string, with some identifier of your data as the key and your serialized data as the value.
    SET mydata1 "Serialized data for mydata1"
    SET mydata2 "Serialized data for mydata2"
    SET mydata3 "Serialized data for mydata3"
  2. Take the data identifier key you used in (1) and add it to a set or list corresponding to each associated tag record.
    SADD tag:mytag1 mydata1 mydata2
    SADD tag:mytag2 mydata3

Reading from the Cache

  1. Just get the cached data by key, using its identifier, and deserialize it.
    GET mydata1
  2. If there’s no data in the cache: retrieve it from persistent storage, generate it, etc. and then write it to cache.

Invalidating Tagged Data in the Cache

  1. Get the record corresponding to the tag you want to invalidate.
    SMEMBERS tag:mytag1
  2. For each data identifier key in that record, delete the record by key.
    DEL mydata1 mydata2
  3. Delete the record for the tag, effectively invalidating it.
    DEL tag:mytag1

Pros and Cons

Advantages

  1. Relatively simple to implement

Disadvantages

  1. Slow invalidation, as it first requires retrieving the list of keys associated with that tag, then deleting them as well as the tag record. This slows down any write involving updating a tag’s associated data.
  2. Greater risk of inconsistent data reads, as another client might read stale cache data while keys are being invalidated.
  3. If any of your tag records get evicted or deleted by mistake, there’s no easy way to invalidate those records.

Variants

There are slightly more elaborate approaches of this basic idea, for specific use cases, for example when you need to be able to easily access the tags corresponding to each cached data record you can use primary and secondary indices.


The Alternative: Passive Data Invalidation

This tagging strategy is based on keeping numeric “versions” of each tag. The invalidation becomes much easier and quicker as the tag’s version number simply has to be increased, at the cost of making the cache retrieval slighly more complex since you need to check if the version number of the data corresponds to the tag’s current version.

For this we can use Redis hashes, which are a very handy data structure that works as a collection of field-value pairs.

Writing to the Cache

All the tag versions will be saved in a hash record with an easily remembered key, for example tags, @tags, etc. We’ll use @tags in the rest of this example.

  1. Read your @tags record to get the current versions of your tags. If it hasn’t been created yet it will just be empty.
    HGETALL @tags
  2. Insert a record for the data you’re caching as another hash, with some identifier of your data as the key.  The serialized data you’re caching would go into a specific field in that hash, for example let’s call it @data. Also, for each tag associated with this data we add a field named after the tag, with the current version of that tag, as read in step (1), as the value. For tags not present in @tags we use 0 as the version.
    HMSET mydata1 @data "Serialized data for data1" mytag1 1 mytag2 0

Reading from the Cache

  1. Retrieve the current tag versions from @tags
    HGETALL @tags
  2. Get the cached data hash by key, using its identifier. If it’s not in cache: retrieve it from persistent storage, generate it, etc., and then write it to cache.
    HGETALL mydata1
  3. Compare the version of each tag in your cached data hash with its current version in @tags. If it is not exactly the same for any of the tags then ignore the cached data, and instead: retrieve it from persistent storage, generate it, etc., and then write it to cache.
  4. If all the tag versions in your cached data are current, then deserialize the @data field.

Invalidating Tagged Data in the Cache

  1. Increase by 1 the value of the field named after your tag, inside your @tags record. Helpfully, Redis creates this field automatically if it doesn’t exist, and will end up with value 1 after the increment.
    HINCRBY @tags mytag1 1

Pros and Cons

Advantages

  1. Invalidating tags is super quick and easy, as you just increment a value for each tag you updated.
  2. Less risk of inconsistent data reads, as tag invalidations are atomic.
  3. If you’re evicting by least recently or least frequently used, your @tags record should not be evicted as you will be accessing it all the time.

Disadvantages

  1. It’s a bit more complex to implement, especially the reading part.
  2. Reading data from cache requires also reading the @tags record. This can be mitigated by reading that record right after connecting to Redis and keeping it in memory for any subsequent cache retrievals during the same connection. Doing that also helps to keep the data consistent, as its less probable that you will mix stale and updated data in the same connection.

Conclusion

I have successfully used this approach for caching DB results while tagging them based on the tables the data was retrieved from. When writing to any table I’m also increasing its tag version, and all subsequent reads of the cached results based on statements such as SELECT or JOIN involving that table will automatically refresh that cached data. Systems that have a more robust entity tagging system can use the same idea for tagging individual entities inside the DB, like a particular user, blog post, etc.

If there’s interest, I might add an example of a PhpRedis implementation for setting, getting, and invalidating tagged data using this strategy. In the meantime, if you end up implementing this strategy, please let me know how you implemented and/or modified it for your particular use case, and what cool things you learned in the process!