A Simple Yet Powerful Cache Tagging Strategy for Redis

When using Redis for caching data related to specific entities, a basic requirement is being able to invalidate all the data related to any entity we update, so that any subsequent request related to that entity will refresh its cache. This is usually called “tagging”, as the basic idea is to tag the data you’re caching with the identifiers of the entities related to that data, so that when one of your entities changes, you (somehow) invalidate all the data tagged with that entity’s identifier.


Active Data Invalidation

The usual advice for implementing this on Redis is based on an active data invalidation strategy, where we actively invalidate (read: delete) all of the cache entries for any tag at the point where we’re updating its data.

Writing to the Cache

  1. Insert a record with the data you’re caching, usually as a string, with some identifier of your data as the key and your serialized data as the value.
    SET mydata1 "Serialized data for mydata1"
    SET mydata2 "Serialized data for mydata2"
    SET mydata3 "Serialized data for mydata3"
  2. Take the data identifier key you used in (1) and add it to a set or list corresponding to each associated tag record.
    SADD tag:mytag1 mydata1 mydata2
    SADD tag:mytag2 mydata3

Reading from the Cache

  1. Just get the cached data by key, using its identifier, and deserialize it.
    GET mydata1
  2. If there’s no data in the cache: retrieve it from persistent storage, generate it, etc. and then write it to cache.

Invalidating Tagged Data in the Cache

  1. Get the record corresponding to the tag you want to invalidate.
    SMEMBERS tag:mytag1
  2. For each data identifier key in that record, delete the record by key.
    DEL mydata1 mydata2
  3. Delete the record for the tag, effectively invalidating it.
    DEL tag:mytag1

Pros and Cons

Advantages

  1. Relatively simple to implement

Disadvantages

  1. Slow invalidation, as it first requires retrieving the list of keys associated with that tag, then deleting them as well as the tag record. This slows down any write involving updating a tag’s associated data.
  2. Greater risk of inconsistent data reads, as another client might read stale cache data while keys are being invalidated.
  3. If any of your tag records get evicted or deleted by mistake, there’s no easy way to invalidate those records.

Variants

There are slightly more elaborate approaches of this basic idea, for specific use cases, for example when you need to be able to easily access the tags corresponding to each cached data record you can use primary and secondary indices.


The Alternative: Passive Data Invalidation

This tagging strategy is based on keeping numeric “versions” of each tag. The invalidation becomes much easier and quicker as the tag’s version number simply has to be increased, at the cost of making the cache retrieval slighly more complex since you need to check if the version number of the data corresponds to the tag’s current version.

For this we can use Redis hashes, which are a very handy data structure that works as a collection of field-value pairs.

Writing to the Cache

All the tag versions will be saved in a hash record with an easily remembered key, for example tags, @tags, etc. We’ll use @tags in the rest of this example.

  1. Read your @tags record to get the current versions of your tags. If it hasn’t been created yet it will just be empty.
    HGETALL @tags
  2. Insert a record for the data you’re caching as another hash, with some identifier of your data as the key.  The serialized data you’re caching would go into a specific field in that hash, for example let’s call it @data. Also, for each tag associated with this data we add a field named after the tag, with the current version of that tag, as read in step (1), as the value. For tags not present in @tags we use 0 as the version.
    HMSET mydata1 @data "Serialized data for data1" mytag1 1 mytag2 0

Reading from the Cache

  1. Retrieve the current tag versions from @tags
    HGETALL @tags
  2. Get the cached data hash by key, using its identifier. If it’s not in cache: retrieve it from persistent storage, generate it, etc., and then write it to cache.
    HGETALL mydata1
  3. Compare the version of each tag in your cached data hash with its current version in @tags. If it is not exactly the same for any of the tags then ignore the cached data, and instead: retrieve it from persistent storage, generate it, etc., and then write it to cache.
  4. If all the tag versions in your cached data are current, then deserialize the @data field.

Invalidating Tagged Data in the Cache

  1. Increase by 1 the value of the field named after your tag, inside your @tags record. Helpfully, Redis creates this field automatically if it doesn’t exist, and will end up with value 1 after the increment.
    HINCRBY @tags mytag1 1

Pros and Cons

Advantages

  1. Invalidating tags is super quick and easy, as you just increment a value for each tag you updated.
  2. Less risk of inconsistent data reads, as tag invalidations are atomic.
  3. If you’re evicting by least recently or least frequently used, your @tags record should not be evicted as you will be accessing it all the time.

Disadvantages

  1. It’s a bit more complex to implement, especially the reading part.
  2. Reading data from cache requires also reading the @tags record. This can be mitigated by reading that record right after connecting to Redis and keeping it in memory for any subsequent cache retrievals during the same connection. Doing that also helps to keep the data consistent, as its less probable that you will mix stale and updated data in the same connection.

Conclusion

I have successfully used this approach for caching DB results while tagging them based on the tables the data was retrieved from. When writing to any table I’m also increasing its tag version, and all subsequent reads of the cached results based on statements such as SELECT or JOIN involving that table will automatically refresh that cached data. Systems that have a more robust entity tagging system can use the same idea for tagging individual entities inside the DB, like a particular user, blog post, etc.

If there’s interest, I might add an example of a PhpRedis implementation for setting, getting, and invalidating tagged data using this strategy. In the meantime, if you end up implementing this strategy, please let me know how you implemented and/or modified it for your particular use case, and what cool things you learned in the process!

MySQL Function for Calculating the Working Days between 2 Dates

Lately I needed a MySQL stored function to calculate the working days (or business days) between 2 dates, however all the solutions I found online were either not configurable in terms of which week days count as working days, or really hard to read/understand. So I rolled my own, and decided to post it here in case anyone else finds it useful.

Here’s the function declaration code, as well as a usage example, hosted in GitHub (or you can find it in https://gist.github.com/kazeno/8bad9453d1e4d2aed33e6af14d1aa7a1 if it’s not showing in your browser):

The function accepts 2 dates, as well as a string that specifies which week days should count as working days. The week days are input as the integers corresponding to their WEEKDAY function representation, i.e.:

0 = Monday
1 = Tuesday
2 = Wednesday
3 = Thursday
4 = Friday
5 = Saturday
6 = Sunday

Thus if the working days are Monday to Friday, the workdays argument would be ‘01234’, and if for a more abstact example the working days are Tuesday to Thursday plus Saturday, the workdays argument would then become ‘1235’.

The function itself determines the start and end dates from the first 2 arguments (so you can use it with the earlier and later dates in any position), counts the number of whole weeks (Monday to Sunday) between the 2 dates, and then loops through the remaining days not belonging to a whole week and counts them if they are contained in the 3rd argument.

Hope you found it useful!

Display Custom Equation Numbers in Wolfram Mathematica

If you would like to use the EquationNumbered style for typesetting, but using your own equation numbers or designations instead of the automatically generated ones, you can simply edit the cell expression by going into the menu Cell > Show Expression. If you do this on a blank EquationNumbered cell, you should get code like the following:

Cell[BoxData[ FormBox["", TraditionalForm]], "EquationNumbered"]

There, you need to override the CellFrameLabels option of the parent Cell, with the following code:

{{None, Cell[ TextData[{"(YOUR_DESIGNATION_HERE)"}], "EquationNumbered"]}, {None, None}}

Substitute the YOUR_DESIGNATION_HERE text with the number or designation you want to assign to your equation. For example if you want to call it A1, your cell should end up like this:

Cell[BoxData[ FormBox["", TraditionalForm]], "EquationNumbered", CellFrameLabels->{{None, Cell[ TextData[{"(A1)"}], "EquationNumbered"]}, {None, None}}]

Now you can disable the menu Cell > Show Expression option, and your cell is ready for you to type your equation in!

Laplace's equation in polar coordinates

Clear the Field of an non-Editable AngularUI Typeahead

When working with non-editable AngularUI Typeaheads (i.e. ones that have the typeahead-editable attribute set to false), a logical thing to do would be to erase the typeahead input field’s view value if the user doesn’t select a valid typeahead option. However, there’s no built-in option to do just that, so here’s a workaround.

The main idea here is to use the ng-blur attribute to set a function that will clear the typeahead field if no valid option was selected. To do that first we need to access by name the form controller object containing the typeahead. If the typeahead is not contained in a form element, what we can do is declare one of its parent elements as a form controller using ng-form, and give names to both of them:

<div ng-form name="myForm"> <input type="text" name="myField" ng-model="myField" ng-blur="clearUnselected()" typeahead="myField in myFields" typeahead-editable="false" /> </div>

Now we can access the the typeahead’s form properties from the AngularJS controller, and reset its $viewValue property if no valid option was selected by the user. We wrap this into an AngularJS $timeout because there’s a delay between clicking on an option and the corresponding model updating. Don’t forget to inject the $timeout service into your controller!

//inside your controller $scope.clearUnselected = function () { $timeout(function () { if (!$scope.myField) { //the model was not set by the typeahead $scope.myForm.myField.$setViewValue(''); $scope.myForm.myField.$render(); } }, 250); //a 250 ms delay should be safe enough }

PrestaShop Module Development by Fabien Serny – a book review

This book, written by one of the original members of the Prestashop developer team, is an invaluable resource for anyone whose income depends at least in part on developing Prestashop modules. The time savings alone, as compared to having to browse through the Prestashop classes and native modules just to find how an example of functionality similar to what you require, are definitely worth the price of admission.

The book outlines Prestashop best practices, goes into detail on the different types of modules, and describes the usage of helper form and list classes, the context object, overrides, and how to handle module updates. Part of this information can be also found in Prestashop’s developer guide, however the book goes into much more detail and lays out quite explanatory code samples. However, the most useful part of it all might be the Native Hooks Appendix, which has a list of all the 145 native hooks (as of Prestashop 1.6) with their descriptions, parameters, and the files they are called from.

As for stuff missing from the book, I believe a section on Functional Testing of modules would have been really useful, since the Prestashop developer community would surely benefit from incorporating automated testing into its development standards. Apart from this, the book’s a real time saver if you’re a Prestashop beginner, and even if you’ve got quite a few modules under your belt, it should at least help to improve the quality of your code.

Get data from a Jet DB (Microsoft Access) file in Windows using LibreOffice

If you need to access data from a Jet DB file, there’s a relatively simple way to do it in Windows without Microsoft Office, by using LibreOffice and creating an ODBC data source.

First open your Windows/SysWOW64/ folder in explorer and run odbcad32.exe as Administrator. The ODBC Data Source Administrator should open. Once it does, stay on the User DSN tab and click the Add button. Select the Microsoft Access Driver and click Finish. On the new window that will open write the name of your file (or anything you want, for that matter) in the Data Source Name field, next click on the Select button of the Database subsection, and select your file. Click OK to close both windows.

Now that the data source is set up, open Libre Office Base, select Connect to an existing database, and select ODBC in the dropdown menu. Click on Browse, choose the data source you just created, and click OK. Click on Finish and save your config as a new file. Now you should have access to all of the database tables in the file.

This method has the disadvantage of requiring you to create a new ODBC data source for each file you need to open, but it sure beats all other ways I’ve found so far that don’t require using a suitable version of Microsoft Access.