My new BluRayFriend site (still under construction) requires some caching. The server is Windows 2003, so AppFabric isn’t an option.
One of the things I need to store is Ratings. For it, I created a little API that uses System.Runtime.Caching. I can create new single-index caches simply by inheriting from a base class and filling in some blanks.
I started to use that same approach for the user cache, but it didn’t last very long. I have to be able to retrieve users two ways: By Id and by FacebookId. Logging in uses the FacebookId, pretty much everything else uses my own system user id.
This provided me the opportunity to do something that I’ve wanted to do for a long time: Create a multi-index cache interface. The multi-index part means that you can retrieve objects via different criteria.
The trick is the key field. If I need to lookup FacebookId=12345, how do I know what the UserId is to query the cache? You can use Linq to query the dictionary to find the right object, assuming that its there, but then you lose out of the advantages of a dictionary.
The solution I came up with is to have multiple dictionaries, one for the primary index, and one for each secondary index. The secondary dictionaries map one key to another. So, the secondary index for FacebookId maps “FacebookId aaaaa = Id 12345”. When you query a secondary index, it looks up the mapping, then goes to the primary index.
What if the object doesn’t exist in the secondary index? No problem. The index has a delegate that knows how to go get the object by FacebookId. The primary index and other secondary indexes all know how to extract their own key from the object. Once the index pulls back the object, it gives it to all of the other indexes. The primary index inserts or updates itself, and the other secondary indexes extract the key and update/insert themselves too.
If you hit the primary key directly (ie: for the user id, in this example), then it works pretty much the same way. If the object isn’t found, it fires the delegate that looks it up. It then passes the object to all of the secondary indexes, and they each figure out their own keys and update/insert themselves.
If you remove an object from the cache, it too is passed to all of the indexes so that they can update themselves.
This top-of-the-line excel snapshot shows the single primary index and two secondary. See how the right columns of the secondary correlate to the left columns of the primary. That’s the mapping. Also note that one of the shown indexes is by last name; that’s fictional. I don’t have such an index, but thought another one was necessary to help illustrate.
Why Multiple Indexes Instead of Multiple Caches?
In my example, you could imagine that there’s a UserId cache and a FacebookId cache. The problem is the duplication and the synchronization. If I update a user, do I really want to know that I have to flush or update two caches? And if so, how do I know the ID to flush on each caches? And why would I want two copies of the same object in memory anyway? And why do zebras have stripes?
In essence, a multi-index cache is mutliple caches, but they’re all wrapped up nice and neat.
The code for the API is less than 120 lines of code. I wrote it pretty quickly, and have spent more time talking about it since than actually doing anything with it. I’m sure it could use a little more work, but not much. It’s pretty solid as is. The few lines of code can be accredited to the ConcurrentDictionary in .Net4 which has the delegation and thread safety built in. Prior to .Net4, I would still base it on a dictionary, but would have to code the thread safety and delegation.
To make it accessible, I created two base classes.
For each cache, you implement MultiIndexCache<T>, and for each Index within the cache, you implement Index<T>. In my case, T is BluRayFriendUser.
Through a funny coincidence, the code for the cache and the index comes up to about 120 lines. The implementation of the user cache is also 120 lines. Neat.
Anyway, here’s the full blown implementation. The following code contains:
- UserCache, which is the cache set. The indexes are private classes within UserCache
- Private class: UserCacheIndex P Index<BluRayFriendUser>. This is a convenient baseclass for each of the two indexes. Both indexes require an IUserProvider.
- ByUserId : UserCacheIndex– the primary index
- ByFacebookId : UserCacheIndex – the secondary index
The UserCache class implements GetPrimaryIndex and GetSecondaryIndexes. Those methods simply provide all of the indexes to the base class.
UserCache has two of its own methods that make the caches accessible. GetUser(int userId) and GetUserByFacebookId(long facebookId). Each of those methods make a call to base method GetItem, passing it the name of the index to query on, and the key.
The indexes each have a name. The name is defined as a constructor parameter. To use an index, you have to refer to it by name.
The indexes implement two methods
- GetKey – this method is responsible for looking at an object and creating a key for it. The UserId index returns UserId.ToString(), and the FacebookId index returns FacebookId.ToString()
- Getter(string key) – if the item isn’t found on the index, then the getter method is responsible for retrieving and returning the item
And that’s pretty much it for coding one of these things. A couple of subclasses, and a few overrides.
I didn’t want to be require that the keys be strings, but it ended up being the most practical. This implementation is using a ConcurrentDictionary, but if I want to swap out the store with System.Runtime.Caching (or even AppFabric), for example, then the keys would eventually have to be comes strings anyway.
Its using Concurrent Dictionaries, which handles the concurrency and delegation automatically.
If two threads request the same user via different indexes at the same time, then it will load the same object twice. One thread will get it by user id, and the other will get it by facebook id. All that means is that the object is going to get inserted then updated very quickly. The updated object will be identical to the object that was inserted, but that’s ok. All subsequent requests will get the one from the cache.
As for keeping the dictionaries in sync: Its self maintaining, especially since there isn’t an expiration policy in this version. But, even if that weren’t the case, it would still work just fine. Each index as a delegate that knows how to retrieve the object. So even if two tables are out of sync for the nanosecond that you make a query, the delegate will fill in the blank.
Things to Watch Out For
The objects on the cache should be immutable. It is a hashtable, so if one of the properties that comprises the cache changes, the it might no longer be found. Furthermore, you usually don’t want someone pulling an item from the cache and altering it. They may not understand that they’re altering the object that everyone is using.
The most obvious way to deal with this is to clone the object before returning it. The base class can do this automatically if the object implements ICloneable. Or, if the cache developer wants to do so himself, he can do so by transforming the result of this.GetItem().
The only other thing that has to be considered is nulls, an issue which I have momentarily chosen to ignore. The values that comprise a primary index should always exist; otherwise its just not a valid primary index. In my case, everyone has a user id. But, what if a user doesn’t have a facebook id? Currently, the Getter for the facebook index will return a null, then the cache will try to store the null. In that case, it needs to recognize that the user is not retrievable by facebook id, and not try to update anything. To get fancy, it should also know that it previously attempted to get a particular user by facebook id, and not try a second time (unless a flush occurred).
System.Runtime.Caching already had a bunch of expiration policies and the api to support it. I wouldn’t try reinventing that. Rather, I’d replace the concurrent dictionaries with the System.Runtime.Caching objects and take advantage of it there. My single index cache uses System.Runtime.Caching and takes advantages of the expiration policies. It’s a nice feature. Previously, that api was only available in ASP.NET. Its nice to have it out and about.
Live Cache Update
I’ve covered this in previous posts, but its really neat so I’ll repeat it. In the case of the RATING cache, the AVERAGE RATING needs to be recalculated each time someone saves a rating for a movie. I want to do that asynchronously via messaging so that the database doesn’t get assaulted, yadda yadda. And, when the average recalc is complete, I want all web servers to be notified so that they can update their rating cache. This is done via Simple Bus: http://someguysoftware.wikidot.com/simple-bus
The preceeding code is in the Global.asax. The first subscription is responsible for the recalcuation. Really, that shouldn’t be done in the web process. It’s temporary until I have somewhere else to put it.
The CalculateAverage method publishes a message when its done with the recalc. The message type is called AverageRatingChanged, and it too is caught in the global.asax. The second subscription updates the rating cache. If there were 10 servers, all 10 would receive the same message and update their caches.
There are tons of caching solutions out there, a lot of them working in memory. In these days of AppFabric and alternatives, a distributed cache should be strongly considered. But, if you just need a stupid little in-memory cache, and you need multiple indexes, then this works easily and very well. By swapping out the store, I could use multiple indexes against any of the caching APIs including AppFabric.
And, more importantly, it was fun. I’ve been tasked with dealing with some caching issues in another capacity and have often wondered “what-if?” and “why-not…”, and this was a good opportunity to exercise those questions in 120 lines of code.