A caching strategy I have used to great effect is to load tables of static or at least seldom-changed data into memory rather than look things up in the database on-demand. On modern hardware, preloading even tens of thousands of rows in memory can be a great performance tradeoff. Obviously, though, you want to minimize the memory footprint.
Any time cached data has a lot of repeating string values, a simple strategy that can save a LOT of memory is to only load each unique string value once. This is easily accomplished via string interning.
String interning is a means of keeping just one instance of each unique string value in memory. An easily understood example is names and addresses. The country and region/state/province code, and to a slightly lesser extent, postal codes and city names, will not have many unique values relative to the total number of records you might load into memory, for any non-trivial number of records. For example, if you are only loading US data, there will be just over 50 unique state / territory codes (50 states plus Puerto Rico, US Virgin Islands, Guam, etc). If you have 20,000 names and addresses in memory, interning these repetitive values saves a LOT of memory.
And doing it is drop-dead simple.
// Assuming all rows have a 2-character state code, and the following code is
// called for each row, this will load 4 bytes times the number of rows loaded
// (.NET strings are UTF-16).
Rules.Region = dbRegionValue;
// .... but this will store 4 bytes times the number of unique values,
// such that whenever the Region property of each instance of the Rules
// class has a particular value, say, "NY", it will point to the SAME instance
// in memory as all the other instances with the value "NY".
Rules.Region = string.Intern(dbRegionValue); // 99% memory savings!
The trade off? It will load a tad slower, because each value is tracked in a string interning pool, which is basically a hash table, and there’s a lookup to see if a string already exists or not. The system takes care of making sure the string is a singleton; if the string already exists, the existing instance is returned.
In practice, I haven’t really seen a subjective difference in speed with or without interning, even though the native interning implementation isn’t particularly fast unless you’re running as a native AOT app. However if this turns out to be an issue, you can accomplish the same thing as native interning with a [Concurrent]Dictionary<string,string> implementation … here’s a URL discussing the relevant benchmarks:
However, I think this would be premature optimization for my use case. I note a hesitation on startup of 2 or 3 seconds … I could cut that down to a fraction of a second, but these are system processes and console apps; it’s not worth the extra effort.
Note that the .NET runtime normally just interns string constants in your program, as a way to minimize the memory footprint of the assembly. This is a particularly effective use of interning because it’s all upside: the interning overhead happens at compile time, and the hash table is “baked in” to the assembly.
You would not intern dynamically manipulated strings, such as strings concatenated or created with StringBuilders, as there’d be no point. You would have the hashtable lookup without the memory savings.