Thursday, October 04, 2007

HashTable of HashTables

Today I discovered that the effective limit of a HashTable with random hashes is around 65,000 items. This is because the hash-key is a 4-byte integer (32 bits). The way the stats work out, you should expect collisions from about 2^(32/2) = 65536 items. In many scenarios (mine included), that risk is too high!

It is not hard to come up with a unique enough random hash-key. The problem is that the HashTable will only allow that key to be an integer. So my workaround is to create *two* keys instead. This will increase the limit to 2^(64/2) = 4294967296 items. If I have that many items in memory, I will have other problems!

Once you have two keys, use the first to key into HashTable-1. Then make each element of HashTable-1 a new HashTable, and use the 2nd key in that one. (In a totally random scenario, each 2nd-level HashTable will only have a single element. So it would be best to initialize it with that in mind, so as not to use too much memory).

My particular scenario (an identity map) uses .NET types as the first key, and database IDs as the second. This is not the best-case for randomness, but it is sufficient for my purposes (actually, guaranteed unique).

0 comments: