> > *Optimal* hash table sizes, or functions, only exist in theory... > > However, there are a few things to think about when choosing the size. > > > > As a general rule, it should be a prime number far from any power of 2. > > THis is what I really want to understand. I can write all the code just > fine, and I can use primes for hash keys, but what I want to know is: > why? I've seen one other person recommemd a prime far from 2^n, but I've > seen many more people recommend using a prime 2^n plus or minus 1, as in > 7, 17, 31, etc. > > I haven't seen any explanation for the difference in methods, and I can't > give it a decent test until I have a signifigant number of spells (I have > 0 at the moment). A good hash function is one that each key is equally likely to hash to any of the slots. When using a division method hash function, this means trying to find the set size as unrelated to the probability distribution of the set of items you will be hashing. Without knowing the set of items to be hashed, a prime number for the size of the hash table is a good choice, since the chances of the probability distribution having a similar "factor" is the lowest (there is only one factor in a prime number). A power of two is bad for some distributions, because it has the affect of only caring about the lowest order bits of the key. For example, if h(k) = k mod 16, only the 4 lowest order bits make a difference in where the key is hashed. All other bits are irrelevant. Simiarly, a power of 10 is a bad choice if the application deals with decimal digits. The higher order digits are ignored in such hash functions. A set size of 2^p - 1 is concidered a poor choice when the key is a character string interpreted as a radix 2^p, since two strings hash to the same bucket when the strings are the same except for two adjacent characters are transposed. The general rule of "a prime number not too close to powers of two" will usually work well when the distribution of the set of keys is unknown, because it avoids the above problems. If you do know something about the distribution, violating this rule may improve your hash functions. One reference i own which goes into some details about choosing a good hash function is "Introduction to Algorithms", but Cormen, Leiserson, and Rivest. If you are looking for furthur information on hashing, check for this or similar titles in your library. Eric thrytis@imaxx.net +-----------------------------------------------------------+ | Ensure that you have read the CircleMUD Mailing List FAQ: | | http://cspo.queensu.ca/~fletcher/Circle/list_faq.html | | Or send 'info circle' to majordomo@cspo.queensu.ca | +-----------------------------------------------------------+
This archive was generated by hypermail 2b30 : 12/18/00 PST