Hash Table

Hash TableA Hash Table (or Hash Map) maps unique keys to associated values. A hash table is usually implemented as an array-based data structure which uses a hash function to convert the key into the index of an array element, where the associated value can be found. The diagram (from Wikimedia Commons) shows the use of a hash table to store telephone numbers.

The hash function is crucial in hash table design. A good hash function should provide uniformly distributed hash values. Poor hash functions can cause collisions, leading to poor hash table performance. Collisions occur when the hash function returns the same value for different keys.

The ratio between the number of stored items and the table size is known as the load factor. Hash tables can be of a constant size or they can be dynamically resized when the load factor exceeds a specified threshold. This is done before the table becomes full to minimise the number of collisions and prevent performance degradation.

As we noted above, collisions occur when the hash function returns the same hash value for different keys. Collisions are virtually unavoidable, so they should always be taken into consideration when implementing a hash table. There are various strategies for collision resolution, the principal ones being Closed Addressing and Open Addressing.

In closed addressing (open hashing), each slot of the hash table contains a link to another data structure (eg: a linked list) which stores the key-value pairs that share the same hash. When a collision occurs, this data structure is searched for the key-value pair which matches the key.

In open addressing (closed hashing) each slot contains a key-value pair. When a collision occurs, the open addressing algorithm calculates another location to locate a free slot. The performance of hash tables based on an open addressing strategy can decrease dramatically when the table is tightly filled, ie: when the load factor is 0.7 or more.

You can view Java and C++ implementations of a simple hash table here.

Next: Memory Allocation