Hashing: The Greatest Idea in Programming

What is Hashing?

We want a scheme for storing, adding, deleting and searching for elements of a certain data type. We want these operations to be fast. For example, we would like the search capability to be (or at least approach) O(k) instead of O(n) or O(log n). Previously, we had discussed storing zip codes in an unsorted array, sorted array, and also in a binary search tree. We determined that these data structures did not meet our efficiency needs. Likewise, if we try to use stacks or queues, these structures are good at accessing the first or last element but are not optimal for searching for some random element. Let us start our discussion of hashing by looking at an introductory video of this most important concept:

Problems with Linear Probing: Deleting Elements Becomes Complicated

The video, above, described how linear probing can cause clustering. But there is a bigger problem (with both linear and quadratic probing). This has to do with what happens when elements are deleted from the hash table.

The following is taken from Carnegie Melnon's website:


How do you delete an item from a hash table?

First you perform a lookup and find the item. After you've found the item, if you're resolving collisions using chaining, then the data can be removed from the linked list. But what if you used linear or quadratic probing to resolve collisions? Removing items would cause serious problems. If you remove an item from a hashtable like you would in a binary tree, you won't be able to find previous items that collided with it, and you'll end up with a mess.

So you don't literally delete items. You just mark that position in the array. These marked items are sometimes called tombstones. Future inserts can overwrite these markers, but lookups treat them as collisions. Without these markers, we might insert two elements with the same hash value, then remove the first one and leave the second item unreachable.

Problems with Chaining: Loss of Search Efficiency