To illustrate the idea, suppose we are trying to solve the element distinctness problem on a restricted universe. In this problem we are given a list of integers and we want to determine if there are any repeated elements. The input is an array A[1..n] of integers, where each integer is between (say) 1 and M, where M is not so large compared to n — maybe M is about 10n or something like that.
The usual way to do this is to create an array B[1..M] such that B[j] = 1 if j is an element of A, and 0 otherwise. We start by initializing all the entries of B to 0. Then we loop through the elements of A. For each i, 1 ≤ i ≤ n, we first check B[A[i]]. If it equals 1, then the value A[i] already occurred in A. Otherwise, we set B[A[i]] := 1, to signify that we've seen A[i].
If M = O(n), this gives a linear-time algorithm for element distinctness. It even lets us list the repeated elements, if there are any.
Now here's the deal. Suppose we are solving element distinctness many times in a program, as a subroutine. Then it is conceivable that initializing all the entries of B to 0 could actually be a significant time drain. Although we have to allocate the memory for B once at the start, could there be a way to avoid the time-consuming Θ(M) initialization for B each time we call the subroutine again? We have to handle the problem that the entries of B could be arbitrary junk that we have no control over.
The answer is yes, using the following trick: instead of containing 1 or 0, we will set it up so that B[j] contains the position p in A where j is found. The key point is that we always actually verify this by looking at A[p], so we can never go wrong, even if B[j] is junk. More precisely, we want to ensure that if B[j] = p, then A[p] = j.
Now for each i we are going to check to see if A[i] = d has already been seen. To do this, we look in B[d]; say it equals c. If c is not between 1 and i-1, then we know that c represents uninitialized junk, so we can confidently ignore it and set B[d] = i.
If c is between 1 and i-1, then it either represents a true pointer back to the place in A where d was found earlier, or it just happens to be junk that is in the right range. In either case, we look at A[c]. Either it equals d, in which case we have found d earlier in the array at the entry A[c], or it equals something else. In the latter case we also set B[d] = i. This works whether B[d] is a true pointer or not!
Here's the code:
for i := 1 to n do
d := A[i]
c := B[d]
if (c < 1) or (c ≥ i) then
B[d] := i
else if A[c] = d then
print(d, " occurred at positions ", c," and ", i)
else B[d] := i;
And that's it! This code fragment prints out the duplications. Of course, if you'd rather terminate as soon as a repeated element is found, you can do that instead. Or if you'd rather save the repeated elements in a linked list, you can do that, too. And of course, this trick can be used in other settings, such as large sparse matrices, where you want to avoid initialization costs.