Monday, March 07, 2011

 

Binary Search: A Cautionary Tale!

Practically every developer knows what binary search is: a simple (indeed fundamental) searching algorithm which is an example of a natural divide-and-conquer strategy. Basically: starting with an already sorted array, compare the middle element of the array with the value we want to find. If the values are the same we are done, else if the array element is larger, repeat in the remaining lower half of the array, else if the array element is smaller, repeat in the remaining upper half of the array.

Unsurprisingly, binary search is an often used interview question (although perhaps less so, due to increasing complexity and the proliferation of programming frameworks; I must admit I’ve never personally been asked about it in an interview). It is a basic technique that you can reasonably expect every reasonable candidate to know and can be implemented in just a few lines of code.

Despite binary search’s simplicity, it is easy to implement it incorrectly!

Jon Bentley, in his book Programming Pearls (a programming classic), wrote that in a course he ran for professional programmers, he asked the participants to code binary search and found that 90% failed to implement it correctly. If you asked 100 programmers to write an implementation of binary search a large number of them would be incorrect and many of those those that apparently worked would actually contain subtle flaws. Indeed, many published implementations of binary search are wrong:

“a study reported that it is correctly implemented in only five out of twenty textbooks.”

Indeed, Jon Bentley’s own implementation of binary search, published in the ‘Writing Correct Programs’ chapter of Programming Pearls, contained a subtle bug that remained undetected for over twenty years. Here is an iterative C# implementation, adapted from Bentley’s pseudocode, that contains the bug (can you spot it?):

public static int BinarySearch(int[] sortedArray, int valueToFind)

{

int lower = 0;

int upper = sortedArray.Length - 1;

int m;

while (lower <= upper)

{

m = (lower + upper) / 2;

if (sortedArray[m] < valueToFind)

{

lower = m + 1;

}

else if (sortedArray[m] == valueToFind)

{

return m;

}

else

{

upper = m - 1;

}

}

return -1;

}

.

.

…Scroll down…

.

.

The bug is in the mid-point assignment

m = (lower + upper)/2 


which performed as a direct sum could lead to an integer overflow. It should be replaced by the identically equivalent safe expression

m = lower + (upper – lower)/2

This bug was highlighted in Chapter 1 of Algorithms for Interviews, IMO, a slightly misnamed book which could perhaps be better described as ‘Algorithms to make programmers think’, where the emphasis is on describing problems and getting the reader to attempt to solve them.


[Side Note: A quick check with Reflector shows that this is implemented correctly without the possible overflow flaw in .NET 4, in fact the developers have gone put of their way to make sure someone doesn’t introduce this bug by encapsulating the mid-point division in its own Method GetMedian()]



    

Powered by Blogger