Eccentric Developments


Back to basics - Binary search

When building programs, you manipulate data; sometimes, you want to sort it; and others, you want to find something.

Searching for something in an array is very straightforward. Look at each item and compare it with the one you are looking for until you find it. It is acceptable until you start doing it many times or have to look for an item in a large amount of data.

Naïve implementation

The easiest way to find something in an array is to look over all the items. When you do this, you might get lucky and find it at the first position you look at, or if you are not, you might not find it at all, in which case you end up looking at all the data.

numbers = {86,60,45,34,88,48,64,27,98,52,26,66,55,13,69,68,20,70,56,5,1,77,58,15,93,87,16,39,75,94,63,47,18,91,3,54,81,9,19,6,7,65,24,37,46,82,32,21,89,71,84,51,38,2,76,90,4,67,23,50,14,22,72,79,95,59,44,43,36,25,53,49,30,57,97,85,61,96,31,41,33,40,78,12,92,10,83,42,73,74,80,8,29,28,11,35,99,17,100,62}
count = #numbers

-- Bubble sort starts here
function find(number)
  for n = 1, count do
     if numbers[n] == number then
        return n  --found at position n
      end
  end
  return 0 --not found
end
-- Bubble sort ends here

number_1 = find(1)
number_101 = find(101)
return "1 is at position " .. number_1 .. " and 101 is at position " .. number_101

Doing it this way is not a problem when you have a small number of items in the array. But if you have a lot of them, the work becomes slow and inefficient.

Even finding a number in 100 elements just by looking is cumbersome.

What if we use a sorted array

But let's say we have all the information already sorted. Then, finding a number by looking is much easier for a person. Sadly computers cannot look at several items simultaneously as people do, but we can program the computer to use some hints while searching in this particular case.

Enter binary search; for this algorithm, we look at the middle of the data and determine how it compares to the item we want to find.

  • If the middle item is equal, we stop looking as no more work is needed.
  • If the item we are looking for is smaller than what is in the middle, we continue the search in the smaller values subarray (left subarray: from the first position to the mid).
  • And if the item we are looking for is bigger than the one in the middle, we continue searching in the bigger values subarray (right subarray: from middle position to the last).

What is happening is that every time you compare the item with the value at the midpoint in the subarray, you decide where to continue looking, effectively discarding half of the data.

Binary search is much faster than looking at all the data when looking for a single item, especially when it is not there. Look at the following implementation:

numbers = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100}
count = #numbers

-- Bubble sort starts here
function find(number, first, last)
  if first >= last then
    return 0 --not found
  end
  local middle = (first + last) // 2
  local middle_number = numbers[middle]
  if middle_number == number then
    return middle
  elseif number < middle_number then
    return find(number, first, middle - 1)
  else
    return find(number, middle + 1, last)
  end
end
-- Bubble sort ends here

number_1 = find(1, 1, count)
number_101 = find(101, 1, count)
return "1 is at position " .. number_1 .. " and 101 is at position " .. number_101

The find function keeps track of the subarray that is the probable place for the item and updates the first and last indexes that delimit it whenever it is needed. The result of using binary search is that instead of executing 252 instructions, only 80 got to run.

Ending notes

Binary search is an example of how using hints, plus applying common intuition, helps when writing algorithms. Every time the find function executes, it discards half of the data, quickly reducing the amount of work. Granted, you need to start with a sorted array of data, or this algorithm won't work.

This algorithm belongs to a broader classification named "Divide and Conquer" If you think about it, it is a very fitting name.

Enrique CR - 2022-05-02