Search Quest VII: The Search for Search (A Python Story)

Wanting a programming problem to work on but reluctant to tackle a big project, I returned to an old college textbook to look for something challenging but within my pay-grade, so to speak:

Implement sequential search and binary search algorithms on your computer. Run timings for each algorithm on arrays of size n = 10i for i ranging from 1 to as large a value as your computer’s memory and compiler will allow. For both algorithms, store the values 0 through n-1 in order in the array, and use a variety of random search values in the range 0 to n -1 on each size n. Graph the resulting times. When is sequential search faster than binary search for a sorted array? 

In short, for lists of data values in numerical order of varying size, compare on a graph the speeds of two search algorithms, sequential search and binary search.

For the benefit of readers who are not familiar with the said search methods, I’ll use an analogy I once heard during my time as a student at teachers’ college. Imagine you have to find the contact information of someone called “Daryl Daryl” in a telephone directory. First, you search for the name sequentially, that is, starting at the very first name in the phonebook you scan each name one by one until you land on “Daryl Daryl.” This, of course, is not how we would look up someone in real life, but let’s suspend our disbelief for explanation’s sake.

You then try another way to find “Daryl Daryl.” By eye, you open the phonebook at the middle, landing on a page, let’s say, with surnames starting with J. Since D comes before J, you safely ignore the last half of the book. You then repeat the process: you open the book at the halfway point between A and J, which lands you on E. Again, the last half isn’t of interest, so you go next to the middle of A and E and land on names starting with C. This time you ignore the first half. Going to the halfway mark between C and E, you finally land on D. Given that all the D names seem to fit on the two open pages before you, you zero-in on “Daryl Daryl” using the subsequent letters in his name: between “Daber” and “Dazlu” is “Danzig”; between “Danzig” and “Dazlu” is “Dapple”; between “Dapple” and “Dazlu” is “Daryl”! Success! you cry, and then blush slightly for doing so.

The search described immediately above is roughly what happens during a binary search. However, readers may be asking at this point, “So what?” As humans, some kind of “optimized” binary search (e.g. using an educated guess to find the “D” section) would be preferable to sequential search because it is slow-going for us to look at each name in order. Not necessarily for today’s light-speed computers though. Wouldn’t a modern-day machine find a value just as fast using sequential search as it would with binary search or any other method for that matter?

output

Well, it actually turns out that not all search algorithms are made equal. One thing that matters is how much data you’re searching. For relatively small amounts of data, the speed differences between a sequential and binary search may be insignificant, but for bigger amounts, binary search typically wins. Without getting into the details of algorithm analysis theory, it can be said that if you double the size of your data you’ll, on average, double the time it takes sequential search to do its job. With binary search, you would need to square the amount of data before doubling the search time. Suppose it takes 0.001 seconds to do a sequential and binary search on 1000 values. In 0.002 seconds, you could theoretically search 2000 values with sequential search, but 1000000 values with binary search!

This may make binary search look relatively more powerful than sequential search, but remember that binary search only works with data that is in order. Imagine trying to use it if somebody gave you a phonebook’s pages as a jumbled pile of torn sheets or the task of finding a Queen of Hearts in a shuffled deck of cards. Sequential search, while slower, would at least work in both cases. And even when the data is in order, sequential search may still come out on top if the target value is near the beginning of the data set (e.g. you’re not looking up “Daryl Daryl” but “Andrea Andrea”; the first card in the deck, lo and behold, is the Queen of Hearts). Data distribution matters.

So, without any more ado, here’s my shortish Python program that compares sequential and binary search. Because I had to start and end somewhere, binseqcomp.py (click expand source below) only tests lists/arrays of 100, 1000, 10000, and 1000000 values. For each size, it runs three trials, in each trial seeking a different randomly-generated number. I used the timeit module to return the average time of ten searches so as to avoid getting runtimes that were either too short (times so small they’re hard to interpret) or too long (you’re waiting forever for the program to finish). The runtimes for each search algorithm and data sizes appear in the Terminal window.

plot

After all the searches have been run,  the program then graphs the times as a scatter plot. I was happy to implement this feature. In addition to avoiding having to copy and paste the results into some third-party graphing software, I finally had my excuse to explore SciPy a bit. It turned out that part of the SciPy family of libraries was matplotlib, a Python module that can generate 2D graphs. One of the cool things about matplotlib graphs is the ability to zoom-in on data points using the Magnifying Glass icon. I found this particularly useful in analyzing the binary search times, which practically appear level with each other on the graph despite different data sizes. When I zoomed in, however, I could see that the times did increase, as expected.

So when is sequential search faster than binary search for sorted data? It seems that for data sets less than a 1000 (102) values in size, sequential search worked almost as fast as binary search. For larger data sets though, forget about it.  In one trial I ran for a data set of 100000 values (106) looking for the number “854144”, it took sequential search approximately 2 seconds to finish whereas binary search took only 0.0002 seconds, thus making it faster by a factor of 10000! As the number of values to be searched increases, binary search is where you’ll want to bet your money.

#!/usr/bin/env python

#*******************************************************************************
#Mathwoods - Sequential Search vs. Binary Search
#A short program that compares and graphs the speed of two search algorithms
#searching ordered array of increasing size.
#Created by Mathwoods on February 13, 2017
#Tested on Python 2.7.10
#Last modified on May 2, 2017
#*******************************************************************************
from __future__ import print_function #make print work as Python 3.x function
import random
import timeit
import matplotlib.pyplot as plt

#sequential search
def seq_search(arr=None, x=None):
	i = 0
	while(i < len(arr)):
		if arr[i] == x:
			return i
		i = i + 1
	return -1

#binary search
def bin_search(arr=None, x=None):
	l, r = 0, len(arr)-1
	while l <= r: 		m = int((l+r)/2) 		if x > arr[m]:
			l = m+1
		if x < arr[m]: 			r = m-1 		if x == arr[m]: 			return m 	return -1 #wrapper function to time the search algorithms def wrapper(func, *args, **kwargs): 	def wrapped(): 		return func(*args, **kwargs) 	return wrapped #main #ANSI escape sequences used to format terminal text #\x1b[... => ASCII hex for ESC, bracket added at end
#\x1b[a;b;..;..m; Selected Graphics Rendition Parameter (SGR)
#a=1 => bold or increased intensity
#b=7 => reverse video (switch background/foreground colors)
#0m  => reset all text atrributes to terminal default
print("\n" + '\x1B[1;7m'+ "SEQUENTIAL SEARCH VS BINARY SEARCH" + '\x1B[0m')

i, trial = 2, 1						#initial exponent and trial number
list_results = []					#list of maps
timeit_loops = 10					#run each search t times for each trial

#run search trials
while i <= 6:

	n = 10**i 						#10 to power of i
	tup_num = tuple(range(0, n))	#populate tuple with integers 0 to n-1

	print("\n" + '\x1B[4m'+ "ARRAY SIZE = %s (10^i where i = %s)"%(n,i) \
	 + '\x1B[0m')

	while trial <= 3:

		r = random.randint(0, n-1)
		print("TRIAL %s" % trial)
		print("Random generated value: %s" % r)

		print("Running SEQ SEARCH...", end="")

		#Calculate time
		wrapped = wrapper(seq_search, tup_num, r)
		seq_time = timeit.timeit(wrapped, number=timeit_loops)

		print("DONE!")

		print("Total time for %s searches: " \
		      "%s seconds." % (timeit_loops, seq_time))

		print("Running BIN SEARCH...", end="")

		wrapped = wrapper(bin_search, tup_num, r)
		bin_time = timeit.timeit(wrapped, number=timeit_loops)

		print("DONE!")

		print("Total time for %s searches: %s seconds."
				% (timeit_loops, bin_time))

		list_results.append({ 'power': i, 'trial': trial, 'tup_size': n,
				'rand': r, 'seq_time': seq_time, 'bin_time': bin_time,
				'timeit_loops': timeit_loops})

		trial = trial + 1

	i = i + 1
	trial = 1

print("\n" + '\x1B[1;7m'+ "To end program close graph window." + '\x1B[0m'\
 	+ "\n")

#make graph
lst_xcoordseq = []
lst_ycoordseq = []
lst_xcoordbin = []
lst_ycoordbin = []

#populate coordinate lists with data from list_results
for entry in list_results:
	lst_xcoordseq.append(entry['power'])
	lst_xcoordbin.append(entry['power'])
	lst_ycoordseq.append(entry['seq_time'])
	lst_ycoordbin.append(entry['bin_time'])

#add points to plot
#red diamonds for seq search; blue dots for bin search
plt.plot(lst_xcoordseq, lst_ycoordseq, 'rD', label='Seq Search')
plt.plot(lst_xcoordbin, lst_ycoordbin, 'bo', label='Bin Search')

#Set lower/upper bounds of axes
ymax = max(lst_ycoordseq)
plt.axis([0,7,-0.1,ymax+0.1])

plt.xlabel('Size of array (powers of 10)')
plt.ylabel('Search time (sec)')

#Set figure title
fig = plt.gcf()
fig.canvas.set_window_title('Math Woods - Sequential vs. Binary Search')

#Put grid lines at every tick on x,y axes
plt.grid(True)

#place legend
#numpoints-1 so that only one dot appears in legend
plt.legend(loc='upper left', numpoints=1)

plt.show()

One thought on “Search Quest VII: The Search for Search (A Python Story)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s