It is often necessary in computer programming to put a collection of items in an order. To do this, a computer must rearrange values in the order specified by the programmer. Although this seems easy enough, sorting arrays can become time-consuming if your array happens to be large. There are many algorithms created to sort arrays both quickly and efficiently, and the algorithm covered today is the merge sort algorithm.

16348257

Suppose we have the above array, and we would like to sort it in ascending numerical order. The end result of the sorted array would look like this:

12345678

When I sorted this array in my head, I first found the smallest number, put it at the beginning of the array, and found the next smallest number, and put that after the beginning. I repeated this process until I had found all the elements and they were all in order.

This method of looking over the whole array works fairly quickly for short arrays, but it can start taking time for large arrays because of all the comparisons we have to make.

Merge Sort

Merge sort is an efficient, comparison-based sorting algorithm. It functions on the concept of divide and conquer. Divide and Conquer is an algorithmic paradigm where recursion is used to solve a problem divided into smaller problems. There are three steps to solving a problem using Divide and Conquer:

  1. Divide the problem into subproblems of the same type
  2. Recursively solve these sub problems, breaking down the problem until the problem becomes simple enough to solve directly
  3. Combine the solutions to the sub-problems to generate a solution for the original problem

Does Divide and Conquer make sense to you yet? If you don’t grasp the concept fully yet, don’t worry. You’ll understand once we start diving into merge sort.

Implementing Merge Sort

Using the Divide and Conquer method, there are now two steps to implement merge sort:

  1. Divide input array into sub-arrays, each containing one element
  2. Repeatedly merge these sub-lists to provide new sorted sub-lists until there is only one sub-list remaining

Dividing the Arrays

Let’s sort this array of 6 elements to understand merge sort:

735981

First, divide the array into individual elements:

7
3
5
9
8
1

Merging the Divided Arrays

Then, merge these elements into arrays of two sorted elements:

37
59
18

To merge these elements, we only had to ask one question for each merged array: which of the two numbers is the smallest? By answering this question, we now have three arrays that contain two sorted elements each.

Let’s merge these arrays again. Notice how there are an odd number of objects to merge (three arrays). In this case, we will choose to have an extra object on one side

3579
18

To end up with these two sorted arrays, we asked one question every time we iterated over an element in the array. For example, to end up with the sorted array (3, 5, 7, 8), we first started at the beginning element of each array and asked ourselves which element is smallest. Since both arrays are already sorted, we know that the smallest value we choose here will be the smallest value out of both arrays.

37
59

When comparing the first element of each array (3 and 5), we can determine that 3 is smaller than 5. So, we place 3 at the beginning of the to-be merged array.

3

Then, since we have already placed the 3 from the first array in our new merged array, we move on to the next element in the first element.

37
59

Now we are comparing 7 and 5. Again, we can determine that 5 is less than 7, therefore we, again, place 5 in the new array.

35

This process repeats for the remaining two elements.

37
59

Here, since 7 is smaller than 9, we place 7 in the new array.

357

Filling the Remaining Elements

When one of the arrays has been completely iterated through, we can go ahead and populate the rest of the new array’s spots with elements from the array that has elements remaining. For example, in this case, the first array (3, 7) has been completely iterated through (Both the 3 and the 7 have been placed in the new array), so we can go ahead and fill the last spot with the remaining element in the second array.

3579

We perform this same merge operation over and over again until we are left with one big sorted array.

Below is a simple but intuitive diagram of how the merge sort process works.

Show Me The Code

Now that you understand the concept of merge sort, let’s explore how to implement merge sort using the Python programming language. Again, this is an algorithm that does implement recursion, which can be hard to wrap your head around. I will do my best to explain it to you.

Before further explanation and breakdown though, I present you with the full code.

def split(array):
    midpoint = len(array) // 2
    return array[:midpoint], array[midpoint:]

def merge(left_array, right_array):
    if len(left_array) == 0: return right_array
    if len(right_array) == 0: return left_array

    left_index = right_index = 0
    merged_array = []
    for i in range(0, len(left_array) + len(right_array)):
        if left_array[left_index] <= right_array[right_index]:
            merged_array.append(left_array[left_index])
            left_index += 1
        else:
            merged_array.append(right_array[right_index])
            right_index += 1

        if right_index == len(right_array):
            merged_array += left_array[left_index:]
            break
        elif left_index == len(left_array):
            merged_array += right_array[right_index:]
            break

    return merged_array

def merge_sort(array):
    if len(array) <= 1:
        return array
    else:
        left, right = split(array)
        return merge(merge_sort(left), merge_sort(right))

Note: I would like to point out that in the Python programming language, an array is not the same thing as a list. However, the above program will work with both an input array and an input list, as it does not use any array specific functions. The parameters and variable nomenclature in the above program assumes that the input data structure is an array, mostly because I have been referring to the data types being sorted as arrays throughout the entire article.

Let’s dive into the code and explore how it works.

Split the Arrays

def split(array):
    midpoint = len(array) // 2
    return array[:midpoint], array[midpoint:]

The first function in our script is the split function. This function splits a given array in half and returns a tuple containing both halves. Notice how we are using integer division to find the midpoint which causes one half of the array to have an extra element if there is an odd number of elements.

Merge the Arrays

def merge(left_array, right_array):
    if len(left_array) == 0: return right_array
    if len(right_array) == 0: return left_array

    left_index = right_index = 0
    merged_array = []
    for i in range(0, len(left_array) + len(right_array)):
        if left_array[left_index] <= right_array[right_index]:
            merged_array.append(left_array[left_index])
            left_index += 1
        else:
            merged_array.append(right_array[right_index])
            right_index += 1

        if right_index == len(right_array):
            merged_array += left_array[left_index:]
            break
        elif left_index == len(left_array):
            merged_array += right_array[right_index:]
            break

    return merged_array

This second function merges two sorted arrays into one big sorted array. This implements the merge part of the merge sort.

if len(left_array) == 0: return right_array
if len(right_array) == 0: return left_array

First, we check if either of the arrays presented have no elements. If the left array has no elements, we return the right array as the sorted array, and vice versa. This is because there is no need to merge an array with an empty array.

left_index = right_index = 0
merged_array = []
for i in range(0, len(left_array) + len(right_array)):

Next, we set variables for the left index and the right index for the arrays. Remember, every time we use an element from an array in the final merged array we will increment the respective array’s index value.

We also create an empty array for the final merged array. Then, the program loops through every element in both arrays by summing the lengths of both arrays and using it as a range in a for-loop.

if left_array[left_index] <= right_array[right_index]:
  merged_array.append(left_array[left_index])
  left_index += 1
else:
  merged_array.append(right_array[right_index])
  right_index += 1

This is the first actual comparison the algorithm makes. It checks if the first element in the first array is less than or equal to the first element in the other array. If we were sorting the array in descending order, we would make the comparison backwards. If the first array’s element is indeed less than or greater to the second array’s element, we append that element to the empty array we created earlier and increase the left_index value by 1. Otherwise, we append the second array’s element and increment the right_index.

if right_index == len(right_array):
  merged_array += left_array[left_index:]
  break
elif left_index == len(left_array):
  merged_array += right_array[right_index:]
  break

After making the comparisons, we check if we have depleted all of the elements in either one of the arrays. If one array no longer has elements, we add the elements from the other array into the merged array.

Time for Recursion

def merge_sort(array):
    if len(array) <= 1:
        return array
    else:
        left, right = split(array)
        return merge(merge_sort(left), merge_sort(right))

This is the final and most important function of our program. Although it is considerably shorter than our last function, it is the fundamental concept underlying within this function that makes merge sort be able to operate.

This function is a recursive function, so we start out with a base case. Remember that we would like to divide the arrays into the smallest possible size, which is 1 element. A 1 element array is technically already sorted, so if we receive an array with only one element in the function, we go ahead and return it right back out.

Otherwise, we split the array and unpack the tuple value into two variables, both holding an array. Then, the function recursively calls merge with itself passing the left and right array as parameters.

If you don’t understand how this recursive function call works, I suggest you go ahead and read my article on recursion. Everything will make more sense after you’ve had time to grasp the concept of recursion and gain the ability to comprehend it in your head.

Once everything has been implemented, we have a successful merge sort program. You can try this program below:

If you feel like this article has helped you, I would appreciate it if you could share this article with your friends. Any feedback is much appreciated, and if you have any questions about merge sort, leave them in the comments below. Subscribe to my newsletter below if you learned something new today!