Correctness, performance, and stability of basic sorting algorithm

This chapter deals with the following fundamental problem: Permute the elements of an array v[0..n-1] to put them in increasing order. In other words, rearrange the array so that v[0] ≤ v[1] ≤ . . . ≤ v[n-1] .

We shall discuss here two simple algorithms for the problem. Other chapters will describe more sophisticated and much faster algorithms.

Though the sorting problem is presented here in terms of an array, it makes sense for any linear structure, such as a linked list, for example.

Exercise 1

Important. Write a function to check whether an array v[0..n-1] is in increasing order. (This exercise exemplifies the strategy of writing the tests before inventing an algorithm for the problem.)

Criticize the code of the following function, which promises to decide whether an array v[0..n-1] is in increasing order.

int check (int v[], int n) {
   if (n > 1) 
      for (int i = 1; i < n; i++) 
         if (v[i-1] > v[i]) return 0; 
   return 1; }

Criticize the code of the following function, which promises to decide whether an array v[0..n-1] is in increasing order.

int check (int v[], int n) {
   int yes;
   for (int i = 1; i < n; i++) 
      if (v[i-1] <= v[i]) yes = 1;
      else {
         yes = 0;
         break; }
   return yes; }

Write a function to rearrange an array v[0..n-1] so that it is in strictly increasing order.
Write a function to receive an array v[0..n-1] of integers in increasing order and an integer x , and insert x into the array so as to maintain the increasing order.

Sorting by insertion

The algorithm of sorting by insertion, or Insertionsort, is often used to sort a deck of playing cards. Here is a function that implements the algorithm:

(Compare the inner for with the insertion algorithm discussed in the Arrays chapter.) To understand the workings of the algorithm, it is sufficient to observe that at the beginning of each repetition of the outer for, immediately before the comparison j < n,

These invariant conditions are trivially true at the beginning of the first iteration, when j == 1. At the beginning of the last iteration, j is n and therefore the array v[0..n-1] is in the desired order. (Note that the last iteration is aborted as soon as it begins, since the condition j < n is false.)

Performance of the algorithm. How many times does the function insertionsort compare x with an element of the array? More precisely, how many times insertionsort executes the comparison v[i] > x in the worst case? This number is directly related to the sequence of values of i along the execution of the function. For each value of j, in the worst case, the variable i takes the values j-1, . . . , 0 . The following table shows these values explicitly:

The third column of the table gives the number of different values of i in the second column of the same line. Therefore, the number of evaluations of v[i] > x is, in the worst case, equal to the sum of the third column. This sum is n(n-1)/2, that is, a little less than half of

Now consider the time that function insertionsort consumes to do its job. This time is proportional to the number of executions of the comparison v[i] > x (think!). Hence, the time consumption of the function grows, in the worst case, as the square of the size of the array. If an array of size N consumes T seconds then an array of size 2N will consume 4T seconds and an array of size 10N will consume 100T seconds.

This analysis shows that the Insertionsort algorithm is too slow for sorting big arrays. Due to its simplicity, however, the algorithm is very useful when the array is small. Besides, in the best case (for example, if the array is already almost sorted), the performance of the algorithm is very good: the number of comparisons is on the order of n and therefore the time consumption is proportional to n.

Animations. The animation on the right (copied from Wikimedia) shows an array v[0..79] of positive numbers being sorted by insertion. (See a slower version of the animation.) Each element v[i] of the array is represented by the point (i, v[i]).

There are many other interesting animations of Insertionsort on the Web. See, for example,

Exercises 2

Write a recursive version of the Insertionsort algorithm.
In the function insertionsort, change the comparison v[i] > x into v[i] >= x. Is the new function correct?
What happens if we replace for (j = 1 by for (j = 0 in the code of insertionsort? What happens if we replace v[i+1] = x by v[i] = x?

Criticize the following implementation of the Insertionsort algorithm:

for (int j = 1; j < n; ++j) {
   int x = v[j];
   for (int i = j-1; i >= 0 && v[i] > x; --i) {
      v[i+1] = v[i];
      v[i] = x; } }

Criticize the following implementation of the Insertionsort algorithm:

int i, j, x;
for (j = 1; j < n; ++j) {
   for (i = j-1; i >= 0 && v[i] > v[i+1]; --i) {
      x = v[i]; v[i] = v[i+1]; v[i+1] = x; } }

Is the following implementation of the Insertionsort algorithm correct?

for (int j = 1; j < n; ++j) {
   int x = v[j];
   int i = j - 1;
   while (i >= 0 && v[i] > x) { 
      v[i+1] = v[i];
      --i; }
   v[i+1] = x; }

Correctness check. Write a program to test, experimentally, the correctness of your implementation of the Insertionsort algorithm. Run tests on random integer arrays of various sizes. After each test, your program should check whether the output is in increasing order.
Important. Write a version of the Insertionsort algorithm to rearrange an array v[p..r] in increasing order. Your version must satisfy the following invariant: at the beginning of each iteration, a terminal segment v[k+1..r] of the array is in increasing order.
Binary search. The inner for of the function insertionsort has the mission of finding the point in v[0..j-1] where v[j] must be inserted. Consider doing this with a binary search. Analyse the result.
Worst case. Describe and analyse a worst case instance for the Insertionsort algorithm, i.e., an array v[0..n-1] that will force the algorithm to execute the greatest possible number of comparisons.
Best case. Describe and analyse a best case instance for the Insertionsort algorithm, i.e., an array v[0..n-1] that will lead the algorithm to execute the least possible number of comparisons.
Let T be a function defined on positive integers by the formula T (N) = c N², where c is a constant. Prove that T (2N) = 4 T (N).
★ Data movement. How many times, in the worst case, does the Insertionsort algorithm move an element of the array from one place to another? How many times does this occur in the best case?
Write a function with prototype insertionsort1 (int v[], int n) to rearrange an array v[1..n] (notice the indices!) in increasing order. (This can be done by making an appropriate call to insertionsort.)
Decreasing order. Write a function that will permute the elements of an array v[0..n-1] in decreasing order. Use the Insertionsort idea.
★ Performance test. Write a program to time your implementation of the Insertionsort algorithm. Here are a few hints on how you can do it. For a fixed value of n, an experiment consists of the following: generate a random array of n integers, submit it to your implementation, and use the clock function from the time library to measure the running time of the implementation. For each value of n, do a few (a dozen, say) experiments and take the average of the time measurements. Repeat the whole process for a sequence of values of n. (You may take the sequence 2⁸, 2⁹, … , 2¹⁹, 2²⁰, for example.) Present the results in a table where each row corresponds to a value of n. The table could have three columns: in column 1, the value of n; in column 2, the average running time of your implementation for that value of n; in column 3, the average running time from column 2 divided by n². Such a table will make it easy to compare the practice with the theory.

Sorting by selection

This section discusses another well-known sorting algorithm. It uses the following idea: select the smallest element of the array, then the second smallest, then the third smallest, and so on.

To understand why the algorithm is correct, it is sufficient to observe that at the beginning of each repetition of the outer for, immediately before comparing i to n-1, the following invariants hold:

The third invariant can be translated into English by saying that v[0..i-1] contains all the small elements of the original array and v[i..n-1] contains all the large elements.

The three invariants make sure that at the beginning of each iteration the elements v[0], . . . , v[i-1] are in their final positions. At the beginning of the last iteration, the array is in increasing order, as desired.

Performance of the algorithm. The Selectionsort algorithm does about n²/2 comparisons between elements of the array in the worst case. This is just like with Insertionsort. Unlike Insertionsort, however, the Selectionsort algorithm also does about n²/2 comparisons in the best case (for example, if the array is already sorted or almost sorted). Hence, the time consumption of the algorithm is always proportional a n². In view of this observation (and of others to be suggested in the exercises) the Insertionsort algorithm is preferred to Selectionsort.

Animations. The animation at right (copied from Wikipedia) shows the sorting of an array v[0..79] of positive numbers by Selectionsort. (The speed of the animation is unrelated to the speed of the algorithm. See a slower version of the animation.) Each element v[i] of the array is represented by the point (i,v[i]).

There are many other interesting animations of Selectionsort on the Web. See, for example,

Exercises 3

Write a recursive version of the Selectionsort algorithm.
In function selectionsort, what happens if we replace for (i = 0 by for (i = 1? What happens if we replace for (i = 0; i < n-1 by for (i = 0; i < n ?
In the function selectionsort, replace the comparison v[j] < v[min] by v[j] <= v[min]. Is the new function correct?
Correctness check. Write a program to test, experimentally, the correctness of your implementation of Selectionsort. (See the analogous exercise for Insertionsort.)
Worst case. Describe and analyse a worst case instance for the Selectionsort algorithm, i.e., an array v[0..n-1] that forces the algorithm to execute the greatest possible number of comparisons.
Best case. Describe and analyse a best case instance for the Selectionsort algorithm, i.e., an array v[0..n-1] for which the algorithm executes the smallest possible number of comparisons.
★ Data movement. How many times, in the worst case, does the Selectionsort algorithm move an element of the array from one place to another? How many times does this occur in the best case?
Decreasing sort. Write a function to permute the elements of an integer array v[0..n-1] in decreasing order. Use the same idea as the Selectionsort algorithm.
★ Performance test. Write a program to test the performance of your implementation of Selectionsort. (See the analogous exercise for Insertionsort.)

Exercises 4

Sorting of strings. Write a function to put an array of ASCII strings in lexicographic order. Use the Insertionsort algorithm.
Sorting lines of files. Write a function to rearrange the lines of an ASCII file in lexicographic order. (See the description of the sort utility.)
Sorting of structs. Suppose each element of an array is a struct with two fields: one is an integer and the other is an ASCII string:
```
   struct record {int aa; char *bb;};
```
(Field aa could be the number of an item in the inventory of a store and bb the name of the item.) Write a function to rearrange the array so that the fields aa are in increasing order. Write another function to rearrange the array so that the fields bb are in lexicographic order.
Invent a sorting algorithm that is faster than Insertionsort and Selectionsort.
Random shuffle? Consider the following antithesis of the sorting problem: do a random shuffle of the elements of an array v[0..n-1], i.e., rearrange the elements of the array in such a way that all the permutations are equally probable. Discuss and criticize the following solution (it was used in the European distribution of Microsoft's Windows 7):
```
void r_insertionsort (int n, int v[]) {
   for (int j = 1; j < n; ++j) {
      int x = v[j];
      int i;
      for (i = j-1; i >= 0 ; --i) {
         if (rand() > RAND_MAX/2) 
            v[i+1] = v[i];
         else break; }
      v[i+1] = x; } }
```
Sorting of linked lists. Write a function to sort a linked list. Use an idea similar to Insertionsort. (Will your function return something?)
Sorting of linked lists. Write a function to sort a linked list. Use an idea similar to Selectionsort. (Will your function return something?)

Exercises 5: applications

Many problems can be reduced to sorting an array, i.e., many problems can be solved with the help of some sorting algorithm.

Anagrams. A word is an anagram of another if the sequence of letters of one is a permutation of the sequence of letters of the other. For example, silent is an anagram of listen. Let's say that two words are equivalent if one is an anagram of the other. An equivalence class of words is a set of pairwise equivalent words. Write a program that will extract a largest equivalence class from a given ASCII file containing words, one per line. Run your program on a file of common English words (see the file 1-1000.txt copied from https://gist.github.com/deekayen/4148741#file-1-1000-txt). To stress-test your program, run it on a larger file of English words. (See, for example, the file of 466K words in English.)
Distinct values. Given an array v[0..n-1] of integer numbers, find how many distinct numbers there are in the array (that is, find the size of the set of elements of the array).
Median. Let v[0 . . n−1] be an array of pairwise distinct integers. A median of the array is an element of the array which is larger than half of the elements of the array and smaller than the other half. More precisely, a median of v[0 . . n−1] is a number m equal to some element of the array and strictly greater than exactly ⌊(n−1)/2⌋ elements of the array. (A median is very different from the mean. For example, the mean of 1 2 99 is 51, while the median is 2.) Write an algorithm to find a median of an array v[0 . . n−1] of pairwise distinct integers.
Job scheduling. Suppose that n jobs must be processed on a single processor. Given the durations t₁, … , t_n of the jobs, in what order should they be processed to minimize the average finish time of a job?

Stability of sorting

A sorting algorithm is stable if it does not change the relative position of same-value elements. Here is an example. Suppose that an array of numbers of type double is sorted by an algorithm that looks only at the integer part of the numbers. If the sorting algorithm is stable, the numbers having the same integer part will remain in the same relative order as before. In the following figure, the first line shows the original array and the second shows the result of sorting the array by a stable algorithm.

Here is another example. Let's say that each element of the array is a struct with two fields: the first contains a person's name and the second contains the year that person was born. Suppose the original array has two John Doe, the first born in 2013 and the second in 2011. If the array is sorted by a stable algorithm which looks only at the name-fields, the two John Doe will remain in the same relative order: first the younger one and then the older one.

The stability of a sorting algorithm is a useful property that we shall exploit in another chapter.

`0`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`10`
111	999	222	999	333	888	444	777	555	666	555

`0`	increasing			`j-1`	`j`					`n-1`
444	555	555	666	777	222	999	222	999	222	999

`0`	increasing			`i-1`	`i`					`n-1`
110	120	120	130	140	999	666	999	666	999	666
small					large

55.1	44.0	55.2	33.9	22.9	11.0	22.5	22.6
11.0	22.9	22.5	22.6	33.9	44.0	55.1	55.2

Basic sorting algorithms