First, sort the numbers in increasing order.
Then decide what to do next.
— an algorithmic strategy
This chapter deals with the following fundamental problem: Permute the elements of an array v[0..n-1] to put them in increasing order. In other words, rearrange the array so that v[0] ≤ v[1] ≤ . . . ≤ v[n-1] .
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
111 | 999 | 222 | 999 | 333 | 888 | 444 | 777 | 555 | 666 | 555 |
111 | 222 | 333 | 444 | 555 | 555 | 666 | 777 | 888 | 999 | 999 |
We shall discuss here two simple algorithms for the problem. Other chapters will describe more sophisticated and much faster algorithms.
Though the sorting problem is presented here in terms of an array, it makes sense for any linear structure, such as a linked list, for example.
Table of contents:
int check (int v[], int n) { if (n > 1) for (int i = 1; i < n; i++) if (v[i-1] > v[i]) return 0; return 1; }
int check (int v[], int n) { int yes; for (int i = 1; i < n; i++) if (v[i-1] <= v[i]) yes = 1; else { yes = 0; break; } return yes; }
The algorithm of sorting by insertion, or Insertionsort, is often used to sort a deck of playing cards. Here is a function that implements the algorithm:
// This function rearranges the array v[0..n-1]
// in increasing order.
void
insertionsort (int n, int v[])
{
for (int j = 1; j < n; ++j) {
int x = v[j];
int i;
for (i = j-1; i >= 0 && v[i] > x; --i)
v[i+1] = v[i];
v[i+1] = x;
}
}
(Compare the inner for
with the insertion algorithm discussed in the Arrays chapter.)
To understand the workings of the algorithm,
it is sufficient to observe that at the beginning of each repetition
of the outer for
, immediately before the comparison
j < n
,
These invariant conditions are trivially true at the beginning of the
first iteration, when j == 1.
At the beginning of the last iteration,
j is n and therefore
the array v[0..n-1] is in the desired order.
(Note that the last iteration is aborted as soon as it begins,
since the condition j < n
is false.)
0 | increasing | j-1 | j | n-1 | ||||||
444 | 555 | 555 | 666 | 777 | 222 | 999 | 222 | 999 | 222 | 999 |
Performance of the algorithm.
How many times does the function insertionsort
compare x with an element of the array?
More precisely,
how many times insertionsort
executes the comparison v[i] > x
in the worst case?
This number is directly related
to the sequence of values of i
along the execution of the function.
For each value of j,
in the worst case,
the variable i takes
the values j-1, . . . , 0 .
The following table shows these values explicitly:
j i
1 0 1
2 1 0 2
3 2 1 0 3
. . . . .
n-1 n-2 n-3 .. 1 0 n-1
The third column of the table gives
the number of different values of i
in the second column of the same line.
Therefore,
the number of evaluations of
v[i] > x
is, in the worst case, equal to the sum of the third column.
This sum is n(n-1)/2,
that is, a little less than half of
n2 .
Now consider the time
that function insertionsort consumes
to do its job.
This time is proportional to the number of executions
of the comparison v[i] > x
(think!).
Hence, the time consumption
of the function grows, in the worst case,
as the square of the size of the array.
If an array of size N consumes
T seconds
then an array of size 2N will consume
4T seconds
and an array of size 10N will consume
100T seconds.
This analysis shows that the Insertionsort algorithm is too slow for sorting big arrays. Due to its simplicity, however, the algorithm is very useful when the array is small. Besides, in the best case (for example, if the array is already almost sorted), the performance of the algorithm is very good: the number of comparisons is on the order of n and therefore the time consumption is proportional to n.
Animations. The animation on the right (copied from Wikimedia) shows an array v[0..79] of positive numbers being sorted by insertion. (See a slower version of the animation.) Each element v[i] of the array is represented by the point (i, v[i]).
There are many other interesting animations of Insertionsort on the Web. See, for example,
v[i] > xinto
v[i] >= x. Is the new function correct?
for (j = 1by
for (j = 0in the code of insertionsort? What happens if we replace
v[i+1] = xby
v[i] = x?
for (int j = 1; j < n; ++j) { int x = v[j]; for (int i = j-1; i >= 0 && v[i] > x; --i) { v[i+1] = v[i]; v[i] = x; } }
int i, j, x; for (j = 1; j < n; ++j) { for (i = j-1; i >= 0 && v[i] > v[i+1]; --i) { x = v[i]; v[i] = v[i+1]; v[i+1] = x; } }
for (int j = 1; j < n; ++j) { int x = v[j]; int i = j - 1; while (i >= 0 && v[i] > x) { v[i+1] = v[i]; --i; } v[i+1] = x; }
forof the function insertionsort has the mission of finding the point in v[0..j-1] where v[j] must be inserted. Consider doing this with a binary search. Analyse the result.
This section discusses another well-known sorting algorithm. It uses the following idea: select the smallest element of the array, then the second smallest, then the third smallest, and so on.
// This function rearranges the array v[0..n-1]
// in increasing order.
void
selectionsort (int n, int v[])
{
for (int i = 0; i < n-1; ++i) {
int min = i;
for (int j = i+1; j < n; ++j)
if (v[j] < v[min]) min = j;
int x = v[i]; v[i] = v[min]; v[min] = x;
}
}
To understand
why the algorithm is correct,
it is sufficient to observe that at the beginning of each
repetition of the outer for
,
immediately before comparing i to n-1,
the following invariants hold:
The third invariant can be translated into English by saying
that v[0..i-1] contains all the small
elements
of the original array and v[i..n-1] contains all the
large
elements.
0 | increasing | i-1 | i | n-1 | ||||||
110 | 120 | 120 | 130 | 140 | 999 | 666 | 999 | 666 | 999 | 666 |
small | large |
The three invariants make sure that at the beginning of each iteration the elements v[0], . . . , v[i-1] are in their final positions. At the beginning of the last iteration, the array is in increasing order, as desired.
Performance of the algorithm. The Selectionsort algorithm does about n2/2 comparisons between elements of the array in the worst case. This is just like with Insertionsort. Unlike Insertionsort, however, the Selectionsort algorithm also does about n2/2 comparisons in the best case (for example, if the array is already sorted or almost sorted). Hence, the time consumption of the algorithm is always proportional a n2. In view of this observation (and of others to be suggested in the exercises) the Insertionsort algorithm is preferred to Selectionsort.
Animations. The animation at right (copied from Wikipedia) shows the sorting of an array v[0..79] of positive numbers by Selectionsort. (The speed of the animation is unrelated to the speed of the algorithm. See a slower version of the animation.) Each element v[i] of the array is represented by the point (i,v[i]).
There are many other interesting animations of Selectionsort on the Web. See, for example,
for (i = 0by
for (i = 1? What happens if we replace
for (i = 0; i < n-1by
for (i = 0; i < n?
v[j] < v[min]by
v[j] <= v[min]. Is the new function correct?
struct record {int aa; char *bb;};
(Field aa could be the number of an item in the inventory of a store and bb the name of the item.) Write a function to rearrange the array so that the fields aa are in increasing order. Write another function to rearrange the array so that the fields bb are in lexicographic order.
void r_insertionsort (int n, int v[]) { for (int j = 1; j < n; ++j) { int x = v[j]; int i; for (i = j-1; i >= 0 ; --i) { if (rand() > RAND_MAX/2) v[i+1] = v[i]; else break; } v[i+1] = x; } }
Many problems can be reduced to sorting an array, i.e., many problems can be solved with the help of some sorting algorithm.
silentis an anagram of
listen. Let's say that two words are equivalent if one is an anagram of the other. An equivalence class of words is a set of pairwise equivalent words. Write a program that will extract a largest equivalence class from a given ASCII file containing words, one per line. Run your program on a file of common English words (see the file 1-1000.txt copied from https://
A sorting algorithm is stable if it does not change the relative position of same-value elements. Here is an example. Suppose that an array of numbers of type double is sorted by an algorithm that looks only at the integer part of the numbers. If the sorting algorithm is stable, the numbers having the same integer part will remain in the same relative order as before. In the following figure, the first line shows the original array and the second shows the result of sorting the array by a stable algorithm.
55.1 | 44.0 | 55.2 | 33.9 | 22.9 | 11.0 | 22.5 | 22.6 |
11.0 | 22.9 | 22.5 | 22.6 | 33.9 | 44.0 | 55.1 | 55.2 |
Here is another example.
Let's say that each element of the array is a struct
with two fields:
the first contains a person's name
and the second contains the year that person was born.
Suppose the original array has two John Doe
,
the first born in 2013 and the second in 2011.
If the array is sorted by a stable algorithm
which looks only at the name-fields,
the two John Doe
will remain in the same relative order:
first the younger one and then the older one.
The stability of a sorting algorithm is a useful property that we shall exploit in another chapter.
v[i] > xby
v[i] >= xin the insertionsort function. Does the modified function implement a stable sort?