Consider the sorting problem discussed in another chapter. Namely, consider the problem of permuting the elements of an array v[0 .. n-1] to put them in increasing order, i.e., rearrange the elements so that v[0] ≤ . . . ≤ v[n-1]. The previous chapter analysed some basic algorithms for the problem. Those algorithm are quadratic, i.e., they consume an amount of time proportional to n2.
The present chapter looks at a more sophisticated and much faster algorithm that uses the divide-and-conquer strategy. The basic idea is simple: if the first half of the array is already increasing and the second half is also increasing, then the two halves can be quickly merged so that the whole array is increasing.
Table of contents:
Before we can deal with our sorting problem, we must consider the following auxiliary merging problem: given increasing arrays v[p .. q-1] and v[q .. r-1], rearrange the array v[p .. r-1] in increasing order.
p | q-1 | q | r-1 | ||||||||
111 | 333 | 555 | 555 | 777 | 999 | 999 | 222 | 444 | 777 | 888 |
It would be easy to solve the
problem in an amount of time proportional to the
square of the size of the whole array v[p..r-1]:
just ignore that the two halves
are already sorted and
use one of the basic sorting algorithms.
But we can do much better.
In order to do it,
we need a workspace,
say w[0..r-p-1],
of the same type and size as the array v[p..r-1].
// This function receives increasing arrays // v[p..q-1] and v[q..r-1] and rearranges // v[p..r-1] in increasing order. static void merge (int p, int q, int r, int v[]) { int *w; // 1 w = malloc ((r-p) * sizeof (int)); // 2 int i = p, j = q; // 3 int k = 0; // 4 while (i < q && j < r) { // 5 if (v[i] <= v[j]) w[k++] = v[i++]; // 6 else w[k++] = v[j++]; // 7 } // 8 while (i < q) w[k++] = v[i++]; // 9 while (j < r) w[k++] = v[j++]; // 10 for (i = p; i < r; ++i) v[i] = w[i-p]; // 11 free (w); // 12 }
The keyword static indicates that the function merge has an auxiliary nature and will not be called directly by the user of the sorting algorithm.
Performance. Function merge consists essentially of moving the elements of array v from one place to another (first from v to w and then back from w to v). The function executes
2n
of these moves, where n is the size of the array v[p..r-1], i.e., n = r-p. The time spent by merge is proportional to the number of moves. Therefore, the time consumption of the function is proportional a n. In other words, merge is linear.
while (i < q) w[k++] = v[i++]; for (i = p; i < j; ++i) v[i] = w[i-p];
while (i < q && j < r) { if (v[i] <= v[j]) w[k++] = v[i++]; if (v[i] > v[j]) w[k++] = v[j++]; }
i = p; j = q; for (k = 0; k < r-p; ++k) { if (j >= r || (i < q && v[i] <= v[j])) w[k] = v[i++]; else w[k] = v[j++]; }
while (k < r-p) { while (i < q && v[i] <= v[j]) w[k++] = v[i++]; while (j < r && v[j] <= v[i]) w[k++] = v[j++]; }
while(lines 5 to 8) in the merge function?
while (q < r) { int x = v[q], int i; for (i = q-1; i >= p && v[i] > x; --i) v[i+1] = v[i]; v[i+1] = x; q++; }
while? (Notice that the function operates in-place, that is, without an auxiliary array.) How much time does the function consume?
int i, k, x; i = p; while (i < q && q < r) { if (v[i] >= v[q]) { x = v[q]; for (k = q - 1; k >= i; --k) v[k+1] = v[k]; v[i] = x; ++q; } ++i; }
v[i] <= v[j]is replaced by
v[i] < v[j]?
Sedgewick writes the merging algorithm
in the following clever way.
First, copy the array v[p..q-1]
to the workspace w[0..q-p-1];
then, copy v[q..r-1]
to the workspace w[q-p..r-p-1] in reverse order.
Now, the left half
of w serves as a
sentinel
for the right half
,
and vice-versa,
during the merging process.
As a result,
there is no need to check
the boundary conditions i < q-p
and j ≥ q-p
at every iteration.
// This function receives increasing arrays
// v[p..q-1] and v[q..r-1] and rearranges
// v[p..r-1] in increasing order.
static void
s_merge (int p, int q, int r, int v[])
{
int i, j, *w;
w = malloc ((r-p) * sizeof (int));
for (i = p; i < q; ++i) w[i-p] = v[i];
for (j = q; j < r; ++j) w[r-p+q-j-1] = v[j];
i = 0; j = r-p-1;
for (int k = p; k < r; ++k)
if (w[i] <= w[j]) v[k] = w[i++];
else v[k] = w[j--];
free (w);
}
Just as the previous version, this one takes time proportional to the size of the array v[p..r-1].
for (i = 0, k = p; k < q; ++i, ++k) w[i] = v[k]; for (j = r-p-1, k = q; k < r; --j, ++k) w[j] = v[k]; i = 0; j = r-p-1; for (k = p; k < r; ++k) if (w[i] <= w[j]) v[k] = w[i++]; else v[k] = w[j--];
The Mergesort algorithm uses a divide-and-conquer strategy to sort the given array. The divide phase is simple: just break the array in half. The conquer phase is implemented by the merge function discussed above.
The recursive function shown next rearranges the array v[p..r-1] in increasing order. The basis of the recursion is the set of instances where p ≥ r-1 ; for these instances, the array has at most 1 element and therefore nothing needs to be done to put it in increasing order.
// The function mergesort rearranges the // array v[p..r-1] in increasing order. void mergesort (int p, int r, int v[]) { if (p < r-1) { // 1 int q = (p + r)/2; // 2 mergesort (p, q, v); // 3 mergesort (q, r, v); // 4 merge (p, q, r, v); // 5 } }
(Notes: 1. You can replace the call to merge in line 5 by a call to s_merge, since these two functions are equivalent. 2. The result of the division by 2 in the expression (p+r)/2 is automatically truncated. For example, (3+6)/2 is 4.)
If p < r-1, the instance v[p..r-1] of the problem is reduced to the pair of instances v[p..q-1] and v[q..r-1]. These two instances are strictly smaller than the original instance, since q < r and q > p (check it!) at the end of line 2. Hence, by induction hypothesis, the array v[p..q-1] will be in increasing order at the end of line 3, and the array v[q..r-1] will be in increasing order at the end of line 4. Now, at the end of line 5, the array v[p..r-1] will be in increasing order thanks to the merge operation. This discussion proves that the function mergesort is correct.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
111 | 999 | 222 | 999 | 333 | 888 | 444 | 777 | 555 | 666 | 555 |
111 | 999 | 222 | 999 | 333 | 888 | 444 | 777 | 555 | 666 | 555 |
111 | 999 | 222 | 999 | 333 | 888 | 444 | 777 | 555 | 666 | 555 |
. |
. |
. |
111 | 999 | 222 | 333 | 999 | 444 | 777 | 888 | 555 | 555 | 666 |
111 | 222 | 333 | 999 | 999 | 444 | 555 | 555 | 666 | 777 | 888 |
111 | 222 | 333 | 444 | 555 | 555 | 666 | 777 | 888 | 999 | 999 |
To rearranged in increasing order an array v[0..n-1], as the original formulation of the problem required, all we need to do is execute mergesort (0, n, v).
(p+r)/2by
(p+r-1)/2in the code of the mergesort function? What happens if we replace
(p+r)/2by
(p+r+1)/2?
mergesort (1,5,v) mergesort (1,3,v) mergesort (1,2,v) mergesort (2,3,v) mergesort (3,5,v) mergesort (3,4,v) mergesort (4,5,v)
Repeat this exercise with an array indexed by 1..5.
The animation at right (copied from Wikipedia) shows the sorting of an array v[0..99] that contains a random permutation of 0..99. (See a slower version of the animation.) Each element v[i] is represented by the point with coordinates (i, v[i]).
There are many other animations of Mergesort on the Web. Here is a sample:
void mergesort1 (int p, int r, int v[]) { if (p < r-1) { int q = (p + r) / 2; mergesort1 (p, q, v); mergesort1 (q, r, v); merge (p, q+1, r, v); } }
void mergesort2 (int p, int r, int v[]) { if (p < r) { int q = (p + r) / 2; mergesort2 (p, q, v); mergesort2 (q, r, v); merge (p, q, r, v); } }
void mergesort3 (int p, int r, int v[]) { if (p < r-1) { int q = (p + r - 1) / 2; mergesort3 (p, q, v); mergesort3 (q, r, v); merge (p, q, r, v); } }
(p+r)/2by
(p+r+1)/2?
void mergesort4 (int p, int r, int v[]) { if (p < r-1) { int q = (p + r) / 2; mergesort4 (p, q-1, v); mergesort4 (q-1, r, v); merge (p, q-1, r, v); } }
void mergesort5 (int p, int r, int v[]) { if (p < r-1) { q = r - 2; mergesort5 (p, q, v); if (v[r-2] > v[r-1]) { int t = v[r-2]; v[r-2] = v[r-1]; v[r-1] = t; } merge (p, q, r, v); } }
void mergesort6 (int p, int r, int v[]) { if (p < r-1) { q = r - 1; mergesort6 (p, q, v); merge (p, q, r, v); } }
Submit an array v[0..n-1] to the mergesort function. The size of the array is reduced by half in each step of the recursion. In the first round, the original instance of the problem is reduced to the instances v[0..n/2-1] and v[n/2..n-1]. In the second round, we have four instances:
v[0..n/4-1], v[n/4..n/2-1], v[n/2..3n/4-1] and v[3n/4..n-1].
And so on, until, in the last round, each instance has at most 1 element. The total number of rounds is approximately log n (therefore also approximately lg(n)).
In each round, the function merge moves 2n elements of the array v[0..n-1] (why?). Hence, the total number of moves executed to sort v[0..n-1] is approximately
2n log n .
It is easy to see that the time consumed by mergesort is proportional to the total number of moves, and therefore proportional to
n log n .
Hence, the algorithm is linearithmic. The number n log n grows much slower than n2 and only a little faster than n. If an array of size N requires T units of time, an array of size 2N will require less than 2.2 T units of time, provided N is greater than 210. Likewise, an array of size 4N will require less than 4.4 T units of time, provided N is greater than 220. (Check the math!)
The time consumption of Mergesort is proportional to n log n while that of the basic algorithms is proportional to n2. But the factor of proportionality is larger in the case of Mergesort, since the code is more complex. Hence, Mergesort only becomes really faster than the basic algorithms when n is sufficiently large. (This is a very common phenomenon: sophisticated algorithms are typically slower than simple algorithms when the amount of data is small.)
int c = 1; for (int i = 0; i < n; i *= 2) for (int 0 = 1; j < n; ++j) c += 1;
pureMergesort when the array is small.)
int max (int p, int r, int v[]) { if (p == r) return v[r]; else { int q = (p + r)/2; int x = max (p, q, v); iny y = max (q+1, r, v); if (x >= y) return x; else return y; } }
Is the function correct? Is it faster than the obvious iterative function? How many times does the function call itself? Suppose that p and r are 0 and 6 respectively and show the (duly indented) sequence of calls of max.
The Mergesort algorithm can be implemented in iterative style. In each iteration, we merge two consecutive blocks of b elements each: the first block with the second, the third with the fourth, and so on. Variable b assumes the values 1, 2, 4, 8, . . .
// This function rearranges the array
// v[0..n-1] in increasing order.
void
imergesort (int n, int v[])
{
int b = 1;
while (b < n) {
int p = 0;
while (p + b < n) {
int r = p + 2*b;
if (r > n) r = n;
merge (p, p+b, r, v);
p = p + 2*b;
}
b = 2*b;
}
}
The figure illustrates the iteration in which b is 2:
0 | p | p+b | p+2b | n-1 | ||||||
111 | 999 | 222 | 999 | 333 | 888 | 444 | 777 | 555 | 666 | 555 |
There are many interesting animations and visualizations of the iterative version of Mergesort:
broom. Each hair of the
broomis an element of the array and the inclination of the hair is the value of the element. Here are some standalone presentations of the animation:
whilein imergesort? What are the invariants of the inner
while?