Many array sorting algorithms (like Insertionsort, Selectionsort, Mergesort, Quicksort, and Heapsort) are based on comparisons between the elements of the array. This chapter introduces a sorting algorithm based on counting the elements of the array. This algorithm, in turn, is the mainspring of the Radixsort method for sorting short strings. (This chapter was adapted from section 5.1 of the book by Sedgewick and Wayne.)
Table of contents:
Suppose we want to rearrange in increasing order an array v[0 . . n−1] whose elements are integers in the range
0 . . R−1
where R is a small number. This range is the universe of the array. The sorted array will have a run of elements equal to 0, followed by a run of elements equal to 1, etc., and finally a run of elements equal to R−1. The figure shows an array over the universe 0 . . 4 before and after sorting:
2 | 3 | 3 | 4 | 1 | 3 | 0 | 3 | 1 | 2 | 2 | 1 | 2 | 4 | 3 | 4 | 4 | 2 | 3 | 4 |
0 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 |
To sort an array v[0..n-1] whose universe is known in advance, we can resort to a simple counting procedure which does not compare one element to another. To begin, we need the idea of frequency.
The frequency of an element r of the universe 0..R-1 is the number of occurrences of r in the array v[0..n-1]. This number will be denoted by f[r]. In the example above, where the universe is 0..4, the frequencies are given by the following table f[0..4]:
0 | 1 | 2 | 3 | 4 | |
f | 1 | 3 | 5 | 6 | 5 |
The frequency of predecessors of an element r of 0..R is the sum f[0] + ... + f[r-1], denoted by fp[r]. Note that we added R to the universe and that the sum that defines fp[r] does not include f[r]. Of course fp[0] is zero since the sum is empty. In the example above, the frequency of predecessors is given by the following table:
0 | 1 | 2 | 3 | 4 | 5 | |
fp | 0 | 1 | 4 | 9 | 15 | 20 |
Observe that fp[r] is the starting index of the run of elements equal to r in the sorted array. In the example above, the run of 1's begins at index 1, the run of 2's begins at index 4, the run of 3's begins at index 9, etc.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
0 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 |
Now, the array v[0..n-1]
can be sorted in two steps:
first,
a table fp is used
to copy the elements of v[0..n-1]
to the correct
positions of an auxiliary array
aux[0..n-1];
then,
the array aux is copied back to the
space occupied by v.
The following code implements this algorithm:
// This function rearranges v[0..n-1] in // increasing order assuming that the // elements of the array belong to the // universe 0..R-1. void countingsort (int *v, int n, int R) { int r; int *fp, *aux; fp = malloc ((R+1) * sizeof (int)); aux = malloc (n * sizeof (int)); for (r = 0; r <= R; ++r) fp[r] = 0; for (int i = 0; i < n; ++i) { r = v[i]; fp[r+1] += 1; } // A for (r = 1; r <= R; ++r) fp[r] += fp[r-1]; // B for (int i = 0; i < n; ++i) { r = v[i]; aux[fp[r]] = v[i]; fp[r]++; // C } // D for (int i = 0; i < n; ++i) v[i] = aux[i]; free (fp); free (aux); }
On line A, fp[r] is the frequency of r-1, i.e., fp[r] == f[r-1] for every r in 0..R-1. On line B, fp[r] is the frequency of predecessors of r; hence, fp[r] will be the starting index of the run of elements whose value is r. Line C increments fp[r] in preparation for the next element of value r in the array. On line D, aux[0..n-1] is in increasing order.
Time consumption. The function countingsort consumes n + R units of time. If R is small (no more than some multiple of n), this is better than the n log n time bound of algorithms like Mergesort, Quicksort, and Heapsort.
Stability. The function countingsort produces a stable sort (i.e., it does not change the relative positions of equal-value elements). This stability is the basis for the use of the function in the digital sorting algorithm to be examined in the next section.
int *f = malloc (R * sizeof (int)); for (int r = 0; r < R; ++r) f[r] = 0; for (int i = 0; i < n; ++i) f[v[i]] += 1; int i = 0; for (int r = 0; r < R; ++r) for (int k = 0; k < f[r]; ++k) v[i++] = r; free (f);
Let v[0..n-1] be an array of strings, all of the same length. (You may assume that all the strings are ASCII, though this is not essential.) Now consider the problem of
rearranging the array in lexicographic order.
In the context of this problem, the bytes of the strings are traditionally called digits, even if they do not belong to the set 0..9.
Example. Suppose that the strings are car licence plates. (We could restrict ouselves to the forty three characters from '0' to 'Z'.) Here is an example with 10 plates, each 7 digits long:
FON1723 EAD3312 CDA7891 FAJ4021 DOG1125 BAT7271 GIZ1234 BAT7328 BIG8733 CAT9955
(This example is from the book by Sedgewick and Wayne.) To put this array in lexicographic order, we can call countingsort several times: first, to sort by the last digit, then to sort by the one-but-last digit, and so on:
CDA7891 EAD3312 FAJ4021 DOG1125 CDA7891 EAD3312 BAT7271 FAJ4021 FAJ4021 DOG1125 GIZ1234 EAD3312 FAJ4021 BAT7328 BAT7271 FON1723 GIZ1234 FON1723 DOG1125 BAT7271 BIG8733 EAD3312 DOG1125 BAT7271 EAD3312 BIG8733 BAT7328 CAT9955 FON1723 BAT7328 EAD3312 FAJ4021 FAJ4021 CAT9955 CDA7891 BIG8733 BIG8733 BAT7328 BAT7271 FON1723 CDA7891 DOG1125 GIZ1234 GIZ1234 FON1723 BAT7328 BAT7271 BIG8733 EAD3312 DOG1125 CAT9955 BIG8733 CDA7891 BAT7328 GIZ1234 FAJ4021 CAT9955 BAT7271 CDA7891 BIG8733 CAT9955 DOG1125 FON1723 BAT7328 CDA7891 CAT9955 CAT9955 GIZ1234 FON1723 GIZ1234
Since countingsort is stable,
the array of strings will be eventually in lexicographic order.
Notice that it is essential to apply countingsort
from right to left
,
i.e., starting from the last digit.
Algorithm. The following code implements the algorithm used in the example. The parameter W represents the common length of all the strings (7 in the example above) and the parameter R (smaller than 256) is the size of the universe of digits:
typedef unsigned char byte;
// Rearranges in lexicographic order the
// array v[0..n-1] of strings. Each v[i]
// is a string v[i][0..W-1] whose elements
// belong to the set 0..R-1.
void
digitalsort (byte **v, int n, int W, int R)
{
int *fp;
byte **aux;
fp = malloc ((R+1) * sizeof (int));
aux = malloc (n * sizeof (byte *));
for (int d = W-1; d >= 0; --d) {
int r;
for (r = 0; r <= R; ++r)
fp[r] = 0;
for (int i = 0; i < n; ++i) {
r = v[i][d];
fp[r+1] += 1;
}
for (r = 1; r <= R; ++r)
fp[r] += fp[r-1];
for (int i = 0; i < n; ++i) {
r = v[i][d];
aux[fp[r]] = v[i];
fp[r]++;
}
for (int i = 0; i < n; ++i)
v[i] = aux[i];
}
free (fp);
free (aux);
}
The code ignores the null byte, \0, at the end of each string and therefore can be applied to any array of arrays of bytes (provided the value of each byte is in the range 0..R-1).
This sorting algorithm is digital because it sorts the strings digit-by-digit. The algorithm is also known as
Time consumption. The digitalsort function consumes (n+R)W units of time. If R is small (at worst n), the time consumption is proportional to Wn, that is, to the total number of digits in the input.
Animations. There are several animations of the Radixsort algorithm on YouTube. Here is a sample:
no is th ti fo al go pe to co to th ai of th pa