. . . there is a huge difference
between working programs and good programs.
A good program not only works,
but is easy to read and maintain.
— P. A. Darnell, Ph. E. Margolis,
Software Engineering in C
Programs must be written for people to read,
and incidentally for machines to execute.
— H. Abelson, G. Sussman,
The Structure and Interpretation of Computer Programs
Computing walks on three legs:
correctness,
efficiency, and
elegance.
—
Imre Simon
An array is a data structure that stores a sequence de objects, all of the same type, in consecutive positions of the RAM (= random access) memory of the computer. An array allows random access: any element of the array can be reached directly, without going through the previous elements (the tenth element, for example, can be reached without going first through the first, the second, etc., the ninth elements).
How does one search for a certain object in an array? How does one insert a new object in an array? How does one delete an element from the array? These problems will be used as a pretext to show examples of algorithms and to illustrate the concepts of correctness, efficiency, and elegance. The three problems — search, insertion, and deletion — will reappear, in other contexts, in many chapters of this site.
Imagine that we wish to store n integer numbers in an array v. The space occupied by the array in memory may have been created by the declaration
int v[N];
where N is a constant (possibly defined by a #define directive). Assuming that the integers are stored in positions 0, 1, … , n-1, we shall say that
v[0..n-1] is an integer array
or an array of integers. Of course we must have 0 ≤ n ≤ N. If n == 0, the array v[0..n-1] is empty. If n == N, the array is full.
Table of contents:
int v[99]; for (i = 0; i < 99; ++i) v[i] = 98 - i; for (i = 0; i < 99; ++i) v[i] = v[v[i]];
The search problem consists of the following: Given an integer x and an array v[0..n-1] of integers, find x in v, that is, find an index k such that v[k] == x.
0 | 0 | n-1 | ||||||||||
443 | 222 | 555 | 111 | 333 | 444 | 555 | 666 | 888 | 777 | 888 | 999 | |
x | v |
A solution of this problem is an algorithm that receives the values of x, n, v and returns an index k. We are immediately faced with a design decision: what shall our algorithm return if there is no k such that v[k] == x? In other words, what shall our algorithm do if the instance of the problem has no solution?
Our design decision: return -1
if the instance has no solution.
This is a good convention since
-1 does not belong to the set 0..n-1
of valid
indices.
To implement this convention,
it is best to scan the array fom the end to the beginning:
// The function receives x, n >= 0 and v and
// returns an index k in 0..n-1 such that
// x == v[k]. If such k does not exist,
// returns -1.
int
find (int x, int n, int v[])
{
int k;
k = n-1;
while (k >= 0 && v[k] != x)
k -= 1;
return k;
}
It is easy to check that the function is correct. The function is also efficient and elegant: it does not waste time, it has no unnecessary instructions and no unnecessary variables, and it does not handle special cases (like the case n == 0, for example) in separate.
An equally correct, efficient, and elegant code
could be written with a for
in place of the while
:
for (int k = n-1; k >= 0; --k) if (v[k] == x) return k; return -1;
Bad examples.
To make a contrast with the function above,
here are some inelegant,
inefficient, or incorrect functions.
The first, very popular,
uses a
transit
, or signal
,
boolean variable:
int found = 0, k = n-1; while (k >= 0 && found == 0) { // inelegant if (v[k] == x) found = 1; // inelegant else k -= 1; } return k;
The second, also quite popular, unnecessarily handles the empty array in separate:
if (n == 0) return -1; // inelegant
else {
int k = n-1;
while (k >= 0 && v[k] != x) k -= 1;
return k;
}
The next function is inefficient (because it continues running after a solution has been found) and inelegant (because it unnecessarily initializes a variable):
int k = 0; // inelegant int sol = -1; for (k = n-1; k >= 0; --k) // inefficient if (v[k] == x) sol = k; // inelegant return sol;
Some functions seem reasonable at first sight, but are simply wrong. In the following example, the last iteration makes the mistake of accessing v[-1]:
int k = n-1;
while (v[k] != x && k >= 0) // wrong!
k -= 1;
return k;
(The comparisons should have been done in the correct order: k >= 0 && v[k] != x.)
Sentinel. We can make a different design decision and return n, instead of -1, in case x is not in v[0..n-1]. In this case, it is better to scan the array from the beginning to the end:
int k = 0; while (k < n && v[k] != x) k += 1; return k;
The function is even better if we use a sentinel to avoid the repeated comparisons of k with n:
v[n] = x; // sentinel
int k = 0;
while (v[k] != x)
k += 1;
return k;
(We are assuming here that the array is not full and therefore there is room for the sentinel.)
int k = 0; while (k < n && v[k] != x) k += 1; if (v[k] == x) return k; else return -1;
int sol; for (int k = 0; k < n; ++k) { if (v[k] == x) sol = k; else sol = -1; } return sol;
int maxi (int n, int v[]) { int m = v[0]; for (int j = 1; j < n; ++j) if (v[j-1] < v[j]) m = v[j]; return m; }
int maximum (int n, int v[]) { int x = v[0]; for (int j = 1; j < n; j += 1) if (x < v[j]) x = v[j]; return x; }
Does it make sense to subsitute x = 0
for x = v[0]
,
as some careless programmers do?
Does it make sense to substitute
x = INT_MIN
for x = v[0]
?
Does it make sense to substitute
x <= v[j]
for x < v[j]
?
[Partial solution: ./solutions/array10.html]
run) of zeros in an array of integers. Try to examine the least possible number of elements of the array.
Here is a recursive solution (see the Recursion and recursive algorithms chapter) of the search problem:
// Receives x, n >= 0 and v and returns k
// such that 0 <= k < n and v[k] == x.
// If such k does not exist, returns -1.
int rfind (int x, int n, int v[]) {
if (n == 0) return -1;
if (x == v[n-1]) return n-1;
return rfind (x, n-1, v);
}
How does this work? The number of relevant elements of v is n. If n is 0 then v[0..n-1] is empty and therefore x is not in v[0..n-1]. Now assume that n > 0; in this case, x is in v[0..n-1] if and only if
x is equal to v[n-1] or x is in v[0..n-2].
In short: the code reduces the instance that looks for x in v[0..n-1] to the instance that looks for x in v[0..n-2].
Bad example. The following alternative to the function rfind is very inelegant. It starts the recursion at n == 1, a case that demands some work to solve. (As a side-effect, the function is unable to handle the n == 0 instance of the problem.)
int ugly (int x, int n, int v[]) { if (n == 1) { // inelegant if (x == v[0]) return 0; // inelegant else return -1; } if (x == v[n-1]) return n-1; return ugly (x, n-1, v); }
int rfnd (int x, int n, int v[]) { if (n == 0) return -1; int k = rfnd (x, n-1, v); if (k != -1) return k; if (x == v[n-1]) return n-1; return -1; }
int rfnd (int x, int n, int v[]) { if (v[n-1] == x) return n-1; else return rfnd (x, n-1, v); }
int rfnd (int x, int n, int v[]) { if (v[n-1] == x || n == 0) return n-1; else return rfnd (x, n-1, v); }
Let's say that we wish to delete the element of the array v[0..n-1] whose index is k. To do this, we shall shift the segment v[k+1..n-1] of the array to the range k..n-2. For example, the result of the deletion of the element at index 3 in the array 0 11 22 33 44 55 will be the array 0 11 22 44 55 . Of course the problem makes sense if and only if 0 ≤ k < n.
The following function solves the problem and returns the value of the deleted element:
// This function receives an array v[0..n-1]
// and an index k such that 0 <= k < n.
// It deletes the element whose index is k
// and returns the value of that element.
int
delete (int k, int n, int v[])
{
int x = v[k];
for (int j = k+1; j < n; ++j)
v[j-1] = v[j];
return x;
}
Everything works well even when k is n-1 and when k is 0. Of course the deletion will take more time when k is small compared to n.
How to use this function? For example, to delete the element at index 51 of v[0..n-1] (assuming that 51 < n) it is enough to say
x = delete (51, n, v);
n -= 1; // updates the value of n
Recursive version. It is a good exercise to write a recursive version of the delete function. The size of an instance of the problem is measured by the number n-k and the instance is considered small if n-k is 1. Therefore, the base case of the recursion is the one where k is n-1.
// This function receives an array v[0..n-1]
// and an index k such that 0 <= k < n.
// It deletes the element whose index is k
// and returns the value of that element.
int rdelete (int k, int n, int v[]) {
int x = v[k];
if (k < n-1) {
int y = rdelete (k+1, n, v);
v[k] = y;
}
return x;
}
v[j-1] = v[j]into
v[j] = v[j+1]in the code of function delete?
for (int j = n-1; j > k; --j) v[j-1] = v[j];
for (int j = k+1; j < n; ++j) v[j-1] = v[j]; v[n-1] = 0;
if (k < n - 1) for (int j = k+1; j < n; ++j) v[j-1] = v[j];
int rdelete2 (int k, int n, int v[]) { int x = v[k]; if (k < n-1) { rdelete2 (k, n-1, v); v[n-2] = v[n-1]; } return x; }
Given an array v[0..n-1] of numbers, we wish to insert a number x between the elements whose indices are k-1 and k. This makes sense not only when 1 ≤ k ≤ n-1 but also when k is 0 (insert at the beginning) and when k is n (insert at the end). That is, it makes sense for any k in the set 0..n.
Of course we must not do any insertion if the array is full. Therefore, make sure that n+1 ≤ N before calling the function.
// This function inserts x between the
// positions k-1 and k of the array v[0..n-1]
// assuming that 0 <= k <= n.
void
insert (int k, int x, int n, int v[])
{
for (int j = n; j > k; --j)
v[j] = v[j-1];
v[k] = x;
}
The function works well even when we wish to insert at the beginning of the array and when we wish to insert at the end.
For exemple, to insert a new element having value 999 between the positions 50 and 51 (assuming 51 ≤ n) all we have to do is
insert (51, 999, n, v);
n++; // updates n
Recursive version. It is a good exercise to write a recursive version of insert:
// This function inserts x between the
// positions k-1 and k of the array v[0..n-1]
// assuming that 0 <= k <= n.
void rinsert (int k, int x, int n, int v[]) {
if (k == n) v[n] = x;
else {
v[n] = v[n-1];
rinsert (k, x, n - 1, v);
}
}
int rinsert2 (int k, int x, int n, int v[]) { if (k == n) { v[n] = x; return n + 1; } else { int y = v[k]; v[k] = x; return rinsert2 (k+1, y, n, v); } }
Now consider a combined search and delete problem. Suppose we wish to delete all the zero elements from the array v[0 .. n-1]. Of course the problem makes sense with any n ≥ 0 (the case where n == 0 is trivial). For exemple, if n is 7 and v[0..6] is
11 22 0 0 33 0 44
then the resulting array must be 11 22 33 44 . The function must return the number of elements of the array after the deletion of all zeros.
Here is an elegant and efficient solution of the problem:
// This function deletes all the zero elements
// of v[0..n-1]. It assumes only that n >= 0.
// The function leaves the result in v[0..i-1]
// and returns i.
int
deletezeros (int n, int v[])
{
int i = 0;
for (int j = 0; j < n; ++j)
if (v[j] != 0)
v[i++] = v[j];
return i;
}
(The instruction
v[i++] = v[j]
has the same effect as
v[i] = v[j]; i += 1
.)
At the beginning of each iteration, the following invariant holds:
v[0..i-1]
is the array that results from the deletion of all zeros from
v[0..j-1];
of course i ≤ j.
0 | i | j | n | |||||||||
999 | 999 | 999 | 999 | 999 | 000 | 999 | 000 | 999 | 000 | 999 | 000 | 999 |
no-zeros version of original v[0..j-1] | garbage |
Everything works well even when n is 0. It also works well when the array has no zeros. It also works well when the array has nothing but zeros.
To delete all zeros from an array v[0..n-1], just say
n = deletezeros (n, v);
Bad examples. The following version is worse than the one above because it wastes space (by using an auxiliary array of n elements):
int vv[1000], i = 0; for (int j = 0; j < n; ++j) if (v[j] != 0) vv[i++] = v[j]; for (int j = 0; j < i; ++j) v[j] = vv[j]; return i;
Now a version that is simply wrong (why?):
for (int i = 0; i < n; ++i) if (v[i] == 0) { for (int j = i; j+1 < n; ++j) v[j] = v[j+1]; n -= 1; } return n;
Next, one more inelegant and inefficient version.
It is inefficient
because the instruction v[j] = v[j+1]
does not copy v[j+1] to its final place
and therefore has to be repeated many times:
int i = 0; while (i < n) { if (v[i] != 0) i += 1; else { for (int j = i; j+1 < n; ++j) v[j] = v[j+1]; --n; } } return n;
To better feel the inefficiency of this version, consider the following example. Suppose that n is 100 and that all the elements of v except v[99] are zeros. The version above will copy elements of v from one place to another 4950 times, while the first version of deletezeros will copy elements only once!
Next, and even more inelegant and inefficient version.
Changing the value of i within
the for
controlled by i
is a very bad way to achieve the desired effect.
Moreover,
variable j is unnecessarily initialized
and the part i < n-1
of the if
condition
is supefluous.
int j = 0;
for (int i = 0; i < n; ++i)
if (i < n-1 && v[i] == 0) {
for (j = i; j+1 < n; ++j)
v[j] = v[j+1];
--n;
--i; // bad...
}
return n;
The following version is no more efficient than the previous one, but at least it is not so crooked:
for (int i = n-1; i >= 0; --i) if (v[i] == 0) { for (int j = i; j < n-1; ++j) v[j] = v[j+1]; --n; } return n;
Recursive version.
Finally, here is a recursive version
of deletezeros.
Notice how the instruction v[m] = v[n-1]
puts v[n-1]
in its final place.
// The function rdeletezeros deletes all the
// zero elements of v[0..n-1]. It leaves the
// result in v[0..i-1] and returns i.
int rdeletezeros (int n, int v[]) {
if (n == 0) return 0;
int m = rdeletezeros (n - 1, v);
if (v[n-1] == 0) return m;
v[m] = v[n-1];
return m + 1;
}
Next, a crooked and inefficient version. Unlike the previous version, it does not immediately put each element of the array in its final place. Observe also that there are two recursive calls and they have different forms; a bad sign!
// Receives 0 <= k <= n and deletes the zeros
// from v[k..n-1]. The array without the zeros
// will have the form v[k..m-1]. The function
// returns the value of m.
int rbad (int k, int n, int v[]) {
if (k == n) return n;
if (v[k] != 0)
return rbad (k+1, n, v);
for (int i = k; i < n-1; ++i) v[i] = v[i+1];
return rbad (k, n-1, v);
}
Sometimes, a recursive version of a function requires additional parameters, as we see here with the parameter k. It is essential to explain to the user, as we did above, what is the role of the additional parameter. (But, of course, not even the best of explanations will fix an ill-conceived function.)
int delete0 (int n, int v[]) { int z = 0; for (int i = 0; i < n; ++i) { if (v[i] == 0) z += 1; v[i-z] = v[i]; } return n - z; }
int delete0 (int n, int v[]) { int z = 0; for (int i = 0; i < n - z; ++i) { if (v[i] == 0) { z += 1; for (int k = i; k < n - z; ++k) v[k] = v[k+1]; --i; } } return n - z; }
i 222
(followed by ↵) the number 222 is inserted into the collection. If the user types
d 333
one copy of the number 333 is deleted from the collection. (If there is no copy, the command is ignored.) If the user types any character other than 'i' or 'd', the execution of the program terminates. After each insertion or deletion the program must display the collection. The collection should be kept in increasing order throughout the process. [Solution: ./solutions/array2.html]
Answer: No. The additional parentheses are superfluos since the operators >= and != take precedence over &&.
Answer:
In an expression like function(arg1, arg2)
,
the name of the function almost overlaps the first argument,
making things difficult to read.
See layout tips in the Code layout chapter.
int v[n];
where the value of n only becomes known during the execution of the program?
Answer: No. Better avoid doing this.