An instance of a computational problem is a particular case of the problem, with specific values of the parameters (i.e., input data) of the problem. Each set of values of the parameters defines an instance.
Consider, for example, the following problem: given numbers a, b, and c, find a number x such that ax2 + bx + c = 0. (The numbers a, b, and c are the parameters of the problem.) One instance of this problem consists of finding x such that 1x2 + 1x − 2 = 0. Another instance consists of finding x such that 1x2 − 2x + 1 = 0. Yet another instance asks for x such that 123x2 − 456x + 789 = 0.
A solution of an instance of a computational problem is any object that satisfies the statement of the problem instance. Take, for example, the problem in the previous note; then
A solution of a problem is an algorithm that solves the problem. In other words, for any values of the parameters, the algorithm produces a solution of the corresponding instance (or signals that the instance has no solution). For the problem stated above, an algorithm can be summarized by the formula (−b ± √. ) / 2a
An algorithm is correct if it solves the problem that it is supposed to solve.
Elegant is more or less the same as simple, clean, beautiful, economical. An elegant algorithm (or program) has no superfluous code and no unnecessary variables. An elegant algorithm does not treat special instances of the problem in separate.
A very clever, intricate, convoluted, tangled, ugly algorithm is inelegant.
Experience shows that elegante algorithms tend to be efficient, and efficient algorithms are usually elegant. In addition, elegant iterative processes have simpler invariants.
An algorithm is efficient if does not waste time. Roughly speaking, an algorithm is efficient if it is faster than others algorithms for the same problem.
Though this definition
is adequate for informal use,
it hides some difficulties:
faster than othersshould include not only the already known algorithms for the problem, but also the ones not yet discovered.
We could say, very roughly, that
to be efficient is to do more with less
.
The following definitions refer to the time consumption
(i.e., the response time
)
of algorithms.
In all the definitions,
n is the size of an instance of the problem that the algorithm solves,
i.e.,
n is the size
of the input to the algorithm.
(If the input is a graph, for example,
then n is the
the number of vertices plus the number of edges of the graph.)
An algorithm is
These terms refer, usually, to the time consumption in the worst case.
A data type is a set of values equipped with a set of operations. Some operations transform values into other values; others associate numbers to values or sets of values. Here are some basic data types:
The operations on these values are the comparisons ( <, <=, ==, >, >= ) and the arithmetic operations ( +, -, *, / ). Each arithmetic operation receives two ints and returns an int. The definitions of these operations are a little different from the ones used in mathematics, since we can have overflow in some cases and truncation in others. For example, the value of 9 / 2 is 4 (not 4.5). Another example: if INT_MIN and INT_MAX are −32768 and +32767 then 32767 + 1 is −32768, not 32768.
The usual operations on these values are indicated by <, <=, ==, >, >=, + and -. In an ideal implementation of the data type, the value of the expression 127 + 2 would be −127 rather than 129. In C, however, the value of the expression x + y below is 129, since the expression is evaluated in int arithmetic.
char x, y, z; x = 127; y = 2; z = x + y; // the value of z is -127
real numbers) is more difficult to describe.
You can define your own data types using typedef. The following data type, for example, is a useful representation of boolean values:
typedef enum {FALSE, TRUE} boolean;
(An alternative data type is defined in the stdbool.h interface.) Another useful data type is string (a sequence of bytes):
typedef char *string;
Here is a more specialized example: the data type point represents the coordinates of a point on the plane:
typedef struct { double x; double y; } point;
The operations on values of this type are derived from the operations on doubles. You can also add your own operations, like the operation that produces the Euclidean distance between two points, for example.
realnumbers
Computers are not aware of real numbers in the mathematical sense. Computers only know a small set of rational numbers.
The world of computing calls reals
the numbers of type
float and
double
(such as 0.31415926 × 101),
represented in
floating point notation.
All these numbers are rational.
A priority queue is any data type equipped with two operations:
It is easy to implement a priority queue in such a way than one of the operations is fast. It is more challenging to come up with an implementation in which both operations are fast.
The definition just given is that of a max
priority queue.
It is not difficult to adapt the definition
to that of a min
priority queue.
Do not mistake the heap data structure with the area of the computer memory used for dynamic memory allocation. The two concepts are independent.
Natural numbers can be written in binary (base 2) notation, octal (base 8) notation, decimal (base 10) notation, and hexadecimal (base 16) notation.
In binary notation, the digits are 0 and 1. In octal notation, the digits are 0 1 .. 7. In decimal, the digits are 0 1 .. 9. And in hexadecimal, the digits are 0 1 .. 9 A B .. F.
For example, the number seventy-nine
is represented by
79
in decimal notation
since seventy-nine is
7×10 + 9.
The same number is represented by
1001111
in binary notation,
since seventy-nine is equal to 1×26 + 1×23 + 1×22 + 1×21 + 1×20.
The same number is represented by
117
in octal notation
(since seventy-nine is equal to 1×8² + 1×8¹ + 7),
and by 4F
in hexadecimal notation
(since seventy-nine is equal to 4×16¹ + F).
The C language uses the following convention:
For example, 0117 is the octal representation and 0x4F is the hexadecimal representation of seventy-nine.
Diacritics are the marks over letters (like á, ã, é, ñ, etc.) and the cedilla under a c.
The C99 standard of the C language allows letters with diacritics in names of variables and functions. But the letters with diacritics must be represented (in the source file of your program) by means of its Unicode numbers. For example, if you wish the name of a variable to be piñacolada, you must type pi\u00F1acolada. This complication makes the resource somewhat useless in practice…
In the case of character constants, however, the things are more friendly: letters with diacritics can be typed in directly (provided your languages environment is appropriate). For example,
int pinacolada = 0; // piñacolada char message[14] = "values in €"; printf( "%s\n", message);
produces
values in €
(Note that the string "values in €" occupies 14 bytes since € uses 3 bytes in UTF-8 encoding.)
Until about 2004, the bytes 10000000 to 11111111 were used to represent letters with diacritics and some others characters. This correspondence is defined by the ISO-LATIN-1 table, also known as ISO-8859-1 table.
With the popularization of the UTF-8 code, the ISO-LATIN-1 table was deprecated. Hence, the bytes whose first bit is 1 (unsigned chars 128 to 255) no longer represent characters.
The UTF-8 code represents each character by a sequence of 1 to 4 bytes. The coding scheme was carefully designed to allow for the unambiguous decoding of a sequence of bytes and to be compatible with the ASCII code.
The first 128 characters on the Unicode list (numbers U+0000 to U+007F) are represented by 1 byte each. The 1920 next characters (numbers U+0080 to U+07FF) are encoded in 2 bytes. And so on.
The table below shows the UTF-8 code structure. In the left column, we have the intervals of Unicode numbers, in hexadecimal notation. In the right column, in binary notation, the intervals of the corresponding code bytes:
Unicode numbers byte 1 byte 2 byte 3 byte 4 00000000 .. 0000007F 0xxxxxxx 00000080 .. 000007FF 110xxxxx 10xxxxxx 00000800 .. 0000FFFF 1110xxxx 10xxxxxx 10xxxxxx 00010000 .. 0010FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Now, the same table written in decimal notation:
0 .. 127 000..127 128 .. 2047 192..223 128..191 2048 .. 65535 224..239 128..191 128..191 65536 .. 1114111 240..247 128..191 128..191 128..191
Finally, the same table written in hexadecimal notation:
0 .. 7F 00..7F 80 .. 7FF C0..DF 80..BF 800 .. FFFF E0..EF 80..BF 80..BF 10000 .. 10FFFF F0..F7 80..BF 80..BF 80..BF
First, a very rough draft of the algorithm:
// This function decides whether a string s contains a // well-formed sequence of parentheses and brackets. wellformed (string s) { for each element c of s do { if c ≡ ')' then if top of stack has '(' then pop stack else if c ≡ ']' then if top of stack has '[' then pop stack else push c on stack } if stack empty then returnyeselse returnno}
Now, a draft that is closer to actual code:
wellformed (string s) { allocate space for a stack of bytes for each byte c in s do { if c ≡ ')' then if stack nonempty and top of stack is '(' then pop else returnnoelse if c ≡ ']' then if stack nonempty and top of stack is '[' then pop else returnnoelse push c } if stack empty then returnyeselse returnno}
Here is a rough pseudocode draft in of the infixToPostfix function:
// This function receives an infix expression inf // and returns the corresponding postfix expression. infixToPostfix (string inf) { for each character c of inf do { depending on the value of c push c onto stack or add c to the postfix expression or transfer characters from stack to postfix } add '\0' to the postfix expression return postfix }
Now, a more precise and complete draft:
infixToPostfix (string inf) { allocate space for postfix string post allocate space for stack of bytes // first byte of inf is a '(' push first byte of inf onto stack for each byte c of inf do { if c ≡ '(' then push c else if c ≡ ')' then while top of stack is different from '(' pop x put x in post pop x else if c is '+' or '-' then while top of stack is different from '(' pop x put x in post push c else if c is '*' or '/' then while top of stack is neither '(' nor '+' nor '-' pop x put x in post push c else put c in post } put '\0' in post return post }
Here is um draft of the distances algorithm:
// Receives matrix A of interconnections // between 6 cities. Returns array d such that // d[i] is the distance from city c to city i. distances (matrix A[0..5][0..5], int c) { d[c] = 0 put c into the queue while queue is not empty { let i be the first city in the queue for each neighbor j of i do if d[j] not yet defined then d[j] = d[i] + 1 put j into queue remove i from queue } return d }
Here is a draft, in pseudocode, of the sieve function:
sieve (array v[1..m]) {
p = 1
while 2p ≤ m do
let c be the most valuable child of p
if v[p] ≥ v[c] then stop
else interchange v[p] and v[c]
else p = c // p moves down the heap
}
Here is a draft, in pseudocode, of the heapsort function:
// Rearranges v[1..n] in increasing order. heapsort (v[1..n]) { transform v[1..n] into a heap for m = n, n−1, ... , 2 do interchange v[1] and v[m] sieve (v[1..m-1]) }
Problem-solving skills are almost unanimously
the most important qualification
that employers look for […]
more than programming languages proficiency,
debugging, and system design.
What should you do when you need to invent an algorithm for a new computational problem?
To properly understand the problem, write its statement on a piece of papel, draw a diagram, explain the problem to a colleague. Try to solve some simple instances of the problem. Try to solve extreme instances, like the very small and the ones that have no solution.
(This note was inspired in the article How to think like a programmer — lessons in problem solving, by Richard Reis.)