The trees in computer science
have the curious tendency to grow downwards…
A binary tree is a data structure more general than a linked list. This chapter introduces some basics operations on binary trees. The next chapter, Binary search trees, will discuss a fundamental application.
Table of contents:
A binary tree is a set of records that satisfy certain conditions. The conditions will not be given explicitly, but they will become clear in the context. The records will be called nodes (but could also be called cells). Each node has an address. We shall assume, for now, that each node has only three fields: an integer number and two pointers to nodes. The nodes can, then, be defined as follows:
typedef struct record { int contents; node *lft; node *rght; } node;
The contents field
is the payload
of the node;
the other two fields define the structure of the tree.
The lft field of each node contains NULL
or the address of another node.
An analogous assumption holds for the rght field.
If the lft field of a node P is the address of a node L, we shall say that L is the left child of P. Similarly, if P.rght is equal to &R, we shall say that R is the right child of P. If a node C is a child (left or right) of P, we shall say that P is the parent of C. A leaf is a node that has no children.
When talking about binary trees,
it is very convenient to
blurr the distinction between a node and its address.
So, if x is a pointer to a node
(that is, if x is of type *node),
we say consider the node x
instead of consider the node whose address is x
.
The figure below shows two examples of binary trees. On the left, a curious binary tree in nature. On the right, a schematic representation of a binary tree whose nodes contain the numbers 2, 7, 5, etc.
Suppose that x is a node. A descendant of x is any node that can be reached by iterating the instructions x = x->lft and x = x->rght in any order. (Of course these instructions can only be iterated while x is not NULL. We are assuming that NULL is eventually reached.)
A node x together with all its descendants is a binary tree. We say that x is the root of the tree. If x has a parent, this tree is a subtree of some larger tree. If x is NULL, the tree is empty.
For any node x, the node x->lft is the root of the left subtree of x and x->rght is the root of the right subtree of x.
The address of a binary tree
is the address of its root.
In verbal communication,
it is convenient to
blurr the distinction between trees and their addresses:
we say consider a tree r
instead of
consider a tree whose root has address r
.
This convention suggests the introduction of the alternative name
tree
for the data type pointer-to-node:
typedef node *tree;
Given this convention, we can give a recursive definition of binary tree: a pointer-to-node r is a binary tree if
Many algorithms on trees become simpler when written in recursive style.
A binary tree can be traversed (i.e., scanned) in many different ways. One particularly important order of traversal is left-root-right, also known as inorder traversal, or infix traversal. The left-root-right traversal visits
In the following figure, the nodes are numbered in the order in which they are visited by the left-root-right traversal.
Here is a recursive function that does the left-root-right traversal of a binary tree:
// Receives the root r of a binary tree
// and prints the contents of its nodes
// in left-root-right order.
void leftrootright (tree r) {
if (r != NULL) {
leftrootright (r->lft);
printf ("%d\n", r->contents);
leftrootright (r->rght);
}
}
Writing an iterative version of this function
is an excellent exercise.
The version below uses a stack of nodes.
The sequence of nodes on the stack
is a kind of to do
list:
each node x on the stack is a reminder that we must still
print the contents of x
and the contents of the right subtree of x.
The element at the top of the stack can be a NULL,
but all the other elements are non NULL.
void i_leftrootright (tree r) {
createstack (); // stack of nodes
push (r);
while (true) {
x = pop ();
if (x != NULL) {
push (x);
push (x->lft);
}
else {
if (stackisempty ()) break;
x = pop ();
printf ("%d\n", x->contents);
push (x->rght);
}
}
freestack ();
}
(The code would be slightly simpler —
but a little less readable —
if it were to manipulate the stack directly.)
The figure below is the trace of execution of the i_leftrootright
function.
Each line of the table gives the state of things
at the beginning of an iteration:
on the left is the stack and
on the right the printed nodes.
The value NULL is indicated by -
.
5 / \ 3 8 / \ / \ 1 4 6 9 / \ \ 0 2 7 |
stack print 5 5 3 5 3 1 5 3 1 0 5 3 1 0 - 5 3 1 - 0 5 3 2 1 5 3 2 - 5 3 - 2 5 4 3 5 4 - 5 - 4 8 5 8 6 8 6 - 8 7 6 8 - 7 9 8 9 - - 9 |
The exercises below discuss two other orders of traversal of a binary tree: the root-left-right traversal and the left-right-root traversal. The first one is also known as preorder traversal, or prefix traversal. The second one is also known as postorder traversal, or postfix traversal. (By the way, see the exercise about infix, postfix, and prefix notations below.)
void i_leftrootright (tree r) { node *x = r; createstack (); while (true) { while (x != NULL) { push (x); x = x->lft; } if (stackisempty ()) break; x = pop (); printf ("%d\n", x->contents); x = x->rght; } freestack (); }
The height of a node x in a binary tree is the distance between x and its most distant descendant. More precisely, the height of x is the number of steps in the longest path that leads from x to a leaf. The paths that this definition refers to are those obtained by iterating the instructions x = x->lft and x = x->rght, in any order.
The height of a tree is the height of the root of the tree. A tree with only one node has height 0. The figure shows a tree of height 3.
Here is how the height of a tree with root r may be computed:
// Returns the height of the binary tree r. int height (tree r) { if (r == NULL) return -1; // height of empty tree else { int he = height (r->lft); int hd = height (r->rght); if (he < hd) return hd + 1; else return he + 1; } }
What is the relation between the height, say h, and the number of nodes of a binary tree? If the tree has n nodes then
lg (n) ≤ h ≤ n−1 ,
where lg (n)
denotes ⌊log n⌋,
i.e., the floor of log n .
A binary tree of height n−1
is a trunk without branches
:
each node has at most one child.
At the other extreme,
a tree of height lg (n)
is quasi complete
:
all the levels
are full except perhaps the last one.
n lg(n)
4 2
5 2
6 2
10 3
64 6
100 6
128 7
1000 9
1024 10
1000000 19
A binary tree is balanced if, for each of its nodes, the left and right subtrees have nearly the same height. The height of a balanced binary tree with n nodes is close to log n. (The tree in the example above is balanced). We should work with trees that are balanced whenever possible. But this is not easy if the tree grows and contracts during the execution of your program.
Depth. The depth of a node s in a binary tree having root r is the distance from r to s. More precisely, the depth of s is the length of the only path that goes from r to s. For example, the depth of r is 0 and the depth of r->lft is 1.
A tree is balanced if all its leaves have nearly the same depth.
555 555 / \ 333 111 333 888 - / \ \ - 111 444 999 444 - - 888 - 999 - -
from left to right. Write a function that will print the contents of the nodes as it traverses the tree in level order. (Hint: Use a queue of nodes. See the calculation of distances in a graph.)
In some applications (see the next section) it is convenient to have direct access to the parent of each node. To achieve this, we must add a prnt field to each node:
typedef struct record { int contents; struct record *prnt; struct record *lft, *rght; } node;
Since the root of a tree has no parent, we need to take a design decision about the value of the prnt field in this case. If r is the root, we could set r->prnt = NULL. But it is better to set
r->prnt = r
when r is the root. Of course the root will be the only node of the tree to have this property.
Consider the following problem: find the first node of a tree in the left-root-right order. Here is a function that solves the problem:
// Receives a nonempty binary tree r and
// returns the first node of the tree in
// the left-root-right traversal.
node *first (tree r) {
while (r->lft != NULL)
r = r->lft;
return r;
}
It is not difficult to write an analogous function to find the last node in the left-root-right traversal.
Suppose x is a node of a binary tree. Our problem: find the successor of x, i.e., the next node after x in the left-root-right order. We assume, of course, that x is not NULL.
The following function solves the problem assuming that the nodes have prnt fields. The function returns the successor of x or returns NULL if there is no sucessor.
// Receives a node x. Returns the successor of // x in the left-root-right traversal or NULL // if x is the last node. The function assumes // that x != NULL. node *nextnode (node *x) { if (x->rght != NULL) { node *y = x->rght; while (y->lft != NULL) y = y->lft; return y; // A } while (x->prnt != NULL && x->prnt->rght == x) // B x = x->prnt; // B return x->prnt; }
(Note that the function does not need to know where the root of the tree is.) On line A, y is the first node (in the left-root-right order) of the subtree whose root is x->rght. The lines B move x up the tree while it is someone's right child.