Just as binary trees are a generalization of linked lists, binary search trees (or BSTs) are a generalization of increasing linked lists.
Table of contents:
Consider a binary tree whose nodes have a key field of a type (such as int, char, string, etc.) that allows comparisons. In this chapter, we assume that the keys are of type int. We shall also assume that the nodes of the tree have the following structure:
typedef struct record { int key; int contents; struct record *lft, *rght; } node;
A binary tree of this kind is a search tree if every node p has the following property: ⚠ the key of p is (1) greater than or equal to the key of every node of the left subtree of p and (2) smaller than or equal to the key of every node of the right subtree of p. In other words, if p is any node then
o->key ≤ p->key ≤ q->key
for every node o in the left subtree of p and every node q in the right subtree of p.
Here is another way to describe the defining property of a search tree: the left-root-right order of the keys is increasing.
Consider the following problem: Given a search tree r and a number k, find a node in the tree whose key is k. The following recursive function solves the problem:
// Receives a search tree r and a
// number k. Returns a node whose
// key is k; if such node does not
// exist, returns NULL.
node
*search (tree r, int k) {
if (r == NULL || r->key == k)
return r;
if (r->key > k)
return search (r->lft, k);
else
return search (r->rght, k);
}
Here is an iterative version of the same function:
while (r != NULL && r->key != k) { if (r->key > k) r = r->lft; else r = r->rght; } return r;
The animation below
(copied from
https://
In the worst case, the search consumes time proportional to the height of the tree. If the tree is balanced, the time consumption will be proportional to log n , where n is the number of nodes. This time is of the same order as that of a binary search in a sorted array.
Consider the problem of inserting a new node, with key k, into a search tree. The resulting tree must also be a search tree. First, define the new node, which will be a leaf of the resulting tree:
node *new; new = malloc (sizeof (node)); new->key = k; new->lft = new->rght = NULL;
The function that solves the problem must, of course, return the root of the new tree. The function is particularly simple and elegant when written in recursive style:
// The function receives a search tree r and
// a loose new leaf. It inserts the leaf
// into the tree so that the resulting tree
// is a search tree. The function returns
// the root of the resulting tree.
tree
insert (tree r, node *new) {
if (r == NULL) return new;
if (r->key > new->key)
r->lft = insert (r->lft, new);
else
r->rght = insert (r->rght, new);
return r;
}
The root of the resulting tree is the same as that of the original tree unless the original tree was empty.
Consider the problem of deleting a node from a search tree in such way that the tree remains a search tree. This problem is trickier than search and insertion.
We begin by dealing with the instances in which the node to be deleted, say r, is the root of the tree. If r has only one child, just make that child assume the role of root. Else, let q be the predecessor of r in the left-root-right order and let q assume the role of root.
The figure illustrates the before-and-after of the deletion of node r. The nodes are numbered in left-root-right order. The predecessor of r is q. The node q becomes the root, the children of r become the children of q, and f becomes the (right) child of p.
// Receives a nonempty search tree r. Deletes // the root of the tree and rearranges the // tree so that it remains a search tree. // Returns the new root. tree deleteroot (tree r) { node *p, *q; if (r->lft == NULL) { q = r->rght; free (r); return q; } p = r; q = r->lft; while (q->rght != NULL) { p = q; q = q->rght; } // q is the predecessor of r // p is the parent of q if (p != r) { p->rght = q->lft; q->lft = r->lft; } q->rght = r->rght; free (r); return q; }
It is easy now to delete a node other than the root of the tree. In order to delete the left child of a node x, do
x->lft = deleteroot (x->lft);
and in order to delete the right child of x, do
x->rght = deleteroot (x->rght);
How much time do the search, insertion, and deletion algorithms consume? Of course this time is bounded by (some multiple of) the number of nodes, say n, of the tree (since no node is visited more than 3 times). But this is a rather rough answer. Here is a finer answer: in the worst case, any of the algorithms
consumes an amount of time proportional to the height of the tree.
Hence, we should work with trees that are as balanced as possible, i.e., trees whose height is as close to log n as possible. In other words, we should work with trees in which all the leaves have approximately the same depth.
Unfortunately this is not easy to do. The algorithms of insertion and deletion described above do not produce balanced trees. If the insert function is repeatedly applied to a balanced tree, the result can be quite unbalanced. Something analogous can happen after a sequence of calls to the deleteroot function.
To deal with this situation,
we must invent more sophisticated and complex
insertion and deletion algorithms.
These algorithms must perform a rebalancing of the tree
after each insertion and each deletion.
I will only mention two quaint
technical terms:
there is a package of insertion and deletion algorithms that produces
red-black trees
;
there is another package that produces
AVL trees
.
typedef struct record { int key, contents; struct record *lft, *rght; } node;
If x does not belong to the tree, the function must return NULL. The time consumption of your function must be bounded by (some multiple of) the depth of x. (Give a solution that is better than the answers to the question How can we find the parent of a node in a given binary tree if it does not have a pointer for the parent? on Quora.)