Binary search trees

Just as binary trees are a generalization of linked lists, binary search trees (or BSTs) are a generalization of increasing linked lists.

Table of contents:

Definition
Search
Insertion
Deletion
Performance of the algorithms

Definition

Consider a binary tree whose nodes have a key field of a type (such as int, char, string, etc.) that allows comparisons. In this chapter, we assume that the keys are of type int. We shall also assume that the nodes of the tree have the following structure:

typedef struct record {
   int            key;
   int            contents;
   struct record *lft, *rght; 
} node;

A binary tree of this kind is a search tree if every node p has the following property: ⚠ the key of p is (1) greater than or equal to the key of every node of the left subtree of p and (2) smaller than or equal to the key of every node of the right subtree of p. In other words, if p is any node then

o->key ≤ p->key ≤ q->key

for every node o in the left subtree of p and every node q in the right subtree of p.

[Binary_search_tree.svg.png]

Here is another way to describe the defining property of a search tree: the left-root-right order of the keys is increasing.

Exercises 1

★ Suppose that x->lft->key ≤ x->key for every node x that has a left child and x->key ≤ x->rght->key for every node x that has a right child. Is this a search tree?
★ Important! Write a function to decide whether a given binary tree is a search tree.

Search

Consider the following problem: Given a search tree r and a number k, find a node in the tree whose key is k. The following recursive function solves the problem:

// Receives a search tree r and a
// number k. Returns a node whose
// key is k; if such node does not
// exist, returns NULL.

node 
*search (tree r, int k) {
    if (r == NULL || r->key == k)
       return r;
    if (r->key > k)
       return search (r->lft, k);
    else
       return search (r->rght, k);
}

Here is an iterative version of the same function:

    while (r != NULL && r->key != k) {
       if (r->key > k) 
          r = r->lft;
       else
          r = r->rght;
    }
    return r;

The animation below (copied from https://plus.google.com/+Geekboots/posts/FU9LXq9tKpE) shows the execution of search (r, 27):

In the worst case, the search consumes time proportional to the height of the tree. If the tree is balanced, the time consumption will be proportional to log n , where n is the number of nodes. This time is of the same order as that of a binary search in a sorted array.

Exercises 2

Write a function min to find a smallest key in a search tree. Write a function max to find a largest key.
Suppose that the keys of our search tree are pairwise distinct. Write a function that will receive a number k and return the first key larger than k in the increasing order of the keys.
Write a function to transform an increasing array into a balanced binary search tree.
Write a function to transform a binary search tree into an increasing array.
There is a very intimate relation between search trees and the binary search algorithm for sorted arrays. What is this relation, exactly?

Insertion

Consider the problem of inserting a new node, with key k, into a search tree. The resulting tree must also be a search tree. First, define the new node, which will be a leaf of the resulting tree:

    node *new;
    new = malloc (sizeof (node));
    new->key = k;
    new->lft = new->rght = NULL;

The function that solves the problem must, of course, return the root of the new tree. The function is particularly simple and elegant when written in recursive style:

// The function receives a search tree r and
// a loose new leaf. It inserts the leaf
// into the tree so that the resulting tree
// is a search tree. The function returns
// the root of the resulting tree.

tree 
insert (tree r, node *new) {  
    if (r == NULL) return new;
    if (r->key > new->key) 
       r->lft = insert (r->lft, new);
    else 
       r->rght = insert (r->rght, new);
    return r;
}

The root of the resulting tree is the same as that of the original tree unless the original tree was empty.

Exercises 3

★ Write a nonrecursive version of insert.
Suppose that the keys 50 30 70 20 40 60 80 15 25 35 45 36 are inserted, in this order, into an initially empty search tree. Draw a picture of the resulting tree.
Suppose that the nodes of a binary search tree have a parent field. Write a function to insert a new node into the tree. Write a recursive and an iterative versions.

Deletion

Consider the problem of deleting a node from a search tree in such way that the tree remains a search tree. This problem is trickier than search and insertion.

We begin by dealing with the instances in which the node to be deleted, say r, is the root of the tree. If r has only one child, just make that child assume the role of root. Else, let q be the predecessor of r in the left-root-right order and let q assume the role of root.

$[remoção: antes (p.xxx)]$

$[after (p.xxx)]$

The figure illustrates the before-and-after of the deletion of node r. The nodes are numbered in left-root-right order. The predecessor of r is q. The node q becomes the root, the children of r become the children of q, and f becomes the (right) child of p.

// Receives a nonempty search tree r. Deletes
// the root of the tree and rearranges the
// tree so that it remains a search tree.
// Returns the new root.

tree 
deleteroot (tree r) {  
    node *p, *q;
    if (r->lft == NULL) {
       q = r->rght;
       free (r);
       return q;
    }
    p = r; q = r->lft;
    while (q->rght != NULL) {
       p = q; q = q->rght;
    }
    // q is the predecessor of r
    // p is the parent of q
    if (p != r) {
       p->rght = q->lft;
       q->lft = r->lft;
    }
    q->rght = r->rght;
    free (r);
    return q;
}

It is easy now to delete a node other than the root of the tree. In order to delete the left child of a node x, do

   x->lft = deleteroot (x->lft);

and in order to delete the right child of x, do

   x->rght = deleteroot (x->rght);

Exercises 4

Check the correctness of the deleteroot function.
Suppose that the keys 50 30 70 20 40 60 80 15 25 35 45 36 are inserted, in this order, into an initially empty search tree. Draw a picture of the resulting tree. Next, delete the node that contains 30.
Write a recursive version of the function that deletes a node from a search tree.
Suppose that the nodes of a search tree have a parent field. Write a function that will delete a node of the tree. Write a recursive and an iterative versions.

Performance of the algorithms

How much time do the search, insertion, and deletion algorithms consume? Of course this time is bounded by (some multiple of) the number of nodes, say n, of the tree (since no node is visited more than 3 times). But this is a rather rough answer. Here is a finer answer: in the worst case, any of the algorithms

consumes an amount of time proportional to the height of the tree.

Hence, we should work with trees that are as balanced as possible, i.e., trees whose height is as close to log n as possible. In other words, we should work with trees in which all the leaves have approximately the same depth.

Unfortunately this is not easy to do. The algorithms of insertion and deletion described above do not produce balanced trees. If the insert function is repeatedly applied to a balanced tree, the result can be quite unbalanced. Something analogous can happen after a sequence of calls to the deleteroot function.

To deal with this situation, we must invent more sophisticated and complex insertion and deletion algorithms. These algorithms must perform a rebalancing of the tree after each insertion and each deletion. I will only mention two quaint technical terms: there is a package of insertion and deletion algorithms that produces red-black trees; there is another package that produces AVL trees.

Exercises 5

Consider search trees whose nodes have the structure below. Write a function that receives the root of such a tree and a node x and returns the parent of x.
```
typedef struct record {
   int            key, contents;
   struct record *lft, *rght; 
} node;
```
If x does not belong to the tree, the function must return NULL. The time consumption of your function must be bounded by (some multiple of) the depth of x. (Give a solution that is better than the answers to the question How can we find the parent of a node in a given binary tree if it does not have a pointer for the parent? on Quora.)