How bits and bytes represent numbers and characters?

The memory of any computer is a sequence of bytes. Each byte is a sequence of 8 bits (binary digits) and therefore has 2⁸ = 256 possible values:

A sequence of consecutive bytes in memory can be interpreted in three different ways:

Natural numbers and binary notation

Every sequence of bits can be seen as a natural number in binary notation: the number is the sum of the powers of 2 that correspond to the 1 bits. For example, the sequence 1101 represents the number 2³ + 2² + 2⁰, which is equal to 13. The sequence 1111 represents 2³ + 2² + 2¹ + 2⁰, which is equal to 15.

Every sequence of s bytes — that is, 8s bits — represents a natural number in the closed interval

If s = 1, for example, the interval goes from 0 to 2⁸−1, i.e., from 0 to 255. If s = 2, the interval goes up to 2¹⁶−1, i.e., 65535. If s = 4, the interval goes up to 2³²−1, i.e., 4294967295.

Example. In order to make the example fit on the page, we take s = 1 and pretend that each byte has only 4 bits. A sequence of 4 bits represents, in binary notation, a number in the interval 0 . . 2⁴−1:

Exercises 1

byte	number

`0000`	0
`0001`	1
`0010`	2
`0011`	3
`0100`	4
`0101`	5
`0110`	6
`0111`	7
`1000`	8
`1001`	9
`1010`	10
`1011`	11
`1100`	12
`1101`	13
`1110`	14
`1111`	15

Show that every natural number can be written in binary notation.
Show that 2^k + 2^k−1 + … + 2¹ + 2⁰ = 2^k+1−1, for any natural number k.
Write the numbers 2⁸, 2⁸−1, 2¹⁶, 2¹⁶−1, 2³², and 2³²−1 in hexadecimal notation.

Integers and two's complement notation

Let s be a nonnull natural number. Every sequence of s bytes — that is, 8s bits — can be interpreted as an integer in the closed interval

If s = 1, for example, this interval goes from −2⁷ to 2⁷−1, i.e., from −128 a 127. If s = 2, the interval goes from −2¹⁵ to 2¹⁵−1, i.e., from −32768 to 32767. If s = 4, the interval goes from −2³¹ to 2³¹−1, i.e., from −2147483648 to 2147483647.

What integer does a given sequence of 8s bits represent? Begin by interpreting the sequence as a natural number in binary notation. Let's say that this number is n. If the first bit of the sequence is 0, the sequence represents the positive integer n. If the first bit is 1, the sequence represents the strictly negative integer n − 2^8s. This way of representing integers is known as two's complement notation.

Example. For the example to fit on the page, we take s = 1 and pretend that each byte has only 4 bits. Any such sequence of bits represents an integer in the interval −2³ . . 2³−1 :

Exercises 2

byte	integer

`0000`	+0
`0001`	+1
`0010`	+2
`0011`	+3
`0100`	+4
`0101`	+5
`0110`	+6
`0111`	+7
`1000`	−8
`1001`	−7
`1010`	−6
`1011`	−5
`1100`	−4
`1101`	−3
`1110`	−2
`1111`	−1

Complement of n. We have shown above how the two's complement notation transforms any sequence of s bytes whose first bit is 1 into a negative integer. Now consider the converse operation. Given an integer n in the interval −2^8s−1 . . −1, show that the sequence of s bytes that represents n in two's complement notation is equal to the sequence of bytes that represents the natural number n + 2^8s in binary notation.
Two's complement. The two's complement notation transforms any sequence of s bytes whose first bit is 1 into a negative integer. Now consider the converse operation. Suppose that n is um integer in the interval −2^8s−1 . . −1. Take the sequence of bits that represents the absolute value of n in binary notation; complement all the bits (that is, change 0s into 1s and vice versa) and add 1, in binary, to the result. Show that this operation produces the sequence of s bits that represents n in two's complement notation.
An alternative for two's complement? Suppose, as we did in the example above, that we have only 4 bits to represent integers. Now consider the following interpretation of such a sequence of 4 bits. Let n be the positive integer represented by the last three bits in binary notation. If the first bit is 0, then the whole sequence represents the positive integer n. If the first bit is 1, then the whole sequence represents the negative integer −n. (For example, the sequence 1101 represents −5.) Discuss the disadvantages of representing integers in this way.
Write the numbers 2⁷, 2⁷−1, 2¹⁵, 2¹⁵−1, 2³¹, and 2³¹−1 in hexadecimal notation.

Characters and the ASCII table

A character is any typographic symbol (letter, digit, punctuation mark, and so on). Examples of characters: @, A, B, C, a, b, c, +, -, *, /, =, £, À, ñ, ó, ≤, ≠ . (Do not confuse the idea of character with the char type of the C language.)

In this chapter, we consider only the small set of 128 characters known as the ASCII alphabet. This set includes the characters

Every byte whose first bit is 0 represents a character in the ASCII alphabet. The correspondence between bytes and characters is defined by the ASCII table. Here is a small sample of that table:

byte	character

`00111111`	?
`01000000`	@
`01000001`	A
`01000010`	B
`01000011`	C
`01100001`	a
`01100010`	b
`01100011`	c
`01111110`	~

We use verbal shortcuts to refer to ASCII characters. For example, rather than saying the character A we can say the character 65, since the byte that corresponds to A in the ASCII table is the representation of 65 in binary notation.

Control characters. Besides the ninety-five normal characters, the ASCII alphabet contains thirty-three control characters. These characters are not typographic symbols like the others and therefore are indicated by a special notation: a backslash followed by a digit or a letter. Here are the most used control characters:

byte	character	name

`00000000`	`\0`	null character
`00001001`	`\t`	horizontal tabulation (tab)
`00001010`	`\n`	end of line (newline)
`00001011`	`\v`	vertical tabulation
`00001100`	`\f`	end of page (new page)
`00001101`	`\r`	carriage return

The character \0 is used to mark the end of a string and takes no space when displayed; the character \n signals the end of a line of text and produces a jump to a new line when displayed; the character \f signals the end of a page; and so on. Though the space (character 32) is not a control character, it can be indicated by \ (backslash followed by a space).

The characters \ , \t, \n, \v, \f, and \r are collectively known as white-spaces. Many functions of the standard libraries treat all the white-space characters as if they were spaces.

Non-ASCII characters. If you only use English, the ASCII alphabet is likely all you need. However, you should be aware that the ASCII alphabet lacks many letters from other languages, for example letters with diacritics such as À, ñ, ó, etc., and special symbols such as £, ≤, ≠, etc. Each of these characters is represented by two or more consecutive bytes in a coding scheme known as UTF-8. More about this will be said in chapter Strings and character chains and in chapter Unicode and UTF-8.

Exercises 3

What bytes represent the characters O, o, 0 and \0 ?
Write the bytes 01000001, 01000010, and 01000011 in hexadecimal notation.
Write, in decimal notation, the sequence of bytes that represents the text A byte has 8 bits..
Consider the bytes that represent the natural numbers 39 65 39 32 105 115 54 53 10 39 97 39 32 105 115 57 55 in binary notation. What is the sequence of characters represented by this sequence of bytes?
The epigraph at the top of this page is a sequence of ASCII characters. Decode the epigraph.
Byte inspection. Study the documentation of the programs od and hexdump (the names are shorthands of octal dump and hexadecimal dump). These utilities display, byte-by-byte, the contents of any file.

Questions and answers

Question: The ASCII alphabet uses only the bytes whose first bit is 0. Why not use the bytes whose first bit is 1 to represent letters with diacritics and some special symbols?
Answer: The ISO-LATIN-1 table (also known as ISO-8859-1) does exactly this. But that table is not much used nowadays.