01000010 01111001 01110100 01100101 01110011 00101100 00100000 01101110 01110101 01101101 01100010 01100101 01110010 01110011 00101100 00100000 01100011 01101000 01100001 01110010 01100001 01100011 01110100 01100101 01110011 01110011 00001010
The memory of any computer is a sequence of bytes. Each byte is a sequence of 8 bits (binary digits) and therefore has 28 = 256 possible values:
00000000 00000001 00000010 ⋮ 11111110 11111111
A sequence of consecutive bytes in memory can be interpreted in three different ways:
This chapter discusses these three interpretations.
Table of contents:
Every sequence of bits can be seen as a natural number in binary notation: the number is the sum of the powers of 2 that correspond to the 1 bits. For example, the sequence 1101 represents the number 23 + 22 + 20, which is equal to 13. The sequence 1111 represents 23 + 22 + 21 + 20, which is equal to 15.
Every sequence of s bytes — that is, 8s bits — represents a natural number in the closed interval
If s = 1, for example, the interval goes from 0 to 28−1, i.e., from 0 to 255. If s = 2, the interval goes up to 216−1, i.e., 65535. If s = 4, the interval goes up to 232−1, i.e., 4294967295.
Example. In order to make the example fit on the page, we take s = 1 and pretend that each byte has only 4 bits. A sequence of 4 bits represents, in binary notation, a number in the interval 0 . . 24−1:
byte | number |
---|---|
0000 | 0 |
0001 | 1 |
0010 | 2 |
0011 | 3 |
0100 | 4 |
0101 | 5 |
0110 | 6 |
0111 | 7 |
1000 | 8 |
1001 | 9 |
1010 | 10 |
1011 | 11 |
1100 | 12 |
1101 | 13 |
1110 | 14 |
1111 | 15 |
Let s be a nonnull natural number. Every sequence of s bytes — that is, 8s bits — can be interpreted as an integer in the closed interval
−28s−1 . . 28s−1−1 .
If s = 1, for example, this interval goes from −27 to 27−1, i.e., from −128 a 127. If s = 2, the interval goes from −215 to 215−1, i.e., from −32768 to 32767. If s = 4, the interval goes from −231 to 231−1, i.e., from −2147483648 to 2147483647.
What integer does a given sequence of 8s bits represent? Begin by interpreting the sequence as a natural number in binary notation. Let's say that this number is n. If the first bit of the sequence is 0, the sequence represents the positive integer n. If the first bit is 1, the sequence represents the strictly negative integer n − 28s. This way of representing integers is known as two's complement notation.
Example. For the example to fit on the page, we take s = 1 and pretend that each byte has only 4 bits. Any such sequence of bits represents an integer in the interval −23 . . 23−1 :
byte | integer |
---|---|
0000 | +0 |
0001 | +1 |
0010 | +2 |
0011 | +3 |
0100 | +4 |
0101 | +5 |
0110 | +6 |
0111 | +7 |
1000 | −8 |
1001 | −7 |
1010 | −6 |
1011 | −5 |
1100 | −4 |
1101 | −3 |
1110 | −2 |
1111 | −1 |
A character is any typographic symbol (letter, digit, punctuation mark, and so on). Examples of characters: @, A, B, C, a, b, c, +, -, *, /, =, £, À, ñ, ó, ≤, ≠ . (Do not confuse the idea of character with the char type of the C language.)
In this chapter, we consider only the small set of 128 characters known as the ASCII alphabet. This set includes the characters
! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
(the first character is a blank space) and a few others.
Every byte whose first bit is 0 represents a character in the ASCII alphabet. The correspondence between bytes and characters is defined by the ASCII table. Here is a small sample of that table:
byte | character |
---|---|
00111111 | ? |
01000000 | @ |
01000001 | A |
01000010 | B |
01000011 | C |
01100001 | a |
01100010 | b |
01100011 | c |
01111110 | ~ |
We use verbal shortcuts
to refer to ASCII characters.
For example, rather than saying
the character A
we can say
the character 65
,
since the byte that corresponds to A
in the ASCII table is the representation of 65 in binary notation.
Control characters.
Besides the ninety-five normal
characters,
the ASCII alphabet contains
thirty-three
control characters.
These characters are not typographic symbols like the others
and therefore are indicated by a special notation:
a backslash followed by a digit or a letter.
Here are the most used control characters:
byte | character | name |
---|---|---|
00000000 | \0 | null character |
00001001 | \t | horizontal tabulation (tab) |
00001010 | \n | end of line (newline) |
00001011 | \v | vertical tabulation |
00001100 | \f | end of page (new page) |
00001101 | \r | carriage return |
The character \0 is used to mark the end of a string and takes no space when displayed; the character \n signals the end of a line of text and produces a jump to a new line when displayed; the character \f signals the end of a page; and so on. Though the space (character 32) is not a control character, it can be indicated by \ (backslash followed by a space).
The characters \ , \t, \n, \v, \f, and \r are collectively known as white-spaces. Many functions of the standard libraries treat all the white-space characters as if they were spaces.
Non-ASCII characters. If you only use English, the ASCII alphabet is likely all you need. However, you should be aware that the ASCII alphabet lacks many letters from other languages, for example letters with diacritics such as À, ñ, ó, etc., and special symbols such as £, ≤, ≠, etc. Each of these characters is represented by two or more consecutive bytes in a coding scheme known as UTF-8. More about this will be said in chapter Strings and character chains and in chapter Unicode and UTF-8.
A byte has 8 bits..
Answer: The ISO-LATIN-1 table (also known as ISO-8859-1) does exactly this. But that table is not much used nowadays.