1.1 Compilation Steps
A C++
source file undergoes many transformations on its way to becoming an
executable program. The initial steps involve processing all the
#include and conditional preprocessing directives
to produce what the standard calls a translation
unit. Translation units are important because they have no
dependencies on other files. Nonetheless, programmers still speak in
terms of source files, even if they actually mean translation units,
so this book uses the phrase source
file because it is familiar to most readers. The
term "translation" encompasses
compilation and interpretation, although most C++ translators are
compilers. This section discusses how C++ reads and compiles
(translates) source files (translation units).
A C++ program can be made from many source files, and each file can
be compiled separately. Conceptually, the compilation process has
several steps (although a compiler can merge or otherwise modify
steps if it can do so without affecting the observable results):
Read physical characters from the source file and translate the
characters to the source character set (described in Section 1.4 later in this chapter). The
source "file" is not necessarily a
physical file; an implementation might, for example, retrieve the
source from a database. Trigraph sequences are reduced to their
equivalent characters (see Section 1.6 later in this chapter). Each
native end-of-line character or character sequence is replaced by a
newline character.
If a backslash character is followed
immediately by a newline character, delete the backslash and the
newline. The backslash/newline combination must not fall in the
middle of a universal character (e.g., \u1234) and
must not be at the end of a file. It can be used in a character or
string literal, or to continue a preprocessor directive or one-line
comment on multiple lines. A non-empty file must end with a newline.
Partition the source into preprocessor tokens separated by whitespace
and comments. A preprocessor token is slightly different from a
compiler token (see the next section, Section 1.2). A preprocessor token can
be a header name, identifier, number, character literal, string
literal, symbol, or miscellaneous character. Each preprocessor token
is the longest sequence of characters that can make up a legal token,
regardless of what comes after the token.
Perform preprocessing and expand macros. All
#include files are processed in the manner
described in steps 1-4. For more information about preprocessing, see
Chapter 11.
Convert character and string literals to the execution character set.
Concatenate adjacent string literals. Narrow string literals are
concatenated with narrow string literals. Wide string literals are
concatenated with wide string literals. Mixing narrow and wide string
literals results in an error.
Perform the main compilation.
Combine compiled files. For each file, all required template
instantiations (see Chapter 7) are identified, and
the necessary template definitions are located and compiled.
Resolve external references. The compiled files are linked to produce
an executable image.
|