2 \defgroup module_ansi-c ANSI-C Language Front-end
6 \section preprocessing Preprocessing & Parsing
8 In the \ref ansi-c and \ref java_bytecode directories
11 * \ref languaget and its subclasses
18 1 [shape=none, label=""];
19 2 [label="preprocessing & parsing"];
20 3 [shape=none, label=""];
21 1 -> 2 [label="Command line options, file names"];
22 2 -> 3 [label="Parse tree"];
29 \section type-checking Type-checking
31 In the \ref ansi-c and \ref java_bytecode directories.
34 * \ref languaget and its subclasses
44 1 [shape=none, label=""];
45 2 [label="type checking"];
46 3 [shape=none, label=""];
47 1 -> 2 [label="Parse tree"];
48 2 -> 3 [label="Symbol table"];
52 This stage generates a symbol table, mapping identifiers to symbols;
53 \ref symbolt "symbols" are tuples of (value, type, location, flags).
55 This is a good point to introduce the \ref irept ("internal
56 representation") class---the base type of many of CBMC's hierarchical
57 data structures. In particular, \ref exprt "expressions",
58 \ref typet "types" and \ref codet "statements" are all subtypes of
60 An irep is a tree of ireps. A subtlety is that an irep is actually the
61 root of _three_ (possibly empty) trees, i.e. it has three disjoint sets
62 of children: \ref irept::get_sub() returns a list of children, and
63 \ref irept::get_named_sub() and \ref irept::get_comments() each return an
64 association from names to children. **Most clients never use these
65 functions directly**, as subtypes of irept generally provide more
66 descriptive functions. For example, the operands of an
67 \ref exprt "expression" (\ref exprt::op0() "op0", op1 etc) are
68 really that expression's children; the
69 \ref code_assignt::lhs() "left-hand" and right-hand side of an
70 \ref code_assignt "assignment" are the children of that assignment.
71 The \ref irept::pretty() function provides a descriptive string
72 representation of an irep.
74 \ref irep_idt "irep_idts" ("identifiers") are strings that use sharing
75 to improve memory consumption. A common pattern is a map from irep_idts
76 to ireps. A goto-program contains a single symbol table (with a single
77 scope), meaning that the names of identifiers in the target program are
78 lightly mangled in order to make them globally unique. If there is an
79 identifier `foo` in the target program, the `name` field of `foo`'s
80 \ref symbolt "symbol" in the goto-program will be
81 * `foo` if it is global;
82 * <code>bar\::foo</code> if it is a parameter to a function `bar()`;
83 * <code>bar\::3\::foo</code> if it is a local variable in a function
84 `bar()`, where `3` is a counter that is incremented every time a
85 newly-scoped `foo` is encountered in that function.
87 The use of *sharing* to save memory is a pervasive design decision in
88 the implementation of ireps and identifiers. Sharing makes equality
89 comparisons fast (as there is no need to traverse entire trees), and
90 this is especially important given the large number of map lookups
91 throughout the codebase. More importantly, the use of sharing saves vast
92 amounts of memory, as there is plenty of duplication within the
93 goto-program data structures. For example, every statement, and every
94 sub-expression of a statement, contains a \ref source_locationt
95 that indicates the source file and location that it came from. Every
96 symbol in every expression has a field indicating its type and location;
97 etc. Although each of these are constructed as separate objects, the
98 values that they eventually point to are shared throughout the codebase,
99 decreasing memory consumption dramatically.
101 The Type Checking stage turns a parse tree into a
102 \ref symbol_tablet "symbol table". In this context, the 'symbols'
103 consist of code statements as well as what might more traditionally be
104 called symbols. Thus, for example:
105 * The statement `int foo = 11;` is converted into a symbol whose type is
106 integer_typet and value is the \ref constant_exprt
107 "constant expression" `11`; that symbol is stored in the symbol table
108 using the mangled name of `foo` as the key;
109 * The function definition `void foo(){ int x = 11; bar(); }` is
110 converted into a symbol whose type is \ref code_typet (not to be
111 confused with \ref typet or \ref codet!); the code_typet contains the
112 parameter and return types of the function. The value of the symbol is
113 the function's body (a \ref codet), and the symbol is stored in the
114 symbol table with `foo` as the key.