cprover
ansi-c/module.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \defgroup module_ansi-c ANSI-C Language Front-end
3 
4 \author Kareem Khazem
5 
6 \section preprocessing Preprocessing & Parsing
7 
8 In the \ref ansi-c and \ref java_bytecode directories
9 
10 **Key classes:**
11 * \ref languaget and its subclasses
12 * ansi_c_parse_treet
13 
14 \dot
15 digraph G {
16  node [shape=box];
17  rankdir="LR";
18  1 [shape=none, label=""];
19  2 [label="preprocessing & parsing"];
20  3 [shape=none, label=""];
21  1 -> 2 [label="Command line options, file names"];
22  2 -> 3 [label="Parse tree"];
23 }
24 \enddot
25 
26 
27 
28 ---
29 \section type-checking Type-checking
30 
31 In the \ref ansi-c and \ref java_bytecode directories.
32 
33 **Key classes:**
34 * \ref languaget and its subclasses
35 * \ref irept
36 * \ref irep_idt
37 * \ref symbolt
38 * symbol_tablet
39 
40 \dot
41 digraph G {
42  node [shape=box];
43  rankdir="LR";
44  1 [shape=none, label=""];
45  2 [label="type checking"];
46  3 [shape=none, label=""];
47  1 -> 2 [label="Parse tree"];
48  2 -> 3 [label="Symbol table"];
49 }
50 \enddot
51 
52 This stage generates a symbol table, mapping identifiers to symbols;
53 \ref symbolt "symbols" are tuples of (value, type, location, flags).
54 
55 This is a good point to introduce the \ref irept ("internal
56 representation") class---the base type of many of CBMC's hierarchical
57 data structures. In particular, \ref exprt "expressions",
58 \ref typet "types" and \ref codet "statements" are all subtypes of
59 \ref irept.
60 An irep is a tree of ireps. A subtlety is that an irep is actually the
61 root of _three_ (possibly empty) trees, i.e. it has three disjoint sets
62 of children: \ref irept::get_sub() returns a list of children, and
63 \ref irept::get_named_sub() and \ref irept::get_comments() each return an
64 association from names to children. **Most clients never use these
65 functions directly**, as subtypes of irept generally provide more
66 descriptive functions. For example, the operands of an
67 \ref exprt "expression" (\ref exprt::op0() "op0", op1 etc) are
68 really that expression's children; the
69 \ref code_assignt::lhs() "left-hand" and right-hand side of an
70 \ref code_assignt "assignment" are the children of that assignment.
71 The \ref irept::pretty() function provides a descriptive string
72 representation of an irep.
73 
74 \ref irep_idt "irep_idts" ("identifiers") are strings that use sharing
75 to improve memory consumption. A common pattern is a map from irep_idts
76 to ireps. A goto-program contains a single symbol table (with a single
77 scope), meaning that the names of identifiers in the target program are
78 lightly mangled in order to make them globally unique. If there is an
79 identifier `foo` in the target program, the `name` field of `foo`'s
80 \ref symbolt "symbol" in the goto-program will be
81 * `foo` if it is global;
82 * <code>bar\::foo</code> if it is a parameter to a function `bar()`;
83 * <code>bar\::3\::foo</code> if it is a local variable in a function
84  `bar()`, where `3` is a counter that is incremented every time a
85  newly-scoped `foo` is encountered in that function.
86 
87 The use of *sharing* to save memory is a pervasive design decision in
88 the implementation of ireps and identifiers. Sharing makes equality
89 comparisons fast (as there is no need to traverse entire trees), and
90 this is especially important given the large number of map lookups
91 throughout the codebase. More importantly, the use of sharing saves vast
92 amounts of memory, as there is plenty of duplication within the
93 goto-program data structures. For example, every statement, and every
94 sub-expression of a statement, contains a \ref source_locationt
95 that indicates the source file and location that it came from. Every
96 symbol in every expression has a field indicating its type and location;
97 etc. Although each of these are constructed as separate objects, the
98 values that they eventually point to are shared throughout the codebase,
99 decreasing memory consumption dramatically.
100 
101 The Type Checking stage turns a parse tree into a
102 \ref symbol_tablet "symbol table". In this context, the 'symbols'
103 consist of code statements as well as what might more traditionally be
104 called symbols. Thus, for example:
105 * The statement `int foo = 11;` is converted into a symbol whose type is
106  integer_typet and value is the \ref constant_exprt
107  "constant expression" `11`; that symbol is stored in the symbol table
108  using the mangled name of `foo` as the key;
109 * The function definition `void foo(){ int x = 11; bar(); }` is
110  converted into a symbol whose type is \ref code_typet (not to be
111  confused with \ref typet or \ref codet!); the code_typet contains the
112  parameter and return types of the function. The value of the symbol is
113  the function's body (a \ref codet), and the symbol is stored in the
114  symbol table with `foo` as the key.