2 \defgroup goto-programs goto-programs
6 \author Kareem Khazem, Martin Brain
8 \section overview Overview
9 Goto programs are the intermediate representation of the CPROVER tool
10 chain. They are language independent and similar to many of the compiler
11 intermediate languages. Section \ref goto-programs describes the
12 `goto_programt` and `goto_functionst` data structures in detail. However
13 it useful to understand some of the basic concepts. Each function is a
14 list of instructions, each of which has a type (one of 18 kinds of
15 instruction), a code expression, a guard expression and potentially some
16 targets for the next instruction. They are not natively in static
17 single-assign (SSA) form. Transitions are nondeterministic (although in
18 practise the guards on the transitions normally cover form a disjoint
19 cover of all possibilities). Local variables have non-deterministic
20 values if they are not initialised. Variables and data within the
21 program is commonly one of three types (parameterised by width):
22 `unsignedbv_typet`, `signedbv_typet` and `floatbv_typet`, see
23 `util/std_types.h` for more information. Goto programs can be serialised
24 in a binary (wrapped in ELF headers) format or in XML (see the various
25 `_serialization` files).
27 The `cbmc` option `–show-goto-programs` is often a good starting point
28 as it outputs goto-programs in a human readable form. However there are
29 a few things to be aware of. Functions have an internal name (for
30 example `c::f00`) and a ‘pretty name’ (for example `f00`) and which is
31 used depends on whether it is internal or being presented to the user.
32 The `main` method is the ‘logical’ main which is not necessarily the
33 main method from the code. In the output `NONDET` is use to represent a
34 nondeterministic assignment to a variable. Likewise `IF` as a beautified
35 `GOTO` instruction where the guard expression is used as the condition.
36 `RETURN` instructions may be dropped if they precede an `END_FUNCTION`
37 instruction. The comment lines are generated from the `locationt` field
38 of the `instructiont` structure.
40 `goto-programs/` is one of the few places in the CPROVER codebase that
41 templates are used. The intention is to allow the general architecture
42 of program and functions to be used for other formalisms. At the moment
43 most of the templates have a single instantiation; for example
44 `goto_functionst` and `goto_function_templatet` and `goto_programt` and
45 `goto_program_templatet`.
47 \section data_structures Data Structures
49 FIXME: This text is partially outdated.
51 The common starting point for working with goto-programs is the
52 `read_goto_binary` function which populates an object of
53 `goto_functionst` type. This is defined in `goto_functions.h` and is an
54 instantiation of the template `goto_functions_templatet` which is
55 contained in `goto_functions_template.h`. They are wrappers around a map
56 from strings to `goto_programt`’s and iteration macros are provided.
57 Note that `goto_function_templatet` (no `s`) is defined in the same
58 header as `goto_functions_templatet` and is gives the C type for the
59 function and Boolean which indicates whether the body is available
60 (before linking this might not always be true). Also note the slightly
61 counter-intuitive naming; `goto_functionst` instances are the top level
62 structure representing the program and contain `goto_programt` instances
63 which represent the individual functions. At the time of writing
64 `goto_functionst` is the only instantiation of the template
65 `goto_functions_templatet` but other could be produced if a different
66 data-structures / kinds of models were needed for functions.
68 `goto_programt` is also an instantiation of a template. In a similar
69 fashion it is `goto_program_templatet` and allows the types of the guard
70 and expression used in instructions to be parameterised. Again, this is
71 currently the only use of the template. As such there are only really
72 helper functions in `goto_program.h` and thus `goto_program_template.h`
73 is probably the key file that describes the representation of (C)
74 functions in the goto-program format. It is reasonably stable and
75 reasonably documented and thus is a good place to start looking at the
78 An instance of `goto_program_templatet` is effectively a list of
79 instructions (and inner template called `instructiont`). It is important
80 to use the copy and insertion functions that are provided as iterators
81 are used to link instructions to their predecessors and targets and
82 careless manipulation of the list could break these. Likewise there are
83 helper macros for iterating over the instructions in an instance of
84 `goto_program_templatet` and the use of these is good style and strongly
87 Individual instructions are instances of type `instructiont`. They
88 represent one step in the function. Each has a type, an instance of
89 `goto_program_instruction_typet` which denotes what kind of instruction
90 it is. They can be computational (such as `ASSIGN` or `FUNCTION_CALL`),
91 logical (such as `ASSUME` and `ASSERT`) or informational (such as
92 `LOCATION` and `DEAD`). At the time of writing there are 18 possible
93 values for `goto_program_instruction_typet` / kinds of instruction.
94 Instructions also have a guard field (the condition under which it is
95 executed) and a code field (what the instruction does). These may be
96 empty depending on the kind of instruction. In the default
97 instantiations these are of type `exprt` and `codet` respectively and
98 thus covered by the previous discussion of `irept` and its descendents.
99 The next instructions (remembering that transitions are guarded by
100 non-deterministic) are given by the list `targets` (with the
101 corresponding list of labels `labels`) and the corresponding set of
102 previous instructions is get by `incoming_edges`. Finally `instructiont`
103 have informational `function` and `location` fields that indicate where
104 they are in the code.
106 \section goto-conversion Goto Conversion
108 In the \ref goto-programs directory.
113 * \ref goto_programt::instructiont
119 1 [shape=none, label=""];
120 2 [label="goto conversion"];
121 3 [shape=none, label=""];
122 1 -> 2 [label="Symbol table"];
123 2 -> 3 [label="goto-programs, goto-functions, symbol table"];
127 At this stage, CBMC constructs a goto-program from a symbol table. It
128 does not use the parse tree or the source file at all for this step. This
129 may seem surprising, because the symbols are stored in a hash table and
130 therefore have no intrinsic order; nevertheless, every \ref symbolt is
131 associated with a \ref source_locationt, allowing CBMC to figure out the
134 The structure of what is informally called a goto-program follows. The
135 entire target program is converted to a single \ref goto_functionst
136 object. The goto functions contains a set of \ref goto_programt objects;
137 each of these correspond to a "function" or "method" in the target
138 program. Each goto_program contains a list of
139 \ref goto_programt::instructiont "instructions"; each
140 instruction contains an associated statement---these are subtypes of
141 \ref codet. Each instruction also contains a "target", which will be
146 graph [nojustify=true];
150 subgraph cluster_src {
151 1 [shape="none", label="source files"];
152 2 [label="file1.c\n-----------\nint main(){
158 1 -> 2 [color=white];
160 100 [label="file2.c\n--------\nchar bar(){ }"];
161 2 -> 100 [color=white];
164 1 -> 3 [label="corresponds to", lhead=cluster_goto,
167 subgraph cluster_goto {
168 3 [label="a\ngoto_functionst", URL="\ref goto_functionst", shape=none];
169 4 [label="function_map\n(a map from irep_idt\nto goto_programt)"];
171 4 -> 5 [lhead=cluster_funmap, label="value:"];
173 subgraph cluster_funmap {
174 9 [label="bar\n(an irep_idt)", URL="\ref irep_idt"];
175 10 [label="a goto_programt", URL="\ref goto_programt"];
176 9->10 [label="maps to"];
178 5 [label="main\n(an irep_idt)", URL="\ref irep_idt"];
180 7 [label="foo\n(an irep_idt)", URL="\ref irep_idt"];
181 8 [label="a goto_programt", URL="\ref goto_programt"];
182 7->8 [label="maps to"];
184 subgraph cluster_goto_program {
185 11 [shape=none, label="a\ngoto_programt", URL="\ref goto_programt"];
186 12 [label="instructions\n(a list of instructiont)"];
188 5 -> 11 [lhead=cluster_goto_program, label="maps to:"];
192 12 -> target1 [lhead=cluster_instructions];
194 subgraph cluster_instructions {
195 subgraph cluster_ins1{
196 code1 [label="code", URL="\ref codet"];
197 target1 [label="target"];
200 target1 -> target2 [color=white,lhead=cluster_ins2,
203 subgraph cluster_ins2{
204 code2 [label="code", URL="\ref codet"];
205 target2 [label="target"];
208 target2 -> target3 [color=white,lhead=cluster_ins3,
211 subgraph cluster_ins3{
212 code3 [label="code", URL="\ref codet"];
213 target3 [label="target"];
216 target3 -> target4 [color=white,lhead=cluster_ins4,
219 subgraph cluster_ins4{
220 code4 [label="code", URL="\ref codet"];
221 target4 [label="target"];
226 subgraph cluster_decl {
227 decl1 [label="type:\ncode_declt", URL="\ref code_declt",
229 subgraph cluster_decl_in{
230 cluster_decl_in_1 [label="symbol()", shape=none];
231 cluster_decl_in_2 [label="x"];
232 cluster_decl_in_1 -> cluster_decl_in_2 [color=white];
234 decl1 -> cluster_decl_in_1 [lhead="cluster_decl_in", color=white];
236 code1 -> decl1 [lhead=cluster_decl];
238 subgraph cluster_assign1 {
239 assign1 [label="type:\ncode_assignt", URL="\ref code_assignt",
241 subgraph cluster_assign1_in{
242 cluster_assign1_in_1 [label="lhs()", shape=none];
243 cluster_assign1_in_2 [label="x"];
244 cluster_assign1_in_1 -> cluster_assign1_in_2 [color=white];
246 cluster_assign1_in_3 [label="rhs()", shape=none];
247 cluster_assign1_in_4 [label="5"];
248 cluster_assign1_in_3 -> cluster_assign1_in_4 [color=white];
250 assign1 -> cluster_assign1_in_1 [lhead="cluster_assign1_in", color=white];
252 code2 -> assign1 [lhead=cluster_assign1];
254 subgraph cluster_if {
255 if [label="type:\ncode_ifthenelset", URL="\ref code_ifthenelset",
257 if_body [label="..."];
258 if -> if_body [color=white];
260 code3 -> if [lhead=cluster_if];
262 subgraph cluster_assign2 {
263 assign2 [label="type:\ncode_assignt", URL="\ref code_assignt",
265 subgraph cluster_assign2_in{
266 cluster_assign2_in_1 [label="lhs()", shape=none];
267 cluster_assign2_in_2 [label="x"];
268 cluster_assign2_in_1 -> cluster_assign2_in_2 [color=white];
270 cluster_assign2_in_3 [label="rhs()", shape=none];
271 cluster_assign2_in_4 [label="9"];
272 cluster_assign2_in_3 -> cluster_assign2_in_4 [color=white];
274 assign2 -> cluster_assign2_in_1 [lhead="cluster_assign2_in", color=white];
276 code4 -> assign2 [lhead=cluster_assign2];
281 This is not the final form of the goto-functions, since the lists of
282 instructions will be 'normalized' in the next step (Instrumentation),
283 which removes some instructions and adds targets to others.
285 Note that goto_programt and goto_functionst are each template
286 instantiations; they are currently the *only* specialization of
287 goto_program_templatet and goto_functions_templatet, respectively. This
288 means that the generated Doxygen documentation can be somewhat obtuse
289 about the actual types of things, and is unable to generate links to the
290 correct classes. Note that the
291 \ref goto_programt::instructiont::code "code" member of a
292 goto_programt's instruction has type \ref codet (its type in the
293 goto_program_templatet documentation is given as "codeT", as this is the
294 name of the template's type parameter); similarly, the type of a guard
295 of an instruction is \ref guardt.
298 \section instrumentation Instrumentation
300 In the \ref goto-programs directory.
305 * \ref goto_programt::instructiont
311 1 [shape=none, label=""];
312 2 [label="goto conversion"];
313 3 [shape=none, label=""];
314 1 -> 2 [label="goto-programs, goto-functions, symbol table"];
315 2 -> 3 [label="transformed goto-programs"];
319 This stage applies several transformations to the goto-programs from the
322 * The diagram in the previous stage showed a goto_programt with four
323 instructions, but real programs usually yield instruction lists that
324 are littered with \ref code_skipt "skip" statements. The
325 instrumentation stage removes the majority of these.
327 * Function pointers are removed. They are turned into switch statements
328 (but see the next point; switch statements are further transformed).
330 * Compound blocks are eliminated. There are several subclasses of
331 \ref codet that count as 'compound blocks;' therefore, later stages in
332 the CBMC pipeline that switch over the codet subtype of a particular
333 instruction should not need to consider these types. In particular:
335 * code_ifthenelset is turned into GOTOs. In particular, the bodies of
336 the conditionals are flattened into lists of instructions, inline
337 with the rest of the instruction in the goto_programt. The guard of
338 the conditional is placed onto the
339 \ref goto_programt::instructiont::guard "guard" member of
340 an instruction whose code member is of type \ref code_gotot. The
341 \ref goto_programt::instructiont::targets "targets" member
342 of that instruction points to the appropriate branch of the
343 conditional. (Note that although instructions have a list of
344 targets, in practice an instruction should only ever have at most
345 one target; you should check this invariant with an assertion if you
348 The order of instructions in a list of instructions---as well as the
349 targets of GOTOs---are both displayed as arrows when viewing a
350 goto-program as a Graphviz DOT file with `goto-instrument --dot`.
351 The semantics of a goto-program is: the next instruction is the next
352 instruction in the list, unless the current instruction has a
353 target; in that case, check the guard of the instruction, and jump
354 to the target if the guard evaluates to true.
356 * switch statements, for and while loops, and try-catches are also
357 transformed into lists of instructions guarded by GOTOs.
359 * \ref code_blockt "code blocks" are transformed into lists of
362 * \ref code_returnt "return statements" are transformed into
363 (unconditional) GOTOs whose target is the \ref END_FUNCTION
364 instruction. Each goto_programt should have precisely one such
365 instruction. Note the presence of \ref code_deadt, which has a
366 \ref code_deadt::symbol() "symbol()" member. Deads mark symbols that
367 have just gone out of scope; typically, a GOTO that jumps to an
368 END_FUNCTION instruction is preceded by a series of deads. Deads also
369 follow sequences of instructions that were part of the body of a
370 block (loop, conditional etc.) if there were symbols declared in that
373 This stage concludes the *analysis-independent* program transformations.