cprover
goto-symex/README.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \defgroup goto-symex goto-symex
3 
4 # Folder goto-symex
5 
6 \author Kareem Khazem, Martin Brain
7 
8 This directory contains a symbolic evaluation system for goto-programs.
9 This takes a goto-program and translates it to an equation system by
10 traversing the program, branching and merging and unwinding loops and recursion
11 as needed.
12 
13 The output of symbolic execution is a system of equations, asserations and
14 assumptions; an object of type `symex_target_equationt`, containing a list of
15 `symex_target_equationt::SSA_stept`, each of which are equalities between
16 `exprt` expressions. This list is constructed incrementally as the symbolic
17 execution engine walks through the goto-program using the `symex_targett`
18 interface. This interface (implemented by `symex_target_equationt`) exposes
19 functions that append SSA steps into the aforementioned list while transforming
20 expressions into Static Single Assignment (SSA) form. For more details on this
21 process see `symex_target_equation.h`, for an overview of SSA see \ref
22 static-single-assignment.
23 
24 At a later stage, BMC will convert the generated SSA steps into an
25 equation that can be passed to the solver.
26 
27 ---
28 \section symbolic-execution Symbolic Execution
29 
30 In the \ref goto-symex directory.
31 
32 **Key classes:**
33 * goto_symex_statet
34 * goto_symext
35 
36 \dot
37 digraph G {
38  node [shape=box];
39  rankdir="LR";
40  1 [shape=none, label=""];
41  2 [label="goto conversion"];
42  3 [shape=none, label=""];
43  1 -> 2 [label="goto-programs, goto-functions, symbol table"];
44  2 -> 3 [label="equations"];
45 }
46 \enddot
47 
48 \subsection symex-overview Overview
49 
50 The \ref bmct class gets a reference to an \ref symex_target_equationt
51 "equation" (initially, an empty list of \ref symex_target_equationt::SSA_stept
52 "single-static assignment steps") and a goto-program from the frontend.
53 The \ref bmct creates a goto_symext to symbolically execute the
54 goto-program, thereby filling the equation, which can then be passed
55 along to the SAT solver.
56 
57 The class \ref goto_symext holds the global state of the symbol executor, while
58 \ref goto_symex_statet holds the program state at a particular point in
59 symbolic execution. In straight-line code a single \ref goto_symex_statet is
60 maintained, being transformed by each instruction in turn, but these state
61 objects are cloned and merged at control-flow forks and joins respectively.
62 \ref goto_symex_statet contains an instance of \ref value_sett, used to track
63 what objects pointers may refer to, and a constant propagator domain used to
64 try to simplify assignments and (more usefully) resolve branch instructions.
65 
66 This is a memory-heavy data structure, so goto_symext creates it on-demand and
67 lets it go out-of-scope as soon as possible.
68 
69 The process of
70 symbolic execution generates an additional \ref symbol_tablet
71 "symbol table" of dynamically-created objects; this symbol table is
72 needed when solving the equation. This symbol table must thus be
73 exported out of the state before it is torn down; this is done through
74 the parameter "`new_symbol_table`" passed as a reference to the various
75 functions in goto_symext.
76 
77 The main symbolic execution loop code is \ref goto_symext::symex_step. This
78 function case-switches over the type of the instruction that we're
79 currently executing, and calls various other functions appropriate to
80 the instruction type, i.e. goto_symext::symex_function_call() if the
81 current instruction is a function call, goto_symext::symex_goto() if the
82 current instruction is a goto, etc.
83 
84 \subsection symex-loop-and-recursion-unwinding Loop and recursion unwinding
85 
86 Each backwards goto and recursive call has a separate counter
87 (handled by \ref cbmc or another driver program, see the `--unwind` and
88 `--unwind-set` options). The symbolic execution includes constant
89 folding so loops that have a constant / bounded number of iterations will often
90 be handled completely (assuming the unwinding limit is sufficient).
91 When an unwind or recursion limit is reached, an assertion can be added to
92 explicitly show when analysis is incomplete.
93 
94 \subsection goto-symext-clean-expr goto_symext::clean_expr
95 
96 Before any expression is incorporated into the output equation set it is passed
97 through \ref goto_symext::clean_expr and thus \ref goto_symext::dereference,
98 whose primary purpose is to remove dereference operations. It achieves this
99 using the \ref value_sett stored in \ref goto_symex_statet, replacing `*x` with
100 a construct such as
101 `x == &candidate1 ? candidate1 : x == &candidate2 ? candidate2 : x$object`
102 
103 Note the `x$object` fallback candidate, which is known as a *failed symbol* or
104 *failed object*, and represents the unknown object pointed to by `x` when
105 neither of the candidates (`candidate1` and `candidate2` here) matched as
106 expected. This is of course unsound, since `x$object` and `y$object` are always
107 distinct, even if `x` and `y` might actually alias, but at least it ensures
108 writes to and subsequent reads from `x` are related.
109 
110 \ref goto_symext::dereference function also passes its argument to
111 \ref goto_symex_statet::rename, which is responsible for both SSA
112 renaming symbols and for applying constant propagation where possible. Renaming
113 is also performed elsewhere without calling \ref goto_symext::dereference when
114 an expression is already known to be pointer-free.
115 
116 \subsection symex-path Path exploration
117 
118 By default, CBMC symbolically executes the entire program, building up
119 an equation representing all instructions, which it then passes to the
120 solver. Notably, it _merges_ paths that diverge due to a goto
121 instruction, forming a constraint that represents having taken either of
122 the two paths; the code for doing this is \ref goto_symext::merge_goto.
123 
124 Goto-symex can operate in a different mode when the `--paths` flag is passed
125 in the \ref optionst object passed to its constructor (\ref cbmc passes this
126 from the corresponding command-line option).
127 This disables path merging; instead, symex executes
128 a single path at a time, returning each one to its caller, which in the case of
129 cbmc then calls its solver with the equation
130 representing that path, then continues to execute other paths.
131 The 'other paths' that would have been taken when the program branches
132 are saved onto a workqueue so that the driver program can continue to execute
133 the current path, and then later retrieve saved paths from the workqueue.
134 Implementation-wise, \ref bmct passes a worklist to goto_symext in
135 \ref bmct::do_language_agnostic_bmc. If path exploration is enabled,
136 goto_symext will fill up the worklist whenever it encounters a branch,
137 instead of merging the paths on the branch. After the initial symbolic
138 execution (i.e. the first call to \ref bmct::run in
139 \ref bmct::do_language_agnostic_bmc), \ref bmct continues popping the
140 worklist and executing untaken paths until the worklist empties. Note
141 that this means that the default model-checking behaviour is a special
142 case of path exploration: when model-checking, the initial symbolic
143 execution run does not add any paths to the workqueue but rather merges
144 all the paths together, so the additional path-exploration loop is
145 skipped over.
146 
147 ---
148 \section static-single-assignment Static Single Assignment (SSA) Form
149 
150 **Key classes:**
151 * \ref symex_targett
152 * \ref symex_target_equationt
153 
154 *Static Single Assignment (SSA)* form is an intermediate
155 representation that satisfies the following properties:
156 
157 1. Every variable is *assigned exactly once*.
158 2. Every variable must be *defined* before use.
159 
160 In-order to convert a goto-program to SSA form all variables are
161 indexed (renamed) through the addition of a _suffix_.
162 
163 There are three “levels” of indexing:
164 
165 **Level 2 (L2):** variables are indexed every time they are
166 encountered in a function.
167 
168 **Level 1 (L1):** variables are indexed every time the functions that
169 contain those variables are called.
170 
171 **Level 0 (L0):** variables are indexed every time a new thread
172 containing those variables are spawned.
173 
174 We can inspect the indexed variable names with the **--show-vcc** or
175 **--program-only** flags. Variables in SSA form are printed in the
176 following format: `%%s!%%d@%%d#%%d`. Where the string field is the
177 variable name, and the numbers after the !, @, and # are the L0, L1,
178 and L2 suffixes respectively.
179 
180 > Note: **--program-only** prints all the SSA steps in-order. In
181 > contrast, **--show-vcc** will for each assertion print the SSA steps
182 > (assumes, assignments and constraints only) that synthetically
183 > precede the assertion. In the presence of multiple threads it will
184 > also print SSA steps that succeed the assertion.
185 
186 \subsection L1-L2 Level 1 and level 2 indexing
187 
188 The following examples illustrate Level 1 and 2 indexing.
189 
190  $ cat l1.c
191  int main()
192  {
193  int x=7;
194  x=8;
195  assert(0);
196  }
197 
198  $ cbmc --show-vcc l1.c
199  ...
200  {-12} x!0@1#2 == 7
201  {-13} x!0@1#3 == 8
202 
203 That is, the L0 names for both x's are 0; the L1 names for both x's
204 are 1; but each occurrence of x within main() gets a fresh L2 suffix
205 (2 and 3, respectively).
206 
207  $ cat l2.c
208  void foo()
209  {
210  int x=7;
211  x=8;
212  x=9;
213  }
214  int main()
215  {
216  foo();
217  foo();
218  assert(0);
219  }
220 
221  $ cbmc --show-vcc l2.c
222  ...
223  {-12} x!0@1#2 == 7
224  {-13} x!0@1#3 == 8
225  {-14} x!0@1#4 == 9
226  {-15} x!0@2#2 == 7
227  {-16} x!0@2#3 == 8
228  {-17} x!0@2#4 == 9
229 
230 That is, every time we enter function foo, the L1 counter of x is
231 incremented (from 1 to 2) and the L2 counter is reset (back to 2,
232 after having been incremented up to 4). The L2 counter then increases
233 every time we access x as we walk through the function.
234 
235 \subsection L0 Level 0 indexing (threads only)
236 
237 TODO: describe and give a concrete example
238 
239 \subsection PL Relevant Primary Literature
240 
241 Thread indexing is based on the following paper:
242 
243 > Lee, J., Midkiff, S.P. and Padua, D.A., 1997, August. Concurrent
244 > static single assignment form and constant propagation for
245 > explicitly parallel programs. In International Workshop on Languages
246 > and Compilers for Parallel Computing (pp. 114-130). Springer,
247 > Berlin, Heidelberg.
248 
249 Seminal paper on SSA:
250 
251 > Rosen, B.K., Wegman, M.N. and Zadeck, F.K., 1988, January. Global
252 > value numbers and redundant computations. In Proceedings of the 15th
253 > ACM SIGPLAN-SIGACT symposium on Principles of programming languages
254 > (pp. 12-27). ACM.
255 
256 ---
257 \section counter-example-production Counter Example Production
258 
259 In the \ref goto-symex directory.
260 
261 **Key classes:**
262 * \ref symex_target_equationt
263 * \ref prop_convt
264 * \ref bmct
265 * \ref fault_localizationt
266 * \ref counterexample_beautificationt
267 
268 \dot
269 digraph G {
270  node [shape=box];
271  rankdir="LR";
272  1 [shape=none, label=""];
273  2 [label="goto conversion"];
274  3 [shape=none, label=""];
275  1 -> 2 [label="solutions"];
276  2 -> 3 [label="counter-examples"];
277 }
278 \enddot