2 \page tutorial Tutorials
4 \section cbmc_tutorial CBMC Developer Tutorial
10 This is an introduction to hacking on the `cprover` codebase. It is not
11 intended as a user guide to `CBMC` or related tools. It is structured
12 as a series of programming exercises that aim to acclimatise the reader
13 to the basic data structures and workflow needed for contributing to
19 Clone the [CBMC repository][cbmc-repo] and build it:
21 git clone https://github.com/diffblue/cbmc.git
23 make minisat2-download
26 Ensure that [graphviz][graphviz] is installed on your system (in
27 particular, you should be able to run a program called `dot`). Install
28 [Doxygen][doxygen] and generate doxygen documentation:
30 # In the src directory
32 # View the documentation in a web browser
33 firefox doxy/html/index.html
35 If you've never used doxygen documentation before, get familiar with the
36 layout. Open the generated HTML page in a web browser; search for the
37 class `goto_programt` in the search bar, and jump to the documentation
38 for that class; and read through the copious documentation.
40 The build writes executable programs into several of the source
41 directories. In this tutorial, we'll be using binaries inside the
42 `cbmc`, `goto-instrument`, and `goto-cc` directories. Add these
43 directories to your `$PATH`:
45 # Assuming you cloned CBMC into ~/code
46 export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH
47 # Add to your shell's startup configuration file so that you don't have to run that command every time.
48 echo 'export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH' >> .bashrc
50 Optional: install an image viewer that can read images on stdin.
53 [cbmc-repo]: https://github.com/diffblue/cbmc/
54 [doxygen]: http://www.stack.nl/~dimitri/doxygen/
55 [graphviz]: http://www.graphviz.org/
56 [feh]: https://feh.finalrewind.org/
60 ## Whirlwind tour of the tools
62 CBMC's code is located under the `cbmc` directory. Even if you plan to
63 contribute only to CBMC, it is important to be familiar with several
64 other of cprover's auxiliary tools.
67 ### Compiling with `goto-cc`
69 There should be an executable file called `goto-cc` in the `goto-cc`
70 directory; make a symbolic link to it called `goto-gcc`:
73 ln -s "$(pwd)/goto-cc" goto-gcc
75 Find or write a moderately-interesting C program; we'll call it `main.c`.
76 Run the following commands:
78 goto-gcc -o main.goto main.c
81 Invoke `./main.goto` and `./main.exe` and observe that they run identically.
82 The version that was compiled with `goto-gcc` is larger, though:
86 Programs compiled with `goto-gcc` are mostly identical to their `clang`-
87 or `gcc`-compiled counterparts, but contain additional object code in
88 cprover's intermediate representation. The intermediate representation
89 is (informally) called a *goto-program*.
92 ### Viewing goto-programs
94 `goto-instrument` is a Swiss army knife for viewing goto-programs and
95 performing single program analyses on them. Run the following command:
97 goto-instrument --show-goto-functions main.goto
99 Many of the instructions in the goto-program intermediate representation
100 are similar to their C counterparts. `if` and `goto` statements replace
101 structured programming constructs.
103 Find or write a small C program (2 or 3 functions, each containing a few
104 varied statements). Compile it using `goto-gcc` as above into an object
105 file called `main`. If you installed `feh`, try the following command
106 to dump a control-flow graph:
108 goto-instrument --dot main | tail -n +2 | dot -Tpng | feh -
110 If you didn't install `feh`, you can write the diagram to the file and
113 goto-instrument --dot main | tail -n +2 | dot -Tpng > main.png
114 Now open main.png with an image viewer
116 (the invocation of `tail` is used to filter out the first line of
117 `goto-instrument` output. If `goto-instrument` writes more or less
118 debug output by the time you read this, read the output of
119 `goto-instrument --dot main` and change the invocation of `tail`
122 There are a few other views of goto-programs. Run `goto-instrument -h`
123 and try the various switches under the "Diagnosis" section.
127 ## Learning about goto-programs
129 In this section, you will learn about the basic goto-program data
130 structures. Reading from and manipulating these data structures form
131 the core of writing an analysis for CBMC.
134 ### First steps with `goto-instrument`
137 **Task:** Write a simple C program with a few functions, each containing
138 a few statements. Compile the program with `goto-gcc` into a binary
143 The entry point of `goto-instrument` is in `goto_instrument_main.cpp`.
144 Follow the control flow into `goto_instrument_parse_optionst::doit()`, located in `goto_instrument_parse_options.cpp`.
145 At some point in that function, there will be a long sequence of `if` statements.
148 **Task:** Add a `--greet` switch to `goto-instrument`, taking an optional
149 argument, with the following behaviour:
151 $ goto-instrument --greet main
153 $ goto-instrument --greet Leperina main
156 You will also need to add the `greet` option to the
157 `goto_instrument_parse_options.h` file in order for this to work.
158 Notice that in the `.h` file, options that take an argument are followed
159 by a colon (like `(property):`), while simple switches have no colon.
160 Make sure that you `return 0;` after printing the message.
163 The idea behind `goto-instrument` is that it parses a goto-program and
164 then performs one single analysis on that goto-program, and then
165 returns. Each of the switches in `doit` function of
166 `goto_instrument_parse_options` does something different with the
167 goto-program that was supplied on the command line.
170 ### Goto-program basics
172 At this point in `goto-instrument_parse_options` (where the `if`
173 statements are), the goto-program will have been loaded into the object
174 `goto_functions`, of type `goto_functionst`. This has a field called
175 `function_map`, a map from function names to functions.
179 **Task:** Add a `--print-function-names` switch to `goto-instrument`
180 that prints out the name of every function in the goto-binary. Are
181 there any functions that you didn't expect to see?
184 The following is quite difficult to follow from doxygen, but: the value
185 type of `function_map` is `goto_function_templatet<goto_programt>`.
188 **Task:** Read the documentation for `goto_function_templatet<bodyT>`
192 Each \ref goto_programt object contains a list of
193 \ref goto_programt::instructiont called
194 `instructions`. Each instruction has a field called `code`, which has
198 **Task:** Add a `--pretty-program` switch to `goto-instrument`. This
199 switch should use the `codet::pretty()` function to pretty-print every
200 \ref codet in the entire program. The strings that `pretty()` generates
201 for a codet look like this:
215 * working_directory: /some/dir
222 The sub-nodes of a particular node in the pretty representation are
223 numbered, starting from 0. They can be accessed through the `op0()`,
224 `op1()` and `op2()` methods in the `exprt` class.
226 Every node in the pretty representation has an identifier, accessed
227 through the `id()` function. The file `util/irep_ids.def` lists the
228 possible values of these identifiers; have a quick scan through that
229 file. In the pretty representation above, the following facts are true
230 of that particular node:
232 - `node.id() == ID_index`
233 - `node.type().id() == ID_unsignedbv`
234 - `node.op0().id() == ID_symbol`
235 - `node.op0().type().id() == ID_array`
237 The fact that the `op0()` child has a `symbol` ID menas that you could
238 cast it to a `symbol_exprt` (which is a subtype of `exprt`) using the
239 function `to_symbol_expr`.
242 **Task:** Add flags to `goto-instrument` to print out the following information:
243 * the name of every function that is *called* in the program;
244 * the value of every constant in the program;
245 * the value of every symbol in the program.