2 \page cbmc-hacking CBMC Hacking HOWTO
6 This is an introduction to hacking on the `cprover` codebase. It is not
7 intended as a user guide to `CBMC` or related tools. It is structured
8 as a series of programming exercises that aim to acclimatise the reader
9 to the basic data structures and workflow needed for contributing to
15 Clone the [CBMC repository][cbmc-repo] and build it:
17 git clone https://github.com/diffblue/cbmc.git
19 make minisat2-download
22 Ensure that [graphviz][graphviz] is installed on your system (in
23 particular, you should be able to run a program called `dot`). Install
24 [Doxygen][doxygen] and generate doxygen documentation:
26 # In the src directory
28 # View the documentation in a web browser
29 firefox doxy/html/index.html
31 If you've never used doxygen documentation before, get familiar with the
32 layout. Open the generated HTML page in a web browser; search for the
33 class `goto_programt` in the search bar, and jump to the documentation
34 for that class; and read through the copious documentation.
36 The build writes executable programs into several of the source
37 directories. In this tutorial, we'll be using binaries inside the
38 `cbmc`, `goto-instrument`, and `goto-cc` directories. Add these
39 directories to your `$PATH`:
41 # Assuming you cloned CBMC into ~/code
42 export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH
43 # Add to your shell's startup configuration file so that you don't have to run that command every time.
44 echo 'export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH' >> .bashrc
46 Optional: install an image viewer that can read images on stdin.
49 [cbmc-repo]: https://github.com/diffblue/cbmc/
50 [doxygen]: http://www.stack.nl/~dimitri/doxygen/
51 [graphviz]: http://www.graphviz.org/
52 [feh]: https://feh.finalrewind.org/
56 ## Whirlwind tour of the tools
58 CBMC's code is located under the `cbmc` directory. Even if you plan to
59 contribute only to CBMC, it is important to be familiar with several
60 other of cprover's auxiliary tools.
63 ### Compiling with `goto-cc`
65 There should be an executable file called `goto-cc` in the `goto-cc`
66 directory; make a symbolic link to it called `goto-gcc`:
69 ln -s "$(pwd)/goto-cc" goto-gcc
71 Find or write a moderately-interesting C program; we'll call it `main.c`.
72 Run the following commands:
74 goto-gcc -o main.goto main.c
77 Invoke `./main.goto` and `./main.exe` and observe that they run identically.
78 The version that was compiled with `goto-gcc` is larger, though:
82 Programs compiled with `goto-gcc` are mostly identical to their `clang`-
83 or `gcc`-compiled counterparts, but contain additional object code in
84 cprover's intermediate representation. The intermediate representation
85 is (informally) called a *goto-program*.
88 ### Viewing goto-programs
90 `goto-instrument` is a Swiss army knife for viewing goto-programs and
91 performing single program analyses on them. Run the following command:
93 goto-instrument --show-goto-functions main.goto
95 Many of the instructions in the goto-program intermediate representation
96 are similar to their C counterparts. `if` and `goto` statements replace
97 structured programming constructs.
99 Find or write a small C program (2 or 3 functions, each containing a few
100 varied statements). Compile it using `goto-gcc` as above into an object
101 file called `main`. If you installed `feh`, try the following command
102 to dump a control-flow graph:
104 goto-instrument --dot main | tail -n +2 | dot -Tpng | feh -
106 If you didn't install `feh`, you can write the diagram to the file and
109 goto-instrument --dot main | tail -n +2 | dot -Tpng > main.png
110 Now open main.png with an image viewer
112 (the invocation of `tail` is used to filter out the first line of
113 `goto-instrument` output. If `goto-instrument` writes more or less
114 debug output by the time you read this, read the output of
115 `goto-instrument --dot main` and change the invocation of `tail`
118 There are a few other views of goto-programs. Run `goto-instrument -h`
119 and try the various switches under the "Diagnosis" section.
123 ## Learning about goto-programs
125 In this section, you will learn about the basic goto-program data
126 structures. Reading from and manipulating these data structures form
127 the core of writing an analysis for CBMC.
130 ### First steps with `goto-instrument`
133 **Task:** Write a simple C program with a few functions, each containing
134 a few statements. Compile the program with `goto-gcc` into a binary
139 The entry point of `goto-instrument` is in `goto_instrument_main.cpp`.
140 Follow the control flow into `goto_instrument_parse_optionst::doit()`, located in `goto_instrument_parse_options.cpp`.
141 At some point in that function, there will be a long sequence of `if` statements.
144 **Task:** Add a `--greet` switch to `goto-instrument`, taking an optional
145 argument, with the following behaviour:
147 $ goto-instrument --greet main
149 $ goto-instrument --greet Leperina main
152 You will also need to add the `greet` option to the
153 `goto_instrument_parse_options.h` file in order for this to work.
154 Notice that in the `.h` file, options that take an argument are followed
155 by a colon (like `(property):`), while simple switches have no colon.
156 Make sure that you `return 0;` after printing the message.
159 The idea behind `goto-instrument` is that it parses a goto-program and
160 then performs one single analysis on that goto-program, and then
161 returns. Each of the switches in `doit` function of
162 `goto_instrument_parse_options` does something different with the
163 goto-program that was supplied on the command line.
166 ### Goto-program basics
168 At this point in `goto-instrument_parse_options` (where the `if`
169 statements are), the goto-program will have been loaded into the object
170 `goto_functions`, of type `goto_functionst`. This has a field called
171 `function_map`, a map from function names to functions.
175 **Task:** Add a `--print-function-names` switch to `goto-instrument`
176 that prints out the name of every function in the goto-binary. Are
177 there any functions that you didn't expect to see?
180 The following is quite difficult to follow from doxygen, but: the value
181 type of `function_map` is `goto_function_templatet<goto_programt>`.
185 **Task:** Read the documentation for `goto_function_templatet<bodyT>`
189 Each goto_programt object contains a list of
190 \ref goto_program_templatet::instructiont called
191 `instructions`. Each instruction has a field called `code`, which has
195 **Task:** Add a `--pretty-program` switch to `goto-instrument`. This
196 switch should use the `codet::pretty()` function to pretty-print every
197 \ref codet in the entire program. The strings that `pretty()` generates
198 for a codet look like this:
212 * working_directory: /some/dir
219 The sub-nodes of a particular node in the pretty representation are
220 numbered, starting from 0. They can be accessed through the `op0()`,
221 `op1()` and `op2()` methods in the `exprt` class.
223 Every node in the pretty representation has an identifier, accessed
224 through the `id()` function. The file `util/irep_ids.def` lists the
225 possible values of these identifiers; have a quick scan through that
226 file. In the pretty representation above, the following facts are true
227 of that particular node:
229 - `node.id() == ID_index`
230 - `node.type().id() == ID_unsignedbv`
231 - `node.op0().id() == ID_symbol`
232 - `node.op0().type().id() == ID_array`
234 The fact that the `op0()` child has a `symbol` ID menas that you could
235 cast it to a `symbol_exprt` (which is a subtype of `exprt`) using the
236 function `to_symbol_expr`.
239 **Task:** Add flags to `goto-instrument` to print out the following information:
240 * the name of every function that is *called* in the program;
241 * the value of every constant in the program;
242 * the value of every symbol in the program.