cprover
/builddir/build/BUILD/cbmc-cbmc-5.8/doc/architectural/howto.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \page cbmc-hacking CBMC Hacking HOWTO
3 
4 \author Kareem Khazem
5 
6 This is an introduction to hacking on the `cprover` codebase. It is not
7 intended as a user guide to `CBMC` or related tools. It is structured
8 as a series of programming exercises that aim to acclimatise the reader
9 to the basic data structures and workflow needed for contributing to
10 `CBMC`.
11 
12 
13 ## Initial setup
14 
15 Clone the [CBMC repository][cbmc-repo] and build it:
16 
17  git clone https://github.com/diffblue/cbmc.git
18  cd cbmc/src
19  make minisat2-download
20  make
21 
22 Ensure that [graphviz][graphviz] is installed on your system (in
23 particular, you should be able to run a program called `dot`). Install
24 [Doxygen][doxygen] and generate doxygen documentation:
25 
26  # In the src directory
27  doxygen doxyfile
28  # View the documentation in a web browser
29  firefox doxy/html/index.html
30 
31 If you've never used doxygen documentation before, get familiar with the
32 layout. Open the generated HTML page in a web browser; search for the
33 class `goto_programt` in the search bar, and jump to the documentation
34 for that class; and read through the copious documentation.
35 
36 The build writes executable programs into several of the source
37 directories. In this tutorial, we'll be using binaries inside the
38 `cbmc`, `goto-instrument`, and `goto-cc` directories. Add these
39 directories to your `$PATH`:
40 
41  # Assuming you cloned CBMC into ~/code
42  export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH
43  # Add to your shell's startup configuration file so that you don't have to run that command every time.
44  echo 'export PATH=~/code/cbmc/src/goto-instrument:~/code/cbmc/src/goto-cc:~/code/cbmc/src/cbmc:$PATH' >> .bashrc
45 
46 Optional: install an image viewer that can read images on stdin.
47 I use [feh][feh].
48 
49 [cbmc-repo]: https://github.com/diffblue/cbmc/
50 [doxygen]: http://www.stack.nl/~dimitri/doxygen/
51 [graphviz]: http://www.graphviz.org/
52 [feh]: https://feh.finalrewind.org/
53 
54 
55 
56 ## Whirlwind tour of the tools
57 
58 CBMC's code is located under the `cbmc` directory. Even if you plan to
59 contribute only to CBMC, it is important to be familiar with several
60 other of cprover's auxiliary tools.
61 
62 
63 ### Compiling with `goto-cc`
64 
65 There should be an executable file called `goto-cc` in the `goto-cc`
66 directory; make a symbolic link to it called `goto-gcc`:
67 
68  cd cbmc/src/goto-cc
69  ln -s "$(pwd)/goto-cc" goto-gcc
70 
71 Find or write a moderately-interesting C program; we'll call it `main.c`.
72 Run the following commands:
73 
74  goto-gcc -o main.goto main.c
75  cc -o main.exe main.c
76 
77 Invoke `./main.goto` and `./main.exe` and observe that they run identically.
78 The version that was compiled with `goto-gcc` is larger, though:
79 
80  du -hs *.{goto,exe}
81 
82 Programs compiled with `goto-gcc` are mostly identical to their `clang`-
83 or `gcc`-compiled counterparts, but contain additional object code in
84 cprover's intermediate representation. The intermediate representation
85 is (informally) called a *goto-program*.
86 
87 
88 ### Viewing goto-programs
89 
90 `goto-instrument` is a Swiss army knife for viewing goto-programs and
91 performing single program analyses on them. Run the following command:
92 
93  goto-instrument --show-goto-functions main.goto
94 
95 Many of the instructions in the goto-program intermediate representation
96 are similar to their C counterparts. `if` and `goto` statements replace
97 structured programming constructs.
98 
99 Find or write a small C program (2 or 3 functions, each containing a few
100 varied statements). Compile it using `goto-gcc` as above into an object
101 file called `main`. If you installed `feh`, try the following command
102 to dump a control-flow graph:
103 
104  goto-instrument --dot main | tail -n +2 | dot -Tpng | feh -
105 
106 If you didn't install `feh`, you can write the diagram to the file and
107 then view it:
108 
109  goto-instrument --dot main | tail -n +2 | dot -Tpng > main.png
110  Now open main.png with an image viewer
111 
112 (the invocation of `tail` is used to filter out the first line of
113 `goto-instrument` output. If `goto-instrument` writes more or less
114 debug output by the time you read this, read the output of
115 `goto-instrument --dot main` and change the invocation of `tail`
116 accordingly.)
117 
118 There are a few other views of goto-programs. Run `goto-instrument -h`
119 and try the various switches under the "Diagnosis" section.
120 
121 
122 
123 ## Learning about goto-programs
124 
125 In this section, you will learn about the basic goto-program data
126 structures. Reading from and manipulating these data structures form
127 the core of writing an analysis for CBMC.
128 
129 
130 ### First steps with `goto-instrument`
131 
132 <div class=memdoc>
133 **Task:** Write a simple C program with a few functions, each containing
134 a few statements. Compile the program with `goto-gcc` into a binary
135 called `main`.
136 </div>
137 
138 
139 The entry point of `goto-instrument` is in `goto_instrument_main.cpp`.
140 Follow the control flow into `goto_instrument_parse_optionst::doit()`, located in `goto_instrument_parse_options.cpp`.
141 At some point in that function, there will be a long sequence of `if` statements.
142 
143 <div class=memdoc>
144 **Task:** Add a `--greet` switch to `goto-instrument`, taking an optional
145 argument, with the following behaviour:
146 
147  $ goto-instrument --greet main
148  hello, world!
149  $ goto-instrument --greet Leperina main
150  hello, Leperina!
151 
152 You will also need to add the `greet` option to the
153 `goto_instrument_parse_options.h` file in order for this to work.
154 Notice that in the `.h` file, options that take an argument are followed
155 by a colon (like `(property):`), while simple switches have no colon.
156 Make sure that you `return 0;` after printing the message.
157 </div>
158 
159 The idea behind `goto-instrument` is that it parses a goto-program and
160 then performs one single analysis on that goto-program, and then
161 returns. Each of the switches in `doit` function of
162 `goto_instrument_parse_options` does something different with the
163 goto-program that was supplied on the command line.
164 
165 
166 ### Goto-program basics
167 
168 At this point in `goto-instrument_parse_options` (where the `if`
169 statements are), the goto-program will have been loaded into the object
170 `goto_functions`, of type `goto_functionst`. This has a field called
171 `function_map`, a map from function names to functions.
172 
173 
174 <div class="memdoc">
175 **Task:** Add a `--print-function-names` switch to `goto-instrument`
176 that prints out the name of every function in the goto-binary. Are
177 there any functions that you didn't expect to see?
178 </div>
179 
180 The following is quite difficult to follow from doxygen, but: the value
181 type of `function_map` is `goto_function_templatet<goto_programt>`.
182 
183 
184 <div class=memdoc>
185 **Task:** Read the documentation for `goto_function_templatet<bodyT>`
186 and `goto_programt`.
187 </div>
188 
189 Each goto_programt object contains a list of
190 \ref goto_program_templatet::instructiont called
191 `instructions`. Each instruction has a field called `code`, which has
192 type \ref codet.
193 
194 <div class=memdoc>
195 **Task:** Add a `--pretty-program` switch to `goto-instrument`. This
196 switch should use the `codet::pretty()` function to pretty-print every
197 \ref codet in the entire program. The strings that `pretty()` generates
198 for a codet look like this:
199 
200  index
201  * type: unsignedbv
202  * width: 8
203  * #c_type: char
204  0: symbol
205  * type: array
206  * size: nil
207  * type:
208  * #source_location:
209  * file: src/main.c
210  * line: 18
211  * function:
212  * working_directory: /some/dir
213  0: unsignedbv
214  * width: 8
215  * #c_type: char
216  ...
217 </div>
218 
219 The sub-nodes of a particular node in the pretty representation are
220 numbered, starting from 0. They can be accessed through the `op0()`,
221 `op1()` and `op2()` methods in the `exprt` class.
222 
223 Every node in the pretty representation has an identifier, accessed
224 through the `id()` function. The file `util/irep_ids.def` lists the
225 possible values of these identifiers; have a quick scan through that
226 file. In the pretty representation above, the following facts are true
227 of that particular node:
228 
229  - `node.id() == ID_index`
230  - `node.type().id() == ID_unsignedbv`
231  - `node.op0().id() == ID_symbol`
232  - `node.op0().type().id() == ID_array`
233 
234 The fact that the `op0()` child has a `symbol` ID menas that you could
235 cast it to a `symbol_exprt` (which is a subtype of `exprt`) using the
236 function `to_symbol_expr`.
237 
238 <div class=memdoc>
239 **Task:** Add flags to `goto-instrument` to print out the following information:
240 * the name of every function that is *called* in the program;
241 * the value of every constant in the program;
242 * the value of every symbol in the program.
243 </div>