cprover
/builddir/build/BUILD/cbmc-cbmc-5.11/jbmc/src/java_bytecode/README.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \defgroup java_bytecode java_bytecode
3 
4 This module provides a front end for Java.
5 
6 \section java-bytecode-conversion-section Overview of conversion from bytecode to codet
7 
8 To be documented.
9 \subsection java-bytecode-lowering-to-goto Lowering to GOTO
10 
11 The Java language contains high-level programming concepts like virtual
12 functions and throw/catch semantics. These need to be rewritten in terms of
13 other, more fundamental operations in order to analyse the Java program. This
14 operation is referred to as "lowering". See Background Concepts for more
15 information.
16 
17 The following lowering operations are done on Java bytecode after converting
18 into a basic `codet` representation.
19 
20 - \ref java-bytecode-runtime-exceptions "Add runtime exceptions"
21 - \ref java-bytecode-remove-java-new "Remove `new` calls"
22 - \ref java-bytecode-remove-exceptions "Remove thrown exceptions"
23 - \ref java-bytecode-remove-instanceof
24 - As well as other non-Java specific transformations (see \ref goto-programs for
25  details on these)
26 
27 These are performed in `process_goto_function` for example
28 \ref jbmc_parse_optionst::process_goto_function
29 
30 Once these lowerings have been completed, you have a GOTO model that can be handled by \ref goto-symex.
31 
32 \section java-bytecode-array-representation How are Java arrays represented in GOTO
33 
34 To be documented.
35 
36 \section java-bytecode-object-factory Object Factory
37 
38 To be documented.
39 
40 \subsection java-bytecode-pointer-type-selection Pointer type selection
41 
42 In Java, all variables that are of a non-primitive type are pointers to
43 an object. When initializing such variables using \ref java_object_factoryt,
44 e.g., as input arguments for the method under test, we may need to select the
45 correct pointer type. For example, abstract classes can be replaced with their
46 concrete implementations and type parameters in generic types can be replaced
47 with their specialized types.
48 
49 The class \ref select_pointer_typet offers the basic interface for this
50 functionality, in particular see
51 \ref select_pointer_typet::convert_pointer_type. Note that this class only
52 implements generic specialization (see \ref
53 java-bytecode-generic-specialization), derived classes can override this
54 behavior to provide more sophisticated type selection.
55 
56 \subsection java-bytecode-generic-specialization Generic specialization
57 
58 In Java, generics are checked at compile-time for type-correctness. The generic
59 type information is then removed in a process called type erasure. However,
60 throwing away all this type information would be very inconvenient for our
61 analysis. Therefore, when we initialize variables that have generic type we
62 dynamically replace the generic parameters with their specialized types rather
63 than using only the raw type. For example, consider a generic class
64 `MyGeneric<T>` with a field `T myField`. When initializing a variable
65 `MyGeneric<Integer> mg` we change the type of its field `mg.myField` from
66 `MyGeneric::T` to `Integer`. Generic specialization is applied during pointer
67 type selection (see \ref java-bytecode-pointer-type-selection).
68 
69 The generic specialization relies on a map that stores the concrete types of all
70 generic parameters in the current scope. The map is maintained in \ref
71 java_object_factoryt::generic_parameter_specialization_map. Note that not all
72 generic parameters need to have a concrete type, e.g., when the method under
73 test is generic. The types are removed from the map when the scope changes. In
74 different depths of the scope the parameters can be specialized with different
75 types so we keep a stack of types for each parameter.
76 
77 We use the map in \ref select_pointer_typet::specialize_generics to
78 retrieve the concrete type of generic parameters such as `MyGeneric::T` and of
79 arrays of generic parameters such as `MyGeneric::T[]`. Special attention
80 is paid to breaking infinite recursion when searching the map, e.g.,
81 `MyGeneric::T` being specialized with `MyGeneric::U` and vice versa, for an
82 example of such a recursion see
83 \ref select_pointer_typet::get_recursively_instantiated_type. More complicated
84 generic types such as `MyGeneric<T>` are specialized indirectly within \ref
85 java_object_factoryt. Their concrete types are already stored in the map and are
86 retrieved when needed, e.g., to initialize their fields.
87 
88 \section java-bytecode-parsing Java bytecode parsing (parser, convert_class, convert_method)
89 
90 To be documented.
91 
92 \subsection java-class-section How a java program / class is represented in a .class
93 
94 Every Java class is compiled into a .class file. Inner classes, anonymous
95 classes or classes for tableswitches are also compiled into their own .class
96 files.
97 
98 There exists an [official
99 specification](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html)
100 
101 Each class files contains information about its version, the constant pool,
102 information about the contained class, its super-class, its implemented
103 interfaces, its fields, methods and finally additional attributes, such as
104 information about lambda functions used in the methods of the class or inner
105 classes the it contains.
106 
107 The content of a .class file can be inspected via the `javap` tool which is part
108 of the JDK installation. Useful options are `-c` for code, `-p` to display
109 protected / private methods `-l` for line numbers and local variable
110 information, as well as `-verbose` which prints a lot of additional information.
111 
112 In general, all variable length entries in a class file are represented by first
113 having an integer `n` that specifies the number of entries and then an array of
114 `n` such entries in the file. All variable length attribute also contain
115 information about their length in bytes.
116 
117 The integer format used in class files are unsigned integers of size 8/16/32
118 bit, which are named as `u1`/`u2`/`u4`.
119 
120 \subsection java-class-access-flags Access Flags
121 
122 The JVM specification defines different access flags, e.g., `final`, `static`,
123 `protected`, `private` etc. where different ones are applicable to the class
124 itself, its fields or methods. All access flags are represented as bits, the set
125 of bits that are defined for one entity is represented as disjunction of those
126 values. Each of these values is defined as a constant with a name prefixed with
127 `ACC_` in JBMC, e.g., as
128 @code
129 #define ACC_PUBLIC 0x0001
130 #define ACC_ENUM 0x4000
131 @endcode
132 
133 \subsection java-class-constant-pool Constant Pool
134 
135 The constant pool contains all strings and referred values that are used in the
136 .class. This includes the names of the class itself and its super-class, as well
137 as the names and signatures of all fields and methods. All strings in the
138 constant pool are in UTF-16 format.
139 
140 \subsection java-class-fields Fields
141 
142 Each member variable of a class has a field entry with a corresponding field
143 information structure. This contains the name of the field, its raw JVM type
144 (called the descriptor) and an optional signature.
145 
146 A signature is an extension to the Java raw types and contains information about
147 the generic type of an object if applicable.
148 
149 The name of the field, the descriptor and the signature are all represented as
150 indices into the constant pool of the class file.
151 
152 \subsection java-class-methods Methods
153 
154 Methods are represented in a similar way as fields. Each method has an
155 associated name, descriptor and optional signature entry in the constant pool
156 table.
157 
158 An implemented method also has several attributes. One is the `Code` attribute
159 that stores the actual bytecode instructions. There is also an optional
160 `LocalVariableTable` which contains the names of local variables and
161 parameters. In addition to this there is also an optional
162 `LocalVariableTypeTable` that contains information about generic local variables
163 and parameters. Finally the exception table is defined as entries in the
164 `Exceptions` attribute.
165 
166 Note: most information about generic types is optional and exists mainly as
167 debugger information. This is because the Java compiler ensures that typing is
168 correct and creates code accordingly. The same holds true for information on
169 local variables. It is therefore advisable to compile Java projects with the
170 `-g` option that adds debugging information in all cases.
171 
172 \subsection java-class-attributes Attributes
173 
174 The last section contains additional attributes, e.g., `SourceFile` which
175 specified from which source file the .class was compiled, `BootstrapMethods`
176 which is used for lambda functions or `InnerClasses` which refers to inner
177 classes of the class in question.
178 
179 
180 \section java-bytecode-runtime-exceptions Adding runtime exceptions (java_bytecode_instrument)
181 
182 To be documented.
183 
184 \section java-bytecode-concurrency-instrumentation Concurrency instrumentation
185 
186 Relevant code:
187 
188  - \ref java_bytecode_concurrency_instrumentation.h
189  - \ref java_bytecode_concurrency_instrumentation.cpp
190  - \ref java_static_initializers.h
191  - \ref java_static_initializers.cpp
192  - \ref java_bytecode_convert_method.cpp
193 
194 Concurrency instrumentation for Java programs is disabled by default. It can
195 be enabled by specifying the `--java-threading` command line flag.
196 
197 Thread-blocks have special semantics, in the sense that unlike basic blocks
198 they can be executed independently. JBMC treats a sequence of codet's
199 surrounded with calls to `CProver.startThread:(I)V` and `CProver.endThread:(I)V` as
200 a thread-block. These functions take one argument, an integer, that is used to
201 associate the start of a thread (i.e: calls to `CProver.startThread:(I)V`) to the
202 end of the thread (i.e: calls to `CProver.endThread:(I)V`). JBMC assumes that
203 calls to the these functions are well-formed, more specifically, each call to
204 `CProver.startThread:(I)V` must have an associated call to `CProver.endThread:(I)V`.
205 
206 The instrumentation process (described here) will transform the aforementioned
207 function calls, synchronized blocks/methods and calls to
208 `java.lang.Thread.getCurrentThreadID:()I` into appropriate codet. As part of
209 this process a new variable per thread is created along with a single global
210 variable that is used keep track of thread identifiers. These
211 variables are needed to handle calls to
212 `java.lang.Thread.getCurrentThreadID:()I`.
213 
214 Hold on, how do we go from Java threads to `CProver.startThread:(I)V` and
215 `CProver.endThread:(I)V`? Well, two ways
216 
217  - You are free to use the CProver API and manually and insert calls to the
218  `CProver.startThread:(I)V` and `CProver.endThread:(I)V`.
219  - Models! Which provide a mock implementation of the `java.lang.Thread`
220  class.
221 
222 Note: instrumentation of Java static initializers changes when the
223 `--java-threading` flag is specified, this is because the static initializer of
224 every class needs to be carefully synchronized to prevent superfluous
225 interleavings.
226 
227 Note': conversation of synchronized methods (`void synchronized test(){ ... }`) and
228 synchronized blocks (`synchronized(lock) { ... }`) are dealt with differently as
229 they are represented differently in bytecode. One is a flag, while the other
230 has explicit instructions.
231 
232 \subsection converting-thread-blocks Converting Thread Blocks
233 
234 JBMC will iterate through the symbol table to find and instrument
235 thread-blocks. Specifically, function calls to `CProver.startThread:(I)V` are
236 transformed into a `codet(id=ID_start_thread, destination=I)`, which carries a target label in
237 the field `destination`. Similarly `CProver.endThread(I)V` is transformed into
238 a `codet(id=ID_end_thread)`.
239 
240 For each new thread a thread local variable `__CPROVER__thread_id` is created.
241 The new id is obtained by incrementing a global variable
242 `__CPROVER__next_thread_id`.
243 
244 The semantics of `codet(id=ID_start_thread, destination=I)` roughly corresponds to: spawn the
245 current thread, continue the execution of the current thread in the next line,
246 and continue the execution of the new thread at the destination.
247 
248 Example, the following Java code:
249 
250 ```java
251  CProver.startThread(333);
252  ... // thread body
253  CProver.endThread(333);
254 ```
255 
256 is transformed into the following codet:
257 
258 ```C
259  codet(id=ID_start_thread, destination=__CPROVER_THREAD_ENTRY_333)
260  codet(id=ID_goto, destination=__CPROVER_THREAD_EXIT_333)
261  codet(id=ID_label, label=__CPROVER_THREAD_ENTRY_333)
262  codet(id=ID_atomic_begin)
263  __CPROVER__next_thread_id += 1;
264  __CPROVER__thread_id = __CPROVER__next_thread_id;
265  ... // thread body
266  codet(id=ID_end_thread)
267  codet(id=ID_label, label=__CPROVER_THREAD_EXIT_333)
268 ```
269 
270 The ID of the thread is made accessible to the Java program by having calls
271 to the function `CProver.getCurrentThreadID()I` replaced by the variable
272 `__CPROVER__thread_id`.
273 
274 Example, the following Java code:
275 
276 ```java
277  int g = java.lang.thead.getCurrentThreadID();
278 ```
279 
280 is transformed into the following codet:
281 
282 ```C
283  code_assignt(lhs=g, rhs=__CPROVER__thread_id)
284 ```
285 
286 \subsection converting-synchronized-blocks Converting Synchronized Blocks
287 
288 Synchronized blocks make it impossible for two synchronized blocks on the same
289 object to interleave.
290 
291 Converting synchronized blocks is rather straightforward as there is a specific
292 bytecode instruction to indicate the start (`monitorenter`) and end (`monitorexit`)
293 of a synchronized block.
294 
295 `monitorenter` is converted to a call to
296 `java::java.lang.Object.monitorenter:(Ljava/lang/Object;)V`.
297 
298 `monitorexit` is converted to a call to
299 `java::java.lang.Object.monitorexit:(Ljava/lang/Object;)V`.
300 
301 \subsection converting-synchronized-methods Converting Synchronized Methods
302 
303 Synchronized methods make it impossible for two invocations of the same method
304 on the same object to interleave.
305 
306 A method is considered to be synchronized and thus subject to the following
307 process, if the symbol that represents it has the irep_id `is_synchronized`
308 specified. In Java this corresponds to the `synchronized` keyword in the
309 declaration of a method.
310 
311 Unlike synchronized blocks Java synchronized methods do not have explicit calls
312 to the instructions `monitorenter` and `monitorexit` (or equivalents), instead
313 the JVM interprets the synchronized flag and internally implements
314 locking/unlocking on the object.
315 
316 To emulate this behaviour JBMC will iterate through the symbol table to find
317 synchronized methods. It will then insert a call to our model of `monitorenter`
318 (`java::java.lang.Object.monitorenter:(Ljava/lang/Object;)V`) at the beginning
319 of the method and calls to our model of `monitorexit`
320 (`java::java.lang.Object.monitorexit:(Ljava/lang/Object;)V`) is instrumented at
321 each return instruction. We wrap the entire body of the method with an
322 exception handler that will call our model of `monitorexit` if the method returns
323 exceptionally.
324 
325 Example, the following Java code:
326 
327 ```java
328 synchronized int amethod(int p)
329  {
330  int x = 0;
331  if(p == 0)
332  return -1;
333  x = p / 10
334  return x
335  }
336 ```
337 
338 The codet version of the above is too verbose be shown here. Instead the following code depicts
339 our equivalent Java code interpretation of the above:
340 
341 ```java
342 int amethod(int p)
343 {
344  try
345  {
346  java::java.lang.Object.monitorenter(this);
347  int x = 0;
348  if(p == 0)
349  {
350  java::java.lang.Object.monitorexit(this);
351  return -1;
352  }
353  java::java.lang.Object.monitorexit(this);
354  return x
355  }
356  catch(java::java.lang.Throwable e)
357  {
358  // catch every exception, including errors!
359  java::java.lang.Object.monitorexit(this);
360  throw e;
361  }
362 }
363 ```
364 
365 \section java-bytecode-remove-java-new Remove `new`, `newarray` and `multianewarray` bytecode operators
366 
367 \ref remove_java_new.h is responsible for converting the `new`, `newarray` and
368 `multianewarray` Java bytecode operation into \ref codet. Specifically it
369 converts the bytecode instruction into: - An ALLOCATE with the size of the
370 object being created - An assignment to the value zeroing its contents - If an
371 array, initializing the size and data components - If a multi-dimensional
372 array, recursively calling `java_new` on each sub array
373 
374 An ALLOCATE is a \ref side_effect_exprt that is interpreted by \ref
375 goto_symext::symex_allocate
376 
377 _Note: it does not call the constructor as this is done by a separate
378 java_bytecode operation._
379 
380 \subsection java_objects Java Objects (`new`)
381 
382 The basic `new` operation is represented in Java bytecode by the `new` op
383 
384 These are converted by \ref remove_java_newt::lower_java_new
385 
386 For example, the following Java code:
387 
388 ```java
389 TestClass f = new TestClass();
390 ```
391 
392 Which is represented as the following Java bytecode:
393 
394 ```
395 0: new #2 // class TestClass
396 3: dup
397 4: invokespecial #3 // Method "<init>":()V
398 ```
399 
400 The first instruction only is translated into the following `codet`s:
401 
402 ```cpp
403 tmp_object1 = side_effect_exprt(ALLOCATE, sizeof(TestClass))
404 *tmp_object1 = struct_exprt{.component = 0, ... }
405 ```
406 
407 For more details about the zero expression see \ref expr_initializert
408 
409 \subsection oned_arrays Single Dimensional Arrays (`newarray`)
410 
411 The new Java array operation is represented in Java bytecode by the `newarray`
412 operation.
413 
414 These are converted by \ref remove_java_newt::lower_java_new_array
415 
416 See \ref java-bytecode-array-representation for details on how arrays are
417 represented in codet.
418 
419 A `newarray` is represented as:
420  - an allocation of the array object (the same as with a regular Java object).
421  - Initialize the size component
422  - Initialize the data component
423 
424 For example the following Java:
425 
426 ```java
427 TestClass[] tArray = new TestClass[5];
428 ```
429 
430 Which is compiled into the following Java bytecode:
431 
432 ```
433 8: iconst_5
434 9: anewarray #2 // class TestClass
435 ```
436 
437 Is translated into the following `codet`s:
438 
439 ```cpp
440 tmp_object1 = side_effect_exprt(ALLOCATE, sizeof(java::array[referenence]))
441 *tmp_object1 = struct_exprt{.size = 0, .data = null}
442 tmp_object1->length = length
443 tmp_new_data_array1 = side_effect_exprt(java_new_array_data, TestClass)
444 tmp_object1->data = tmp_new_data_array1
445 ARRAY_SET(tmp_new_data_array1, NULL)
446 ```
447 
448 The `ARRAY_SET` `codet` sets all the values to null.
449 
450 \subsection multidarrays Multi Dimensional Arrays (`newmultiarray`)
451 
452 The new Java multi dimensional array operation is represented in bytecode by
453 `multianewarray`
454 
455 These are also by \ref remove_java_newt::lower_java_new_array
456 
457 They work the same as single dimensional arrays but create a for loop for each
458 element in the array since these start initialized.
459 
460 ```cpp
461 for(tmp_index = 0; tmp_index < dim_size; ++tmp_index)
462 {
463  struct java::array[reference] *subarray_init;
464  subarray_init = java_new_array
465  newarray_tmp1->data[tmp_index] = (void *)subarray_init;
466 }
467 ```
468 
469 `remove_java_new` is then recursively applied to the new `subarray`.
470 
471 \section java-bytecode-remove-exceptions Remove exceptions
472 
473 When \ref remove_exceptions is called on the \ref goto_modelt, the
474 goto model contains complex instructions (\ref goto_program_instruction_typet)
475 such as `CATCH-POP`, `CATCH-PUSH` and `THROW`. In order to analyze the goto
476 model, the instructions must be simplified to use more basic instructions - this
477 is called "lowering". This class lowers the `CATCH` and `THROW` instructions.
478 
479 `THROW` instructions are replaced by assigning to `@inflight_exception` and a
480 goto to end of the function. `CATCH` instructions are replaced by a check of
481 the `@inflight_exception` and setting it to null.
482 
483 \subsection throw THROW
484 
485 Consider a simple method `testException(I)V`:
486 
487 ```java
488 public class TestExceptions {
489  int field;
490  public void testException(int i) throws Exception {
491  if (i < 0) {
492  throw new Exception();
493  }
494  field = i;
495  }
496 }
497 ```
498 
499 The goto for `testException(I)V` before `remove_exceptions` (removing comments
500 and replacing irrelevant parts with `...`) is:
501 
502 ```
503 TestExceptions.testException(int) /* java::TestExceptions.testException:(I)V */
504  IF i >= 0 THEN GOTO 3
505  struct java.lang.Exception *new_tmp0;
506  new_tmp0 = new struct java.lang.Exception;
507  IF !(new_tmp0 == null) THEN GOTO 1
508 
509 ...
510 
511 1: SKIP
512  new_tmp0 . java.lang.Exception.<init>:()V();
513 
514 ...
515 
516 2: SKIP
517  THROW: throw(new_tmp0)
518  dead new_tmp0;
519 
520 ...
521 
522 ```
523 where there is a `THROW` instruction to be replaced.
524 
525 After passing the goto model through `remove_exceptions`, it is:
526 
527 ```
528 TestExceptions.testException(int) /* java::TestExceptions.testException:(I)V */
529  IF i >= 0 THEN GOTO 4
530  struct java.lang.Exception *new_tmp0;
531  new_tmp0 = new struct java.lang.Exception;
532  IF !(new_tmp0 == null) THEN GOTO 1
533 
534 ...
535 
536 1: new_tmp0 . java.lang.Exception.<init>:()V();
537  IF @inflight_exception == null THEN GOTO 2 // it is because we've not used it yet
538 
539 ...
540 
541 2: IF !(new_tmp0 == null) THEN GOTO 3
542 
543 ...
544 
545 3: @inflight_exception = (void *)new_tmp0;
546  dead new_tmp0;
547 
548 ...
549 
550 ```
551 where now instead of the instruction `THROW`, the global variable
552 `@inflight_exception` holds the thrown exception in a separate goto statement.
553 
554 \subsection catch CATCH-PUSH, CATCH-POP and EXCEPTION LANDING PAD
555 
556 Consider the method `catchSomething(I)V` that tries the above method
557 `testException(I)V` and catches the exception:
558 
559 ```java
560 public class TestExceptions {
561  int field;
562  public void testException(int i) throws Exception {
563  if (i < 0) {
564  throw new Exception();
565  }
566  field = i;
567  }
568  public void catchSomething(int i) {
569  try {
570  testException(i);
571  } catch (Exception e) {
572  }
573  }
574 }
575 ```
576 
577 The goto model before `remove_exceptions` is:
578 ```
579 TestExceptions.catchSomething(int) /* java::TestExceptions.catchSomething:(I)V */
580 
581 ...
582 
583  CATCH-PUSH ->2
584 
585 ...
586 
587 1: SKIP
588  this . com.diffblue.regression.TestExceptions.testException:(I)V(i);
589  CATCH-POP
590  GOTO 3
591 2: void *anonlocal::2a;
592  struct java.lang.Exception *caught_exception_tmp0;
593  EXCEPTION LANDING PAD (struct java.lang.Exception * caught_exception_tmp0)
594  anonlocal::2a = (void *)caught_exception_tmp0;
595  dead caught_exception_tmp0;
596  dead anonlocal::2a;
597 
598 ...
599 
600 ```
601 The `CATCH-PUSH` instruction signifies the start of the try block, the
602 `CATCH-POP` instruction signifies the end of the try block, and the `EXCEPTION
603 LANDING PAD` signifies beginning of the catch block.
604 
605 After `remove_exceptions` the goto model is:
606 
607 ```
608 TestExceptions.catchSomething(int) /* java::TestExceptions.catchSomething:(I)V */
609 
610 ...
611 
612  this . com.diffblue.regression.TestExceptions.testException:(I)V(i);
613  IF @inflight_exception == null THEN GOTO 3 // true if testException does not throw, method terminates normally
614  IF !(@inflight_exception == null) THEN GOTO 1 // true if testException throws, enters catch block
615 
616 ...
617 
618 1: __CPROVER_string class_identifier_tmp;
619  class_identifier_tmp = ((struct java.lang.Object *)@inflight_exception)->@class_identifier;
620  instanceof_result_tmp = class_identifier_tmp == "java::java.lang.Exception" || ... ; // TRUE
621  dead class_identifier_tmp;
622 2: IF instanceof_result_tmp THEN GOTO 4
623 
624 ...
625 
626 3: ASSERT false // block 3
627  GOTO 5
628 4: void *anonlocal::2a; // The catch block
629  struct java.lang.Exception *caught_exception_tmp0;
630  ASSERT false // block 4
631  caught_exception_tmp0 = (struct java.lang.Exception *)@inflight_exception;
632  @inflight_exception = null;
633  anonlocal::2a = (void *)caught_exception_tmp0;
634  dead caught_exception_tmp0;
635  dead anonlocal::2a;
636 5: ASSERT false // block 5
637 6: END_FUNCTION
638 ```
639 where the `CATCH-PUSH` has been replaced by a check on the `@inflight_exception`
640 variable and goto statements, the `CATCH-POP` replaced by a check on the class
641 of the exception and a goto statement, and the `EXCEPTION LANDING PAD` replaced
642 by a section that assigns the exception to a local variable and sets the
643 `@inflight_exception` back to null.
644 
645 \section java-bytecode-remove-instanceof Remove instanceof
646 
647 \ref remove_instanceof.h removes the bytecode instruction `instanceof ` and replaces it with two instructions:
648  - check whether the pointer is null
649  - if not null, does the class identifier match the type of any of its subtypes
650 
651 \section java-bytecode-method-stubbing Method stubbing
652 
653 To be documented.
654 
655 \section loading-strategies Loading strategies
656 
657 Loading strategies are policies that determine what classes and method bodies
658 are loaded. On an initial pass, the symbols for all classes in all paths on
659 the Java classpath are loaded. Eager loading is a policy that then loads
660 all the method bodies for every one of these classes. Lazy loading
661 strategies are policies that only load class symbols and/or method bodies
662 when they are in some way requested.
663 
664 The mechanism used to achieve this is initially common to both eager and
665 context-insensitive lazy loading. \ref java_bytecode_languaget::typecheck calls
666 [java_bytecode_convert_class](\ref java_bytecode_languaget::java_bytecode_convert_class)
667 (for each class loaded by the class loader) to create symbols for all classes
668 and the methods in them. The methods, along with their parsed representation
669 (including bytecode) are also added to a map called
670 \ref java_bytecode_languaget::method_bytecode via a reference held in
671 \ref java_bytecode_convert_classt::method_bytecode. typecheck then performs a
672 switch over the value of
673 [lazy_methods_mode](\ref java_bytecode_languaget::lazy_methods_mode) to
674 determine which loading strategy to use.
675 
676 \subsection eager-loading Eager loading
677 
678 If [lazy_methods_mode](\ref java_bytecode_languaget::lazy_methods_mode) is
679 \ref java_bytecode_languaget::LAZY_METHODS_MODE_EAGER then eager loading is
680 used. Under eager loading
681 \ref java_bytecode_languaget::convert_single_method(const irep_idt &, symbol_table_baset &)
682 is called once for each method listed in method_bytecode (described above). This
683 then calls
684 \ref java_bytecode_languaget::convert_single_method(const irep_idt &, symbol_table_baset &, optionalt<ci_lazy_methods_neededt>)
685 without a ci_lazy_methods_neededt object, which calls
686 \ref java_bytecode_convert_method, passing in the method parse tree. This in
687 turn uses \ref java_bytecode_convert_methodt to turn the bytecode into symbols
688 for the parameters and local variables and finish populating the symbol for the
689 method, including converting the instructions to a codet.
690 
691 \subsection java-bytecode-lazy-methods-v1 Context-Insensitive lazy methods (aka lazy methods v1)
692 
693 Context-insensitive lazy loading is an alternative method body loading strategy
694 to eager loading that has been used in Deeptest for a while.
695 Context-insensitive lazy loading starts at the method given by the `--function`
696 argument and follows type references (e.g. in the definitions of fields and
697 method parameters) and function references (at function call sites) to
698 locate further classes and methods to load. The following paragraph describes
699 the mechanism used.
700 
701 If [lazy_methods_mode](\ref java_bytecode_languaget::lazy_methods_mode) is
702 \ref lazy_methods_modet::LAZY_METHODS_MODE_CONTEXT_INSENSITIVE then
703 context-insensitive lazy loading is used. Under this stragegy
704 \ref java_bytecode_languaget::do_ci_lazy_method_conversion is called to do all
705 conversion. This calls
706 \ref ci_lazy_methodst::operator()(symbol_tablet &, method_bytecodet &, const method_convertert &),
707 which creates a work list of methods to check, starting with the entry point,
708 and classes, starting with the types of any class-typed parameters to the entry
709 point. For each method in the work list it calls
710 \ref ci_lazy_methodst::convert_and_analyze_method, which calls the same
711 java_bytecode_languaget::convert_single_method used by eager loading to do the
712 conversion (via a `std::function` object passed in via parameter
713 method_converter) and then calls
714 \ref ci_lazy_methodst::gather_virtual_callsites to locate virtual calls. Any
715 classes that may implement an override of the virtual function called are added
716 to the work list. Finally the symbol table is iterated over and methods that
717 have been converted, their parameters and local variables, globals accessed
718 from these methods and classes are kept, everything else is thrown away. This
719 leaves a symbol table that contains reachable symbols fully populated,
720 including the instructions for methods converted to a \ref codet.
721 
722 \section java-bytecode-core-models-library Core Models Library
723 
724 To be documented.
725 
726 \section java-bytecode-java-types java_types
727 
728 To be documented.
729 
730 \section java-bytecode-string-library String library
731 
732 To be documented.
733 
734 See also \ref string-solver-interface.
735 
736 \section java-bytecode-conversion-example-section A worked example of converting java bytecode to codet
737 
738 To be documented.