If the function tracer is enabled, allow to set kprobes on the first
instruction of a function (which is the function trace caller):
If no kprobe is set handling of enabling and disabling function tracing
of a function simply patches the first instruction. Either it is a nop
(right now it's an unconditional branch, which skips the mcount block),
or it's a branch to the ftrace_caller() function.
If a kprobe is being placed on a function tracer calling instruction
we encode if we actually have a nop or branch in the remaining bytes
after the breakpoint instruction (illegal opcode).
This is possible, since the size of the instruction used for the nop
and branch is six bytes, while the size of the breakpoint is only
two bytes.
Therefore the first two bytes contain the illegal opcode and the last
four bytes contain either "0" for nop or "1" for branch. The kprobes
code will then execute/simulate the correct instruction.
Instruction patching for kprobes and function tracer is always done
with stop_machine(). Therefore we don't have any races where an
instruction is patched concurrently on a different cpu.
Besides that also the program check handler which executes the function
trace caller instruction won't be executed concurrently to any
stop_machine() execution.
This allows to keep full fault based kprobes handling which generates
correct pt_regs contents automatically.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
31 bit and 64 bit diverge more and more and it is rather painful
to keep both parts running.
To make things simpler just remove the 31 bit support which nobody
uses anyway.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull kbuild misc updates from Michal Marek:
"This is the non-critical part of kbuild for v3.16-rc1:
- make deb-pkg can do s390x and arm64
- new patterns in scripts/tags.sh
- scripts/tags.sh skips userspace tools' sources (which sometimes
have copies of kernel structures) and symlinks
- improvements to the objdiff tool
- two new coccinelle patches
- other minor fixes"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scripts: objdiff: support directories for the augument of record command
scripts: objdiff: fix a comment
scripts: objdiff: change the extension of disassembly from .o to .dis
scripts: objdiff: improve path flexibility for record command
scripts: objdiff: remove unnecessary code
scripts: objdiff: direct error messages to stderr
scripts: objdiff: get the path to .tmp_objdiff more simply
deb-pkg: Add automatic support for s390x architecture
coccicheck: Add unneeded return variable test
kbuild: Fix a typo in documentation
kbuild: trivial - use tabs for code indent where possible
kbuild: trivial - remove trailing empty lines
coccinelle: Check for missing NULL terminators in of_device_id tables
scripts/tags.sh: ignore symlink'ed source files
scripts/tags.sh: add regular expression replacement pattern for memcg
builddeb: add arm64 in the supported architectures
builddeb: use $OBJCOPY variable instead of objcopy
scripts/tags.sh: ignore code of user space tools
scripts/tags.sh: add pattern for DEFINE_HASHTABLE
.gitignore: ignore Module.symvers in all directories
Recordmcount utility under scripts is run, after compiling each object,
to find out all the locations of calling _mcount() and put them into
specific seciton named __mcount_loc.
Then linker collects all such information into a table in the kernel image
(between __start_mcount_loc and __stop_mcount_loc) for later use by ftrace.
This patch adds arm64 specific definitions to identify such locations.
There are two types of implementation, C and Perl. On arm64, only C version
is used to build the kernel now that CONFIG_HAVE_C_RECORDMCOUNT is on.
But Perl version is also maintained.
This patch also contains a workaround just in case where a header file,
elf.h, on host machine doesn't have definitions of EM_AARCH64 nor
R_AARCH64_ABS64. Without them, compiling C version of recordmcount will
fail.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Do the mcount offset adjustment in the recordmcount.pl/recordmcount.[ch]
at compile time and not in ftrace_call_adjust at run time.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Do the mcount offset adjustment in the recordmcount.pl/recordmcount.[ch]
at compile time and not in ftrace_call_adjust at run time.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There's some sections that should not have mcount recorded and should not have
modifications to the that code. But currently they waste some time by calling
mcount anyway (which simply returns). As the real answer should be to
either whitelist the section or have gcc ignore it fully.
This change adds a option to recordmcount to warn when it finds a section
that is ignored by ftrace but still contains mcount callers. This is not on
by default as developers may not know if the section should be completely
ignored or added to the whitelist.
Cc: John Reiser <jreiser@bitwagon.com>
Link: http://lkml.kernel.org/r/20110421023738.476989377@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There are sections that are ignored by ftrace for the function tracing because
the text is in a section that can be removed without notice. The mcount calls
in these sections are ignored and ftrace never sees them. The downside of this
is that the functions in these sections still call mcount. Although the mcount
function is defined in assembly simply as a return, this added overhead is
unnecessary.
The solution is to convert these callers into nops at compile time.
A better solution is to add 'notrace' to the section markers, but as new sections
come up all the time, it would be nice that they are delt with when they
are created.
Later patches will deal with finding these sections and doing the proper solution.
Thanks to H. Peter Anvin for giving me the right nops to use for x86.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Reiser <jreiser@bitwagon.com>
Link: http://lkml.kernel.org/r/20110421023738.237101176@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The .kprobe.text section is safe to modify mcount to nop and tracing.
Add it to the whitelist in recordmcount.c and recordmcount.pl.
Cc: John Reiser <jreiser@bitwagon.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20110421023737.743350547@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The Linux style for switch statements is:
switch (var) {
case x:
[...]
break;
}
Not:
switch (var) {
case x: {
[...]
} break;
Cc: John Reiser <jreiser@bitwagon.com>
Link: http://lkml.kernel.org/r/20110421023737.523968644@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The Linux ftrace subsystem style for comparing is:
var == 1
var > 0
and not:
1 == var
0 < var
It is considered that Linux developers are smart enough not to do the
if (var = 1)
mistake.
Cc: John Reiser <jreiser@bitwagon.com>
Link: http://lkml.kernel.org/r/20110421023737.290712238@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The section .ref.text will not go away unexpectedly and is
safe to trace. Add it to the safe list of sections to allow
tracing.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Depending on the compiler version, ARM GCC calls the mcount function
either __gnu_mcount_nc or mcount.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
arch/arm/kernel/ftrace.c references mcount like kernel/tracing/ftrace.c,
so change the exclusion filter to match any ftrace.o.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Since MIPS modules' address space differs from the core kernel space, to access
the _mcount in the core kernel, the kernel functions in modules must use long
call (-mlong-calls): load the _mcount address into one register and jump to the
address stored by the register:
c: 3c030000 lui v1,0x0 <--------> b label
c: R_MIPS_HI16 _mcount
c: R_MIPS_NONE *ABS*
c: R_MIPS_NONE *ABS*
10: 64630000 daddiu v1,v1,0
10: R_MIPS_LO16 _mcount
10: R_MIPS_NONE *ABS*
10: R_MIPS_NONE *ABS*
14: 03e0082d move at,ra
18: 0060f809 jalr v1
label:
In the old Perl version of recordmcount, we only need to record the position of
the 1st R_MIPS_HI16 type of _mcount, and later, in ftrace_make_nop(), replace
the instruction in this position by a "b label" and in ftrace_make_call(),
replace it back.
But, the default C version of recordmcount records all of the _mcount symbols,
so, we must filter the 2nd _mcount like the Perl version of recordmcount does.
The C version of recordmcount copes with the symbols before they are linked, So
It doesn't know the type of the symbols and therefore can not filter the
symbols as the Perl version of recordmcount does. But as we can see above, the
2nd _mcount symbols of the long call alawys follows the 1st _mcount symbol of
the same long call, which means the offset from the 1st to the 2nd is fixed, it
is 0x10-0xc = 4 here, 4 is the length of the 1st load instruciton, for MIPS has
fixed length of instructions, this offset is always 4.
And as we know, the _mcount is inserted into the entry of every kernel
function, the offset between the other _mcount's is expected to be always
bigger than 4. So, to filter the 2ns _mcount symbol of the long call, we can
simply check the offset between two _mcount symbols, If it is 4, then, filter
the 2nd _mcount symbol.
To avoid touching too much code, an 'empty' function fn_is_fake_mcount() is
added for all of the archs, and the specific archs can override it via chaning
the function pointer: is_fake_mcount in do_file() with the e_machine. e.g. This
patch adds MIPS_is_fake_mcount() to override the default fn_is_fake_mcount()
pointed by is_fake_mcount.
This fn_is_fake_mcount() checks if the _mcount symbol is fake, e.g. the 2nd
_mcount symbol of the long call is fake, for there are 2 _mcount symbols mapped
to one real mcount call, so, one of them is fake and must be filtered.
This fn_is_fake_mcount() is called in sift_rel_mcount() after finding the
_mcount symbols and before adding the _mcount symbol into mrelp, so, it can
prevent the fake mcount symbol going into the last __mcount_loc table.
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
LKML-Reference: <b866f0138224340a132d31861fa3f9300dee30ac.1288176026.git.wuzhangjin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
MIPS64 has 'weird' Elf64_Rel.r_info[1,2], which must be used instead of
the generic Elf64_Rel.r_info, otherwise, the C version of recordmcount
will not work for "segmentation fault".
Usage of "union mips_r_info" and the functions MIPS64_r_sym() and
MIPS64_r_info() written by Maciej W. Rozycki <macro@linux-mips.org>
----
[1] http://techpubs.sgi.com/library/manuals/4000/007-4658-001/pdf/007-4658-001.pdf
[2] arch/mips/include/asm/module.h
Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Signed-off-by: John Reiser <jreiser@BitWagon.com>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
LKML-Reference: <AANLkTinwXjLAYACUfhLYaocHD_vBbiErLN3NjwN8JqSy@mail.gmail.com>
LKML-Reference: <910dc2d5ae1ed042df4f96815fe4a433078d1c2a.1288176026.git.wuzhangjin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The file kernel/trace/ftrace.c references the mcount() call to
convert the mcount() callers to nops. But because it references
mcount(), the mcount() address is placed in the relocation table.
The C version of recordmcount reads the relocation table of all
object files, and it will add all references to mcount to the
__mcount_loc table that is used to find the places that call mcount()
and change the call to a nop. When recordmcount finds the mcount reference
in kernel/trace/ftrace.o, it saves that location even though the code
is not a call, but references mcount as data.
On boot up, when all calls are converted to nops, the code has a safety
check to determine what op code it is actually replacing before it
replaces it. If that op code at the address does not match, then
a warning is printed and the function tracer is disabled.
The reference to mcount in ftrace.c, causes this warning to trigger,
since the reference is not a call to mcount(). The ftrace.c file is
not compiled with the -pg flag, so no calls to mcount() should be
expected.
This patch simply makes recordmcount.c skip the kernel/trace/ftrace.c
file. This was the same solution used by the perl version of
recordmcount.
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The elf reader for recordmcount.c had duplicate functions for both
32 bit and 64 bit elf handling. This was due to the need of using
the 32 and 64 bit elf structures.
This patch consolidates the two by using macros to define the 32
and 64 bit names in a recordmcount.h file, and then by just defining
a RECORD_MCOUNT_64 macro and including recordmcount.h twice we
create the funtions for both the 32 bit version as well as the
64 bit version using one code source.
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently, the mcount callers are found with a perl script that does
an objdump on every file in the kernel. This is a C version of that
same code which should increase the performance time of compiling
the kernel with dynamic ftrace enabled.
Signed-off-by: John Reiser <jreiser@bitwagon.com>
[ Updated the code to include .text.unlikely section as well as
changing the format to follow Linux coding style. ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>