7a8e76a382
This is a unified tracing buffer that implements a ring buffer that hopefully everyone will eventually be able to use. The events recorded into the buffer have the following structure: struct ring_buffer_event { u32 type:2, len:3, time_delta:27; u32 array[]; }; The minimum size of an event is 8 bytes. All events are 4 byte aligned inside the buffer. There are 4 types (all internal use for the ring buffer, only the data type is exported to the interface users). RINGBUF_TYPE_PADDING: this type is used to note extra space at the end of a buffer page. RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events is greater than the 27 bit delta can hold. We add another 32 bits, and record that in its own event (8 byte size). RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to help keep the buffer timestamps in sync. RINGBUF_TYPE_DATA: The event actually holds user data. The "len" field is only three bits. Since the data must be 4 byte aligned, this field is shifted left by 2, giving a max length of 28 bytes. If the data load is greater than 28 bytes, the first array field holds the full length of the data load and the len field is set to zero. Example, data size of 7 bytes: type = RINGBUF_TYPE_DATA len = 2 time_delta: <time-stamp> - <prev_event-time-stamp> array[0..1]: <7 bytes of data> <1 byte empty> This event is saved in 12 bytes of the buffer. An event with 82 bytes of data: type = RINGBUF_TYPE_DATA len = 0 time_delta: <time-stamp> - <prev_event-time-stamp> array[0]: 84 (Note the alignment) array[1..14]: <82 bytes of data> <2 bytes empty> The above event is saved in 92 bytes (if my math is correct). 82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length. Do not reference the above event struct directly. Use the following functions to gain access to the event table, since the ring_buffer_event structure may change in the future. ring_buffer_event_length(event): get the length of the event. This is the size of the memory used to record this event, and not the size of the data pay load. ring_buffer_time_delta(event): get the time delta of the event This returns the delta time stamp since the last event. Note: Even though this is in the header, there should be no reason to access this directly, accept for debugging. ring_buffer_event_data(event): get the data from the event This is the function to use to get the actual data from the event. Note, it is only a pointer to the data inside the buffer. This data must be copied to another location otherwise you risk it being written over in the buffer. ring_buffer_lock: A way to lock the entire buffer. ring_buffer_unlock: unlock the buffer. ring_buffer_alloc: create a new ring buffer. Can choose between overwrite or consumer/producer mode. Overwrite will overwrite old data, where as consumer producer will throw away new data if the consumer catches up with the producer. The consumer/producer is the default. ring_buffer_free: free the ring buffer. ring_buffer_resize: resize the buffer. Changes the size of each cpu buffer. Note, it is up to the caller to provide that the buffer is not being used while this is happening. This requirement may go away but do not count on it. ring_buffer_lock_reserve: locks the ring buffer and allocates an entry on the buffer to write to. ring_buffer_unlock_commit: unlocks the ring buffer and commits it to the buffer. ring_buffer_write: writes some data into the ring buffer. ring_buffer_peek: Look at a next item in the cpu buffer. ring_buffer_consume: get the next item in the cpu buffer and consume it. That is, this function increments the head pointer. ring_buffer_read_start: Start an iterator of a cpu buffer. For now, this disables the cpu buffer, until you issue a finish. This is just because we do not want the iterator to be overwritten. This restriction may change in the future. But note, this is used for static reading of a buffer which is usually done "after" a trace. Live readings would want to use the ring_buffer_consume above, which will not disable the ring buffer. ring_buffer_read_finish: Finishes the read iterator and reenables the ring buffer. ring_buffer_iter_peek: Look at the next item in the cpu iterator. ring_buffer_read: Read the iterator and increment it. ring_buffer_iter_reset: Reset the iterator to point to the beginning of the cpu buffer. ring_buffer_iter_empty: Returns true if the iterator is at the end of the cpu buffer. ring_buffer_size: returns the size in bytes of each cpu buffer. Note, the real size is this times the number of CPUs. ring_buffer_reset_cpu: Sets the cpu buffer to empty ring_buffer_reset: sets all cpu buffers to empty ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a cpu buffer of another buffer. This is handy when you want to take a snap shot of a running trace on just one cpu. Having a backup buffer, to swap with facilitates this. Ftrace max latencies use this. ring_buffer_empty: Returns true if the ring buffer is empty. ring_buffer_empty_cpu: Returns true if the cpu buffer is empty. ring_buffer_record_disable: disable all cpu buffers (read only) ring_buffer_record_disable_cpu: disable a single cpu buffer (read only) ring_buffer_record_enable: enable all cpu buffers. ring_buffer_record_enabl_cpu: enable a single cpu buffer. ring_buffer_entries: The number of entries in a ring buffer. ring_buffer_overruns: The number of entries removed due to writing wrap. ring_buffer_time_stamp: Get the time stamp used by the ring buffer ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp into nanosecs. I still need to implement the GTOD feature. But we need support from the cpu frequency infrastructure. But this can be done at a later time without affecting the ring buffer interface. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
184 lines
5.0 KiB
Plaintext
184 lines
5.0 KiB
Plaintext
#
|
|
# Architectures that offer an FTRACE implementation should select HAVE_FTRACE:
|
|
#
|
|
|
|
config NOP_TRACER
|
|
bool
|
|
|
|
config HAVE_FTRACE
|
|
bool
|
|
select NOP_TRACER
|
|
|
|
config HAVE_DYNAMIC_FTRACE
|
|
bool
|
|
|
|
config HAVE_FTRACE_MCOUNT_RECORD
|
|
bool
|
|
|
|
config TRACER_MAX_TRACE
|
|
bool
|
|
|
|
config RING_BUFFER
|
|
bool
|
|
|
|
config TRACING
|
|
bool
|
|
select DEBUG_FS
|
|
select RING_BUFFER
|
|
select STACKTRACE
|
|
select TRACEPOINTS
|
|
|
|
config FTRACE
|
|
bool "Kernel Function Tracer"
|
|
depends on HAVE_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
select FRAME_POINTER
|
|
select TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
help
|
|
Enable the kernel to trace every kernel function. This is done
|
|
by using a compiler feature to insert a small, 5-byte No-Operation
|
|
instruction to the beginning of every kernel function, which NOP
|
|
sequence is then dynamically patched into a tracer call when
|
|
tracing is enabled by the administrator. If it's runtime disabled
|
|
(the bootup default), then the overhead of the instructions is very
|
|
small and not measurable even in micro-benchmarks.
|
|
|
|
config IRQSOFF_TRACER
|
|
bool "Interrupts-off Latency Tracer"
|
|
default n
|
|
depends on TRACE_IRQFLAGS_SUPPORT
|
|
depends on GENERIC_TIME
|
|
depends on HAVE_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
select TRACE_IRQFLAGS
|
|
select TRACING
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This option measures the time spent in irqs-off critical
|
|
sections, with microsecond accuracy.
|
|
|
|
The default measurement method is a maximum search, which is
|
|
disabled by default and can be runtime (re-)started
|
|
via:
|
|
|
|
echo 0 > /debugfs/tracing/tracing_max_latency
|
|
|
|
(Note that kernel size and overhead increases with this option
|
|
enabled. This option and the preempt-off timing option can be
|
|
used together or separately.)
|
|
|
|
config PREEMPT_TRACER
|
|
bool "Preemption-off Latency Tracer"
|
|
default n
|
|
depends on GENERIC_TIME
|
|
depends on PREEMPT
|
|
depends on HAVE_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This option measures the time spent in preemption off critical
|
|
sections, with microsecond accuracy.
|
|
|
|
The default measurement method is a maximum search, which is
|
|
disabled by default and can be runtime (re-)started
|
|
via:
|
|
|
|
echo 0 > /debugfs/tracing/tracing_max_latency
|
|
|
|
(Note that kernel size and overhead increases with this option
|
|
enabled. This option and the irqs-off timing option can be
|
|
used together or separately.)
|
|
|
|
config SYSPROF_TRACER
|
|
bool "Sysprof Tracer"
|
|
depends on X86
|
|
select TRACING
|
|
help
|
|
This tracer provides the trace needed by the 'Sysprof' userspace
|
|
tool.
|
|
|
|
config SCHED_TRACER
|
|
bool "Scheduling Latency Tracer"
|
|
depends on HAVE_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This tracer tracks the latency of the highest priority task
|
|
to be scheduled in, starting from the point it has woken up.
|
|
|
|
config CONTEXT_SWITCH_TRACER
|
|
bool "Trace process context switches"
|
|
depends on HAVE_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select MARKERS
|
|
help
|
|
This tracer gets called from the context switch and records
|
|
all switching of tasks.
|
|
|
|
config BOOT_TRACER
|
|
bool "Trace boot initcalls"
|
|
depends on HAVE_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
help
|
|
This tracer helps developers to optimize boot times: it records
|
|
the timings of the initcalls. Its aim is to be parsed by the
|
|
/scripts/bootgraph.pl tool to produce pretty graphics about
|
|
boot inefficiencies, giving a visual representation of the
|
|
delays during initcalls. Note that tracers self tests can't
|
|
be enabled if this tracer is selected since only one tracer
|
|
should touch the tracing buffer at a time.
|
|
|
|
config STACK_TRACER
|
|
bool "Trace max stack"
|
|
depends on HAVE_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
select FTRACE
|
|
select STACKTRACE
|
|
help
|
|
This tracer records the max stack of the kernel, and displays
|
|
it in debugfs/tracing/stack_trace
|
|
|
|
config DYNAMIC_FTRACE
|
|
bool "enable/disable ftrace tracepoints dynamically"
|
|
depends on FTRACE
|
|
depends on HAVE_DYNAMIC_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
default y
|
|
help
|
|
This option will modify all the calls to ftrace dynamically
|
|
(will patch them out of the binary image and replaces them
|
|
with a No-Op instruction) as they are called. A table is
|
|
created to dynamically enable them again.
|
|
|
|
This way a CONFIG_FTRACE kernel is slightly larger, but otherwise
|
|
has native performance as long as no tracing is active.
|
|
|
|
The changes to the code are done by a kernel thread that
|
|
wakes up once a second and checks to see if any ftrace calls
|
|
were made. If so, it runs stop_machine (stops all CPUS)
|
|
and modifies the code to jump over the call to ftrace.
|
|
|
|
config FTRACE_MCOUNT_RECORD
|
|
def_bool y
|
|
depends on DYNAMIC_FTRACE
|
|
depends on HAVE_FTRACE_MCOUNT_RECORD
|
|
|
|
config FTRACE_SELFTEST
|
|
bool
|
|
|
|
config FTRACE_STARTUP_TEST
|
|
bool "Perform a startup test on ftrace"
|
|
depends on TRACING && DEBUG_KERNEL && !BOOT_TRACER
|
|
select FTRACE_SELFTEST
|
|
help
|
|
This option performs a series of startup tests on ftrace. On bootup
|
|
a series of tests are made to verify that the tracer is
|
|
functioning properly. It will do tests on all the configured
|
|
tracers of ftrace.
|