kernel-ark/mm
Li Zefan 55782138e4 tracing/events: convert block trace points to TRACE_EVENT()
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:

  - zero-copy and per-cpu splice() tracing
  - binary tracing without printf overhead
  - structured logging records exposed under /debug/tracing/events
  - trace events embedded in function tracer output and other plugins
  - user-defined, per tracepoint filter expressions
  ...

Cons:

  - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

  - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

  - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

      dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s

So the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.

And the binary output of TRACE_EVENT is much smaller than blktrace:

 # ls -l -h
 -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

Following are some comparisons between TRACE_EVENT and blktrace:

plug:
  kjournald-480   [000]   303.084981: block_plug: [kjournald]
  kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]

unplug_io:
  kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
  kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1

remap:
  kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
  kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384

bio_backmerge:
  kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
  kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]

getrq:
  kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
  kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]

  bash-2066  [001]  1072.953770:   8,0    G   N [bash]
  bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]

rq_complete:
  konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
  konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]

  ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
  ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]

rq_insert:
  kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
  kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]

Changelog from v2 -> v3:

- use the newly introduced __dynamic_array().

Changelog from v1 -> v2:

- use __string() instead of __array() to minimize the memory required
  to store hex dump of rq->cmd().

- support large pc requests.

- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

- some cleanups.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-09 12:34:23 -04:00
..
allocpercpu.c percpu: __percpu_depopulate_mask can take a const mask 2009-04-06 13:44:15 -07:00
backing-dev.c block: change the request allocation/congestion logic to be sync/async based 2009-04-06 08:04:53 -07:00
bootmem.c bootmem, x86: further fixes for arch-specific bootmem wrapping 2009-03-01 16:06:56 +09:00
bounce.c tracing/events: convert block trace points to TRACE_EVENT() 2009-06-09 12:34:23 -04:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c [CVE-2009-0029] System call wrapper special cases 2009-01-14 14:15:18 +01:00
failslab.c kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c 2009-04-03 12:23:01 +02:00
filemap_xip.c mm: do_xip_mapping_read: fix length calculation 2009-04-02 19:04:49 -07:00
filemap.c Export filemap_write_and_wait_range 2009-04-16 07:47:49 -07:00
fremap.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
highmem.c mm: introduce debug_kmap_atomic 2009-04-01 08:59:14 -07:00
hugetlb.c hugetlb: chg cannot become less than 0 2009-04-01 08:59:13 -07:00
internal.h nommu: there is no mlock() for NOMMU, so don't provide the bits 2009-04-01 08:59:14 -07:00
Kconfig nommu: make the initial mmap allocation excess behaviour Kconfig configurable 2009-05-06 16:36:10 -07:00
Kconfig.debug generic debug pagealloc: build fix 2009-04-02 19:04:48 -07:00
maccess.c
madvise.c Revert "Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions" 2009-05-13 08:29:12 -07:00
Makefile generic debug pagealloc 2009-04-01 08:59:13 -07:00
memcontrol.c memcg: fix mem_cgroup_shrink_usage() 2009-05-02 15:36:09 -07:00
memory_hotplug.c mm: remove GFP_HIGHUSER_PAGECACHE 2009-01-06 15:59:01 -08:00
memory.c mm: close page_mkwrite races 2009-05-02 15:36:09 -07:00
mempolicy.c [CVE-2009-0029] System call wrappers part 28 2009-01-14 14:15:30 +01:00
mempool.c
migrate.c FS-Cache: Recruit a page flags for cache management 2009-04-03 16:42:36 +01:00
mincore.c [CVE-2009-0029] System call wrappers part 14 2009-01-14 14:15:24 +01:00
mlock.c x86, bts, mm: clean up buffer allocation 2009-04-24 10:18:52 +02:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mm: fix Committed_AS underflow on large NR_CPUS environment 2009-05-02 15:36:10 -07:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: mark the correct zone as full when scanning zonelists 2008-09-13 14:41:52 -07:00
mprotect.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
mremap.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
msync.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
nommu.c NOMMU: Don't check vm_region::vm_start is page aligned in add_nommu_region() 2009-05-07 12:03:41 -07:00
oom_kill.c oom: prevent livelock when oom_kill_allocating_task is set 2009-05-06 16:36:09 -07:00
page_alloc.c nommu: clamp zone_batchsize() to 0 under NOMMU conditions 2009-05-06 16:36:10 -07:00
page_cgroup.c memcg: remove redundant message at swapon 2009-04-02 19:04:56 -07:00
page_io.c block: fix bad definition of BIO_RW_SYNC 2009-02-18 10:32:00 +01:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c mm: fix proc_dointvec_userhz_jiffies "breakage" 2009-04-01 08:59:13 -07:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c Revert "mm: add /proc controls for pdflush threads" 2009-05-15 11:32:24 +02:00
percpu.c percpu: generalize embedding first chunk setup helper 2009-03-10 16:27:48 +09:00
prio_tree.c
quicklist.c cpumask: replace node_to_cpumask with cpumask_of_node. 2009-03-13 14:49:46 +10:30
readahead.c FS-Cache: Recruit a page flags for cache management 2009-04-03 16:42:36 +01:00
rmap.c mm: fix mlocked page counter mismatch 2009-02-11 14:25:35 -08:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
shmem.c memcg: fix mem_cgroup_shrink_usage() 2009-05-02 15:36:09 -07:00
slab.c tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part 2009-04-12 15:22:55 +02:00
slob.c tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part 2009-04-12 15:22:55 +02:00
slub.c tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part 2009-04-12 15:22:55 +02:00
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c mm: mminit_validate_memmodel_limits(): remove redundant test 2009-04-01 08:59:11 -07:00
swap_state.c memcg: mem+swap controller core 2009-01-08 08:31:05 -08:00
swap.c mm: fix Committed_AS underflow on large NR_CPUS environment 2009-05-02 15:36:10 -07:00
swapfile.c PM/hibernate: fix "swap breaks after hibernation failures" 2009-02-21 14:17:17 -08:00
thrash.c
truncate.c FS-Cache: Recruit a page flags for cache management 2009-04-03 16:42:36 +01:00
util.c Merge branch 'linus' into tracing/core 2009-05-07 11:17:34 +02:00
vmalloc.c alloc_vmap_area: fix memory leak 2009-05-06 16:36:10 -07:00
vmscan.c vmscan: avoid multiplication overflow in shrink_zone() 2009-05-02 15:36:10 -07:00
vmstat.c mm: align vmstat_work's timer 2009-04-02 19:04:48 -07:00