kernel-ark/mm
Jens Axboe 03ba3782e8 writeback: switch to per-bdi threads for flushing data
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  1      0 608848   2652 375372    0    0     0 71024  604    24  1 10 48 42
 0  1      0 549644   2712 433736    0    0     0 60692  505    27  1  8 48 44
 1  0      0 476928   2784 505192    0    0     4 29540  553    24  0  9 53 37
 0  1      0 457972   2808 524008    0    0     0 54876  331    16  0  4 38 58
 0  1      0 366128   2928 614284    0    0     4 92168  710    58  0 13 53 34
 0  1      0 295092   3000 684140    0    0     0 62924  572    23  0  9 53 37
 0  1      0 236592   3064 741704    0    0     4 58256  523    17  0  8 48 44
 0  1      0 165608   3132 811464    0    0     0 57460  560    21  0  8 54 38
 0  1      0 102952   3200 873164    0    0     4 74748  540    29  1 10 48 41
 0  1      0  48604   3252 926472    0    0     0 53248  469    29  0  7 47 45

where vanilla tends to fluctuate a lot in the creation phase:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  1      0 678716   5792 303380    0    0     0 74064  565    50  1 11 52 36
 1  0      0 662488   5864 319396    0    0     4   352  302   329  0  2 47 51
 0  1      0 599312   5924 381468    0    0     0 78164  516    55  0  9 51 40
 0  1      0 519952   6008 459516    0    0     4 78156  622    56  1 11 52 37
 1  1      0 436640   6092 541632    0    0     0 82244  622    54  0 11 48 41
 0  1      0 436640   6092 541660    0    0     0     8  152    39  0  0 51 49
 0  1      0 332224   6200 644252    0    0     4 102800  728    46  1 13 49 36
 1  0      0 274492   6260 701056    0    0     4 12328  459    49  0  7 50 43
 0  1      0 211220   6324 763356    0    0     0 106940  515    37  1 10 51 39
 1  0      0 160412   6376 813468    0    0     0  8224  415    43  0  6 49 45
 1  1      0  85980   6452 886556    0    0     4 113516  575    39  1 11 54 34
 0  2      0  85968   6452 886620    0    0     0  1640  158   211  0  0 46 54

A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.

A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:25 +02:00
..
allocpercpu.c percpu: __percpu_depopulate_mask can take a const mask 2009-04-06 13:44:15 -07:00
backing-dev.c writeback: switch to per-bdi threads for flushing data 2009-09-11 09:20:25 +02:00
bootmem.c kmemleak: Add callbacks to the bootmem allocator 2009-07-08 14:25:14 +01:00
bounce.c block: remove some includings of blktrace_api.h 2009-06-16 11:19:36 +02:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapools: protect page_list walk in show_pools() 2009-06-30 18:56:00 -07:00
fadvise.c readahead: move max_sane_readahead() calls into force_page_cache_readahead() 2009-06-16 19:47:28 -07:00
failslab.c kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c 2009-04-03 12:23:01 +02:00
filemap_xip.c mm: do_xip_mapping_read: fix length calculation 2009-04-02 19:04:49 -07:00
filemap.c mm: mark page accessed before we write_end() 2009-07-06 13:57:03 -07:00
fremap.c
highmem.c block: remove some includings of blktrace_api.h 2009-06-16 11:19:36 +02:00
hugetlb.c hugetlbfs: fix i_blocks accounting 2009-07-29 19:10:35 -07:00
init-mm.c mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
internal.h vmscan: do not unconditionally treat zones that fail zone_reclaim() as full 2009-06-16 19:47:45 -07:00
Kconfig Security/SELinux: seperate lsm specific mmap_min_addr 2009-08-17 15:09:11 +10:00
Kconfig.debug kmemcheck: enable in the x86 Kconfig 2009-06-15 15:49:15 +02:00
kmemcheck.c kmemcheck: add hooks for the page allocator 2009-06-15 15:48:33 +02:00
kmemleak-test.c kmemleak: Simple testing module for kmemleak 2009-06-11 17:04:19 +01:00
kmemleak.c kmemleak: Protect the seq start/next/stop sequence by rcu_read_lock() 2009-07-29 12:34:58 -07:00
maccess.c [S390] maccess: add weak attribute to probe_kernel_write 2009-06-12 10:27:37 +02:00
madvise.c mm: madvise(): correct return code 2009-06-16 19:47:40 -07:00
Makefile Merge branch 'akpm' 2009-06-16 19:50:13 -07:00
memcontrol.c cgroup avoid permanent sleep at rmdir 2009-07-29 19:10:35 -07:00
memory_hotplug.c page-allocator: reset wmark_min and inactive ratio of zone when hotplug happens 2009-06-16 19:47:42 -07:00
memory.c mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() 2009-07-27 12:10:38 -07:00
mempolicy.c mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware 2009-08-07 10:39:55 -07:00
mempool.c mempool.c: clean up type-casting 2009-08-10 08:31:16 -07:00
migrate.c migration: only migrate_prep() once per move_pages() 2009-06-16 19:47:41 -07:00
mincore.c
mlock.c mm: remove CONFIG_UNEVICTABLE_LRU config option 2009-06-16 19:47:42 -07:00
mm_init.c
mmap.c Security/SELinux: seperate lsm specific mmap_min_addr 2009-08-17 15:09:11 +10:00
mmu_notifier.c
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c perf_counter: Add mmap event hooks to mprotect() 2009-06-08 23:10:43 +02:00
mremap.c
msync.c
nommu.c nommu: fix error handling in do_mmap_pgoff() 2009-09-05 11:30:42 -07:00
oom_kill.c mm: revert "oom: move oom_adj value" 2009-08-18 16:31:13 -07:00
page_alloc.c page-allocator: always change pageblock ownership when anti-fragmentation is disabled 2009-09-05 11:30:42 -07:00
page_cgroup.c memcg: remove some redundant checks 2009-06-18 13:03:47 -07:00
page_io.c mm: remove file argument from swap_readpage() 2009-06-16 19:47:44 -07:00
page_isolation.c
page-writeback.c writeback: switch to per-bdi threads for flushing data 2009-09-11 09:20:25 +02:00
pagewalk.c
pdflush.c Revert "mm: add /proc controls for pdflush threads" 2009-05-15 11:32:24 +02:00
percpu.c percpu: don't assume existence of cpu0 2009-09-01 21:23:18 +09:00
prio_tree.c
quicklist.c cpumask: replace node_to_cpumask with cpumask_of_node. 2009-03-13 14:49:46 +10:30
readahead.c readahead: introduce context readahead algorithm 2009-06-16 19:47:30 -07:00
rmap.c mm: fix for infinite churning of mlocked pages 2009-08-26 20:06:52 -07:00
shmem_acl.c shmfs: use 'check_acl' instead of 'permission' 2009-09-08 11:08:46 -07:00
shmem.c shmfs: use 'check_acl' instead of 'permission' 2009-09-08 11:08:46 -07:00
slab.c SLAB: Fix lockdep annotations 2009-06-29 09:57:10 +03:00
slob.c fix RCU-callback-after-kmem_cache_destroy problem in sl[aou]b 2009-06-26 12:10:47 +03:00
slub.c slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU 2009-09-03 22:38:59 +03:00
sparse-vmemmap.c
sparse.c mm: mminit_validate_memmodel_limits(): remove redundant test 2009-04-01 08:59:11 -07:00
swap_state.c mm: remove file argument from swap_readpage() 2009-06-16 19:47:44 -07:00
swap.c mm: fix Committed_AS underflow on large NR_CPUS environment 2009-05-02 15:36:10 -07:00
swapfile.c PM / Hibernate: Replace bdget call with simple atomic_inc of i_count 2009-07-29 21:07:55 +02:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c mm: remove __invalidate_mapping_pages variant 2009-06-16 19:47:43 -07:00
util.c Merge branches 'slab/documentation', 'slab/fixes', 'slob/cleanups' and 'slub/fixes' into for-linus 2009-06-17 08:30:15 +03:00
vmalloc.c Merge branch 'for-linus' of git://linux-arm.org/linux-2.6 2009-06-11 14:15:57 -07:00
vmscan.c writeback: switch to per-bdi threads for flushing data 2009-09-11 09:20:25 +02:00
vmstat.c vmscan: count the number of times zone_reclaim() scans and fails 2009-06-16 19:47:46 -07:00