kernel-ark/fs
Fengguang Wu 8bc3be2751 writeback: speed up writeback of big dirty files
After making dirty a 100M file, the normal behavior is to start the
writeback for all data after 30s delays.  But sometimes the following
happens instead:

	- after 30s:    ~4M
	- after 5s:     ~4M
	- after 5s:     all remaining 92M

Some analyze shows that the internal io dispatch queues goes like this:

		s_io            s_more_io
		-------------------------
	1)	100M,1K         0
	2)	1K              96M
	3)	0               96M
1) initial state with a 100M file and a 1K file

2) 4M written, nr_to_write <= 0, so write more

3) 1K written, nr_to_write > 0, no more writes(BUG)

nr_to_write > 0 in (3) fools the upper layer to think that data have all
been written out.  The big dirty file is actually still sitting in
s_more_io.  We cannot simply splice s_more_io back to s_io as soon as s_io
becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
may starve newly expired inodes in s_dirty.  It is also not an option to
draw inodes from both s_more_io and s_dirty, an let the loop go on: this
might lead to live locks, and might also starve other superblocks in sync
time(well kupdate may still starve some superblocks, that's another bug).

We have to return when a full scan of s_io completes.  So nr_to_write > 0
does not necessarily mean that "all data are written".  This patch
introduces a flag writeback_control.more_io to indicate that more io should
be done.  With it the big dirty file no longer has to wait for the next
kupdate invokation 5s later.

In sync_sb_inodes() we only set more_io on super_blocks we actually
visited.  This avoids the interaction between two pdflush deamons.

Also in __sync_single_inode() we don't blindly keep requeuing the io if the
filesystem cannot progress.  Failing to do so may lead to 100% iowait.

Tested-by: Mike Snitzer <snitzer@gmail.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:19 -08:00
..
9p
adfs
affs
afs vfs: Add 64 bit i_version support 2008-01-28 23:58:27 -05:00
autofs
autofs4
befs fs/: Spelling fixes 2008-02-03 17:33:42 +02:00
bfs
cifs Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
coda
configfs configfs: file.c fix possible recursive locking 2008-01-25 15:05:47 -08:00
cramfs
debugfs Kobject: convert fs/* from kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
devpts
dlm dlm: static initialization improvements 2008-01-30 11:04:43 -06:00
ecryptfs Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
efs
exportfs
ext2 ext2: Fix the max file size for ext2 file system. 2008-01-28 23:58:26 -05:00
ext3 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
ext4 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
fat
freevxfs fs/: Spelling fixes 2008-02-03 17:33:42 +02:00
fuse Kobject: convert fs/* from kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
gfs2 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
hfs
hfsplus
hostfs
hpfs
hppfs
hugetlbfs hugetlb: allow sticky directory mount option 2008-02-05 09:44:14 -08:00
isofs
jbd spinlock: lockbreak cleanup 2008-01-30 13:31:20 +01:00
jbd2 spinlock: lockbreak cleanup 2008-01-30 13:31:20 +01:00
jffs2 Typoes: "whith" -> "with" 2008-02-03 15:14:02 +02:00
jfs Spelling fixes: lenght->length 2008-02-03 15:42:53 +02:00
lockd NLM: tear down RPC clients in nlm_shutdown_hosts 2008-02-01 16:42:15 -05:00
minix
msdos
ncpfs vm audit: add VM_DONTEXPAND to mmap for drivers that need it 2008-02-04 07:55:38 -08:00
nfs Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
nfs_common
nfsd nfsd: more careful input validation in nfsctl write methods 2008-02-01 16:42:15 -05:00
nls
ntfs is_vmalloc_addr(): Check if an address is within the vmalloc boundaries 2008-02-05 09:44:14 -08:00
ocfs2 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
openpromfs
partitions Kobject: convert fs/* from kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
proc Fix /proc dcache deadlock in do_exit 2008-02-05 09:44:18 -08:00
qnx4
ramfs
reiserfs Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
romfs
smbfs Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-02-01 11:45:47 +11:00
sysfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2008-01-25 17:19:08 -08:00
sysv
udf
ufs
vfat
xfs is_vmalloc_addr(): Check if an address is within the vmalloc boundaries 2008-02-05 09:44:14 -08:00
aio.c core: remove last users of empty FASTCALL macro 2008-01-30 13:31:17 +01:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c fs/binfmt_elf.c: spello fix 2008-02-03 18:05:15 +02:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio.c __bio_clone: don't calculate hw/phys segment counts 2008-01-28 10:04:46 +01:00
block_dev.c
buffer.c bufferhead: revert constructor removal 2008-02-05 09:44:14 -08:00
char_dev.c Kobject: rename kobject_init_ng() to kobject_init() 2008-01-24 20:40:38 -08:00
compat_binfmt_elf.c x86: compat_binfmt_elf 2008-01-30 13:31:46 +01:00
compat_ioctl.c remove __attribute_used__ 2008-01-28 23:21:18 +01:00
compat.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
dcache.c
dcookies.c
direct-io.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
dnotify.c
dquot.c
drop_caches.c
eventfd.c
eventpoll.c lockdep: annotate epoll 2008-02-05 09:44:07 -08:00
exec.c exec: rework the group exit and fix the race with kill 2008-02-05 09:44:07 -08:00
fcntl.c
fifo.c
file_table.c
file.c
filesystems.c
fs-writeback.c writeback: speed up writeback of big dirty files 2008-02-05 09:44:19 -08:00
generic_acl.c
inode.c ext4: Add inode version support in ext4 2008-01-28 23:58:27 -05:00
inotify_user.c
inotify.c
internal.h
ioctl.c
ioprio.c cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions 2008-01-28 11:38:15 +01:00
Kconfig nfsd: select CONFIG_PROC_FS in nfsv4 and gss server cases 2008-02-01 16:42:04 -05:00
Kconfig.binfmt x86: compat_binfmt_elf Kconfig 2008-01-30 13:31:46 +01:00
libfs.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
locks.c pid-namespaces-vs-locks-interaction 2008-02-03 17:51:36 -05:00
Makefile x86: compat_binfmt_elf Kconfig 2008-01-30 13:31:46 +01:00
mbcache.c
mpage.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
namei.c
namespace.c
nfsctl.c
no-block.c
open.c
pipe.c
pnode.c
pnode.h
posix_acl.c
quota_v1.c
quota_v2.c
quota.c
read_write.c ext4: export iov_shorten from kernel for ext4's use 2008-01-28 23:58:27 -05:00
read_write.h
readdir.c
select.c
seq_file.c
signalfd.c Fix a small number of "memeber" typoes. 2008-02-03 15:12:15 +02:00
splice.c splice: always updated atime in direct splice 2008-02-01 09:26:32 +01:00
stack.c
stat.c
super.c
sync.c
timerfd.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
utimes.c
xattr_acl.c
xattr.c