kernel-ark/fs
Tetsuo Handa 78ebc2f714 mm,writeback: don't use memory reserves for wb_start_writeback
When writeback operation cannot make forward progress because memory
allocation requests needed for doing I/O cannot be satisfied (e.g.
under OOM-livelock situation), we can observe flood of order-0 page
allocation failure messages caused by complete depletion of memory
reserves.

This is caused by unconditionally allocating "struct wb_writeback_work"
objects using GFP_ATOMIC from PF_MEMALLOC context.

__alloc_pages_nodemask() {
  __alloc_pages_slowpath() {
    __alloc_pages_direct_reclaim() {
      __perform_reclaim() {
        current->flags |= PF_MEMALLOC;
        try_to_free_pages() {
          do_try_to_free_pages() {
            wakeup_flusher_threads() {
              wb_start_writeback() {
                kzalloc(sizeof(*work), GFP_ATOMIC) {
                  /* ALLOC_NO_WATERMARKS via PF_MEMALLOC */
                }
              }
            }
          }
        }
        current->flags &= ~PF_MEMALLOC;
      }
    }
  }
}

Since I/O is stalling, allocating writeback requests forever shall
deplete memory reserves.  Fortunately, since wb_start_writeback() can
fall back to wb_wakeup() when allocating "struct wb_writeback_work"
failed, we don't need to allow wb_start_writeback() to use memory
reserves.

  Mem-Info:
  active_anon:289393 inactive_anon:2093 isolated_anon:29
   active_file:10838 inactive_file:113013 isolated_file:859
   unevictable:0 dirty:108531 writeback:5308 unstable:0
   slab_reclaimable:5526 slab_unreclaimable:7077
   mapped:9970 shmem:2159 pagetables:2387 bounce:0
   free:3042 free_pcp:0 free_cma:0
  Node 0 DMA free:6968kB min:44kB low:52kB high:64kB active_anon:6056kB inactive_anon:176kB active_file:712kB inactive_file:744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:208kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9708 all_unreclaimable? yes
  lowmem_reserve[]: 0 1732 1732 1732
  Node 0 DMA32 free:5200kB min:5200kB low:6500kB high:7800kB active_anon:1151516kB inactive_anon:8196kB active_file:42640kB inactive_file:451076kB unevictable:0kB isolated(anon):116kB isolated(file):3564kB present:2080640kB managed:1775332kB mlocked:0kB dirty:433368kB writeback:21232kB mapped:39144kB shmem:8452kB slab_reclaimable:22056kB slab_unreclaimable:28100kB kernel_stack:20976kB pagetables:9404kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2701604 all_unreclaimable? no
  lowmem_reserve[]: 0 0 0 0
  Node 0 DMA: 25*4kB (UME) 16*8kB (UME) 3*16kB (UE) 5*32kB (UME) 2*64kB (UM) 2*128kB (ME) 2*256kB (ME) 1*512kB (E) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 6964kB
  Node 0 DMA32: 925*4kB (UME) 140*8kB (UME) 5*16kB (ME) 5*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5060kB
  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
  126847 total pagecache pages
  0 pages in swap cache
  Swap cache stats: add 0, delete 0, find 0/0
  Free swap  = 0kB
  Total swap = 0kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  76348 pages reserved
  0 pages hwpoisoned
  Out of memory: Kill process 4450 (file_io.00) score 998 or sacrifice child
  Killed process 4450 (file_io.00) total-vm:4308kB, anon-rss:100kB, file-rss:1184kB, shmem-rss:0kB
  kthreadd: page allocation failure: order:0, mode:0x2200020
  file_io.00: page allocation failure: order:0, mode:0x2200020
  CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
  Call Trace:
    warn_alloc_failed+0xf7/0x150
    __alloc_pages_nodemask+0x23f/0xa60
    alloc_pages_current+0x87/0x110
    new_slab+0x3a1/0x440
    ___slab_alloc+0x3cf/0x590
    __slab_alloc.isra.64+0x18/0x1d
    kmem_cache_alloc+0x11c/0x150
    wb_start_writeback+0x39/0x90
    wakeup_flusher_threads+0x7f/0xf0
    do_try_to_free_pages+0x1f9/0x410
    try_to_free_pages+0x94/0xc0
    __alloc_pages_nodemask+0x566/0xa60
    alloc_pages_current+0x87/0x110
    __page_cache_alloc+0xaf/0xc0
    pagecache_get_page+0x88/0x260
    grab_cache_page_write_begin+0x21/0x40
    xfs_vm_write_begin+0x2f/0xf0
    generic_perform_write+0xca/0x1c0
    xfs_file_buffered_aio_write+0xcc/0x1f0
    xfs_file_write_iter+0x84/0x140
    __vfs_write+0xc7/0x100
    vfs_write+0x9d/0x190
    SyS_write+0x50/0xc0
    entry_SYSCALL_64_fastpath+0x12/0x6a
  Mem-Info:
  active_anon:293335 inactive_anon:2093 isolated_anon:0
   active_file:10829 inactive_file:110045 isolated_file:32
   unevictable:0 dirty:109275 writeback:822 unstable:0
   slab_reclaimable:5489 slab_unreclaimable:10070
   mapped:9999 shmem:2159 pagetables:2420 bounce:0
   free:3 free_pcp:0 free_cma:0
  Node 0 DMA free:12kB min:44kB low:52kB high:64kB active_anon:6060kB inactive_anon:176kB active_file:708kB inactive_file:756kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:7160kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9844 all_unreclaimable? yes
  lowmem_reserve[]: 0 1732 1732 1732
  Node 0 DMA32 free:0kB min:5200kB low:6500kB high:7800kB active_anon:1167280kB inactive_anon:8196kB active_file:42608kB inactive_file:439424kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:2080640kB managed:1775332kB mlocked:0kB dirty:436344kB writeback:3288kB mapped:39260kB shmem:8452kB slab_reclaimable:21908kB slab_unreclaimable:33120kB kernel_stack:20976kB pagetables:9536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11073180 all_unreclaimable? yes
  lowmem_reserve[]: 0 0 0 0
  Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
  Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
  123086 total pagecache pages
  0 pages in swap cache
  Swap cache stats: add 0, delete 0, find 0/0
  Free swap  = 0kB
  Total swap = 0kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  76348 pages reserved
  0 pages hwpoisoned
  SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
    cache: kmalloc-64, object size: 64, buffer size: 64, default order: 0, min order: 0
    node 0: slabs: 3218, objs: 205952, free: 0
  file_io.00: page allocation failure: order:0, mode:0x2200020
  CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45

Assuming that somebody will find a better solution, let's apply this
patch for now to stop bleeding, for this problem frequently prevents me
from testing OOM livelock condition.

Link: http://lkml.kernel.org/r/20160318131136.GE7152@quack.suse.cz
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
..
9p Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
adfs
affs Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-05-17 16:26:30 -07:00
autofs4 dcache_{readdir,dir_lseek}() users: switch to ->iterate_shared 2016-05-02 19:49:32 -04:00
befs befs: switch to ->iterate_shared() 2016-05-10 14:24:57 -04:00
bfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
btrfs Merge branch 'work.lookups' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 10:28:45 -07:00
cachefiles mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 10:08:45 -07:00
cifs Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-05-19 09:21:36 -07:00
coda introduce a parallel variant of ->iterate() 2016-05-02 19:49:29 -04:00
configfs configfs_readdir(): make safe under shared lock 2016-05-09 11:41:13 -04:00
cramfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
crypto ext4/fscrypto: avoid RCU lookup in d_revalidate 2016-04-12 20:01:35 -07:00
debugfs debugfs: Make automount point inodes permanently empty 2016-04-12 15:01:53 -07:00
devpts devpts: more pty driver interface cleanups 2016-04-26 15:47:32 -07:00
dlm mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
ecryptfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:51:59 -07:00
efivarfs efivarfs: Make efivarfs_file_ioctl() static 2016-05-07 07:06:13 +02:00
efs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
exofs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2016-05-17 17:05:30 -07:00
exportfs introduce a parallel variant of ->iterate() 2016-05-02 19:49:29 -04:00
ext2 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
ext4 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
f2fs Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
fat Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
freevxfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
fscache mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
fuse Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 10:08:45 -07:00
hfs Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
hfsplus Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
hostfs hostfs: switch to ->iterate_shared() 2016-05-12 19:49:30 -04:00
hpfs hpfs: switch to ->iterate_shared() 2016-05-12 19:47:13 -04:00
hugetlbfs mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
isofs Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
jbd2 Merge branch 'master' into for-next 2016-04-18 11:18:55 +02:00
jffs2 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 10:08:45 -07:00
kernfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 11:01:31 -07:00
lockd
logfs logfs: no need to lock directory in lseek 2016-05-09 11:42:19 -04:00
minix simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
ncpfs mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
nfs Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-05-19 09:21:36 -07:00
nfs_common
nfsd Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
nilfs2 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
nls
notify fsnotify: avoid spurious EMFILE errors from inotify_init() 2016-05-19 19:12:14 -07:00
ntfs fs: simplify the generic_write_sync prototype 2016-05-01 19:58:39 -04:00
ocfs2 ocfs2: clean up an unneeded goto in ocfs2_put_slot() 2016-05-19 19:12:14 -07:00
omfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
openpromfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
orangefs orangefs: don't open-code inode_lock/inode_unlock 2016-05-02 19:47:23 -04:00
overlayfs Merge branch 'ovl-fixes' into for-linus 2016-05-17 02:17:59 -04:00
proc Merge branch 'akpm' (patches from Andrew) 2016-05-19 20:00:06 -07:00
pstore mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
qnx4 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
qnx6 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
quota fs/quota: use nla_put_u64_64bit() 2016-04-26 12:00:48 -04:00
ramfs tmpfs/ramfs: fix VM_MAYSHARE mappings for NOMMU 2016-05-20 17:58:30 -07:00
reiserfs Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
romfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
squashfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
sysfs
sysv simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
tracefs
ubifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 10:08:45 -07:00
udf Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
ufs simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
xfs Merge branch 'work.lookups' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 10:28:45 -07:00
aio.c aio: remove a pointless assignment 2016-04-03 19:51:33 -04:00
anon_inodes.c
attr.c
bad_inode.c ->getxattr(): pass dentry and inode as separate arguments 2016-04-11 00:48:00 -04:00
binfmt_aout.c
binfmt_elf_fdpic.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:51:59 -07:00
binfmt_elf.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:51:59 -07:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c fs: simplify the generic_write_sync prototype 2016-05-01 19:58:39 -04:00
buffer.c mm, page_alloc: avoid looking up the first zone in a zonelist twice 2016-05-19 19:12:14 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c give readdir(2)/getdents(2)/etc. uniform exclusion with lseek() 2016-05-02 19:49:28 -04:00
coredump.c coredump: only charge written data against RLIMIT_CORE 2016-05-12 16:55:50 -04:00
dax.c direct-io: eliminate the offset argument to ->direct_IO 2016-05-01 19:58:39 -04:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:51:59 -07:00
dcookies.c
direct-io.c fs: simplify the generic_write_sync prototype 2016-05-01 19:58:39 -04:00
drop_caches.c
eventfd.c eventfd: document lockless access in eventfd_poll 2016-03-22 15:36:02 -07:00
eventpoll.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
exec.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-05-19 09:21:36 -07:00
fcntl.c
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
file_table.c
file.c give readdir(2)/getdents(2)/etc. uniform exclusion with lseek() 2016-05-02 19:49:28 -04:00
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c mm,writeback: don't use memory reserves for wb_start_writeback 2016-05-20 17:58:30 -07:00
inode.c parallel lookups: actual switch to rwsem 2016-05-02 19:49:28 -04:00
internal.h
ioctl.c
Kconfig Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux 2016-03-26 12:59:04 -07:00
Kconfig.binfmt
libfs.c more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
locks.c
Makefile Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux 2016-03-26 12:59:04 -07:00
mbcache.c
mount.h
mpage.c mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
namei.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-05-19 09:21:36 -07:00
namespace.c
no-block.c
nsfs.c
open.c Merge branch 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 14:41:03 -07:00
pipe.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
pnode.c propogate_mnt: Handle the first propogated copy being a slave 2016-05-05 09:54:45 -05:00
pnode.h
posix_acl.c xattr_handler: pass dentry and inode as separate arguments of ->get() 2016-04-10 20:48:24 -04:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-03-16 13:09:08 -04:00
read_write.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-18 11:46:23 -07:00
readdir.c introduce a parallel variant of ->iterate() 2016-05-02 19:49:29 -04:00
select.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
seq_file.c Make file credentials available to the seqfile interfaces 2016-04-14 12:56:09 -07:00
signalfd.c
splice.c Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
stack.c
stat.c
statfs.c
super.c Merge branch 'master' into for-next 2016-04-18 11:18:55 +02:00
sync.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
timerfd.c
userfaultfd.c
utimes.c
xattr.c ->getxattr(): pass dentry and inode as separate arguments 2016-04-11 00:48:00 -04:00