kernel-ark

History

David Rientjes 2ff754fa8f mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator Commit `0e093d9976` ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") uncovered a livelock in the page allocator that resulted in tasks infinitely looping trying to find memory and kswapd running at 100% cpu. The issue occurs because drain_all_pages() is called immediately following direct reclaim when no memory is freed and try_to_free_pages() returns non-zero because all zones in the zonelist do not have their all_unreclaimable flag set. When draining the per-cpu pagesets back to the buddy allocator for each zone, the zone->pages_scanned counter is cleared to avoid erroneously setting zone->all_unreclaimable later. The problem is that no pages may actually be drained and, thus, the unreclaimable logic never fails direct reclaim so the oom killer may be invoked. This apparently only manifested after wait_iff_congested() was introduced and the zone was full of anonymous memory that would not congest the backing store. The page allocator would infinitely loop if there were no other tasks waiting to be scheduled and clear zone->pages_scanned because of drain_all_pages() as the result of this change before kswapd could scan enough pages to trigger the reclaim logic. Additionally, with every loop of the page allocator and in the reclaim path, kswapd would be kicked and would end up running at 100% cpu. In this scenario, current and kswapd are all running continuously with kswapd incrementing zone->pages_scanned and current clearing it. The problem is even more pronounced when current swaps some of its memory to swap cache and the reclaimable logic then considers all active anonymous memory in the all_unreclaimable logic, which requires a much higher zone->pages_scanned value for try_to_free_pages() to return zero that is never attainable in this scenario. Before wait_iff_congested(), the page allocator would incur an unconditional timeout and allow kswapd to elevate zone->pages_scanned to a level that the oom killer would be called the next time it loops. The fix is to only attempt to drain pcp pages if there is actually a quantity to be drained. The unconditional clearing of zone->pages_scanned in free_pcppages_bulk() need not be changed since other callers already ensure that draining will occur. This patch ensures that free_pcppages_bulk() will actually free memory before calling into it from drain_all_pages() so zone->pages_scanned is only cleared if appropriate. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2011-01-26 10:50:01 +10:00
..
backing-dev.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-10-26 17:58:44 -07:00
bootmem.c	x86, memblock: Replace e820_/_early string with memblock_	2010-08-27 11:13:47 -07:00
bounce.c	bounce: call flush_dcache_page() after bounce_copy_vec()	2010-09-09 18:57:25 -07:00
compaction.c	mm: compaction: prevent division-by-zero during user-requested compaction	2011-01-20 17:02:05 -08:00
debug-pagealloc.c
dmapool.c	mm/dmapool.c: use TASK_UNINTERRUPTIBLE in dma_pool_alloc()	2011-01-13 17:32:48 -08:00
fadvise.c	readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM	2010-03-06 11:26:25 -08:00
failslab.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
filemap_xip.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
filemap.c	mm: remove likely() from grab_cache_page_write_begin()	2011-01-13 17:32:36 -08:00
fremap.c	Avoid pgoff overflow in remap_file_pages	2010-09-25 09:34:58 -07:00
highmem.c	mm,x86: fix kmap_atomic_push vs ioremap_32.c	2010-10-27 18:03:05 -07:00
huge_memory.c	memcg: fix USED bit handling at uncharge in THP	2011-01-20 17:02:06 -08:00
hugetlb.c	hugetlb: fix handling of parse errors in sysfs	2011-01-13 17:32:49 -08:00
hwpoison-inject.c	HWPOISON, hugetlb: support hwpoison injection for hugepage	2010-08-11 09:23:11 +02:00
init-mm.c	mm: provide init_mm mm_context initializer	2010-08-09 20:44:54 -07:00
internal.h	Revert "mm: batch activate_page() to reduce lock contention"	2011-01-17 14:42:19 -08:00
Kconfig	thp: select CONFIG_COMPACTION if TRANSPARENT_HUGEPAGE enabled	2011-01-13 17:32:45 -08:00
Kconfig.debug	trivial: improve help text for mm debug config options	2009-09-21 15:14:57 +02:00
kmemcheck.c	kmemcheck: Fix build errors due to missing slab.h	2010-03-30 22:02:32 +09:00
kmemleak-test.c	percpu: clean up percpu variable definitions	2009-06-24 15:13:48 +09:00
kmemleak.c	kmemleak: Fix typo in the comment	2010-08-08 21:57:23 +01:00
ksm.c	ksm: drain pagevecs to lru	2011-01-13 17:32:49 -08:00
maccess.c	MN10300: Save frame pointer in thread_info struct rather than global var	2010-10-27 17:29:01 +01:00
madvise.c	thp: khugepaged: make khugepaged aware about madvise	2011-01-13 17:32:47 -08:00
Makefile	thp: transparent hugepage core	2011-01-13 17:32:42 -08:00
memblock.c	memblock: fix memblock_is_region_memory()	2011-01-20 17:02:05 -08:00
memcontrol.c	memcg: correctly order reading PCG_USED and pc->mem_cgroup	2011-01-20 17:02:06 -08:00
memory_hotplug.c	Merge branch 'slub/hotplug' into slab/urgent	2011-01-15 13:28:17 +02:00
memory-failure.c	thp: compound_trans_order	2011-01-13 17:32:47 -08:00
memory.c	thp: add debug checks for mapcount related invariants	2011-01-13 17:32:47 -08:00
mempolicy.c	thp: add numa awareness to hugepage allocations	2011-01-13 17:32:45 -08:00
mempool.c	mm: remove broken 'kzalloc' mempool	2009-09-22 07:17:35 -07:00
migrate.c	memcg: fix memory migration of shmem swapcache	2011-01-13 17:32:51 -08:00
mincore.c	thp: mincore transparent hugepage support	2011-01-13 17:32:44 -08:00
mlock.c	mlock: do not hold mmap_sem for extended periods of time	2011-01-13 17:32:36 -08:00
mm_init.c
mmap.c	brk: fix min_brk lower bound computation for COMPAT_BRK	2011-01-13 17:32:48 -08:00
mmu_context.c	exit: fix oops in sync_mm_rss	2010-03-24 16:31:21 -07:00
mmu_notifier.c	thp: mmu_notifier_test_young	2011-01-13 17:32:46 -08:00
mmzone.c	mm: page allocator: adjust the per-cpu counter threshold when memory is low	2011-01-13 17:32:31 -08:00
mprotect.c	thp: mprotect: transparent huge page support	2011-01-13 17:32:44 -08:00
mremap.c	thp: split_huge_page_mm/vma	2011-01-13 17:32:41 -08:00
msync.c	sanitize vfs_fsync calling conventions	2010-05-21 18:31:21 -04:00
nommu.c	mlock: do not hold mmap_sem for extended periods of time	2011-01-13 17:32:36 -08:00
oom_kill.c	oom: kill all threads sharing oom killed task's mm	2010-10-26 16:52:05 -07:00
page_alloc.c	mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator	2011-01-26 10:50:01 +10:00
page_cgroup.c	kmemleak: Annotate false positive in init_section_page_cgroup()	2010-07-19 11:54:14 +01:00
page_io.c	block: unify flags for struct bio and struct request	2010-08-07 18:20:39 +02:00
page_isolation.c	mm: page_isolation: codeclean fix comment and rm unneeded val init	2010-10-26 16:52:11 -07:00
page-writeback.c	writeback: avoid unnecessary determine_dirtyable_memory call	2011-01-13 17:32:38 -08:00
pagewalk.c	thp: split_huge_page_mm/vma	2011-01-13 17:32:41 -08:00
percpu-km.c	percpu: clear memory allocated with the km allocator	2010-10-02 10:28:42 +03:00
percpu-vm.c	mm: remove gfp mask from pcpu_get_vm_areas	2011-01-13 17:32:34 -08:00
percpu.c	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2011-01-13 10:05:56 -08:00
pgtable-generic.c	mm/pgtable-generic.c: fix CONFIG_SWAP=n build	2011-01-26 10:49:58 +10:00
prio_tree.c
quicklist.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
readahead.c	readahead.c: fix comment	2010-05-25 08:07:00 -07:00
rmap.c	memcg: create extensible page stat update routines	2011-01-13 17:32:50 -08:00
shmem.c	fs: icache RCU free inodes	2011-01-07 17:50:26 +11:00
slab.c	mm/slab.c: make local symbols static	2011-01-15 13:28:36 +02:00
slob.c	kernel: kmem_ptr_validate considered harmful	2011-01-07 17:50:16 +11:00
slub.c	Merge branch 'slub/hotplug' into slab/urgent	2011-01-15 13:28:17 +02:00
sparse-vmemmap.c	tree-wide: fix comment/printk typos	2010-11-01 15:38:34 -04:00
sparse.c	thp: remove PG_buddy	2011-01-13 17:32:43 -08:00
swap_state.c	thp: split_huge_page paging	2011-01-13 17:32:41 -08:00
swap.c	Revert "mm: simplify code of swap.c"	2011-01-17 14:42:34 -08:00
swapfile.c	thp: split_huge_page paging	2011-01-13 17:32:41 -08:00
thrash.c	mm: pass mm to grab_swap_token	2009-06-23 12:50:05 -07:00
truncate.c	mm: fix truncate_setsize() comment	2011-01-20 17:02:06 -08:00
util.c	kernel: kmem_ptr_validate considered harmful	2011-01-07 17:50:16 +11:00
vmalloc.c	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6	2011-01-13 20:15:35 -08:00
vmscan.c	mm: fix deferred congestion timeout if preferred zone is not allowed	2011-01-26 10:50:00 +10:00
vmstat.c	thp: transparent hugepage vmstat	2011-01-13 17:32:43 -08:00