kernel-ark/mm
Joonsoo Kim ad2c814441 topology: add support for node_to_mem_node() to determine the fallback node
Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
on ppc LPARs with memoryless nodes, a large amount of memory was consumed
by slabs and was marked unreclaimable.  He tracked it down to slab
deactivations in the SLUB core when we allocate remotely, leading to poor
efficiency always when memoryless nodes are present.

After much discussion, Joonsoo provided a few patches that help
significantly.  They don't resolve the problem altogether:

 - memory hotplug still needs testing, that is when a memoryless node
   becomes memory-ful, we want to dtrt
 - there are other reasons for going off-node than memoryless nodes,
   e.g., fully exhausted local nodes

Neither case is resolved with this series, but I don't think that should
block their acceptance, as they can be explored/resolved with follow-on
patches.

The series consists of:

[1/3] topology: add support for node_to_mem_node() to determine the
      fallback node

[2/3] slub: fallback to node_to_mem_node() node if allocating on
      memoryless node

      - Joonsoo's patches to cache the nearest node with memory for each
        NUMA node

[3/3] Partial revert of 81c98869fa (""kthread: ensure locality of
      task_struct allocations")

 - At Tejun's request, keep the knowledge of memoryless node fallback
   to the allocator core.

This patch (of 3):

We need to determine the fallback node in slub allocator if the allocation
target node is memoryless node.  Without it, the SLUB wrongly select the
node which has no memory and can't use a partial slab, because of node
mismatch.  Introduced function, node_to_mem_node(X), will return a node Y
with memory that has the nearest distance.  If X is memoryless node, it
will return nearest distance node, but, if X is normal node, it will
return itself.

We will use this function in following patch to determine the fallback
node.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:51 -04:00
..
backing-dev.c arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
balloon_compaction.c mm: print more details for bad_page() 2014-01-23 16:36:50 -08:00
bootmem.c mm/bootmem.c: remove unused local `map' 2013-11-13 12:09:09 +09:00
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
cma.c mm, CMA: clean-up log message 2014-08-06 18:01:16 -07:00
compaction.c mm, compaction: properly signal and act upon lock and need_sched() contention 2014-06-04 16:54:11 -07:00
debug-pagealloc.c
dmapool.c Fix unbalanced mutex in dma_pool_create(). 2014-09-18 10:39:16 -07:00
early_ioremap.c mm: create generic early_ioremap() support 2014-04-07 16:36:15 -07:00
fadvise.c
failslab.c
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
filemap.c NFS client updates for Linux 3.18 2014-10-08 12:49:23 -04:00
fremap.c mm: mark remap_file_pages() syscall as deprecated 2014-06-06 16:08:17 -07:00
frontswap.c swap: change swap_list_head to plist, add swap_avail_head 2014-06-04 16:54:07 -07:00
gup.c kvm: Faults which trigger IO release the mmap_sem 2014-09-24 14:07:54 +02:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c mm: numa: Do not mark PTEs pte_numa when splitting huge pages 2014-10-02 11:57:18 -07:00
hugetlb_cgroup.c hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked 2014-08-29 16:28:16 -07:00
hugetlb.c mm: fix potential infinite loop in dissolve_free_huge_pages() 2014-08-06 18:01:21 -07:00
hwpoison-inject.c mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive 2014-08-06 18:01:19 -07:00
init-mm.c
internal.h mm/internal.h: use nth_page 2014-08-06 18:01:16 -07:00
interval_tree.c
iov_iter.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
Kconfig mm/zpool: update zswap to use zpool 2014-08-06 18:01:23 -07:00
Kconfig.debug
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c mm: introduce kmemleak_update_trace() 2014-06-06 16:08:17 -07:00
ksm.c sched: Remove proliferation of wait_on_bit() action functions 2014-07-16 15:10:39 +02:00
list_lru.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
maccess.c
madvise.c mm: update the description for madvise_remove 2014-08-06 18:01:18 -07:00
Makefile mm: Support compiling out madvise and fadvise 2014-08-17 19:44:24 -05:00
memblock.c mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() 2014-09-10 15:42:12 -07:00
memcontrol.c mm: memcontrol: do not iterate uninitialized memcgs 2014-10-02 16:28:44 -07:00
memory_hotplug.c memory-hotplug: add zone_for_memory() for selecting zone for new memory 2014-08-06 18:01:21 -07:00
memory-failure.c hwpoison: fix race with changing page during offlining 2014-08-06 18:01:19 -07:00
memory.c mm: softdirty: keep bit when zapping file pte 2014-09-26 08:10:35 -07:00
mempolicy.c Merge branch 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-07-10 11:38:23 -07:00
mempool.c mm/mempool.c: update the kmemleak stack trace for mempool allocations 2014-06-06 16:08:17 -07:00
migrate.c mm: migrate: Close race between migration completion and mprotect 2014-10-02 11:57:18 -07:00
mincore.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
mlock.c mm: describe mmap_sem rules for __lock_page_or_retry() and callers 2014-08-06 18:01:20 -07:00
mm_init.c mm: bring back /sys/kernel/mm 2014-01-27 21:02:39 -08:00
mmap.c mm/mmap.c: use pr_emerg when printing BUG related information 2014-09-10 15:42:12 -07:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c kvm: Fix page ageing bugs 2014-09-24 14:07:58 +02:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c mm: move mmu notifier call from change_protection to change_pmd_range 2014-04-07 16:35:50 -07:00
mremap.c mm, thp: close race between mremap() and split_huge_page() 2014-05-11 17:55:48 +09:00
msync.c msync: fix incorrect fstart calculation 2014-07-03 09:21:53 -07:00
nobootmem.c mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() 2014-09-10 15:42:12 -07:00
nommu.c arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area 2014-08-08 15:57:27 -07:00
oom_kill.c mm, oom: remove unnecessary exit_state check 2014-08-06 18:01:21 -07:00
page_alloc.c topology: add support for node_to_mem_node() to determine the fallback node 2014-10-09 22:25:51 -04:00
page_cgroup.c mm/page_cgroup.c: mark functions as static 2014-04-03 16:21:02 -07:00
page_io.c fix __swap_writepage() compile failure on old gcc versions 2014-06-14 19:30:48 -05:00
page_isolation.c mm: memory-hotplug: enable memory hotplug to handle hugepage 2013-09-11 15:57:48 -07:00
page-writeback.c mm, writeback: prevent race when calculating dirty limits 2014-08-06 18:01:21 -07:00
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c percpu: perform tlb flush after pcpu_map_pages() failure 2014-08-15 16:06:10 -04:00
percpu.c percpu: free percpu allocation info for uniprocessor system 2014-08-16 08:59:02 -04:00
pgtable-generic.c mm: actually clear pmd_numa before invalidating 2014-08-29 16:28:15 -07:00
process_vm_access.c start adding the tag to iov_iter 2014-05-06 17:32:49 -04:00
quicklist.c
readahead.c mm/readahead.c: remove unused file_ra_state from count_history_pages 2014-08-06 18:01:15 -07:00
rmap.c kvm: Fix page ageing bugs 2014-09-24 14:07:58 +02:00
shmem.c shmem: fix nlink for rename overwrite directory 2014-09-26 21:16:42 -04:00
slab_common.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
slab.c mm/slab: factor out unlikely part of cache_free_alien() 2014-10-09 22:25:51 -04:00
slab.h mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
slob.c mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller() 2014-10-09 22:25:50 -04:00
slub.c slub: disable tracing and failslab for merged slabs 2014-10-09 22:25:51 -04:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:35:54 -07:00
swap_state.c mm: allow drivers to prevent new writable mappings 2014-08-08 15:57:31 -07:00
swap.c mm: memcontrol: use page lists for uncharge batching 2014-08-08 15:57:18 -07:00
swapfile.c mm: memcontrol: rewrite uncharge API 2014-08-08 15:57:17 -07:00
truncate.c mm: memcontrol: rewrite uncharge API 2014-08-08 15:57:17 -07:00
util.c proc/maps: make vm_is_stack() logic namespace-friendly 2014-10-09 22:25:50 -04:00
vmacache.c mm,vmacache: optimize overflow system-wide flushing 2014-06-04 16:53:57 -07:00
vmalloc.c mm/vmalloc.c: clean up map_vm_area third argument 2014-08-06 18:01:19 -07:00
vmpressure.c arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
vmscan.c mm: memcontrol: use page lists for uncharge batching 2014-08-08 15:57:18 -07:00
vmstat.c mm: vmscan: only update per-cpu thresholds for online CPU 2014-08-06 18:01:20 -07:00
workingset.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
zbud.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zpool.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zsmalloc.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zswap.c mm/zswap.c: add __init to zswap_entry_cache_destroy() 2014-08-08 15:57:18 -07:00