kernel-ark/Documentation/vm
Petr Holasek 90bd6fd31c ksm: allow trees per NUMA node
Here's a KSM series, based on mmotm 2013-01-23-17-04: starting with
Petr's v7 "KSM: numa awareness sysfs knob"; then fixing the two issues
we had with that, fully enabling KSM page migration on the way.

(A different kind of KSM/NUMA issue which I've certainly not begun to
address here: when KSM pages are unmerged, there's usually no sense in
preferring to allocate the new pages local to the caller's node.)

This patch:

Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes
which control merging pages across different numa nodes.  When it is set
to zero only pages from the same node are merged, otherwise pages from
all nodes can be merged together (default behavior).

Typical use-case could be a lot of KVM guests on NUMA machine and cpus
from more distant nodes would have significant increase of access
latency to the merged ksm page.  Sysfs knob was choosen for higher
variability when some users still prefers higher amount of saved
physical memory regardless of access latency.

Every numa node has its own stable & unstable trees because of faster
searching and inserting.  Changing of merge_across_nodes value is
possible only when there are not any ksm shared pages in system.

I've tested this patch on numa machines with 2, 4 and 8 nodes and
measured speed of memory access inside of KVM guests with memory pinned
to one of nodes with this benchmark:

  http://pholasek.fedorapeople.org/alloc_pg.c

Population standard deviations of access times in percentage of average
were following:

merge_across_nodes=1
2 nodes 1.4%
4 nodes 1.6%
8 nodes	1.7%

merge_across_nodes=0
2 nodes	1%
4 nodes	0.32%
8 nodes	0.018%

RFC: https://lkml.org/lkml/2011/11/30/91
v1: https://lkml.org/lkml/2012/1/23/46
v2: https://lkml.org/lkml/2012/6/29/105
v3: https://lkml.org/lkml/2012/9/14/550
v4: https://lkml.org/lkml/2012/9/23/137
v5: https://lkml.org/lkml/2012/12/10/540
v6: https://lkml.org/lkml/2012/12/23/154
v7: https://lkml.org/lkml/2012/12/27/225

Hugh notes that this patch brings two problems, whose solution needs
further support in mm/ksm.c, which follows in subsequent patches:

1) switching merge_across_nodes after running KSM is liable to oops
   on stale nodes still left over from the previous stable tree;

2) memory hotremove may migrate KSM pages, but there is no provision
   here for !merge_across_nodes to migrate nodes to the proper tree.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:19 -08:00
..
.gitignore Documentation/vm/.gitignore: add page-types 2009-09-24 07:20:57 -07:00
00-INDEX slub: doc: update the slabinfo.c file path 2011-08-31 20:10:17 +03:00
active_mm.txt Fix common misspellings 2011-03-31 11:26:23 -03:00
balance page allocator: use allocation flags as an index to the zone watermark 2009-06-16 19:47:35 -07:00
cleancache.txt Cleanups: rename of flush to invalidate, moving reporting of statistics 2012-03-22 19:52:47 -07:00
frontswap.txt doc: fix quite a few typos within Documentation 2012-11-19 14:28:24 +01:00
highmem.txt mm: highmem documentation 2010-10-26 16:52:08 -07:00
hugetlbpage.txt hugetlb: update hugetlbpage.txt 2012-08-21 16:45:03 -07:00
hwpoison.txt Documentation: update cgroupfs mount point 2011-06-15 21:52:50 -07:00
ksm.txt ksm: allow trees per NUMA node 2013-02-23 17:50:19 -08:00
locking mm: Convert i_mmap_lock to a mutex 2011-05-25 08:39:18 -07:00
numa doc: fix broken references 2011-09-27 18:08:04 +02:00
numa_memory_policy.txt Doc: Fix typo s/packages/packaged 2010-09-21 17:03:27 +02:00
overcommit-accounting Fix common misspellings 2011-03-31 11:26:23 -03:00
page_migration trivial: fix where cgroup documentation is not correctly referred to 2009-03-30 15:22:02 +02:00
pagemap.txt proc: report file/anon bit in /proc/pid/pagemap 2012-05-31 17:49:29 -07:00
slub.txt Documentations: Fix slabinfo.c directory in vm/slub.txt 2012-05-10 11:45:23 +03:00
transhuge.txt thp: introduce sysfs knob to disable huge zero page 2012-12-12 17:38:32 -08:00
unevictable-lru.txt mm: remove vma arg from page_evictable 2012-10-09 16:22:55 +09:00