kernel-ark/fs
Aaron Lu 6fcb52a56f thp: reduce usage of huge zero page's atomic counter
The global zero page is used to satisfy an anonymous read fault.  If
THP(Transparent HugePage) is enabled then the global huge zero page is
used.  The global huge zero page uses an atomic counter for reference
counting and is allocated/freed dynamically according to its counter
value.

CPU time spent on that counter will greatly increase if there are a lot
of processes doing anonymous read faults.  This patch proposes a way to
reduce the access to the global counter so that the CPU load can be
reduced accordingly.

To do this, a new flag of the mm_struct is introduced:
MMF_USED_HUGE_ZERO_PAGE.  With this flag, the process only need to touch
the global counter in two cases:

 1 The first time it uses the global huge zero page;
 2 The time when mm_user of its mm_struct reaches zero.

Note that right now, the huge zero page is eligible to be freed as soon
as its last use goes away.  With this patch, the page will not be
eligible to be freed until the exit of the last process from which it
was ever used.

And with the use of mm_user, the kthread is not eligible to use huge
zero page either.  Since no kthread is using huge zero page today, there
is no difference after applying this patch.  But if that is not desired,
I can change it to when mm_count reaches zero.

Case used for test on Haswell EP:

  usemem -n 72 --readonly -j 0x200000 100G

Which spawns 72 processes and each will mmap 100G anonymous space and
then do read only access to that space sequentially with a step of 2MB.

  CPU cycles from perf report for base commit:
      54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
  CPU cycles from perf report for this commit:
       0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page

Performance(throughput) of the workload for base commit: 1784430792
Performance(throughput) of the workload for this commit: 4726928591
164% increase.

Runtime of the workload for base commit: 707592 us
Runtime of the workload for this commit: 303970 us
50% drop.

Link: http://lkml.kernel.org/r/fe51a88f-446a-4622-1363-ad1282d71385@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:28 -07:00
..
9p Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
adfs Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
affs get rid of 'parent' argument of ->d_compare() 2016-07-31 16:37:25 -04:00
afs rxrpc: Rewrite the data and ack handling code 2016-09-08 11:10:12 +01:00
autofs4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-10-06 09:52:23 -07:00
befs fs/befs/io.c:befs_bread(): remove unneeded initialization to NULL 2016-05-23 17:04:14 -07:00
bfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
btrfs Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-09-23 13:39:37 -07:00
cachefiles cachefiles: Fix race between inactivating and culling a cache object 2016-08-03 13:33:26 -04:00
ceph ceph: do not modify fi->frag in need_reset_readdir() 2016-09-05 14:30:35 +02:00
cifs Move check for prefix path to within cifs_get_root() 2016-09-09 23:58:07 -05:00
coda drop redundant ->owner initializations 2016-05-29 19:08:00 -04:00
configfs configfs: Return -EFBIG from configfs_write_bin_file. 2016-09-16 12:58:28 +02:00
cramfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
crypto fscrypto: require write access to mount to set encryption policy 2016-09-10 01:18:57 -04:00
debugfs debugfs: propagate release() call result 2016-09-27 12:45:57 +02:00
devpts devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts 2016-09-23 11:31:31 +02:00
dlm dlm: fix malfunction of dlm_tool caused by debugfs changes 2016-08-26 13:22:14 -05:00
ecryptfs ecryptfs: don't allow mmap when the lower fs doesn't support it 2016-07-08 10:35:28 -05:00
efivarfs fs/efivarfs: Fix double kfree() in error path 2016-09-09 16:08:48 +01:00
efs fs/efs/super.c: fix return value 2016-05-20 17:58:30 -07:00
exofs block, fs, mm, drivers: use bio set/get op accessors 2016-06-07 13:41:38 -06:00
exportfs introduce a parallel variant of ->iterate() 2016-05-02 19:49:29 -04:00
ext2 ext2/4, xfs: call thp_get_unmapped_area() for pmd mappings 2016-10-07 18:46:28 -07:00
ext4 ext2/4, xfs: call thp_get_unmapped_area() for pmd mappings 2016-10-07 18:46:28 -07:00
f2fs In this round, we've investigated how f2fs deals with errors given by our fault 2016-10-06 15:30:40 -07:00
fat Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
freevxfs freevxfs: update Kconfig information 2016-06-13 10:20:39 +02:00
fscache Merge branch 'd_real' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into work.misc 2016-06-30 23:34:49 -04:00
fuse fuse: limit xattr returned size 2016-10-03 11:06:05 +02:00
gfs2 We've only got six GFS2 patches for this merge window. In patch order: 2016-10-04 13:42:13 -07:00
hfs Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
hfsplus Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
hostfs hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() 2016-08-04 00:18:10 +02:00
hpfs get rid of 'parent' argument of ->d_compare() 2016-07-31 16:37:25 -04:00
hugetlbfs
isofs get rid of 'parent' argument of ->d_compare() 2016-07-31 16:37:25 -04:00
jbd2 The major change this cycle is deleting ext4's copy of the file system 2016-07-26 18:35:55 -07:00
jffs2 vfs: make the string hashes salt the hash 2016-06-10 20:21:46 -07:00
jfs jfs: Simplify code 2016-09-06 12:17:24 -05:00
kernfs kernfs: don't depend on d_find_any_alias() when generating notifications 2016-08-31 14:48:52 +02:00
lockd Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-07-28 12:59:05 -07:00
logfs Merge branch 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-06 09:49:02 -04:00
minix simple local filesystems: switch to ->iterate_shared() 2016-05-02 19:49:32 -04:00
ncpfs get rid of 'parent' argument of ->d_compare() 2016-07-31 16:37:25 -04:00
nfs NFSv4.1: Fix the CREATE_SESSION slot number accounting 2016-09-11 14:56:44 -04:00
nfs_common
nfsd nfsd: don't return an unhashed lock stateid after taking mutex 2016-08-12 16:10:25 -04:00
nilfs2 nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
nls
notify fsnotify: clean up spinlock assertions 2016-10-07 18:46:26 -07:00
ntfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-07-28 12:59:05 -07:00
ocfs2 ocfs2: fix undefined struct variable in inode.h 2016-10-07 18:46:26 -07:00
omfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
openpromfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
orangefs Revert "orangefs: bump minimum userspace version" 2016-10-03 15:07:36 -04:00
overlayfs Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-10-04 14:48:27 -07:00
proc fs/proc/task_mmu.c: make the task_mmu walk_page_range() limit in clear_refs_write() obvious 2016-10-07 18:46:28 -07:00
pstore ramoops: move spin_lock_init after kmalloc error checking 2016-09-08 15:01:13 -07:00
qnx4 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
qnx6 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
quota quota: fill in Q_XGETQSTAT inode information for inactive quotas 2016-08-15 17:43:31 +02:00
ramfs ipc/shm: fix crash if CONFIG_SHMEM is not set 2016-09-19 15:36:17 -07:00
reiserfs reiserfs: Unlock superblock before calling reiserfs_quota_on_mount() 2016-09-16 17:20:59 +02:00
romfs romfs, squashfs: switch to ->iterate_shared() 2016-05-09 11:41:15 -04:00
squashfs fs: have ll_rw_block users pass in op and flags separately 2016-06-07 13:41:38 -06:00
sysfs sysfs print name of undiscoverable attribute group 2016-09-27 12:24:29 +02:00
sysv vfs: make the string hashes salt the hash 2016-06-10 20:21:46 -07:00
tracefs tracefs: ->d_parent is never NULL or negative... 2016-05-29 16:22:07 -04:00
ubifs ubifs: Fix xattr generic handler usage 2016-08-23 23:02:52 +02:00
udf udf: don't bother with full-page write optimisations in adinicb case 2016-09-19 10:47:01 +02:00
ufs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-07-28 12:59:05 -07:00
xfs ext2/4, xfs: call thp_get_unmapped_area() for pmd mappings 2016-10-07 18:46:28 -07:00
aio.c aio: mark AIO pseudo-fs noexec 2016-09-15 15:49:28 -07:00
anon_inodes.c
attr.c vfs: Don't modify inodes with a uid or gid unknown to the vfs 2016-07-05 15:06:46 -05:00
bad_inode.c switch ->setxattr() to passing dentry and inode separately 2016-05-27 20:09:16 -04:00
binfmt_aout.c fs: fix binfmt_aout.c build error 2016-05-28 16:34:59 -07:00
binfmt_elf_fdpic.c elf_fdpic_transfer_args_to_stack(): make it generic 2016-07-25 16:51:49 +10:00
binfmt_elf.c x86/coredump: Use pr_reg size, rather that TIF_IA32 flag 2016-09-14 21:28:10 +02:00
binfmt_em86.c fs/binfmt_em86.c: fix incompatible pointer type 2016-08-02 19:35:15 -04:00
binfmt_flat.c binfmt_flat: allow compressed flat binary format to work on MMU systems 2016-07-28 13:29:12 +10:00
binfmt_misc.c binfmt_misc for-linus on 20160727 2016-08-07 10:13:14 -04:00
binfmt_script.c
block_dev.c fs/block_dev: fix potential NULL ptr deref in freeze_bdev() 2016-08-25 08:38:26 -06:00
buffer.c xfs: update for 4.8-rc1 2016-07-27 09:53:35 -07:00
char_dev.c chardev: add missing line break in pr_warn 2016-07-14 16:21:53 +09:00
compat_binfmt_elf.c
compat_ioctl.c [media] cec: add compat32 ioctl support 2016-06-28 10:00:13 -03:00
compat.c Fix a number of bugs, most notably a potential stale data exposure 2016-05-24 12:55:26 -07:00
coredump.c coredump: fix dumping through pipes 2016-06-07 22:07:09 -04:00
dax.c thp: reduce usage of huge zero page's atomic counter 2016-10-07 18:46:28 -07:00
dcache.c Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
dcookies.c
direct-io.c direct-io: use bio set/get op accessors 2016-06-07 13:41:38 -06:00
drop_caches.c
eventfd.c
eventpoll.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
exec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2016-08-04 18:04:44 -04:00
fcntl.c
fhandle.c
file_table.c
file.c give readdir(2)/getdents(2)/etc. uniform exclusion with lseek() 2016-05-02 19:49:28 -04:00
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c mm, writeback: flush plugged IO in wakeup_flusher_threads() 2016-08-09 19:58:06 -06:00
inode.c Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
internal.h iomap: expose iomap_apply outside iomap.c 2016-09-19 11:24:49 +10:00
ioctl.c vfs: cap dedupe request structure size at PAGE_SIZE 2016-09-15 13:29:52 -07:00
iomap.c Merge branch 'iomap-4.9-dax' into for-next 2016-10-03 09:53:59 +11:00
Kconfig fs/locks: Replace lg_local with a per-cpu spinlock 2016-09-22 15:25:53 +02:00
Kconfig.binfmt ARM: 8594/1: enable binfmt_flat on systems with an MMU 2016-08-12 16:47:05 +01:00
libfs.c lockless next_positive() 2016-06-20 17:11:29 -04:00
locks.c File locking related changes for v4.9 2016-10-04 13:36:19 -07:00
Makefile fs: introduce iomap infrastructure 2016-06-21 09:23:11 +10:00
mbcache.c
mount.h mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
mpage.c block/mm: make bdev_ops->rw_page() take a bool for read/write 2016-08-07 14:41:02 -06:00
namei.c fs: return EPERM on immutable inode 2016-08-07 10:03:31 -04:00
namespace.c mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
no-block.c
nsfs.c nsfs: Simplify __ns_get_path 2016-09-22 20:06:20 -05:00
open.c binfmt_misc for-linus on 20160727 2016-08-07 10:13:14 -04:00
pipe.c mm: memcontrol: only mark charged pages with PageKmemcg 2016-08-09 10:14:10 -07:00
pnode.c mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
pnode.h mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
posix_acl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-07-29 15:54:19 -07:00
proc_namespace.c
read_write.c x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2 2016-07-15 10:30:26 +02:00
readdir.c restore killability of old mutex_lock_killable(&inode->i_mutex) users 2016-05-26 00:13:25 -04:00
select.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
seq_file.c fs/seq_file: fix out-of-bounds read 2016-08-26 17:39:35 -07:00
signalfd.c
splice.c Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
stack.c
stat.c
statfs.c
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-07-29 15:54:19 -07:00
sync.c
timerfd.c timerfd: Reject ALARM timerfds without CAP_WAKE_ALARM 2016-06-09 23:42:38 +02:00
userfaultfd.c mm: introduce fault_env 2016-07-26 16:19:19 -07:00
utimes.c fs: return EPERM on immutable inode 2016-08-07 10:03:31 -04:00
xattr.c vfs: Don't modify inodes with a uid or gid unknown to the vfs 2016-07-05 15:06:46 -05:00