kernel-ark/fs
David Gibson b45b5bd65f [PATCH] hugepage: Strict page reservation for hugepage inodes
These days, hugepages are demand-allocated at first fault time.  There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.

A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations.  In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available.  In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.

The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)

This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.

This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
..
9p [PATCH] v9fs: assign dentry ops to negative dentries 2006-03-22 07:53:55 -08:00
adfs
affs
afs
autofs
autofs4
befs
bfs
cifs [CIFS] Always match oplock break (cache notification) to the right tcp 2006-03-05 03:39:55 +00:00
coda
configfs
cramfs [PATCH] cramfs mounts provide corrupted content since 2.6.15 2006-03-06 18:40:43 -08:00
debugfs [PATCH] debugfs: Add debugfs_create_blob() helper for exporting binary data 2006-03-20 13:42:59 -08:00
devfs
devpts
efs
exportfs
ext2 [PATCH] Fix ext2 readdir f_pos re-validation logic 2006-03-15 16:31:51 -08:00
ext3 [PATCH] ext3: fix nobh mode for chattr +j inodes 2006-03-11 09:19:34 -08:00
fat
freevxfs
fuse
hfs
hfsplus
hostfs
hpfs
hppfs
hugetlbfs [PATCH] hugepage: Strict page reservation for hugepage inodes 2006-03-22 07:54:03 -08:00
isofs
jbd
jffs
jffs2 [PATCH] mtd: 64 bit fixes 2006-03-09 19:47:37 -08:00
jfs [PATCH] JFS: Take logsync lock before testing mp->lsn 2006-03-14 14:00:48 -08:00
lockd [PATCH] NLM: Ensure we do not Oops in the case of an unlock 2006-03-14 07:57:18 -08:00
minix
msdos
ncpfs
nfs [PATCH] NFSv4: fix mount segfault on errors returned that are < -1000 2006-03-14 07:57:18 -08:00
nfs_common
nfsd
nls
ntfs
ocfs2 [PATCH] slab: Remove SLAB_NO_REAP option 2006-03-22 07:53:59 -08:00
openpromfs
partitions [PATCH] s390: dasd partition detection 2006-03-08 14:14:01 -08:00
proc [PATCH] smaps: shared fix 2006-03-06 18:40:45 -08:00
qnx4
ramfs [PATCH] mm: nommu use compound pages 2006-03-22 07:54:01 -08:00
reiserfs [PATCH] reiserfs: fix unaligned bitmap usage 2006-03-02 10:37:59 -08:00
relayfs
romfs
smbfs
sysfs [PATCH] sysfs: fix a kobject leak in sysfs_add_link on the error path 2006-03-20 13:42:59 -08:00
sysv
udf [PATCH] udf: fix uid/gid options and add uid/gid=ignore and forget options 2006-03-08 14:14:00 -08:00
ufs
vfat
xfs
aio.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio.c
block_dev.c
buffer.c [PATCH] page migration: fail if page is in a vma flagged VM_LOCKED 2006-03-14 21:43:02 -08:00
char_dev.c [PATCH] kobj_map semaphore to mutex conversion 2006-03-20 13:42:58 -08:00
compat_ioctl.c [NET] compat ifconf: fix limits 2006-03-08 16:46:08 -08:00
compat.c
dcache.c [PATCH] fix file counting 2006-03-08 14:14:01 -08:00
dcookies.c
direct-io.c Fix a direct I/O locking issue revealed by the new mutex code. 2006-03-15 15:14:45 +11:00
dnotify.c
dquot.c
drop_caches.c
eventpoll.c
exec.c
fcntl.c
fifo.c Simplify fifo_open() locking logic 2006-03-07 09:16:35 -08:00
file_table.c [PATCH] fix file counting 2006-03-08 14:14:01 -08:00
file.c
filesystems.c
fs-writeback.c
inode.c
inotify.c
ioctl.c
ioprio.c
Kconfig
Kconfig.binfmt
libfs.c
locks.c
Makefile
mbcache.c
mpage.c
namei.c [PATCH] ext3: ext3_symlink should use GFP_NOFS allocations inside 2006-03-11 09:19:34 -08:00
namespace.c [PATCH] fs/namespace.c:dup_namespace(): fix a use after free 2006-03-15 09:37:34 -08:00
nfsctl.c [PATCH] nfsservctl(): remove user-triggerable printk 2006-03-17 07:51:25 -08:00
open.c
pipe.c Mark the pipe file operations static 2006-03-08 14:03:09 -08:00
pnode.c
pnode.h
posix_acl.c
quota_v1.c
quota_v2.c
quota.c
read_write.c
readdir.c
select.c
seq_file.c
stat.c
super.c
xattr_acl.c
xattr.c