kernel-ark/include/linux
Nick Piggin 557ed1fa26 remove ZERO_PAGE
The commit b5810039a5 contains the note

  A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
  (and thus mapcounted and count towards shared rss).  These writes to
  the struct page could cause excessive cacheline bouncing on big
  systems.  There are a number of ways this could be addressed if it is
  an issue.

And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).

There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completely

I will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.

Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).

As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.

When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.

Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.

The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
much use to them except on benchmarks.  All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
..
amba
byteorder
dvb
hdlc
isdn
lockd
mlx4
mmc
mtd
netfilter [NETFILTER]: Replace sk_buff ** with sk_buff * 2007-10-15 12:26:29 -07:00
netfilter_arp [NETFILTER]: Replace sk_buff ** with sk_buff * 2007-10-15 12:26:29 -07:00
netfilter_bridge [NETFILTER]: Replace sk_buff ** with sk_buff * 2007-10-15 12:26:29 -07:00
netfilter_ipv4 [NETFILTER]: Replace sk_buff ** with sk_buff * 2007-10-15 12:26:29 -07:00
netfilter_ipv6 [NETFILTER]: Replace sk_buff ** with sk_buff * 2007-10-15 12:26:29 -07:00
nfsd
raid
rtc
spi
ssb
sunrpc Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 2007-10-15 10:47:35 -07:00
tc_act
tc_ematch
usb docbook: fix usb content 2007-10-15 17:56:36 -07:00
8250_pci.h
a.out.h
ac97_codec.h
acct.h
acpi_pmtmr.h
acpi.h
adb.h
adfs_fs_i.h
adfs_fs_sb.h
adfs_fs.h
aer.h
affs_hardblocks.h
agp_backend.h
agpgart.h
aio_abi.h
aio.h
amifd.h
amifdreg.h
amigaffs.h
anon_inodes.h
apm_bios.h
apm-emulation.h
arcdevice.h
arcfb.h
async_tx.h
ata.h
atalk.h
atm_eni.h
atm_he.h
atm_idt77105.h
atm_nicstar.h
atm_suni.h
atm_tcp.h
atm_zatm.h
atm.h
atmapi.h
atmarp.h
atmbr2684.h
atmclip.h
atmdev.h
atmel_pdc.h
atmioc.h
atmlec.h
atmmpc.h
atmppp.h
atmsap.h
atmsvc.h
attribute_container.h
audit.h
auto_fs4.h
auto_fs.h
auxvec.h
ax25.h
b1lli.h
b1pcmcia.h
backing-dev.h
backlight.h
baycom.h
bcd.h
bfs_fs.h
binfmts.h
bio.h
bit_spinlock.h
bitmap.h
bitops.h
bitrev.h
blkdev.h
blkpg.h
blktrace_api.h
blockgroup_lock.h
bootmem.h
bottom_half.h
bpqether.h
bsg.h
buffer_head.h
bug.h
cache.h
calc64.h
capability.h
capi.h
cciss_ioctl.h
cd1400.h
cdev.h
cdk.h
cdrom.h
cfag12864b.h
chio.h
circ_buf.h
clk.h
clockchips.h
clocksource.h
cm4000_cs.h
cn_proc.h
coda_cache.h
coda_fs_i.h
coda_linux.h
coda_psdev.h
coda.h
coff.h
com20020.h
compat.h
compiler-gcc3.h
compiler-gcc4.h
compiler-gcc.h
compiler-intel.h
compiler.h
completion.h
comstats.h
concap.h
configfs.h
connector.h
console_struct.h
console.h
consolemap.h
const.h
cpu.h
cpufreq.h
cpumask.h
cpuset.h
cramfs_fs_sb.h
cramfs_fs.h
crash_dump.h
crc7.h
crc16.h
crc32.h
crc32c.h
crc-ccitt.h
crc-itu-t.h
crypto.h
cryptohash.h
ctype.h
cuda.h
cyclades.h
cyclomx.h
cycx_cfm.h
cycx_drv.h
cycx_x25.h
dcache.h
dccp.h
dcookies.h
debug_locks.h
debugfs.h
delay.h
delayacct.h
device-mapper.h
device.h
devpts_fs.h
dio.h
dirent.h
display.h
dlm_device.h
dlm_netlink.h
dlm.h
dm9000.h
dm-ioctl.h
dma-mapping.h introduce DMA_MASK_NONE as a signal for unable to do DMA 2007-10-16 09:42:50 -07:00
dmaengine.h
dmapool.h
dmi.h
dn.h
dnotify.h
dqblk_v1.h
dqblk_v2.h
dqblk_xfs.h
ds1wm.h
ds1286.h
ds17287rtc.h
dtlk.h
edac.h
edd.h
eeprom_93cx6.h
efi.h
efs_dir.h
efs_fs_i.h
efs_fs_sb.h
efs_fs.h
efs_vh.h
eisa.h
elevator.h
elf-em.h
elf-fdpic.h
elf.h
elfcore.h
elfnote.h
err.h
errno.h
errqueue.h
etherdevice.h
ethtool.h
eventfd.h
eventpoll.h
exportfs.h
ext2_fs_sb.h
ext2_fs.h
ext3_fs_i.h
ext3_fs_sb.h
ext3_fs.h
ext3_jbd.h
ext4_fs_extents.h
ext4_fs_i.h
ext4_fs_sb.h
ext4_fs.h
ext4_jbd2.h
fadvise.h
falloc.h
fault-inject.h
fb.h
fcdevice.h
fcntl.h
fd1772.h
fd.h
fddidevice.h
fdreg.h
fib_rules.h
file.h
filter.h
firewire-cdev.h
firewire-constants.h
firmware.h
flat.h
font.h
freezer.h
fs_enet_pd.h
fs_stack.h
fs_struct.h
fs_uart_pd.h
fs.h readahead: combine file_ra_state.prev_index/prev_offset into prev_pos 2007-10-16 09:42:52 -07:00
fsl_devices.h
fsnotify.h
fuse.h
futex.h
gameport.h
gen_stats.h
genalloc.h
generic_acl.h
generic_serial.h
genetlink.h
genhd.h
getcpu.h
gfp.h
gfs2_ondisk.h
gigaset_dev.h
gpio_keys.h
gpio_mouse.h
hardirq.h
harrier_defs.h
hash.h
hayesesp.h
hdlc.h
hdlcdrv.h
hdpu_features.h
hdreg.h
hdsmart.h
hid-debug.h
hid.h HID: fix HIDIOCGRDESC memory access in hidraw 2007-10-15 08:12:00 -07:00
hiddev.h
hidraw.h HID: fix HIDIOCGRDESC memory access in hidraw 2007-10-15 08:12:00 -07:00
highmem.h
highuid.h
hil_mlc.h
hil.h
hippidevice.h
hp_sdc.h
hpet.h
hrtimer.h
htirq.h
hugetlb.h
hw_random.h
hwmon-sysfs.h
hwmon-vid.h
hwmon.h
hysdn_if.h
i2c-algo-bit.h
i2c-algo-pca.h
i2c-algo-pcf.h
i2c-algo-sgi.h
i2c-dev.h
i2c-gpio.h
i2c-id.h
i2c-ocores.h
i2c-pnx.h
i2c-pxa.h
i2c.h
i2o-dev.h
i2o.h
i8k.h
ibmtr.h
icmp.h
icmpv6.h
ide.h
idr.h
ieee80211.h
if_addr.h
if_arcnet.h
if_arp.h
if_bonding.h
if_bridge.h [NETFILTER]: Replace sk_buff ** with sk_buff * 2007-10-15 12:26:29 -07:00
if_cablemodem.h
if_ec.h
if_eql.h
if_ether.h
if_fc.h
if_fddi.h
if_frad.h
if_hippi.h
if_infiniband.h
if_link.h
if_ltalk.h
if_macvlan.h
if_packet.h
if_plip.h
if_ppp.h
if_pppol2tp.h
if_pppox.h
if_shaper.h
if_slip.h
if_strip.h
if_tr.h
if_tun.h
if_tunnel.h
if_vlan.h
if_wanpipe.h
if.h
igmp.h
in6.h
in_route.h
in.h
inet_diag.h
inet_lro.h
inet.h
inetdevice.h
init_task.h
init.h Add assembler equivalents to __init{,date}_refok 2007-10-16 09:42:49 -07:00
initrd.h
inotify.h
input-polldev.h
input.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2007-10-15 13:41:39 -07:00
interrupt.h provide stubs for enable_irq_wake() and disable_irq_wake() 2007-10-16 09:42:50 -07:00
io.h
ioc3.h
ioc4.h
ioctl.h
ioport.h
ioprio.h
ip6_tunnel.h
ip.h
ipc.h
ipmi_msgdefs.h
ipmi_smi.h
ipmi.h
ipsec.h
ipv6_route.h
ipv6.h
ipx.h
irda.h
irq_cpustat.h
irq.h
irqflags.h
irqreturn.h
isa.h
isapnp.h
isdn_divertif.h
isdn_ppp.h
isdn.h [ISDN]: Remove local copy of device name to make sure renames work. 2007-10-15 12:26:37 -07:00
isdnif.h
isicom.h
iso_fs.h
istallion.h
ivtv.h
ivtvfb.h
ixjuser.h
jbd2.h
jbd.h docbook: fix filesystems content 2007-10-15 17:56:36 -07:00
jffs2.h
jhash.h
jiffies.h slow down printk during boot 2007-10-16 09:42:49 -07:00
journal-head.h
joystick.h
kallsyms.h
kbd_diacr.h
kbd_kern.h
Kbuild
kd.h
kdebug.h
kdev_t.h
kernel_stat.h sched: guest CPU accounting: add guest-CPU /proc/stat field 2007-10-15 17:00:19 +02:00
kernel.h
kernelcapi.h
kexec.h
key-ui.h
key.h
keyboard.h
keyctl.h
kfifo.h
klist.h
kmalloc_sizes.h
kmod.h
kobj_map.h
kobject.h
kprobes.h
kref.h
ks0108.h
kthread.h
ktime.h
kvm_para.h
kvm.h
lapb.h
latency.h
lcd.h
leds.h
lguest_bus.h
lguest_launcher.h
lguest.h
libata.h
libps2.h
license.h
limits.h
linkage.h
linux_logo.h
list.h
llc.h
lm_interface.h
lock_dlm_plock.h
lockdep.h
log2.h
loop.h
lp.h
lzo.h
m41t00.h
m48t86.h
magic.h
major.h
maple.h
matroxfb.h
mbcache.h
mc6821.h
mc146818rtc.h
mca-legacy.h
mca.h
mdio-bitbang.h
memory_hotplug.h Clean up duplicate includes in include/linux/memory_hotplug.h 2007-10-16 09:42:52 -07:00
memory.h
mempolicy.h
mempool.h
meye.h
migrate.h
mii.h
minix_fs.h
miscdevice.h
mm_inline.h
mm_types.h
mm.h remove ZERO_PAGE 2007-10-16 09:42:53 -07:00
mman.h
mmtimer.h
mmzone.h sparsemem: record when a section has a valid mem_map 2007-10-16 09:42:51 -07:00
mnt_namespace.h
mod_devicetable.h
module.h
moduleloader.h
moduleparam.h
mount.h
mpage.h
mqueue.h
mroute.h
msdos_fs.h
msg.h
msi.h
mtio.h
mutex-debug.h
mutex.h
mv643xx.h
n_r3964.h
namei.h
nbd.h
ncp_fs_i.h
ncp_fs_sb.h
ncp_fs.h
ncp_mount.h
ncp_no.h
ncp.h
neighbour.h
net.h
netdevice.h
netfilter_arp.h
netfilter_bridge.h
netfilter_decnet.h
netfilter_ipv4.h [NETFILTER]: Replace sk_buff ** with sk_buff * 2007-10-15 12:26:29 -07:00
netfilter_ipv6.h
netfilter.h [NETFILTER]: Replace sk_buff ** with sk_buff * 2007-10-15 12:26:29 -07:00
netlink.h
netpoll.h
netrom.h
nfs2.h
nfs3.h
nfs4_acl.h
nfs4_mount.h
nfs4.h
nfs_fs_i.h
nfs_fs_sb.h
nfs_fs.h
nfs_idmap.h
nfs_mount.h
nfs_page.h
nfs_xdr.h
nfs.h
nfsacl.h
nfsd_idmap.h
nl80211.h
nls.h
nmi.h
node.h
nodemask.h
notifier.h
nsc_gpio.h
nsproxy.h
nubus.h
numa.h
nvram.h
of_device.h
of_platform.h
of.h
oom.h
oprofile.h
page-flags.h
pagemap.h filemap: convert some unsigned long to pgoff_t 2007-10-16 09:42:53 -07:00
pagevec.h
param.h
parport_pc.h
parport.h
parser.h
pata_platform.h
patchkey.h
pci_hotplug.h
pci_ids.h 8250_pci: Autodetect mainpine cards 2007-10-16 09:42:50 -07:00
pci_regs.h
pci-acpi.h
pci.h
pcieport_if.h
pda_power.h
percpu_counter.h
percpu.h
personality.h
pfkeyv2.h
pfn.h
pg.h
phantom.h
phonedev.h
phy_fixed.h
phy.h
pid_namespace.h
pid.h
pipe_fs_i.h
pkt_cls.h
pkt_sched.h
pktcdvd.h
platform_device.h
plist.h
pm_legacy.h
pm.h
pmu.h
pnp.h
pnpbios.h
poison.h
poll.h
posix_acl_xattr.h
posix_acl.h
posix_types.h
posix-timers.h
power_supply.h
ppdev.h
ppp_channel.h
ppp_defs.h
ppp-comp.h
prctl.h
preempt.h
prefetch.h
prio_tree.h
proc_fs.h
profile.h
ps2esdi.h
ptrace.h
qnx4_fs.h
qnxtypes.h
quicklist.h
quota.h
quotaio_v1.h
quotaio_v2.h
quotaops.h
radeonfb.h
radix-tree.h radixtree: introduce radix_tree_next_hole() 2007-10-16 09:42:52 -07:00
raid_class.h
ramfs.h
random.h
raw.h
rbtree.h
rcupdate.h
reboot.h
reciprocal_div.h
reiserfs_acl.h
reiserfs_fs_i.h
reiserfs_fs_sb.h
reiserfs_fs.h
reiserfs_xattr.h
relay.h
resource.h
resume-trace.h
rfkill.h
rio_drv.h
rio_ids.h
rio_regs.h
rio.h
rmap.h
romfs_fs.h
root_dev.h
rose.h
route.h
rslib.h
rtc-v3020.h
rtc.h
rtmutex.h
rtnetlink.h
rwsem-spinlock.h
rwsem.h
rxrpc.h
sc26198.h
scatterlist.h
scc.h
sched.h sched: guest CPU accounting: maintain stats in account_system_time() 2007-10-15 17:00:19 +02:00
screen_info.h
sctp.h
scx200_gpio.h
scx200.h
sdla.h
seccomp.h
securebits.h
security.h
selection.h
selinux_netlink.h
selinux.h
sem.h
seq_file.h
seqlock.h
serial167.h
serial_8250.h
serial_core.h wake up from a serial port 2007-10-16 09:42:50 -07:00
serial_pnx8xxx.h
serial_reg.h
serial.h
serialP.h
serio.h
shm.h
shmem_fs.h
signal.h
signalfd.h
skbuff.h Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2007-10-15 14:06:58 -07:00
slab_def.h
slab.h
slob_def.h
slub_def.h SLUB: direct pass through of page size or higher kmalloc requests 2007-10-16 09:42:53 -07:00
sm501-regs.h
sm501.h
smb_fs_i.h
smb_fs_sb.h
smb_fs.h
smb_mount.h
smb.h
smbno.h
smp_lock.h
smp.h
snmp.h
socket.h
sockios.h
som.h
sonet.h
sony-laptop.h
sonypi.h
sort.h
sound.h
soundcard.h
spinlock_api_smp.h
spinlock_api_up.h
spinlock_types_up.h
spinlock_types.h
spinlock_up.h
spinlock.h
splice.h
srcu.h
stacktrace.h
stallion.h
start_kernel.h
stat.h
statfs.h
stddef.h
stop_machine.h
string.h
stringify.h
superhyway.h
suspend.h
svga.h
swap.h
swapops.h
synclink.h
sys.h
syscalls.h
sysctl.h
sysdev.h
sysfs.h
sysrq.h
sysv_fs.h
task_io_accounting_ops.h
task_io_accounting.h
taskstats_kern.h
taskstats.h
tc.h
tcp.h [TCP]: Make snd_cwnd_cnt 32-bit 2007-10-15 12:59:43 -07:00
telephony.h
termios.h
textsearch_fsm.h
textsearch.h
tfrc.h
thread_info.h
threads.h
ticable.h
tick.h
tifm.h
time.h
timer.h
timerfd.h
times.h
timex.h
tiocl.h
tipc_config.h
tipc.h
topology.h sched: enable wake-idle on CONFIG_SCHED_MC=y 2007-10-15 17:00:19 +02:00
toshiba.h
transport_class.h
trdevice.h
tsacct_kern.h
tty_driver.h
tty_flip.h
tty_ldisc.h
tty.h
types.h
uaccess.h
udf_fs_i.h
udf_fs_sb.h
udf_fs.h
udp.h
ufs_fs_i.h
ufs_fs_sb.h
ufs_fs.h
uinput.h
uio_driver.h
uio.h
ultrasound.h
un.h
unistd.h
unwind.h
usb_usual.h
usb.h
usbdevice_fs.h
user_namespace.h
user.h
utime.h
uts.h
utsname.h
vermagic.h
vfs.h
via.h
video_decoder.h
video_encoder.h
video_output.h
videodev2.h
videodev.h
videotext.h
vmalloc.h
vmstat.h
vt_buffer.h
vt_kern.h
vt.h
wait.h
wanrouter.h
watchdog.h
wireless.h
workqueue.h
writeback.h Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 2007-10-15 10:47:35 -07:00
x25.h
xattr.h
xfrm.h
xilinxfb.h
yam.h
zconf.h
zlib.h
zorro_ids.h
zorro.h
zutil.h