kernel-ark

Author	SHA1	Message	Date
Nick Piggin	da6052f7b3	[PATCH] update some mm/ comments Let's try to keep mm/ comments more useful and up to date. This is a start. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:49 -07:00
Ravikiran G Thirumalai	e5ac9c5aec	[PATCH] Add some comments to slab.c Also, checks if we get a valid slabp_cache for off slab slab-descriptors. We should always get this. If we don't, then in that case we, will have to disable off-slab descriptors for this cache and do the calculations again. This is a rare case, so add a BUG_ON, for now, just in case. Signed-off-by: Alok N Kataria <alok.kataria@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:49 -07:00
Heiko Carstens	dfd54cbcc0	[PATCH] bootmem: use MAX_DMA_ADDRESS instead of LOW32LIMIT Introduce ARCH_LOW_ADDRESS_LIMIT which can be set per architecture to override the 4GB default limit used by the bootmem allocater within __alloc_bootmem_low() and __alloc_bootmem_low_node(). E.g. s390 needs a 2GB limit instead of 4GB. Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:49 -07:00
Nick Piggin	b72f160443	[PATCH] oom: more printk Print the name of the task invoking the OOM killer. Could make debugging easier. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:49 -07:00
Nick Piggin	5081dde33f	[PATCH] oom: kthread infinite loop fix Skip kernel threads, rather than having them return 0 from badness. Theoretically, badness might truncate all results to 0, thus a kernel thread might be picked first, causing an infinite loop. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:49 -07:00
Nick Piggin	af5b912435	[PATCH] oom: swapoff tasks tweak PF_SWAPOFF processes currently cause select_bad_process to return straight away. Instead, give them high priority, so we will kill them first, however we also first ensure no parallel OOM kills are happening at the same time. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:49 -07:00
Nick Piggin	4a3ede107e	[PATCH] oom: handle oom_disable exiting Having the oomkilladj == OOM_DISABLE check before the releasing check means that oomkilladj == OOM_DISABLE tasks exiting will not stop the OOM killer. Moving the test down will give the desired behaviour. Also: it will allow them to "OOM-kill" themselves if they are exiting. As per the previous patch, this is required to prevent OOM killer deadlocks (and they don't actually get killed, because they're already exiting -- they're simply allowed access to memory reserves). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Nick Piggin	50ec3bbffb	[PATCH] oom: handle current exiting If current is exiting, it should actually be allowed to access reserved memory rather than OOM kill something else. Can't do this via a straight check in page_alloc.c because that would allow multiple tasks to use up reserves. Instead cause current to OOM-kill itself which will mark it as TIF_MEMDIE. The current procedure of simply aborting the OOM-kill if a task is exiting can lead to OOM deadlocks. In the case of killing a PF_EXITING task, don't make a lot of noise about it. This becomes more important in future patches, where we can "kill" OOM_DISABLE tasks. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Nick Piggin	7887a3da75	[PATCH] oom: cpuset hint cpuset_excl_nodes_overlap does not always indicate that killing a task will not free any memory we for us. For example, we may be asking for an allocation from _anywhere_ in the machine, or the task in question may be pinning memory that is outside its cpuset. Fix this by just causing cpuset_excl_nodes_overlap to reduce the badness rather than disallow it. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Nick Piggin	4ff1ffb487	[PATCH] oom: reclaim_mapped on oom Potentially it takes several scans of the lru lists before we can even start reclaiming pages. mapped pages, with young ptes can take 2 passes on the active list + one on the inactive list. But reclaim_mapped may not always kick in instantly, so it could take even more than that. Raise the threshold for marking a zone as all_unreclaimable from a factor of 4 time the pages in the zone to 6. Introduce a mechanism to force reclaim_mapped if we've reached a factor 3 and still haven't made progress. Previously, a customer doing stress testing was able to easily OOM the box after using only a small fraction of its swap (~100MB). After the patches, it would only OOM after having used up all swap (~800MB). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Nick Piggin	408d85441c	[PATCH] oom: use unreclaimable info __alloc_pages currently starts shooting if page reclaim has failed to free up swap_cluster_max pages in one run through the priorities. This is not always a good indicator on its own, so make use of the all_unreclaimable logic as well: don't consider going OOM until all zones we're interested in are unreclaimable. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Peter Zijlstra	6ddab3b9eb	[PATCH] mm: swap write failure fixup Currently we can silently drop data if the write to swap failed. It usually doesn't result in data-corruption because on page-in the process will receive SIGBUS (assuming write-failure implies read-failure). This assumption might or might not be valid. This patch will avoid the page being discarded after a failed write. But will print a warning the sysadmin _should_ take to heart, if a lot of swap space becomes un-writeable, OOM is not far off. Tested by making the write fail 'randomly' once every 50 writes or so. [akpm@osdl.org: printk warning fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Pekka Enberg	ca5f9703df	[PATCH] slab: respect architecture and caller mandated alignment As explained by Heiko, on s390 (32-bit) ARCH_KMALLOC_MINALIGN is set to eight because their common I/O layer allocates data structures that need to have an eight byte alignment. This does not work when CONFIG_SLAB_DEBUG is enabled because kmem_cache_create will override alignment to BYTES_PER_WORD which is four. So change kmem_cache_create to ensure cache alignment is always at minimum what the architecture or caller mandates even if slab debugging is enabled. Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Nick Piggin	db37648cd6	[PATCH] mm: non syncing lock_page() lock_page needs the caller to have a reference on the page->mapping inode due to sync_page, ergo set_page_dirty_lock is obviously buggy according to its comments. Solve it by introducing a new lock_page_nosync which does not do a sync_page. akpm: unpleasant solution to an unpleasant problem. If it goes wrong it could cause great slowdowns while the lock_page() caller waits for kblockd to perform the unplug. And if a filesystem has special sync_page() requirements (none presently do), permanent hangs are possible. otoh, set_page_dirty_lock() is usually (always?) called against userspace pages. They are always up-to-date, so there shouldn't be any pending read I/O against these pages. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Nick Piggin	28e4d965e6	[PATCH] mm: remove_mapping() safeness Some users of remove_mapping had been unsafe. Modify the remove_mapping precondition to ensure the caller has locked the page and obtained the correct mapping. Modify callers to ensure the mapping is the correct one. [hugh@veritas.com: swapper_space fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:48 -07:00
Rolf Eike Beer	bfa5bf6d64	[PATCH] Add kerneldocs for some functions in mm/memory.c These functions are already documented quite well with long comments. Now add kerneldoc style header to make this turn up in everyones favorite doc format. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:47 -07:00
Martin Peschke	7ff6f08295	[PATCH] CPU hotplug compatible alloc_percpu() This patch splits alloc_percpu() up into two phases. Likewise for free_percpu(). This allows clients to limit initial allocations to online cpu's, and to populate or depopulate per-cpu data at run time as needed: struct my_struct obj; / initial allocation for online cpu's / obj = percpu_alloc(sizeof(struct my_struct), GFP_KERNEL); ... / populate per-cpu data for cpu coming online / ptr = percpu_populate(obj, sizeof(struct my_struct), GFP_KERNEL, cpu); ... / access per-cpu object / ptr = percpu_ptr(obj, smp_processor_id()); ... / depopulate per-cpu data for cpu going offline / percpu_depopulate(obj, cpu); ... / final removal */ percpu_free(obj); Signed-off-by: Martin Peschke <mp3@de.ibm.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:47 -07:00
Martin Schwidefsky	8bc719d3ca	[PATCH] out of memory notifier Add a notifer chain to the out of memory killer. If one of the registered callbacks could release some memory, do not kill the process but return and retry the allocation that forced the oom killer to run. The purpose of the notifier is to add a safety net in the presence of memory ballooners. If the resource manager inflated the balloon to a size where memory allocations can not be satisfied anymore, it is better to deflate the balloon a bit instead of killing processes. The implementation for the s390 ballooner is included. [akpm@osdl.org: cleanups] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:47 -07:00
Christoph Lameter	19655d3487	[PATCH] linearly index zone->node_zonelists[] I wonder why we need this bitmask indexing into zone->node_zonelists[]? We always start with the highest zone and then include all lower zones if we build zonelists. Are there really cases where we need allocation from ZONE_DMA or ZONE_HIGHMEM but not ZONE_NORMAL? It seems that the current implementation of highest_zone() makes that already impossible. If we go linear on the index then gfp_zone() == highest_zone() and a lot of definitions fall by the wayside. We can now revert back to the use of gfp_zone() in mempolicy.c ;-) Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:47 -07:00
Christoph Lameter	2f6726e54a	[PATCH] Apply type enum zone_type After we have done this we can now do some typing cleanup. The memory policy layer keeps a policy_zone that specifies the zone that gets memory policies applied. This variable can now be of type enum zone_type. The check_highest_zone function and the build_zonelists funnctionm must then also take a enum zone_type parameter. Plus there are a number of loops over zones that also should use zone_type. We run into some troubles at some points with functions that need a zone_type variable to become -1. Fix that up. [pj@sgi.com: fix set_mempolicy() crash] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:47 -07:00
Christoph Lameter	4e4785bcf0	[PATCH] mempolicies: fix policy_zone check There is a check in zonelist_policy that compares pieces of the bitmap obtained from a gfp mask via GFP_ZONETYPES with a zone number in function zonelist_policy(). The bitmap is an ORed mask of __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM. The policy_zone is a zone number with the possible values of ZONE_DMA, ZONE_DMA32, ZONE_HIGHMEM and ZONE_NORMAL. These are two different domains of values. For some reason seemed to work before the zone reduction patchset (It definitely works on SGI boxes since we just have one zone and the check cannot fail). With the zone reduction patchset this check definitely fails on systems with two zones if the system actually has memory in both zones. This is because ZONE_NORMAL is selected using no __GFP flag at all and thus gfp_zone(gfpmask) == 0. ZONE_DMA is selected when __GFP_DMA is set. __GFP_DMA is 0x01. So gfp_zone(gfpmask) == 1. policy_zone is set to ZONE_NORMAL (==1) if ZONE_NORMAL and ZONE_DMA are populated. For ZONE_NORMAL gfp_zone(<no _GFP_DMA>) yields 0 which is < policy_zone(ZONE_NORMAL) and so policy is not applied to regular memory allocations! Instead gfp_zone(__GFP_DMA) == 1 which results in policy being applied to DMA allocations! What we realy want in that place is to establish the highest allowable zone for a given gfp_mask. If the highest zone is higher or equal to the policy_zone then memory policies need to be applied. We have such a highest_zone() function in page_alloc.c. So move the highest_zone() function from mm/page_alloc.c into include/linux/gfp.h. On the way we simplify the function and use the new zone_type that was also introduced with the zone reduction patchset plus we also specify the right type for the gfp flags parameter. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:47 -07:00
Christoph Lameter	27bf71c2a7	[PATCH] reduce MAX_NR_ZONES: remove display of counters for unconfigured zones eventcounters: Do not display counters for zones that are not available on an arch Do not define or display counters for the DMA32 and the HIGHMEM zone if such zones were not configured. [akpm@osdl.org: s390 fix] [heiko.carstens@de.ibm.com: s390 fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:47 -07:00
Christoph Lameter	e53ef38d05	[PATCH] reduce MAX_NR_ZONES: make ZONE_HIGHMEM optional Make ZONE_HIGHMEM optional - ifdef out code and definitions related to CONFIG_HIGHMEM - __GFP_HIGHMEM falls back to normal allocations if there is no ZONE_HIGHMEM - GFP_ZONEMASK becomes 0x01 if there is no DMA32 and no HIGHMEM zone. [jdike@addtoit.com: build fix] Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:46 -07:00
Christoph Lameter	fb0e7942bd	[PATCH] reduce MAX_NR_ZONES: make ZONE_DMA32 optional Make ZONE_DMA32 optional - Add #ifdefs around ZONE_DMA32 specific code and definitions. - Add CONFIG_ZONE_DMA32 config option and use that for x86_64 that alone needs this zone. - Remove the use of CONFIG_DMA_IS_DMA32 and CONFIG_DMA_IS_NORMAL for ia64 and fix up the way per node ZVCs are calculated. - Fall back to prior GFP_ZONEMASK of 0x03 if there is no DMA32 zone. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:46 -07:00
Christoph Lameter	2f1b624868	[PATCH] reduce MAX_NR_ZONES: use enum to define zones, reformat and comment Use enum for zones and reformat zones dependent information Add comments explaning the use of zones and add a zones_t type for zone numbers. Line up information that will be #ifdefd by the following patches. [akpm@osdl.org: comment cleanups] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:46 -07:00
Christoph Lameter	98d2b0ebda	[PATCH] reduce MAX_NR_ZONES: page allocator ZONE_HIGHMEM cleanup page allocator ZONE_HIGHMEM fixups 1. We do not need to do an #ifdef in si_meminfo since both counters in use are zero if !CONFIG_HIGHMEM. 2. Add #ifdef in si_meminfo_node instead to avoid referencing zone information for ZONE_HIGHMEM if we do not have HIGHMEM (may not be there after the following patches). 3. Replace the use of ZONE_HIGHMEM with MAX_NR_ZONES in build_zonelists_node 4. build_zonelists_node: Remove BUG_ON for ZONE_HIGHMEM. Zone will be optional soon and thus BUG_ON cannot be triggered anymore. 5. init_free_area_core: Replace a use of ZONE_HIGHMEM with NR_MAX_ZONES. [akpm@osdl.org: cleanups] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:46 -07:00
Christoph Lameter	c1f60a5a41	[PATCH] reduce MAX_NR_ZONES: move HIGHMEM counters into highmem.c/.h Move totalhigh_pages and nr_free_highpages() into highmem.c/.h Move the totalhigh_pages definition into highmem.c/.h. Move the nr_free_highpages function into highmem.c [yoichi_yuasa@tripeaks.co.jp: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:46 -07:00
Christoph Lameter	182e8e2373	[PATCH] reduce MAX_NR_ZONES: make display of highmem counters conditional on CONFIG_HIGHMEM Do not display HIGHMEM memory sizes if CONFIG_HIGHMEM is not set. Make HIGHMEM dependent texts and make display of highmem counters optional Some texts are depending on CONFIG_HIGHMEM. Remove those strings and remove the display of highmem counter values if CONFIG_HIGHMEM is not set. [akpm@osdl.org: remove some ifdefs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:46 -07:00
Franck Bui-Huu	f71bf0cac7	[PATCH] bootmem: miscellaneous coding style fixes It fixes various coding style issues, specially when spaces are useless. For example '*' go next to the function name. Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:45 -07:00
Franck Bui-Huu	bbc7b92e33	[PATCH] bootmem: use pfn/page conversion macros It also creates get_mapsize() helper in order to make the code more readable when it calculates the boot bitmap size. Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:45 -07:00
Franck Bui-Huu	e786e86a54	[PATCH] bootmem: remove useless headers inclusions Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:45 -07:00
Franck Bui-Huu	bb0923a668	[PATCH] bootmem: limit to 80 columns width Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:45 -07:00
Franck Bui-Huu	69d49e681d	[PATCH] bootmem: mark link_bootmem() as part of the __init section Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:45 -07:00
Adrian Bunk	b221385bc4	[PATCH] mm/: make functions static This patch makes the following needlessly global functions static: - slab.c: kmem_find_general_cachep() - swap.c: __page_cache_release() - vmalloc.c: __vmalloc_node() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:45 -07:00
Peter Zijlstra	204ec841fb	[PATCH] mm: msync() cleanup With the tracking of dirty pages properly done now, msync doesn't need to scan the PTEs anymore to determine the dirty status. From: Hugh Dickins <hugh@veritas.com> In looking to do that, I made some other tidyups: can remove several #includes, and sys_msync loop termination not quite right. Most of those points are criticisms of the existing sys_msync, not of your patch. In particular, the loop termination errors were introduced in 2.6.17: I did notice this shortly before it came out, but decided I was more likely to get it wrong myself, and make matters worse if I tried to rush a last-minute fix in. And it's not terribly likely to go wrong, nor disastrous if it does go wrong (may miss reporting an unmapped area; may also fsync file of a following vma). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:45 -07:00
Peter Zijlstra	ee6a645788	[PATCH] mm: fixup do_wp_page() Wrt. the recent modifications in do_wp_page() Hugh Dickins pointed out: "I now realize it's right to the first order (normal case) and to the second order (ptrace poke), but not to the third order (ptrace poke anon page here to be COWed - perhaps can't occur without intervening mprotects)." This patch restores the old COW behaviour for anonymous pages. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Peter Zijlstra	e88dd6c11c	[PATCH] mm: small cleanup of install_page() Smallish cleanup to install_page(), could save a memory read (haven't checked the asm output) and sure looks nicer. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Peter Zijlstra	c1e6098b23	[PATCH] mm: optimize the new mprotect() code a bit mprotect() resets the page protections, which could result in extra write faults for those pages whose dirty state we track using write faults and are dirty already. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Peter Zijlstra	edc79b2a46	[PATCH] mm: balance dirty pages Now that we can detect writers of shared mappings, throttle them. Avoids OOM by surprise. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Peter Zijlstra	d08b3851da	[PATCH] mm: tracking shared dirty pages Tracking of dirty pages in shared writeable mmap()s. The idea is simple: write protect clean shared writeable pages, catch the write-fault, make writeable and set dirty. On page write-back clean all the PTE dirty bits and write protect them once again. The implementation is a tad harder, mainly because the default backing_dev_info capabilities were too loosely maintained. Hence it is not enough to test the backing_dev_info for cap_account_dirty. The current heuristic is as follows, a VMA is eligible when: - its shared writeable (vm_flags & (VM_WRITE\|VM_SHARED)) == (VM_WRITE\|VM_SHARED) - it is not a 'special' mapping (vm_flags & (VM_PFNMAP\|VM_INSERTPAGE)) == 0 - the backing_dev_info is cap_account_dirty mapping_cap_account_dirty(vma->vm_file->f_mapping) - f_op->mmap() didn't change the default page protection Page from remap_pfn_range() are explicitly excluded because their COW semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and because they don't have a backing store anyway. mprotect() is taught about the new behaviour as well. However it overrides the last condition. Cleaning the pages on write-back is done with page_mkclean() a new rmap call. It can be called on any page, but is currently only implemented for mapped pages, if the page is found the be of a VMA that accounts dirty pages it will also wrprotect the PTE. Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from under ->private_lock. This seems to be safe, since ->private_lock is used to serialize access to the buffers, not the page itself. This is needed because clear_page_dirty() will call into page_mkclean() and would thereby violate locking order. [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Nick Piggin	725d704eca	[PATCH] mm: VM_BUG_ON Introduce a VM_BUG_ON, which is turned on with CONFIG_DEBUG_VM. Use this in the lightweight, inline refcounting functions; PageLRU and PageActive checks in vmscan, because they're pretty well confined to vmscan. And in page allocate/free fastpaths which can be the hottest parts of the kernel for kbuilds. Unlike BUG_ON, VM_BUG_ON must not be used to execute statements with side-effects, and should not be used outside core mm code. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
David Rientjes	f3ef9ead31	[PATCH] do not free non slab allocated per_cpu_pageset Stops panic associated with attempting to free a non slab-allocated per_cpu_pageset. Signed-off-by: David Rientjes <rientjes@cs.washington.edu> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-25 17:38:36 -07:00
Linus Torvalds	9f261e0113	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: (74 commits) NFS: unmark NFS direct I/O as experimental NFS: add comments clarifying the use of nfs_post_op_update() NFSv4: rpc_mkpipe creating socket inodes w/out sk buffers NFS: Use SEEK_END instead of hardcoded value NFSv4: When mounting with a port=0 argument, substitute port=2049 NFSv4: Poll more aggressively when handling NFS4ERR_DELAY NFSv4: Handle the condition NFS4ERR_FILE_OPEN NFSv4: Retry lease recovery if it failed during a synchronous operation. NFS: Don't invalidate the symlink we just stuffed into the cache NFS: Make read() return an ESTALE if the file has been deleted NFSv4: It's perfectly legal for clp to be NULL here.... NFS: nfs_lookup - don't hash dentry when optimising away the lookup SUNRPC: Fix Oops in pmap_getport_done SUNRPC: Add refcounting to the struct rpc_xprt SUNRPC: Clean up soft task error handling SUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors SUNRPC: rpc_delay() should not clobber the rpc_task->tk_status Fix a referral error Oops NFS: NFS_ROOT should use the new rpc_create API NFS: Fix up compiler warnings on 64-bit platforms in client.c ... Manually resolved conflict in net/sunrpc/xprtsock.c	2006-09-23 16:58:40 -07:00
Linus Torvalds	a4c12d6c5d	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (353 commits) [IPV6] ADDRCONF: Mobile IPv6 Home Address support. [IPV6] ADDRCONF: Allow non-DAD'able addresses. [IPV6] NDISC: Fix is_router flag setting. [IPV6] ADDRCONF: Convert addrconf_lock to RCU. [IPV6] NDISC: Add proxy_ndp sysctl. [IPV6] NDISC: Set per-entry is_router flag in Proxy NA. [IPV6] NDISC: Avoid updating neighbor cache for proxied address in receiving NA. [IPV6]: Don't forward packets to proxied link-local address. [IPV6] NDISC: Handle NDP messages to proxied addresses. [NETFILTER]: PPTP conntrack: fix another GRE keymap leak [NETFILTER]: PPTP conntrack: fix GRE keymap leak [NETFILTER]: PPTP conntrack: fix PPTP_IN_CALL message types [NETFILTER]: PPTP conntrack: check call ID before changing state [NETFILTER]: PPTP conntrack: clean up debugging cruft [NETFILTER]: PPTP conntrack: consolidate header parsing [NETFILTER]: PPTP conntrack: consolidate header size checks [NETFILTER]: PPTP conntrack: simplify expectation handling [NETFILTER]: PPTP conntrack: remove unnecessary cid/pcid header pointers [NETFILTER]: PPTP conntrack: fix header definitions [NETFILTER]: PPTP conntrack: remove more dead code ...	2006-09-23 16:49:31 -07:00
Trond Myklebust	275a082fe9	Add a real API for dealing with blk_congestion_wait() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
Linus Torvalds	6585b57240	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart * master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart: [AGPGART] Rework AGPv3 modesetting fallback. [AGPGART] Add suspend callback for i965 [AGPGART] Fix number of aperture sizes in 830 gart structs. [AGPGART] Intel 965 Express support. [AGPGART] agp.h: constify struct agp_bridge_data::version [AGPGART] const'ify VIA AGP PCI table. [AGPGART] CONFIG_PM=n slim: drivers/char/agp/intel-agp.c [AGPGART] CONFIG_PM=n slim: drivers/char/agp/efficeon-agp.c [AGPGART] Const'ify the agpgart driver version. [AGPGART] remove private page protection map	2006-09-22 17:50:50 -07:00
David S. Miller	f034b5d4ef	[XFRM]: Dynamic xfrm_state hash table sizing. The grow algorithm is simple, we grow if: 1) we see a hash chain collision at insert, and 2) we haven't hit the hash size limit (currently 110241024 slots), and 3) the number of xfrm_state objects is > the current hash mask All of this needs some tweaking. Remove __initdata from "hashdist" so we can use it safely at run time. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:08:41 -07:00
Andrew Morton	016eb4a0ed	[PATCH] invalidate_complete_page() race fix If a CPU faults this page into pagetables after invalidate_mapping_pages() checked page_mapped(), invalidate_complete_page() will still proceed to remove the page from pagecache. This leaves the page-faulting process with a detached page. If it was MAP_SHARED then file data loss will ensue. Fix that up by checking the page's refcount after taking tree_lock. Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Hugh Dickins <hugh@veritas.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-08 10:22:50 -07:00
Kirill Korotaev	3a45975681	[PATCH] IA64,sparc: local DoS with corrupted ELFs This prevents cross-region mappings on IA64 and SPARC which could lead to system crash. They were correctly trapped for normal mmap() calls, but not for the kernel internal calls generated by executable loading. This code just moves the architecture-specific cross-region checks into an arch-specific "arch_mmap_check()" macro, and defines that for the architectures that needed it (ia64, sparc and sparc64). Architectures that don't have any special requirements can just ignore the new cross-region check, since the mmap() code will just notice on its own when the macro isn't defined. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> [ Cleaned up to not affect architectures that don't need it ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-08 08:40:46 -07:00
Dave Jones	115b384cf8	Merge ../linus	2006-09-05 17:20:21 -04:00

1 2 3 4 5 ...

891 Commits