Patch from Nicolas Pitre
This patch gets rid of the last C implementations of needed libgcc
functions for the kernel, replacing them with optimized assembly
versions.
Those functions are:
__ashldi3
__ashrdi3
__lshrdi3
__muldi3
__ucmpdi2
The first 3 were lifted from gcc, the other two were written from scratch.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Encapsulate pool data into dmabounce_pool. Only account successful
allocations. Use dma_mapping_error().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We know what pgprot we're going to use, so don't #define it. Also,
since we select the nonaliasing/aliasing copypage implementation at
run time, there's no point having it globally visible.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Richard Purdie
Add spitz irda platform support
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Richard Purdie
Add corgi irda platform support
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Richard Purdie
Add poodle irda platform support
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Richard Purdie
Update the PXA irda driver to match the recent platform device
suspend/resume level changes.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This fixes a bug where settimeofday would set the wrong parameters
in do_gtod, resulting in gettimeofday returning a value about 4
hours after the correct time. The bug was that we divided a
negative 64-bit value with do_div, which treated it as unsigned
and gave us a result that was approximately 1.8e10 too large
(since the divisor was 1e9).
Signed-off-by: Paul Mackerras <paulus@samba.org>
ata_pci_init_one() receives an array of struct ata_port_info. Recent
updates to the code had always obtained port information from
array element 0, rather than array element N.
Change to avoid hardcoding port_info[0], thereby restoring proper
hardware information to secondary legacy ports.
If a card doesn't support the "write block" command class then
any attempts to open the device should reflect this by denying
write access.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The second argument to ata_qc_complete() was being used for two
purposes: communicate the ATA Status register to the completion
function, and indicate an error. On legacy PCI IDE hardware, the latter
is often implicit in the former. On more modern hardware, the driver
often completely emulated a Status register value, passing ATA_ERR as an
indication that something went wrong.
Now that previous code changes have eliminated the need to use drv_stat
arg to communicate the ATA Status register value, we can convert it to a
mask of possible error classes.
This will lead to more flexible error handling in the future.
move EXPORT_SYMBOL(filemap_populate) to the proper place: just after
function itself: it's easy to miss that function is exported otherwise.
Signed-off-by: Nikita Danilov <nikita@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In 'mm' change the explicit use of a for-loop using NR_CPUS into the
general for_each_cpu() constructs. This widens the scope of potential
future optimizations of the general constructs, as well as takes advantage
of the existing optimizations of first_cpu() and next_cpu(), which is
advantageous when the true CPU count is much smaller than NR_CPUS.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Policy contextualization is only useful for task based policies and not for
vma based policies. It may be useful to define allowed nodes that are not
accessible from this thread because other threads may have access to these
nodes. Without this patch strange memory policy situations may cause an
application to fail with out of memory.
Example:
Let's say we have two threads A and B that share the same address space and
a huge array computational array X.
Thread A is restricted by its cpuset to nodes 0 and 1 and thread B is
restricted by its cpuset to nodes 2 and 3.
Thread A now wants to restrict allocations to the first node and thus
applies a BIND policy on X to node 0 and 2. The cpuset limits this to node
0. Thus pages for X must be allocated on node 0 now.
Thread B now touches a page that has never been used in X and faults in a
page. According to the BIND policy of the vma for X the page must be
allocated on page 0. However, the cpuset of B does not allow allocation on
0 and 1. Now the application fails in alloc_pages with out of memory.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Do a separation between do_xxx and sys_xxx functions. sys_xxx functions
take variable sized bitmaps from user space as arguments. do_xxx functions
take fixed sized nodemask_t as arguments and may be used from inside the
kernel. Doing so simplifies the initialization code. There is no
fs = kernel_ds assumption anymore.
- Split up get_nodes into get_nodes (which gets the node list) and
contextualize_policy which restricts the nodes to those accessible
to the task and updates cpusets.
- Add comments explaining limitations of bind policy
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a set of ppc64 specific patches that at least allow
compilation/booting with the following configurations:
FLATMEM
SPARSEMEN
SPARSEMEM + MEMORY_HOTPLUG
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds the necessary for non-NUMA hot-add of highmem to an existing zone on
i386.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
> I found the tests does not work well with Dave's patchset.
> I've found the followings:
>
> - setup_per_zone_pages_min() calls should be added in
> capture_page_range() and online_pages()
> - lru_add_drain() should be called before try_to_migrate_pages()
The following patch deals with the first item.
Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This basically keeps up from having to extern __kmalloc_section_memmap().
The vaddr_in_vmalloc_area() helper could go in a vmalloc header, but that
header gets hard to work with, because it needs some arch-specific macros.
Just stick it in here for now, instead of creating another header.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Lion Vollnhals <webmaster@schiggl.de>
Signed-off-by: Jiri Slaby <xslaby@fi.muni.cz>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds generic memory add/remove and supporting functions for memory
hotplug into a new file as well as a memory hotplug kernel config option.
Individual architecture patches will follow.
For now, disable memory hotplug when swsusp is enabled. There's a lot of
churn there right now. We'll fix it up properly once it calms down.
Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
See the "fixup bad_range()" patch for more information, but this actually
creates a the lock to protect things making assumptions about a zone's size
staying constant at runtime.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pgdat->node_size_lock is basically only neeeded in one place in the normal
code: show_mem(), which is the arch-specific sysrq-m printing function.
Strictly speaking, the architectures not doing memory hotplug do no need this
locking in show_mem(). However, they are all included for completeness. This
should also make any future consolidation of all of the implementations a
little more straightforward.
This lock is also held in the sparsemem code during a memory removal, as
sections are invalidated. This is the place there pfn_valid() is made false
for a memory area that's being removed. The lock is only required when doing
pfn_valid() operations on memory which the user does not already have a
reference on the page, such as in show_mem().
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When doing memory hotplug operations, the size of existing zones can obviously
change. This means that zone->zone_{start_pfn,spanned_pages} can change.
There are currently no locks that protect these structure members. However,
they are rarely accessed at runtime. Outside of swsusp, the only place that I
can find is bad_range().
So, split bad_range() up into two pieces: one that needs to be locked and
anther that doesn't.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A little helper that we use in the hotplug code.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a zone is empty at boot-time and then hot-added to later, it needs to run
the same init code that would have been run on it at boot.
This patch breaks out zone table and per-cpu-pages functions for use by the
hotplug code. You can almost see all of the free_area_init_core() function on
one page now. :)
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following series implements memory hot-add for ppc64 and i386. There are
x86_64 and ia64 implementations that will be submitted shortly as well,
through the normal maintainers.
This patch:
local_mapnr is unused, except for in an alpha header. Keep the alpha one,
kill the rest.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a problem on ppc64 where with more than 4 threads a large system
wouldn't scale well while faulting in the .text (most of the time was spent
in the kernel despite it was an userland compute intensive app). The
reason is the useless overwrite of the same pte from all cpu.
I fixed it this way (verified on an older kernel but the forward port is
almost identical). This will benefit all archs not just ppc64.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Basic overcommit checking for hugetlb_file_map() based on an implementation
used with demand faulting in SLES9.
Since demand faulting can't guarantee the availability of pages at mmap
time, this patch implements a basic sanity check to ensure that the number
of huge pages required to satisfy the mmap are currently available.
Despite the obvious race, I think it is a good start on doing proper
accounting. I'd like to work towards an accounting system that mimics the
semantics of normal pages (especially for the MAP_PRIVATE/COW case). That
work is underway and builds on what this patch starts.
Huge page shared memory segments are simpler and still maintain their
commit on shmget semantics.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Below is a patch to implement demand faulting for huge pages. The main
motivation for changing from prefaulting to demand faulting is so that huge
page memory areas can be allocated according to NUMA policy.
Thanks to consolidated hugetlb code, switching the behavior requires changing
only one fault handler. The bulk of the patch just moves the logic from
hugelb_prefault() to hugetlb_pte_fault() and find_get_huge_page().
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up some repeated code related to HugeTLB. hugetlb_zero_setup would
have already allocated the file->f_op.
Signed-off-by: Krishnakumar. R <rkrishnakumar@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reformat hugelbfs_forget_inode and add the missing but harmless
write_inode_now call. It looks the same as generic_forget_inode now except
for the call to truncate_hugepages instead of truncate_inode_pages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
hugetlbfs_do_delete_inode is the same as generic_delete_inode now, so remove
it in favour of the latter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make hugetlbfs looks the same as generic_detelte_inode, fixing a bunch of
missing updates to it at the same time. Rename it to
hugetlbfs_do_delete_inode and add a real hugetlbfs_delete_inode that
implements ->delete_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move hugetlbfs accounting into ->alloc_inode / ->destroy_inode. This keeps
the code simpler, fixes a loeak where a failing inode allocation wouldn't
decrement the counter and moves hugetlbfs_delete_inode and
hugetlbfs_forget_inode closer to their generic counterparts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Updated several references to page_table_lock in common code comments.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of oddities were guarded by page_table_lock, no longer properly
guarded when that is split.
The mm_counters of file_rss and anon_rss: make those an atomic_t, or an
atomic64_t if the architecture supports it, in such a case. Definitions by
courtesy of Christoph Lameter: who spent considerable effort on more scalable
ways of counting, but found insufficient benefit in practice.
And adding an mm with swap to the mmlist for swapoff: the list is well-
guarded by its own lock, but the list_empty check now has to be repeated
inside it.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.
This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)
In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.
Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.
There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In worrying over the various pte operations in different architectures, I came
across some unused functions in UML: remove mprotect_kernel_vm,
protect_vm_page and addr_pte.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's usually a good reason when a pte is examined without the lock; but it
makes me nervous when the pointer is dereferenced more than once.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>