Commit Graph

287 Commits

Author SHA1 Message Date
Jeff Dike
f8aaeacec1 [PATCH] consolidate asm/futex.h
Most of the architectures have the same asm/futex.h.  This consolidates them
into asm-generic, with the arches including it from their own asm/futex.h.

In the case of UML, this reverts the old broken futex.h and goes back to using
the same one as almost everyone else.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:39 -08:00
Ravikiran G Thirumalai
1fd73c6b67 [PATCH] Kill L1_CACHE_SHIFT_MAX
Kill L1_CACHE_SHIFT from all arches.  Since L1_CACHE_SHIFT_MAX is not used
anymore with the introduction of INTERNODE_CACHE, kill L1_CACHE_SHIFT_MAX.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:39 -08:00
Christoph Lameter
39743889aa [PATCH] Swap Migration V5: sys_migrate_pages interface
sys_migrate_pages implementation using swap based page migration

This is the original API proposed by Ray Bryant in his posts during the first
half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.

The intent of sys_migrate is to migrate memory of a process.  A process may
have migrated to another node.  Memory was allocated optimally for the prior
context.  sys_migrate_pages allows to shift the memory to the new node.

sys_migrate_pages is also useful if the processes available memory nodes have
changed through cpuset operations to manually move the processes memory.  Paul
Jackson is working on an automated mechanism that will allow an automatic
migration if the cpuset of a process is changed.  However, a user may decide
to manually control the migration.

This implementation is put into the policy layer since it uses concepts and
functions that are also needed for mbind and friends.  The patch also provides
a do_migrate_pages function that may be useful for cpusets to automatically
move memory.  sys_migrate_pages does not modify policies in contrast to Ray's
implementation.

The current code here is based on the swap based page migration capability and
thus is not able to preserve the physical layout relative to it containing
nodeset (which may be a cpuset).  When direct page migration becomes available
then the implementation needs to be changed to do a isomorphic move of pages
between different nodesets.  The current implementation simply evicts all
pages in source nodeset that are not in the target nodeset.

Patch supports ia64, i386 and x86_64.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:12:42 -08:00
Christoph Lameter
d3cb487149 [PATCH] atomic_long_t & include/asm-generic/atomic.h V2
Several counters already have the need to use 64 atomic variables on 64 bit
platforms (see mm_counter_t in sched.h).  We have to do ugly ifdefs to fall
back to 32 bit atomic on 32 bit platforms.

The VM statistics patch that I am working on will also make more extensive
use of atomic64.

This patch introduces a new type atomic_long_t by providing definitions in
asm-generic/atomic.h that works similar to the c "long" type.  Its 32 bits
on 32 bit platforms and 64 bits on 64 bit platforms.

Also cleans up the determination of the mm_counter_t in sched.h.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:29 -08:00
Andrew Morton
7756b9e4e3 [PATCH] kill last zone_reclaim() bits
Remove the last bits of Martin's ill-fated sys_set_zone_reclaim().

Cc: Martin Hicks <mort@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:26 -08:00
Badari Pulavarty
f6b3ec238d [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store
Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
given range of pages & its associated backing store.  Current
implementation supports only shmfs/tmpfs and other filesystems return
-ENOSYS.

"Some app allocates large tmpfs files, then when some task quits and some
client disconnect, some memory can be released.  However the only way to
release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli

Databases want to use this feature to drop a section of their bufferpool
(shared memory segments) - without writing back to disk/swap space.

This feature is also useful for supporting hot-plug memory on UML.

Concerns raised by Andrew Morton:

- "We have no plan for holepunching!  If we _do_ have such a plan (or
  might in the future) then what would the API look like?  I think
  sys_holepunch(fd, start, len), so we should start out with that."

- Using madvise is very weird, because people will ask "why do I need to
  mmap my file before I can stick a hole in it?"

- None of the other madvise operations call into the filesystem in this
  manner.  A broad question is: is this capability an MM operation or a
  filesytem operation?  truncate, for example, is a filesystem operation
  which sometimes has MM side-effects.  madvise is an mm operation and with
  this patch, it gains FS side-effects, only they're really, really
  significant ones."

Comments:

- Andrea suggested the fs operation too but then it's more efficient to
  have it as a mm operation with fs side effects, because they don't
  immediatly know fd and physical offset of the range.  It's possible to
  fixup in userland and to use the fs operation but it's more expensive,
  the vmas are already in the kernel and we can use them.

Short term plan &  Future Direction:

- We seem to need this interface only for shmfs/tmpfs files in the short
  term.  We have to add hooks into the filesystem for correctness and
  completeness.  This is what this patch does.

- In the future, plan is to support both fs and mmap apis also.  This
  also involves (other) filesystem specific functions to be implemented.

- Current patch doesn't support VM_NONLINEAR - which can be addressed in
  the future.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:22 -08:00
Stephen Hemminger
3821af2fe1 [FLS64]: generic version
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:06 -08:00
Ravikiran G Thirumalai
c660439ba9 [PATCH] x86_64/ia64 : Fix compilation error for node_to_first_cpu
Fixes a compiler error in node_to_first_cpu, __ffs expects unsigned long as
a parameter; instead cpumask_t was being passed.  The macro
node_to_first_cpu was not yet used in x86_64 and ia64 arches, and so we never
hit this.  This patch replaces __ffs with first_cpu macro, similar to other
arches.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-24 12:30:22 -08:00
John Hawkes
f5899b5d4f [IA64] disable preemption in udelay()
The udelay() inline for ia64 uses the ITC.  If CONFIG_PREEMPT is enabled
and the platform has unsynchronized ITCs and the calling task migrates
to another CPU while doing the udelay loop, then the effective delay may
be too short or very, very long.

This patch disables preemption around 100 usec chunks of the overall
desired udelay time.  This minimizes preemption-holdoffs.

udelay() is now too big to be inline, move it out of line and export it.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:00:24 -08:00
Tony Luck
eed66cfcbb [IA64] Split 16-bit severity field in sal_log_record_header
ERR_SEVERITY item is defined as a 8 bits item in SAL documentation
($B.2.1 rev december 2003), but as an u16 in sal.h.
This has the side effect that current code in mca.c may not call
ia64_sal_clear_state_info() upon receiving corrected platform errors
if there are bits set in the validation byte.  Reported by Xavier Bru.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-13 10:41:49 -08:00
Keith Owens
bf7ececa9b [IA64] Define an ia64 version of __raw_read_trylock
IA64 is using the generic version of __raw_read_trylock, which always
waits for the lock to be free instead of returning when the lock is in
use.  Define an ia64 version of __raw_read_trylock which behaves
correctly, and drop the generic one.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-12 08:54:18 -08:00
Christoph Lameter
f64fa6772a [IA64] Fix missing parameter for local_add/sub
Local add/sub macros need to have a parameter to specify
the addend/subtrahend respectively.

Signed-off-by: Christoph Lameter <clameter@sgi.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-07 11:30:11 -08:00
Robin Holt
bd1d6e2451 [IA64] Change SET_PERSONALITY to comply with comment in binfmt_elf.c.
We have a customer application which trips a bug.  The problem arises
when a driver attempts to call do_munmap on an area which is mapped, but
because current->thread.task_size has been set to 0xC0000000, the call
to do_munmap fails thinking it is an unmap beyond the user's address
space.

The comment in fs/binfmt_elf.c in load_elf_library() before the call
to SET_PERSONALITY() indicates that task_size must not be changed for
the running application until flush_thread, but is for ia64 executing
ia32 binaries.

This patch moves the setting of task_size from SET_PERSONALITY() to
flush_thread() as indicated.  The customer application no longer is able
to trip the bug.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-06 09:12:34 -08:00
John Keller
3ec829b689 [IA64-SGI] altix: pci_window fixup
Altix only patch to add fixup code that sets up
pci_controller->window. This code is a temporary
fix until ACPI support on Altix is added.

Also, corrects the usage of pci_dev->sysdata,
which had previously been used to reference
platform specific device info, to now point to
a pci_controller struct.

Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-06 09:09:23 -08:00
Dean Roe
b77dae5293 [IA64] - Make pfn_valid more precise for SGI Altix systems
A single SGI Altix system can be divided into multiple partitions,
each running their own instance of the Linux kernel.  pfn_valid()
is currently not optimal for any but the first partition, since it
does not compare the pfn with min_low_pfn before calling the more
costly ia64_pfn_valid().

Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-29 09:24:10 -08:00
Jack Steiner
771388dc7d [IA64-SGI] support for older versions of PROM
Add support for old versions of the SN PROMs. Eventually this
support will be deleted but it is useful right now to continue
supporting older PROMs.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-21 14:17:28 -08:00
Mark Maule
48b1dcc5d8 [IA64] altix: fix copyright in tioce .h files
Fix up copyright in tioce header files

Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-18 13:06:03 -08:00
Nick Piggin
8426e1f6af [PATCH] atomic: inc_not_zero
Introduce an atomic_inc_not_zero operation.  Make this a special case of
atomic_add_unless because lockless pagecache actually wants
atomic_inc_not_negativeone due to its offset refcount.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13 18:14:16 -08:00
Nick Piggin
4a6dae6d38 [PATCH] atomic: cmpxchg
Introduce an atomic_cmpxchg operation.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13 18:14:16 -08:00
Robin Holt
837cd0bdf5 [IA64] 4-level page tables
This patch introduces 4-level page tables to ia64.  I have run
some benchmarks and found nothing interesting.  Performance has
consistently fallen within the noise range.

It also introduces a config option (setting the default to 3
levels).  The config option prevents having 4 level page
tables with 64k base page size.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-11 09:37:29 -08:00
Ashok Raj
b4033c1715 [PATCH] PCI: Change MSI to use physical delivery mode always
MSI hardcoded delivery mode to use logical delivery mode. Recently
x86_64 moved to use physical mode addressing to support physflat mode.
With this mode enabled noticed that my eth with MSI werent working.

msi_address_init()  was hardcoded to use logical mode for i386 and x86_64.
So when we switch to use physical mode, things stopped working.

Since anyway we dont use lowest priority delivery with MSI, its always
directed to just a single CPU. Its safe  and simpler to use
physical mode always, even when we use logical delivery mode for IPI's
or other ioapic RTE's.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-11-10 16:09:18 -08:00
Tony Luck
7669a22592 Pull context-bitmap into release branch 2005-11-10 10:39:49 -08:00
Tony Luck
cb8a55e4cd Pull extend-notify-die into release branch 2005-11-10 10:39:09 -08:00
Keith Owens
9138d581b0 [IA64] Extend notify_die() hooks for IA64
notify_die() added for MCA_{MONARCH,SLAVE,RENDEZVOUS}_{ENTER,PROCESS,LEAVE} and
INIT_{MONARCH,SLAVE}_{ENTER,PROCESS,LEAVE}.  We need multiple
notification points for these events because they can take many seconds
to run which has nasty effects on the behaviour of the rest of the
system.

DIE_SS replaced by a generic DIE_FAULT which checks the vector number,
to allow interception of faults other than SS.

DIE_MACHINE_{HALT,RESTART} added to allow last minute close down
processing, especially when the halt/restart routines are called from
error handlers.

DIE_OOPS added.

The check for kprobe's break numbers has been moved from traps.c to
kprobes.c, allowing DIE_BREAK to be used for any additional break
numbers, i.e. it is no longer kprobes specific.

Hooks for kernel debuggers and kernel dumpers added, ENTER and LEAVE.
Both of these disable the system for long periods which impact on
watchdogs and heartbeat systems in general.  More patches to come that
use these events to reset watchdogs and heartbeats.

unregister_die_notifier() added and both routines exported.  Requested
by Dean Nelson.

Lock removed from {un,}register_die_notifier.  notifier_chain_register()
already takes a lock.  Also the generic notifier chain locking is being
reworked to distinguish between callbacks that can block and those that
cannot, the lock in {un,}register_die_notifier would interfere with
that change.  http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

Leading white space removed from arch/ia64/kernel/kprobes.c.

Typo in mca.c in original version of this patch found & fixed by Dean
Nelson.

Signed-off-by: Keith Owens <kaos@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Acked-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-07 11:27:13 -08:00
Tony Luck
0ad3a96f8a Auto-update from upstream 2005-11-07 09:05:22 -08:00
Ananth N Mavinakayanahalli
8a5c4dc5e5 [PATCH] Kprobes: Track kprobe on a per_cpu basis - ia64 changes
IA64 changes to track kprobe execution on a per-cpu basis.  We now track the
kprobe state machine independently on each cpu using an arch specific kprobe
control block.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:45 -08:00
Christoph Hellwig
481bed4542 [PATCH] consolidate sys_ptrace()
The sys_ptrace boilerplate code (everything outside the big switch
statement for the arch-specific requests) is shared by most architectures.
This patch moves it to kernel/ptrace.c and leaves the arch-specific code as
arch_ptrace.

Some architectures have a too different ptrace so we have to exclude them.
They continue to keep their implementations.  For sh64 I had to add a
sh64_ptrace wrapper because it does some initialization on the first call.
For um I removed an ifdefed SUBARCH_PTRACE_SPECIAL block, but
SUBARCH_PTRACE_SPECIAL isn't defined anywhere in the tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-By: David Howells <dhowells@redhat.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:42 -08:00
Tim Schmielau
8c65b4a604 [PATCH] fix remaining missing includes
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch.  This should now allow not to include sched.h
from module.h, which is done by a followup patch.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:41 -08:00
John W. Linville
e1531b4218 [PATCH] ia64: re-implement dma_get_cache_alignment to avoid EXPORT_SYMBOL
The current ia64 implementation of dma_get_cache_alignment does not work
for modules because it relies on a symbol which is not exported.  Direct
access to a global is a little ugly anyway, so this patch re-implements
dma_get_cache_alignment in a manner similar to what is currently used for
x86_64.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:23 -08:00
Chen, Kenneth W
58cd908299 [IA64] make mmu_context.h and tlb.c 80-column friendly
wrap_mmu_context(), delayed_tlb_flush(), get_mmu_context() all
have an extra { } block which cause one extra indentation.
get_mmu_context() is particularly bad with 5 indentations to
the most inner "if".  It finally gets on my nerve that I can't
keep the code within 80 columns.  Remove the extra { } block
and while I'm at it, reformat all the comments to 80-column
friendly.  No functional change at all with this patch.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-03 14:43:50 -08:00
Peter Keilty
dcc17d1bae [IA64] Use bitmaps for efficient context allocation/free
Corrects the very inefficent method of finding free context_ids in
get_mmu_context().  Instead of walking the task_list of all processes,
2 bitmaps are used to efficently store and lookup state, inuse and
needs flushing. The entire rid address space is now used before calling
wrap_mmu_context and global tlb flushing.

Special thanks to Ken and Rohit for their review and modifications in
using a bit flushmap.

Signed-off-by: Peter Keilty <peter.keilty@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-10-31 14:36:05 -08:00
Bob Picco
631bb0e74e [IA64] Recent SPARSEMEM and DISCONTIG changes break some builds
My only objection to pfn_to_kaddr, which was introduced for HotPlug memory,
is that all arches have an identical implementation. I haven't had a chance
to pursue why yet.  There is probably some arch issue I'm unaware of.

Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-10-31 11:33:53 -08:00
Arthur Othieno
727a53bd53 [PATCH] semaphore: Remove __MUTEX_INITIALIZER()
__MUTEX_INITIALIZER() has no users, and equates to the more commonly used
DECLARE_MUTEX(), thus making it pretty much redundant.  Remove it for good.

Signed-off-by: Arthur Othieno <a.othieno@bluewin.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:27 -08:00
Tejun Heo
1426d7a81d [PATCH] vm: remove unused/broken page_pte[_prot] macros
This patch removes page_pte_prot and page_pte macros from all
architectures.  Some architectures define both, some only page_pte (broken)
and others none.  These macros are not used anywhere.

page_pte_prot(page, prot) is identical to mk_pte(page, prot) and
page_pte(page) is identical to page_pte_prot(page, __pgprot(0)).

* The following architectures define both page_pte_prot and page_pte

  arm, arm26, ia64, sh64, sparc, sparc64

* The following architectures define only page_pte (broken)

  frv, i386, m32r, mips, sh, x86-64

* All other architectures define neither

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:22 -08:00
Christoph Hellwig
dfb7dac3af [PATCH] unify sys_ptrace prototype
Make sure we always return, as all syscalls should.  Also move the common
prototype to <linux/syscalls.h>

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:20 -08:00
Hugh Dickins
fc2acab31b [PATCH] mm: tlb_finish_mmu forget rss
zap_pte_range has been counting the pages it frees in tlb->freed, then
tlb_finish_mmu has used that to update the mm's rss.  That got stranger when I
added anon_rss, yet updated it by a different route; and stranger when rss and
anon_rss became mm_counters with special access macros.  And it would no
longer be viable if we're relying on page_table_lock to stabilize the
mm_counter, but calling tlb_finish_mmu outside that lock.

Remove the mmu_gather's freed field, let tlb_finish_mmu stick to its own
business, just decrement the rss mm_counter in zap_pte_range (yes, there was
some point to batching the update, and a subsequent patch restores that).  And
forget the anal paranoia of first reading the counter to avoid going negative
- if rss does go negative, just fix that bug.

Remove the mmu_gather's flushes and avoided_flushes from arm and arm26: no use
was being made of them.  But arm26 alone was actually using the freed, in the
way some others use need_flush: give it a need_flush.  arm26 seems to prefer
spaces to tabs here: respect that.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:37 -07:00
Hugh Dickins
4d6ddfa924 [PATCH] mm: tlb_is_full_mm was obscure
tlb_is_full_mm?  What does that mean?  The TLB is full?  No, it means that the
mm's last user has gone and the whole mm is being torn down.  And it's an
inline function because sparc64 uses a different (slightly better)
"tlb_frozen" name for the flag others call "fullmm".

And now the ptep_get_and_clear_full macro used in zap_pte_range refers
directly to tlb->fullmm, which would be wrong for sparc64.  Rather than
correct that, I'd prefer to scrap tlb_is_full_mm altogether, and change
sparc64 to just use the same poor name as everyone else - is that okay?

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:37 -07:00
Hugh Dickins
15a23ffa2f [PATCH] mm: tlb_gather_mmu get_cpu_var
tlb_gather_mmu dates from before kernel preemption was allowed, and uses
smp_processor_id or __get_cpu_var to find its per-cpu mmu_gather.  That works
because it's currently only called after getting page_table_lock, which is not
dropped until after the matching tlb_finish_mmu.  But don't rely on that, it
will soon change: now disable preemption internally by proper get_cpu_var in
tlb_gather_mmu, put_cpu_var in tlb_finish_mmu.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:37 -07:00
Rik Van Riel
eb92f4ef32 [PATCH] add sem_is_read/write_locked()
Add sem_is_read/write_locked functions to the read/write semaphores, along the
same lines of the *_is_locked spinlock functions.  The swap token tuning patch
uses sem_is_read_locked; sem_is_write_locked is added for completeness.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:35 -07:00
Linus Torvalds
8a212ab6b8 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2005-10-28 21:09:26 -07:00
Tony Luck
8496f2a451 Pull fix-slow-tlb-purge into release branch 2005-10-28 15:27:36 -07:00
Tony Luck
fac84ef267 Pull xpc-disengage into release branch 2005-10-28 15:27:03 -07:00
Tony Luck
c87ff94333 Pull sparsemem-v5 into release branch 2005-10-28 14:32:56 -07:00
Tony Luck
556902cd2d Pull remove-sn-bist-lock into release branch 2005-10-28 14:32:44 -07:00
Tony Luck
5833f1420b Pull new-efi-memmap into release branch 2005-10-28 14:32:30 -07:00
Tony Luck
a1e78db3f5 Pull define-node-cleanup into release branch 2005-10-28 13:24:06 -07:00
Tony Luck
fbbb0bd1f6 Pull sn_pci_legacy_read-write into release branch 2005-10-28 13:23:50 -07:00
Tony Luck
c85749e6d1 Pull hp-machvec into release branch 2005-10-28 11:15:25 -07:00
Tony Luck
0d9136fdbc Pull altix-mmr into release branch 2005-10-28 11:15:08 -07:00
Tony Luck
9189674026 Pull altix-fpga-reset into release branch 2005-10-28 11:14:47 -07:00
Al Viro
06a544971f [PATCH] gfp_t: dma-mapping (ia64)
... and related annotations for amd64 - swiotlb code is shared, but
prototypes are not.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:47 -07:00
Dean Roe
c1902aae32 [IA64] - Avoid slow TLB purges on SGI Altix systems
flush_tlb_all() can be a scaling issue on large SGI Altix systems
since it uses the global call_lock and always executes on all cpus.
When a process enters flush_tlb_range() to purge TLBs for another
process, it is possible to avoid flush_tlb_all() and instead allow
sn2_global_tlb_purge() to purge TLBs only where necessary.

This patch modifies flush_tlb_range() so that this case can be handled
by platform TLB purge functions and updates ia64_global_tlb_purge()
accordingly.  sn2_global_tlb_purge() now calculates the region register
value from the mm argument introduced with this patch.

Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-10-27 14:44:58 -07:00
Dean Nelson
e54af724c1 [IA64-SGI] fixes for XPC disengage and open/close protocol
This patch addresses a few issues with the open/close protocol that
were revealed by the newly added disengage functionality combined
with more extensive testing.

Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-10-25 16:27:37 -07:00
Bob Picco
1be7d9935b [PATCH] V5 ia64 SPARSEMEM - conditional changes for SPARSEMEM
This patch introduces the conditional changes required for the three
memory models.  With [patch 1/4] there are three memory models; FLATMEM,
DISCONTIG and SPARSEMEM.  Also a new arch include file sparemem.h is
introduced for defining SPARSEMEM parameters.

Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-10-04 13:21:13 -07:00
Dean Roe
3673555457 [IA64-SGI] Remove references to the SN bist_lock
Remove all references to the bist_lock in the SN code as it
is not used for anything.

Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-10-04 09:28:00 -07:00
Al Viro
0cc13a5442 [PATCH] ia64 basic __user annotations
- document places where we pass kernel address to low-level primitive
   that deals with kernel/user addresses
 - uintptr_t is unsigned long, not long

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-29 08:46:27 -07:00
Jack Steiner
59c422358d [IA64-SGI] Increase max system size of SGI SN systems
Increase the maximum system size of SGI SN systems. Note that
this is not the maximum SSI size. The maximum system size is
the number of nodes in the numalink domain.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-28 14:32:57 -07:00
Mark Maule
61b9cf7c6c [IA64-SGI] fix sn_pci_legacy_read/fix sn_pci_legacy_write
This patch adds a #define for SN_SAL_IOIF_PCI_SAFE and makes that the
preferred method of implementing sn_pci_legacy_read() and
sn_pci_legacy_write().

This SAL call has been present in SGI proms since version 4.10.  If the
SN_SAL_IOIF_PCI_SAFE call fails, revert to the previous code for compatability
with older proms.

Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-23 11:21:59 -07:00
Keith Owens
20bb86852a [IA64] Wire in the MCA/INIT handler stacks
Wire the MCA/INIT handler stacks into DTR[2] and track them in
IA64_KR(CURRENT_STACK).  This gives the MCA/INIT handler stacks the
same TLB status as normal kernel stacks.  Reload the old CURRENT_STACK
data on return from OS to SAL.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-22 13:24:19 -07:00
Paolo 'Blaisorblade' Giarrusso
676067cfea [PATCH] Remove unused var from asm/futex.h
As recently done by Russell King for ARM, commit
4732efbeb9 introduces a generic asm/futex.h copied
along most arches, which includes a "-ENOSYS support" to be changed if needed.
However, it includes an unused var (taken from the "real" version) which GCC
warns about.

Remove it from all arches having that file version (i.e. same GIT id).
$ git-diff-tree -r HEAD
and
$ git-ls-tree  -r HEAD include/|grep 9feff4ce14

may be more interesting than looking at the patch itself, to make sure I've
just copied the arm header to all other archs having the original dummy version
of this file.

Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21 16:16:29 -07:00
Jack Steiner
24ee0a6d7b [IA64] Cleanup use of various #defines related to nodes
Some of the SN code & #defines related to compact nodes & IO discovery
have gotten stale over the years. This patch attempts to clean them up.
Some of the various SN MAX_xxx #defines were also unclear & misused.

The primary changes are:

	- use MAX_NUMNODES. This is the generic linux #define for the number
	  of nodes that are known to the generic kernel. Arrays & loops
	  for constructs that are 1:1 with linux-defined nodes should
	  use the linux #define - not an SN equivalent.

	- use MAX_COMPACT_NODES for MAX_NUMNODES + NUM_TIOS. This is the
	  number of nodes in the SSI system. Compact nodes are a hack to
	  get around the IA64 architectural limit of 256 nodes. Large SGI
	  systems have more than 256 nodes. When we upgrade to ACPI3.0,
	  I _hope_ that all nodes will be real nodes that are known to
	  the generic kernel. That will allow us to delete the notion
	  of "compact nodes".

	- add MAX_NUMALINK_NODES for the total number of nodes that
	  are in the numalink domain - all partitions.

	- simplified (understandable) scan_for_ionodes()

	- small amount of cleanup related to cnodes

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-15 16:31:12 -07:00
Alex Williamson
0b9afede3d [IA64] more robust zx1/sx1000 machvec support
Machine vector selection has always been a bit of a hack given how
early in system boot it needs to be done.  Services like ACPI namespace
are not available and there are non-trivial problems to moving them to
early boot.  However, there's no reason we can't change to a different
machvec later in boot when the services we need are available.  By
adding a entry point for later initialization of the swiotlb, we can add
an error path for the hpzx1 machevec initialization and fall back to the
DIG machine vector if IOMMU hardware isn't found in the system.  Since
ia64 uses 4GB for zone DMA (no ISA support), it's trivial to allocate a
contiguous range from the slab for bounce buffer usage.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-14 16:22:11 -07:00
Tony Luck
deb75f3c29 Pull fix-offsets-h into release branch 2005-09-14 14:14:45 -07:00
Tony Luck
82f1b07b9a [IA64] fix circular dependency on generation of asm-offsets.h
Fix?  One ugly hack is replaced by a different ugly hack.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-13 08:50:39 -07:00
Randy Dunlap
33bf56106d [PATCH] feature removal of io_remap_page_range()
As written in Documentation/feature-removal-schedule.txt, remove the
io_remap_page_range() kernel API.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:33 -07:00
Tony Luck
d67eb16f5d Pull sn-features into release branch 2005-09-11 14:34:23 -07:00
Keith Owens
49a28cc8fd [IA64] MCA/INIT: remove obsolete unwind code
Delete the special case unwind code that was only used by the old
MCA/INIT handler.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-11 14:09:34 -07:00
Keith Owens
7f613c7d22 [PATCH] MCA/INIT: use per cpu stacks
The bulk of the change.  Use per cpu MCA/INIT stacks.  Change the SAL
to OS state (sos) to be per process.  Do all the assembler work on the
MCA/INIT stacks, leaving the original stack alone.  Pass per cpu state
data to the C handlers for MCA and INIT, which also means changing the
mca_drv interfaces slightly.  Lots of verification on whether the
original stack is usable before converting it to a sleeping process.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-11 14:08:41 -07:00
Keith Owens
e619ae0b96 [IA64] MCA/INIT: add an extra thread_info flag
Add an extra thread_info flag to indicate the special MCA/INIT stacks.
Mainly for debuggers.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-11 14:02:10 -07:00
Ingo Molnar
fb1c8f93d8 [PATCH] spinlock consolidation
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code.  It does the following
things:

 - consolidates and enhances the spinlock/rwlock debugging code

 - simplifies the asm/spinlock.h files

 - encapsulates the raw spinlock type and moves generic spinlock
   features (such as ->break_lock) into the generic code.

 - cleans up the spinlock code hierarchy to get rid of the spaghetti.

Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c.  (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)

Also, i've enhanced the rwlock debugging facility, it will now track
write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.

The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:

 include/asm-i386/spinlock_types.h       |   16
 include/asm-x86_64/spinlock_types.h     |   16

I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:

   SMP                         |  UP
   ----------------------------|-----------------------------------
   asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
   linux/spinlock_types.h      |  linux/spinlock_types.h
   asm/spinlock_smp.h          |  linux/spinlock_up.h
   linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
   linux/spinlock.h            |  linux/spinlock.h

/*
 * here's the role of the various spinlock/rwlock related include files:
 *
 * on SMP builds:
 *
 *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
 *                        initializers
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
 *                        implementations, mostly inline assembly code
 *
 *   (also included on UP-debug builds:)
 *
 *  linux/spinlock_api_smp.h:
 *                        contains the prototypes for the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 *
 * on UP builds:
 *
 *  linux/spinlock_type_up.h:
 *                        contains the generic, simplified UP spinlock type.
 *                        (which is an empty structure on non-debug builds)
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  linux/spinlock_up.h:
 *                        contains the __raw_spin_*()/etc. version of UP
 *                        builds. (which are NOPs on non-debug, non-preempt
 *                        builds)
 *
 *   (included on UP-non-debug builds:)
 *
 *  linux/spinlock_api_up.h:
 *                        builds the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 */

All SMP and UP architectures are converted by this patch.

arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.

From: Grant Grundler <grundler@parisc-linux.org>

  Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
  Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
  non-SMP kernels.  That should be trivial to fix up later if necessary.

  I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
  some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
  are well tested and contained entirely inside arch specific code.  I do NOT
  expect any new issues to arise with them.

 If someone does ever need to use debug/metrics with them, then they will
  need to unravel this hairball between spinlocks, atomic ops, and bit ops
  that exist only because parisc has exactly one atomic instruction: LDCW
  (load and clear word).

From: "Luck, Tony" <tony.luck@intel.com>

   ia64 fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjanv@infradead.org>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:21 -07:00
Linus Torvalds
486a153f0e Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild 2005-09-09 15:46:49 -07:00
Chen, Kenneth W
383f2835eb [PATCH] Prefetch kernel stacks to speed up context switch
For architecture like ia64, the switch stack structure is fairly large
(currently 528 bytes).  For context switch intensive application, we found
that significant amount of cache misses occurs in switch_to() function.
The following patch adds a hook in the schedule() function to prefetch
switch stack structure as soon as 'next' task is determined.  This allows
maximum overlap in prefetch cache lines for that structure.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:31 -07:00
Sam Ravnborg
0013a85454 kbuild: m68k,parisc,ppc,ppc64,s390,xtensa use generic asm-offsets.h support
Delete obsoleted parts form arch makefiles and rename to asm-offsets.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-09-09 20:57:26 +02:00
Linus Torvalds
4fe8a0f4c5 Merge branch 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6 2005-09-08 17:26:52 -07:00
David S. Miller
085ae41f66 [PATCH] Make sparc64 use setup-res.c
There were three changes necessary in order to allow
sparc64 to use setup-res.c:

1) Sparc64 roots the PCI I/O and MEM address space using
   parent resources contained in the PCI controller structure.
   I'm actually surprised no other platforms do this, especially
   ones like Alpha and PPC{,64}.  These resources get linked into the
   iomem/ioport tree when PCI controllers are probed.

   So the hierarchy looks like this:

   iomem --|
	   PCI controller 1 MEM space --|
				        device 1
					device 2
					etc.
	   PCI controller 2 MEM space --|
				        ...
   ioport --|
            PCI controller 1 IO space --|
					...
            PCI controller 2 IO space --|
					...

   You get the idea.  The drivers/pci/setup-res.c code allocates
   using plain iomem_space and ioport_space as the root, so that
   wouldn't work with the above setup.

   So I added a pcibios_select_root() that is used to handle this.
   It uses the PCI controller struct's io_space and mem_space on
   sparc64, and io{port,mem}_resource on every other platform to
   keep current behavior.

2) quirk_io_region() is buggy.  It takes in raw BUS view addresses
   and tries to use them as a PCI resource.

   pci_claim_resource() expects the resource to be fully formed when
   it gets called.  The sparc64 implementation would do the translation
   but that's absolutely wrong, because if the same resource gets
   released then re-claimed we'll adjust things twice.

   So I fixed up quirk_io_region() to do the proper pcibios_bus_to_resource()
   conversion before passing it on to pci_claim_resource().

3) I was mistakedly __init'ing the function methods the PCI controller
   drivers provide on sparc64 to implement some parts of these
   routines.  This was, of course, easy to fix.

So we end up with the following, and that nasty SPARC64 makefile
ifdef in drivers/pci/Makefile is finally zapped.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-09-08 14:57:25 -07:00
Tony Luck
344a076110 [IA64] Manual merge fix for 3 files
arch/ia64/Kconfig
	arch/ia64/kernel/acpi.c
	include/asm-ia64/irq.h

Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-08 14:27:13 -07:00
Tony Luck
d8c97d5f3a [IA64] simplified efi memory map parsing
New version leaves the original memory map unmodified.
Also saves any granule trimmings for use by the uncached
memory allocator.

Inspired by Khalid Aziz (various traces of his patch still
remain).  Fixes to uncached_build_memmap() and sn2 testing
by Martin Hicks.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-08 12:39:59 -07:00
Len Brown
64e47488c9 Merge linux-2.6 with linux-acpi-2.6 2005-09-08 01:45:47 -04:00
Keshavamurthy Anil S
deac66ae45 [PATCH] kprobes: fix bug when probed on task and isr functions
This patch fixes a race condition where in system used to hang or sometime
crash within minutes when kprobes are inserted on ISR routine and a task
routine.

The fix has been stress tested on i386, ia64, pp64 and on x86_64.  To
reproduce the problem insert kprobes on schedule() and do_IRQ() functions
and you should see hang or system crash.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:58:01 -07:00
John Hawkes
9c1cfda20a [PATCH] cpusets: Move the ia64 domain setup code to the generic code
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:40 -07:00
Stephen Rothwell
5ac353f9ba [PATCH] Clean up struct flock definitions
This patch just gathers together all the struct flock definitions except
xtensa into asm-generic/fcntl.h.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell
1abf62afb6 [PATCH] Clean up the fcntl operations
This patch puts the most popular of each fcntl operation/flag into
asm-generic/fcntl.h and cleans up the arch files.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell
e64ca97fd8 [PATCH] Clean up the open flags
This patch puts the most popular of each open flag into asm-generic/fcntl.h
and cleans up the arch files.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell
9317259ead [PATCH] Create asm-generic/fcntl.h
This set of patches creates asm-generic/fcntl.h and consolidates as much as
possible from the asm-*/fcntl.h files into it.

This patch just gathers all the identical bits of the asm-*/fcntl.h files into
asm-generic/fcntl.h.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:37 -07:00
Jesper Juhl
97de50c0ad [PATCH] remove verify_area(): remove verify_area() from various uaccess.h headers
Remove the deprecated (and unused) verify_area() from various uaccess.h
headers.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:35 -07:00
Christoph Hellwig
c8d127418d [PATCH] remove asm-*/hdreg.h
unused and useless..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:30 -07:00
Karsten Wiese
f26fdd5992 [PATCH] CHECK_IRQ_PER_CPU() to avoid dead code in __do_IRQ()
IRQ_PER_CPU is not used by all architectures.  This patch introduces the
macros ARCH_HAS_IRQ_PER_CPU and CHECK_IRQ_PER_CPU() to avoid the generation
of dead code in __do_IRQ().

ARCH_HAS_IRQ_PER_CPU is defined by architectures using IRQ_PER_CPU in their
include/asm_ARCH/irq.h file.

Through grepping the tree I found the following architectures currently use
IRQ_PER_CPU:

        cris, ia64, ppc, ppc64 and parisc.

Signed-off-by: Karsten Wiese <annabellesgarden@yahoo.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:29 -07:00
H. J. Lu
36d57ac4a8 [PATCH] auxiliary vector cleanups
The size of auxiliary vector is fixed at 42 in linux/sched.h.  But it isn't
very obvious when looking at linux/elf.h.  This patch adds AT_VECTOR_SIZE
so that we can change it if necessary when a new vector is added.

Because of include file ordering problems, doing this necessitated the
extraction of the AT_* symbols into a standalone header file.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:21 -07:00
Stephen Rothwell
202e5979af [PATCH] compat: be more consistent about [ug]id_t
When I first wrote the compat layer patches, I was somewhat cavalier about
the definition of compat_uid_t and compat_gid_t (or maybe I just
misunderstood :-)).  This patch makes the compat types much more consistent
with the types we are being compatible with and hopefully will fix a few
bugs along the way.

	compat type		type in compat arch
	__compat_[ug]id_t	__kernel_[ug]id_t
	__compat_[ug]id32_t	__kernel_[ug]id32_t
	compat_[ug]id_t		[ug]id_t

The difference is that compat_uid_t is always 32 bits (for the archs we
care about) but __compat_uid_t may be 16 bits on some.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:19 -07:00
Jakub Jelinek
4732efbeb9 [PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup
ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter
(which at least on UP usually means an immediate context switch to one of
the waiter threads).  This waiter wakes up and after a few instructions it
attempts to acquire the cv internal lock, but that lock is still held by
the thread calling pthread_cond_signal.  So it goes to sleep and eventually
the signalling thread is scheduled in, unlocks the internal lock and wakes
the waiter again.

Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
to avoid this performance issue, but it was removed when locks were
redesigned to the 3 state scheme (unlocked, locked uncontended, locked
contended).

Following scenario shows why simply using FUTEX_REQUEUE in
pthread_cond_signal together with using lll_mutex_unlock_force in place of
lll_mutex_unlock is not enough and probably why it has been disabled at
that time:

The number is value in cv->__data.__lock.
        thr1            thr2            thr3
0       pthread_cond_wait
1       lll_mutex_lock (cv->__data.__lock)
0       lll_mutex_unlock (cv->__data.__lock)
0       lll_futex_wait (&cv->__data.__futex, futexval)
0                       pthread_cond_signal
1                       lll_mutex_lock (cv->__data.__lock)
1                                       pthread_cond_signal
2                                       lll_mutex_lock (cv->__data.__lock)
2                                         lll_futex_wait (&cv->__data.__lock, 2)
2                       lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock)
                          # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
2                       lll_mutex_unlock_force (cv->__data.__lock)
0                         cv->__data.__lock = 0
0                         lll_futex_wake (&cv->__data.__lock, 1)
1       lll_mutex_lock (cv->__data.__lock)
0       lll_mutex_unlock (cv->__data.__lock)
          # Here, lll_mutex_unlock doesn't know there are threads waiting
          # on the internal cv's lock

Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
but it will cost us not one, but 2 extra syscalls and, what's worse, one of
these extra syscalls will be done for every single waiting loop in
pthread_cond_*wait.

We would need to use lll_mutex_unlock_force in pthread_cond_signal after
requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait.

Another alternative is to do the unlocking pthread_cond_signal needs to do
(the lock can't be unlocked before lll_futex_wake, as that is racy) in the
kernel.

I have implemented both variants, futex-requeue-glibc.patch is the first
one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel.
 The kernel interface allows userland to specify how exactly an unlocking
operation should look like (some atomic arithmetic operation with optional
constant argument and comparison of the previous futex value with another
constant).

It has been implemented just for ppc*, x86_64 and i?86, for other
architectures I'm including just a stub header which can be used as a
starting point by maintainers to write support for their arches and ATM
will just return -ENOSYS for FUTEX_WAKE_OP.  The requeue patch has been
(lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running
32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL.

With the following benchmark on UP x86-64 I get:

for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \
for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done
time elf/ld.so --library-path .:nptl-orig /tmp/bench
real 0m0.655s user 0m0.253s sys 0m0.403s
real 0m0.657s user 0m0.269s sys 0m0.388s
time elf/ld.so --library-path .:nptl-requeue /tmp/bench
real 0m0.496s user 0m0.225s sys 0m0.271s
real 0m0.531s user 0m0.242s sys 0m0.288s
time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
real 0m0.380s user 0m0.176s sys 0m0.204s
real 0m0.382s user 0m0.175s sys 0m0.207s

The benchmark is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt
Older futex-requeue-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt
Older futex-wake_op-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt
Will post a new version (just x86-64 fixes so that the patch
applies against pthread_cond_signal.S) to libc-hacker ml soon.

Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
testcase that will not test the atomicity of the operation, but at least
check if the threads that should have been woken up are woken up and
whether the arithmetic operation in the kernel gave the expected results.

Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:17 -07:00
Ashok Raj
54d5d42404 [PATCH] x86/x86_64: deferred handling of writes to /proc/irqxx/smp_affinity
When handling writes to /proc/irq, current code is re-programming rte
entries directly. This is not recommended and could potentially cause
chipset's to lockup, or cause missing interrupts.

CONFIG_IRQ_BALANCE does this correctly, where it re-programs only when the
interrupt is pending. The same needs to be done for /proc/irq handling as well.
Otherwise user space irq balancers are really not doing the right thing.

- Changed pending_irq_balance_cpumask to pending_irq_migrate_cpumask for
  lack of a generic name.
- added move_irq out of IRQ_BALANCE, and added this same to X86_64
- Added new proc handler for write, so we can do deferred write at irq
  handling time.
- Display of /proc/irq/XX/smp_affinity used to display CPU_MASKALL, instead
  it now shows only active cpu masks, or exactly what was set.
- Provided a common move_irq implementation, instead of duplicating
  when using generic irq framework.

Tested on i386/x86_64 and ia64 with CONFIG_PCI_MSI turned on and off.
Tested UP builds as well.

MSI testing: tbd: I have cards, need to look for a x-over cable, although I
did test an earlier version of this patch.  Will test in a couple days.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Zwane Mwaikambo <zwane@holomorphy.com>
Grudgingly-acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Coywolf Qi Hunt <coywolf@lovecn.org>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:15 -07:00
Mark Maule
5fbcf9a5c6 [IA64-SGI] volatile semantics in places where it seems necessary
Resend using accessors instead of volatile qualifiers per hch comments, and
easier to understand convenience macros per rja comments.

Patch to apply volatile semantics when accessing MMR's in various SN files.

Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-07 16:23:41 -07:00
Kenji Kaneshige
9799e4d39a [IA64] Minor cleanups - remove unnecessary function prototype in iosapic.h
The function prototypes for iosapic_enable_intr() and
iosapic_pci_fixup() in include/asm-ia64/iosapic.h are no longer
needed. This patch removes them. The original patch has been posted by
Satoru Takeuchi.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-07 14:00:40 -07:00
Kenji Kaneshige
697eaad417 [IA64] Minor cleanups - remove CONFIG_ACPI_DEALLOCATE_IRQ
The config option 'CONFIG_ACPI_DEALLOCATE_IRQ' is no longer
needed. This patch removes it.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-07 14:00:08 -07:00
Kenji Kaneshige
a52ac87eb2 [IA64] Minor cleanups - remove unnecessary function prototype in irq.h
The function prototype for handl_IRQ_event() in include/asm-ia64/irq.h
is no longer needed. This patch removes it.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-07 13:59:40 -07:00
Dean Nelson
a607c38971 [IA64-SGI] get XPC to cleanly disengage from remote memory references
When XPC is being shutdown (i.e., rmmod, reboot) it doesn't ensure that
other partitions with whom it was connected have completely disengaged
from any attempt at cross-partition memory references. This can lead to
MCAs in any of these other partitions when the partition is reset.

Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-06 16:15:38 -07:00
Bruce Losure
25732ad493 [IA64] Altix patch for fpga reset
1) workaround a h/w reset issue
2) to improve the determination of FPGA-based h/w in
   the arch/ia64/sn/kernel/tiocx code.

Signed-off-by: Bruce Losure <blosure@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-06 14:16:01 -07:00
Kyle Moffett
fa5b08d5f8 [PATCH] sab: consolidate kmem_bufctl_t
This is used only in slab.c and each architecture gets to define whcih
underlying type is to be used.

Seems a bit silly - move it to slab.c and use the same type for all
architectures: unsigned int.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:48 -07:00
Len Brown
129521dcc9 Merge linux-2.6 into linux-acpi-2.6 test 2005-09-03 02:44:09 -04:00
Jack Steiner
a1cddb8892 [IA64-SGI] Add new vendor-specific SAL calls for:
- notifying the PROM of specific features that are supported by the OS.
  This is used to enable PROM feature if and only if the corresponding
  feature is implemented in the OS

- fetch feature sets that are supported by the current PROM. This allows
  the OS to selectively enable features when the PROM support is available.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-08-31 11:00:53 -07:00