Commit Graph

835 Commits

Author SHA1 Message Date
Keshavamurthy Anil S
89cb14c0dd [PATCH] Kprobes/IA64: check jprobe break before handling
Once the jprobe instrumented function returns, it executes a jprobe_break
which is a break instruction with __IA64_JPROBE_BREAK value.  The current
patch checks for this break value, before assuming that jprobe instrumented
function just completed.

The previous code was not checking for this value and that was a bug.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:24 -07:00
Anil S Keshavamurthy
708de8f11c [PATCH] Kprobes IA64: safe register kprobe
The current kprobes does not yet handle register kprobes on some of the
following kind of instruction which needs to be emulated in a special way.

1) mov r1=ip
2) chk -- Speculation check instruction

This patch attempts to fail register_kprobes() when user tries to insert
kprobes on the above kind of instruction.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:24 -07:00
Anil S Keshavamurthy
1674eafcbd [PATCH] Kprobes IA64: cmp ctype unc support
The current Kprobes when patching the original instruction with the break
instruction tries to retain the original qualifying predicate(qp), however
for cmp.crel.ctype where ctype == unc, which is a special instruction
always needs to be executed irrespective of qp.  Hence, if the instruction
we are patching is of this type, then we should not copy the original qp to
the break instruction, this is because we always want the break fault to
happen so that we can emulate the instruction.

This patch is based on the feedback given by David Mosberger

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:23 -07:00
Anil S Keshavamurthy
a5403183d8 [PATCH] Kprobes IA64: arch_prepare_kprobes() cleanup
arch_prepare_kprobes() was doing lots of functionality
in just one single function. This patch
attempts to clean up arch_prepare_kprobes() by moving
specific sub task to the following (new)functions
1)valid_kprobe_addr() -->> validate the given kprobe address
2)get_kprobe_inst(slot..)->> Retrives the instruction for a given slot from the bundle
3)prepare_break_inst() -->> Prepares break instruction within the bundle
	3a)update_kprobe_inst_flag()-->>Updates the internal flags, required
			for proper emulation of the instruction at later
			point in time.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:23 -07:00
Rusty Lynch
13608d6433 [PATCH] Kprobes ia64 qp fix
Fix a bug where a kprobe still fires when the instruction is predicated
off.  So given the p6=0, and we have an instruction like:

(p6) move loc1=0

we should not be triggering the kprobe.  This is handled by carrying over
the qp section of the original instruction into the break instruction.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:23 -07:00
Rusty Lynch
8bc76772ad [PATCH] Kprobes ia64 cleanup
A cleanup of the ia64 kprobes implementation such that all of the bundle
manipulation logic is concentrated in arch_prepare_kprobe().

With the current design for kprobes, the arch specific code only has a
chance to return failure inside the arch_prepare_kprobe() function.

This patch moves all of the work that was happening in arch_copy_kprobe()
and most of the work that was happening in arch_arm_kprobe() into
arch_prepare_kprobe().  By doing this we can add further robustness checks
in arch_arm_kprobe() and refuse to insert kprobes that will cause problems.

Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:23 -07:00
Anil S Keshavamurthy
cd2675bf65 [PATCH] Kprobes/IA64: support kprobe on branch/call instructions
This patch is required to support kprobe on branch/call instructions.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:23 -07:00
Anil S Keshavamurthy
b2761dc262 [PATCH] Kprobes/IA64: architecture specific JProbes support
This patch adds IA64 architecture specific JProbes support on top of Kprobes

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:22 -07:00
Anil S Keshavamurthy
fd7b231ff9 [PATCH] Kprobes/IA64: arch specific handling
This is an IA64 arch specific handling of Kprobes

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:22 -07:00
Anil S Keshavamurthy
7213b25218 [PATCH] Kprobes/IA64: kdebug die notification mechanism
As many of you know that kprobes exist in the main line kernel for various
architecture including i386, x86_64, ppc64 and sparc64.  Attached patches
following this mail are a port of Kprobes and Jprobes for IA64.

I have tesed this patches for kprobes and Jprobes and this seems to work fine.
 I have tested this patch by inserting kprobes on various slots and various
templates including various types of branch instructions.

I have also tested this patch using the tool
http://marc.theaimsgroup.com/?l=linux-kernel&m=111657358022586&w=2 and the
kprobes for IA64 works great.

Here is list of TODO things and pathes for the same will appear soon.

1) Support kprobes on "mov r1=ip" type of instruction
2) Support Kprobes and Jprobes to exist on the same address
3) Support Return probes
3) Architecture independent cleanup of kprobes

This patch adds the kdebug die notification mechanism needed by Kprobes.

For break instruction on Branch type slot, imm21 is ignored and value
zero is placed in IIM register, hence we need to handle kprobes
for switch case zero.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com>

From: Rusty Lynch <rusty.lynch@intel.com>

At the point in traps.c where we recieve a break with a zero value, we can
not say if the break was a result of a kprobe or some other debug facility.

This simple patch changes the informational string to a more correct "break
0" value, and applies to the 2.6.12-rc2-mm2 tree with all the kprobes
patches that were just recently included for the next mm cut.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:22 -07:00
Hien Nguyen
0aa55e4d7d [PATCH] kprobes: moves lock-unlock to non-arch kprobe_flush_task
This patch moves the lock/unlock of the arch specific kprobe_flush_task()
to the non-arch specific kprobe_flusk_task().

Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Rusty Lynch
7e1048b11c [PATCH] Move kprobe [dis]arming into arch specific code
The architecture independent code of the current kprobes implementation is
arming and disarming kprobes at registration time.  The problem is that the
code is assuming that arming and disarming is a just done by a simple write
of some magic value to an address.  This is problematic for ia64 where our
instructions look more like structures, and we can not insert break points
by just doing something like:

*p->addr = BREAKPOINT_INSTRUCTION;

The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent
functions:

     * void arch_arm_kprobe(struct kprobe *p)
     * void arch_disarm_kprobe(struct kprobe *p)

and then adds the new functions for each of the architectures that already
implement kprobes (spar64/ppc64/i386/x86_64).

I thought arch_[dis]arm_kprobe was the most descriptive of what was really
happening, but each of the architectures already had a disarm_kprobe()
function that was really a "disarm and do some other clean-up items as
needed when you stumble across a recursive kprobe." So...  I took the
liberty of changing the code that was calling disarm_kprobe() to call
arch_disarm_kprobe(), and then do the cleanup in the block of code dealing
with the recursive kprobe case.

So far this patch as been tested on i386, x86_64, and ppc64, but still
needs to be tested in sparc64.

Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Rusty Lynch
73649dab0f [PATCH] x86_64 specific function return probes
The following patch adds the x86_64 architecture specific implementation
for function return probes.

Function return probes is a mechanism built on top of kprobes that allows
a caller to register a handler to be called when a given function exits.
For example, to instrument the return path of sys_mkdir:

static int sys_mkdir_exit(struct kretprobe_instance *i, struct pt_regs *regs)
{
	printk("sys_mkdir exited\n");
	return 0;
}
static struct kretprobe return_probe = {
	.handler = sys_mkdir_exit,
};

<inside setup function>

return_probe.kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("sys_mkdir");
if (register_kretprobe(&return_probe)) {
	printk(KERN_DEBUG "Unable to register return probe!\n");
	/* do error path */
}

<inside cleanup function>
unregister_kretprobe(&return_probe);

The way this works is that:

* At system initialization time, kernel/kprobes.c installs a kprobe
  on a function called kretprobe_trampoline() that is implemented in
  the arch/x86_64/kernel/kprobes.c  (More on this later)

* When a return probe is registered using register_kretprobe(),
  kernel/kprobes.c will install a kprobe on the first instruction of the
  targeted function with the pre handler set to arch_prepare_kretprobe()
  which is implemented in arch/x86_64/kernel/kprobes.c.

* arch_prepare_kretprobe() will prepare a kretprobe instance that stores:
  - nodes for hanging this instance in an empty or free list
  - a pointer to the return probe
  - the original return address
  - a pointer to the stack address

  With all this stowed away, arch_prepare_kretprobe() then sets the return
  address for the targeted function to a special trampoline function called
  kretprobe_trampoline() implemented in arch/x86_64/kernel/kprobes.c

* The kprobe completes as normal, with control passing back to the target
  function that executes as normal, and eventually returns to our trampoline
  function.

* Since a kprobe was installed on kretprobe_trampoline() during system
  initialization, control passes back to kprobes via the architecture
  specific function trampoline_probe_handler() which will lookup the
  instance in an hlist maintained by kernel/kprobes.c, and then call
  the handler function.

* When trampoline_probe_handler() is done, the kprobes infrastructure
  single steps the original instruction (in this case just a top), and
  then calls trampoline_post_handler().  trampoline_post_handler() then
  looks up the instance again, puts the instance back on the free list,
  and then makes a long jump back to the original return instruction.

So to recap, to instrument the exit path of a function this implementation
will cause four interruptions:

  - A breakpoint at the very beginning of the function allowing us to
    switch out the return address
  - A single step interruption to execute the original instruction that
    we replaced with the break instruction (normal kprobe flow)
  - A breakpoint in the trampoline function where our instrumented function
    returned to
  - A single step interruption to execute the original instruction that
    we replaced with the break instruction (normal kprobe flow)

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Hien Nguyen
b94cce926b [PATCH] kprobes: function-return probes
This patch adds function-return probes to kprobes for the i386
architecture.  This enables you to establish a handler to be run when a
function returns.

1. API

Two new functions are added to kprobes:

	int register_kretprobe(struct kretprobe *rp);
	void unregister_kretprobe(struct kretprobe *rp);

2. Registration and unregistration

2.1 Register

  To register a function-return probe, the user populates the following
  fields in a kretprobe object and calls register_kretprobe() with the
  kretprobe address as an argument:

  kp.addr - the function's address

  handler - this function is run after the ret instruction executes, but
  before control returns to the return address in the caller.

  maxactive - The maximum number of instances of the probed function that
  can be active concurrently.  For example, if the function is non-
  recursive and is called with a spinlock or mutex held, maxactive = 1
  should be enough.  If the function is non-recursive and can never
  relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should
  be enough.  maxactive is used to determine how many kretprobe_instance
  objects to allocate for this particular probed function.  If maxactive <=
  0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 *
  NR_CPUS) else maxactive=NR_CPUS)

  For example:

    struct kretprobe rp;
    rp.kp.addr = /* entrypoint address */
    rp.handler = /*return probe handler */
    rp.maxactive = /* e.g., 1 or NR_CPUS or 0, see the above explanation */
    register_kretprobe(&rp);

  The following field may also be of interest:

  nmissed - Initialized to zero when the function-return probe is
  registered, and incremented every time the probed function is entered but
  there is no kretprobe_instance object available for establishing the
  function-return probe (i.e., because maxactive was set too low).

2.2 Unregister

  To unregiter a function-return probe, the user calls
  unregister_kretprobe() with the same kretprobe object as registered
  previously.  If a probed function is running when the return probe is
  unregistered, the function will return as expected, but the handler won't
  be run.

3. Limitations

3.1 This patch supports only the i386 architecture, but patches for
    x86_64 and ppc64 are anticipated soon.

3.2 Return probes operates by replacing the return address in the stack
    (or in a known register, such as the lr register for ppc).  This may
    cause __builtin_return_address(0), when invoked from the return-probed
    function, to return the address of the return-probes trampoline.

3.3 This implementation uses the "Multiprobes at an address" feature in
    2.6.12-rc3-mm3.

3.4 Due to a limitation in multi-probes, you cannot currently establish
    a return probe and a jprobe on the same function.  A patch to remove
    this limitation is being tested.

This feature is required by SystemTap (http://sourceware.org/systemtap),
and reflects ideas contributed by several SystemTap developers, including
Will Cohen and Ananth Mavinakayanahalli.

Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:21 -07:00
Robert Love
dfe52244e0 [PATCH] kstrdup: convert a few existing implementations
Convert a bunch of strdup() implementations and their callers to the new
kstrdup().  A few remain, for example see sound/core, and there are tons of
open coded strdup()'s around.  Sigh.  But this is a start.

Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:18 -07:00
Domen Puncer
15d20bfd60 [PATCH] ptrace_h8300: condition bugfix
Assignment doesn't make much sense here as condition would always be true.

Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:15 -07:00
Vincent Hanquez
76381fee7e [PATCH] xen: x86_64: use more usermode macro
Make use of the user_mode macro where it's possible.  This is useful for Xen
because it will need only to redefine only the macro to a hypervisor call.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:14 -07:00
Vincent Hanquez
e9129e56e9 [PATCH] xen: x86_64: Add macro for debugreg
Add 2 macros to set and get debugreg on x86_64.  This is useful for Xen
because it will need only to redefine each macro to a hypervisor call.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:14 -07:00
Vincent Hanquez
717b594a41 [PATCH] xen: x86: Use more usermode macro
Use the user_mode macro where it's possible.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:14 -07:00
Vincent Hanquez
fa1e1bdf78 [PATCH] xen: x86: Rename usermode macro
Rename user_mode to user_mode_vm and add a user_mode macro similar to the
x86-64 one.

This is useful for Xen because the linux xen kernel does not runs on the same
priviledge that a vanilla linux kernel, and with this we just need to redefine
user_mode().

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:14 -07:00
Vincent Hanquez
1cc6f12e03 [PATCH] xen: x86: Use new macro for debugreg
Make use of the 2 new macro set_debugreg and get_debugreg.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:13 -07:00
Natalie Protasevich
701067c466 [PATCH] x86_64: avoid wasting IRQs
I suggest to change the way IRQs are handed out to PCI devices.

Currently, each I/O APIC pin gets associated with an IRQ, no matter if the
pin is used or not.  It is expected that each pin can potentually be
engaged by a device inserted into the corresponding PCI slot.  However,
this imposes severe limitation on systems that have designs that employ
many I/O APICs, only utilizing couple lines of each, such as P64H2 chipset.

It is used in ES7000, and currently, there is no way to boot the system
with more that 9 I/O APICs.

The simple change below allows to boot a system with say 64 (or more) I/O
APICs, each providing 1 slot, which otherwise impossible because of the IRQ
gaps created for unused lines on each I/O APIC.  It does not resolve the
problem with number of devices that exceeds number of possible IRQs, but
eases up a tension for IRQs on any large system with potentually large
number of devices.

I only implemented this for the ACPI boot, since if the system is this big
and using newer chipsets it is probably (better be!) an ACPI based system
:).  The change is completely "mechanical" and does not alter any internal
structures or interrupt model/implementation.  The patch works for both
i386 and x86_64 archs.  It works with MSIs just fine, and should not
intervene with implementations like shared vectors, when they get worked
out and incorporated.

To illustrate, below is the interrupt distribution for 2-cell ES7000 with
20 I/O APICs, and an Ethernet card in the last slot, which should be eth1
and which was not configured because its IRQ exceeded allowable number (it
actially turned out huge - 480!):

zorro-tb2:~ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:      65716      30012      30007      30002      30009      30010      30010      30010    IO-APIC-edge  timer
  4:        373          0        725        280          0          0          0          0    IO-APIC-edge  serial
  8:          0          0          0          0          0          0          0          0    IO-APIC-edge  rtc
  9:          0          0          0          0          0          0          0          0   IO-APIC-level  acpi
 14:         39          3          0          0          0          0          0          0    IO-APIC-edge  ide0
 16:        108         13          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb1
 18:          0          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb3
 19:         15          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb2
 23:          3          0          0          0          0          0          0          0   IO-APIC-level  ehci_hcd:usb4
 96:       4240        397         18          0          0          0          0          0   IO-APIC-level  aic7xxx
 97:         15          0          0          0          0          0          0          0   IO-APIC-level  aic7xxx
192:        847          0          0          0          0          0          0          0   IO-APIC-level  eth0
NMI:          0          0          0          0          0          0          0          0
LOC:     273423     274528     272829     274228     274092     273761     273827     273694
ERR:          7
MIS:          0

Even though the system doesn't have that many devices, some don't get
enabled only because of IRQ numbering model.

This is the IRQ picture after the patch was applied:

zorro-tb2:~ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:      44169      10004      10004      10001      10004      10003      10004       6135    IO-APIC-edge  timer
  4:        345          0          0          0          0        244          0          0    IO-APIC-edge  serial
  8:          0          0          0          0          0          0          0          0    IO-APIC-edge  rtc
  9:          0          0          0          0          0          0          0          0   IO-APIC-level  acpi
 14:         39          0          3          0          0          0          0          0    IO-APIC-edge  ide0
 17:       4425          0          9          0          0          0          0          0   IO-APIC-level  aic7xxx
 18:         15          0          0          0          0          0          0          0   IO-APIC-level  aic7xxx, uhci_hcd:usb3
 21:        231          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb1
 22:         26          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb2
 23:          3          0          0          0          0          0          0          0   IO-APIC-level  ehci_hcd:usb4
 24:        348          0          0          0          0          0          0          0   IO-APIC-level  eth0
 25:          6        192          0          0          0          0          0          0   IO-APIC-level  eth1
NMI:          0          0          0          0          0          0          0          0
LOC:     107981     107636     108899     108698     108489     108326     108331     108254
ERR:          7
MIS:          0

Not only we see the card in the last I/O APIC, but we are not even close to
using up available IRQs, since we didn't waste any.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:13 -07:00
Roland McGrath
0928d6ef7f [PATCH] x86_64: never block forced SIGSEGV
This is the x86_64 version of the signal fix I just posted for i386.

This problem was first noticed on PPC and has already been fixed there.
But the exact same issue applies to other platforms in the same way.  The
signal blocking for sa_mask and the handled signal takes place after the
handler setup.  When the stack is bogus, the handler setup forces a
SIGSEGV.  But then this will be blocked, and returning to user mode will
fault again and iterate.  This patch fixes the problem by checking whether
signal handler setup failed, and not doing the signal-blocking if so.  This
copies what was done in the ppc code.  I think all architectures' signal
handler setup code follows this pattern and needs the change.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:13 -07:00
john stultz
a3a00751ad [PATCH] x86_64: fix hpet for systems that don't support legacy replacement
Currently the x86-64 HPET code assumes the entire HPET implementation from
the spec is present.  This breaks on boxes that do not implement the
optional legacy timer replacement functionality portion of the spec.

This patch fixes this issue, allowing x86-64 systems that cannot use the
HPET for the timer interrupt and RTC to still use the HPET as a time
source.  I've tested this patch on a system systems without HPET, with HPET
but without legacy timer replacement, as well as HPET with legacy timer
replacement.

This version adds a minor check to cap the HPET counter value in
gettimeoffset_hpet to avoid possible time inconsistencies.  Please ignore
the A2 version I sent to you earlier.

Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:12 -07:00
Alexander Nyberg
c0a88c9878 [PATCH] x86_64: i8259.c iso99 structure initialization
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:12 -07:00
Andrew Morton
c92c6ffdb1 [PATCH] mtrr size-and-base debugging
Consolidate the mtrr sanity checking, add a dump_stack().

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:12 -07:00
Andrew Morton
a3a255e744 [PATCH] x86: cpu_khz type fix
x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use
unsigned long.

So make cpu_khz unsigned int on x86 as well.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:11 -07:00
Alexey Dobriyan
129f69465b [PATCH] Remove i386_ksyms.c, almost.
* EXPORT_SYMBOL's moved to other files
* #include <linux/config.h>, <linux/module.h> where needed
* #include's in i386_ksyms.c cleaned up
* After copy-paste, redundant due to Makefiles rules preprocessor directives
  removed:

	#ifdef CONFIG_FOO
	EXPORT_SYMBOL(foo);
	#endif

	obj-$(CONFIG_FOO) += foo.o

* Tiny reformat to fit in 80 columns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:11 -07:00
Aleksey Gorelov
80bb82afea [PATCH] VIA 82C586B IRQ routing fix
According to the VIA 82C586B datasheet (still available from
http://gkernel.sourceforge.net/specs/via/586b.pdf.bz2) this chip need a
special PIRQ mapping.

Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Aleksey Gorelov <aleksey_gorelov@phoenix.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:11 -07:00
Natalie Protasevich
c434b7a6ae [PATCH] x86: avoid wasting IRQs for PCI devices
I have submitted the patch for x86_64, this is submission for i386.

The patch changes the way IRQs are handed out to PCI devices.  Currently,
each I/O APIC pin gets associated with an IRQ, no matter if the pin is used
or not.  This imposes severe limitation on systems that have designs that
employ many I/O APICs, only utilizing couple lines of each, such as P64H2
chipset.  It is used in ES7000, and currently, there is no way to boot the
system with more that 9 I/O APICs.

The simple change below allows to boot a system with say 64 (or more) I/O
APICs, each providing 1 slot, which otherwise impossible because of the IRQ
gaps created for unused lines on each I/O APIC.  It does not resolve the
problem with number of devices that exceeds number of possible IRQs, but
eases up a tension for IRQs on any large system with potentually large
number of devices.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:10 -07:00
Christoph Lameter
b5d23e5b8c [PATCH] ia64: Selectable Timer Interrupt Frequency
It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ.
Reducing the timer frequency may have important performance benefits on
large systems.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:10 -07:00
Christoph Lameter
5912100372 [PATCH] i386: Selectable Frequency of the Timer Interrupt
Make the timer frequency selectable. The timer interrupt may cause bus
and memory contention in large NUMA systems since the interrupt occurs
on each processor HZ times per second.

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:10 -07:00
Jan Beulich
799d19f6ec [PATCH] allow early printk to use more than 25 lines
Allow early printk code to take advantage of the full size of the screen, not
just the first 25 lines.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:10 -07:00
Jan Beulich
7fbb4f6e68 [PATCH] adjust i386 watchdog tick calculation
Get the i386 watchdog tick calculation into a state where it can also be used
on CPUs with frequencies beyond 4GHz, and it consolidates the calculation into
a single place (for potential furture adjustments).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Natalie Protasevich
ca05fea6db [PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386)
This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and
use heuristics to prevent unique I/O APIC ID check for systems that don't
need it.  The patch disables unique I/O APIC ID check for Xeon-based and
other platforms that don't use serial APIC bus for interrupt delivery.
Andi stated that AMD systems don't need unique IO_APIC_IDs either.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Roland McGrath
7c1def1652 [PATCH] i386: never block forced SIGSEGV
This problem was first noticed on PPC and has already been fixed there.
But the exact same issue applies to other platforms in the same way.  The
signal blocking for sa_mask and the handled signal takes place after the
handler setup.  When the stack is bogus, the handler setup forces a
SIGSEGV.  But then this will be blocked, and returning to user mode will
fault again and iterate.  This patch fixes the problem by checking whether
signal handler setup failed, and not doing the signal-blocking if so.  This
copies what was done in the ppc code.  I think all architectures' signal
handler setup code follows this pattern and needs the change.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:09 -07:00
Christoph Lameter
8c5a09082f [PATCH] x86/x86_64: pcibus_to_node
Define pcibus_to_node to be able to figure out which NUMA node contains a
given PCI device.  This defines pcibus_to_node(bus) in
include/linux/topology.h and adjusts the macros for i386 and x86_64 that
already provided a way to determine the cpumask of a pci device.

x86_64 was changed to not build an array of cpumasks anymore.  Instead an
array of nodes is build which can be used to generate the cpumask via
node_to_cpumask.

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:08 -07:00
Venkatesh Pallipadi
8a9e1b0f56 [PATCH] Platform SMIs and their interferance with tsc based delay calibration
Issue:
Current tsc based delay_calibration can result in significant errors in
loops_per_jiffy count when the platform events like SMIs
(System Management Interrupts that are non-maskable) are present. This could
lead to potential kernel panic(). This issue is becoming more visible with 2.6
kernel (as default HZ is 1000) and on platforms with higher SMI handling
latencies. During the boot time, SMIs are mostly used by BIOS (for things
like legacy keyboard emulation).

Description:
The psuedocode for current delay calibration with tsc based delay looks like
(0) Estimate a value for loops_per_jiffy
(1) While (loops_per_jiffy estimate is accurate enough)
(2)   wait for jiffy transition (jiffy1)
(3)   Note down current tsc (tsc1)
(4)   loop until tsc becomes tsc1 + loops_per_jiffy
(5)   check whether jiffy changed since jiffy1 or not and refine
loops_per_jiffy estimate

Consider the following cases
Case 1:
If SMIs happen between (2) and (3) above, we can end up with a
loops_per_jiffy value that is too low. This results in shorted delays and
kernel can panic () during boot (Mostly at IOAPIC timer initialization
timer_irq_works() as we don't have enough timer interrupts in a specified
interval).

Case 2:
If SMIs happen between (3) and (4) above, then we can end up with a
loops_per_jiffy value that is too high. And with current i386 code, too
high lpj value (greater than 17M) can result in a overflow in
delay.c:__const_udelay() again resulting in shorter delay and panic().

Solution:
The patch below makes the calibration routine aware of asynchronous events
like SMIs. We increase the delay calibration time and also identify any
significant errors (greater than 12.5%) in the calibration and notify it to
user.

Patch below changes both i386 and x86-64 architectures to use this
new and improved calibrate_delay_direct() routine.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:08 -07:00
Ian Campbell
0f8e2d62fa [PATCH] use ${CROSS_COMPILE}installkernel in arch/*/boot/install.sh
The attached patch causes the various arch specific install.sh scripts to
look for ${CROSS_COMPILE}installkernel rather than just installkernel (in
both /sbin/ and ~/bin/ where the script already did this).  This allows you
to have e.g.  arm-linux-installkernel as a handy way to install on your
cross target.  It also prevents the script picking up on the host
/sbin/installkernel which causes the script to fall through and do the
install itself (which is what I actually use myself, with $INSTALL_PATH
set).

I don't believe it causes back-compatibility problems since calling the
host installkernel was never likely to work or be what you wanted when
cross compiling anyway.  If $CROSS_COMPILE isn't set then nothing changes.

I only use ARM and i386 myself but I figured it couldn't hurt to do the
whole lot.  I've cc'd those who I hope are the arch maintainers for files
that I've touched.

Signed-off-by: Ian Campbell <icampbell@arcom.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:07 -07:00
H. Peter Anvin
d0e7feb03d [PATCH] biarch compiler support for i386
This allows the i386 architecture to be built on a system with a biarch
compiler that defaults to x86-64, merely by specifying ARCH=i386.

As previously discussed, this uses the equivalent logic to the ppc port.

Signed-Off-By: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:07 -07:00
Martin J. Bligh
6f4e1e5061 [PATCH] add page_state info to show_mem
This helps a lot when debugging out of memory stuff - useful especially to
see if all the memory is sucked into slab, etc.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:07 -07:00
Matt Tolentino
bbfceef47f [PATCH] add x86-64 specific support for sparsemem
This patch adds in the necessary support for sparsemem such that x86-64
kernels may use sparsemem as an alternative to discontigmem for NUMA
kernels.  Note that this does no preclude one from continuing to build NUMA
kernels using discontigmem, but merely allows the option to build NUMA
kernels with sparsemem.

Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels
results in reduced text size for otherwise equivalent kernels as shown in
the example builds below:

   text	   data	    bss	    dec	    hex	filename
2371036	 765884	1237108	4374028	 42be0c	vmlinux.discontig
2366549	 776484	1302772	4445805	 43d66d	vmlinux.sparse

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:07 -07:00
Matt Tolentino
2b97690f4c [PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options
In order to use the alternative sparsemem implmentation for NUMA kernels,
we need to reorganize the config options.  This patch effectively abstracts
out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases.  Thus,
the discontigmem implementation may be employed as always, but the
sparsemem implementation may be used alternatively.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:06 -07:00
Matt Tolentino
1035faf1b1 [PATCH] add x86-64 Kconfig options for sparsemem
Add the requisite arch specific Kconfig options to enable the use of the
sparsemem implementation for NUMA kernels on x86-64.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:06 -07:00
Matt Tolentino
073326634b [PATCH] remove direct ref to contig_page_data for x86-64
This patch pulls out all remaining direct references to contig_page_data
from arch/x86-64, thus saving an ifdef in one case.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:06 -07:00
Andy Whitcroft
145e664231 [PATCH] ppc64: sparsemem memory model
Provide the architecture specific implementation for SPARSEMEM for PPC64
systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> (in part)
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:06 -07:00
Andy Whitcroft
74b30be2e1 [PATCH] ppc64: add memory present
Provide hooks for PPC64 to allow memory models to be informed of installed
memory areas.  This allows SPARSEMEM to instantiate mem_map for the populated
areas.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:05 -07:00
Andy Whitcroft
510f8fa7ba [PATCH] ppc64: add early_pfn_to_nid
Provide an implementation of early_pfn_to_nid for PPC64.  This is used by
memory models to determine the node from which to take allocations before the
memory allocators are fully initialised.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:05 -07:00
Andy Whitcroft
641c767389 [PATCH] sparsemem swiss cheese numa layouts
The part of the sparsemem patch which modifies memmap_init_zone() has recently
become a problem.  It changes behavior so that there is a call to
pfn_to_page() for each individual page inside of a node's range:
node_start_pfn through node_end_pfn.  It used to simply do this once, at the
beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
of a node made it necessary to change.

Mike Kravetz recently wrote a patch which made the NUMA code accept some new
kinds of layouts.  The system's memory was laid out like this, with node 0's
memory in two pieces: one before and one after node 1's memory:

	Node 0: +++++     +++++
	Node 1:      +++++

Previous behavior before Mike's patch was to assign nodes like this:

	Node 0: 00000     XXXXX
	Node 1:      11111

Where the 'X' areas were simply thrown away.  The new behavior was to make the
pg_data_t span node 0 across all of its areas, including areas that are really
node 1's: Node 0: 000000000000000 Node 1: 11111

This wastes a little bit of mem_map space, but ends up being OK, and more
fully utilizes the system's memory.  memmap_init_zone() initializes all of the
"struct page"s for node 0, even for the "hole", but those never get used,
because there is no pfn_to_page() that resolves to those pages.  However, only
calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
allocated for node0->node_mem_map because:

	struct page *start = pfn_to_page(start_pfn);
	// effectively start = &node->node_mem_map[0]
	for (page = start; page < (start + size); page++) {
		init_page_here();...
		page++;
	}

Slow, and wasteful, but generally harmless.

But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
does):

	for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
		page = pfn_to_page(pfn);
	}

And you end up trying to initialize node 1's pages too early, along with bogus
data from node 0.  This patch checks for those weird layouts and declines to
touch the pages, making the more frequent pfn_to_page() calls OK to do.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:05 -07:00
Andy Whitcroft
05b79bdcb4 [PATCH] sparsemem memory model for i386
Provide the architecture specific implementation for SPARSEMEM for i386 SMP
and NUMA systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:05 -07:00
Andy Whitcroft
d41dee369b [PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.

A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.

Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous.  It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.

Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
to be chopped up.

In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags.  Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations.  However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions.  Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.

One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags.  It might provide
speed increases on certain platforms and will be stored there if there is
room.  But, if out of room, an alternate (theoretically slower) mechanism is
used.

This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:04 -07:00
Andy Whitcroft
af705362ab [PATCH] generify memory present
Allow architectures to indicate that they will be providing hooks to indice
installed memory areas, memory_present().  Provide prototypes for the i386
implementation.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:04 -07:00
Andy Whitcroft
b159d43fbf [PATCH] generify early_pfn_to_nid
Provide a default implementation for early_pfn_to_nid returning node 0.  Allow
architectures to override this with their own implementation out of
asm/mmzone.h.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:04 -07:00
Mike Kravetz
368a0a3afa [PATCH] ppc64: Kconfig memory models
This patch changes some of the default behavior in the ppc64 Kconfig file
that was recently changed/added to 2.6.12-rc2-mm1 by Dave Hansen in
preparation for SPARSEMEM.  Patch allows the display of both FLAT and
DISCONTIG models on pseries.  As before, default is DISCONTIG for SMP and
PSERIES and FLAT for others.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:04 -07:00
Dave Hansen
074ccf8016 [PATCH] mm/Kconfig: kill unused ARCH_FLATMEM_DISABLE
This used to be used to disable FLATMEM selection, but I decided to change it
to be done generically when DISCONTIG is enabled.  The option is unused, so
this kills it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:03 -07:00
Dave Hansen
0e19243e9a [PATCH] update all defconfigs for ARCH_DISCONTIGMEM_ENABLE
This will at least suppress one prompt that users would have received the
first time they compile with the new DISCONTIG arch option.  They'll still
get the "Memory Model" prompt, but 99% of them will have the default work
there.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:02 -07:00
Dave Hansen
3f22ab276b [PATCH] make each arch use mm/Kconfig
For all architectures, this just means that you'll see a "Memory Model"
choice in your architecture menu.  For those that implement DISCONTIGMEM,
you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool
y" and make your users select DISCONTIGMEM right out of the new choice
menu.  The only disadvantage might be if you have some specific things that
you need in your help option to explain something about DISCONTIGMEM.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:02 -07:00
Dave Hansen
5b505b90b2 [PATCH] sparsemem base: teach discontig about sparse ranges
discontig.c has some assumptions that mem_map[]s inside of a node are
contiguous.  Teach it to make sure that each region that it's bringing online
is actually made up of valid ranges of ram.

Written-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:01 -07:00
Dave Hansen
6f167ec721 [PATCH] sparsemem base: simple NUMA remap space allocator
Introduce a simple allocator for the NUMA remap space.  This space is very
scarce, used for structures which are best allocated node local.

This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
the pgdat->node_mem_map initialized in a consistent place for all
architectures.

Issues:
o alloc_remap takes a node_id where we might expect a pgdat which was intended
  to allow us to allocate the pgdat's using this mechanism; which we do not yet
  do.  Could have alloc_remap_node() and alloc_remap_nid() for this purpose.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:01 -07:00
Dave Hansen
c2ebaa425e [PATCH] sparsemem base: early_pfn_to_nid() (works before sparse is initialized)
The following four patches provide the last needed changes before the
introduction of sparsemem.  For a more complete description of what this
will do, please see this patch:

http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch

or previous posts on the subject:
http://marc.theaimsgroup.com/?t=110868540700001&r=1&w=2
http://marc.theaimsgroup.com/?l=linux-mm&m=109897373315016&w=2

Three of these are i386-only, but one of them reorganizes the macros
used to manage the space in page->flags, and will affect all platforms.
There are analogous patches to the i386 ones for ppc64, ia64, and
x86_64, but those will be submitted by the normal arch maintainers.

The combination of the four patches has been test-booted on a variety of
i386 hardware, and compiled for ppc64, i386, and x86-64 with about 17
different .configs.  It's also been runtime-tested on ia64 configs (with
more patches on top).

This patch:

We _know_ which node pages in general belong to, at least at a very gross
level in node_{start,end}_pfn[].  Use those to target the allocations of
pages.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:00 -07:00
Dave Hansen
408fde81c1 [PATCH] remove non-DISCONTIG use of pgdat->node_mem_map
This patch effectively eliminates direct use of pgdat->node_mem_map outside
of the DISCONTIG code.  On a flat memory system, these fields aren't
currently used, neither are they on a sparsemem system.

There was also a node_mem_map(nid) macro on many architectures.  Its use
along with the use of ->node_mem_map itself was not consistent.  It has
been removed in favor of two new, more explicit, arch-independent macros:

	pgdat_page_nr(pgdat, pagenr)
	nid_page_nr(nid, pagenr)

I called them "pgdat" and "nid" because we overload the term "node" to mean
"NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways.  I
believe the newer names are much clearer.

These macros can be overridden in the sparsemem case with a theoretically
slower operation using node_start_pfn and pfn_to_page(), instead.  We could
make this the only behavior if people want, but I don't want to change too
much at once.  One thing at a time.

This patch removes more code than it adds.

Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64.  Full list
here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/

Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:00 -07:00
David Gibson
d7152fe14c [PATCH] Maple powerdown patch
Currently reset and powerdown are not implemented on the Maple board,
and attempting to do so will (incorrectly return).  This implements
the proper communication with the service processor, allowing correct
reset and powerdown on the Maple board, by communicating with the
service processor.  If somehow it's unable to communicate with the
service processor it will loop forever instead.

Note that powerdown on the Maple will power down the CPUs, but not the
fans or other board components due to hardware and firmware
limitations.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Frank Rowand <frowand@mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 17:14:39 +10:00
John Rose
dad32bbf43 [PATCH] pSeries - read irqs dynamically
For I/O DLPAR to work properly, the kernel needs to allow for dynamic
assignment of the irq field of the pci_dev structure upon dynamic bus
addition.  This patch moves the assignment of that field from
pSeries_final_fixup() to pcibios_fixup_bus(), which enables dynamic
assignment for the children of a newly added bus.

Currently, pci_devs receive their irq numbers in one of two ways.  The
irq line is either read at boot for all pci_devs, or read by the rpaphp
module at slot enable time.  The latter is no longer sufficient for
DLPAR addition of slots that don't qualify as PCI-hotplug capable.
This solution handles the cases of boot and dynamic add.

Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 17:09:54 +10:00
Mike Strosaker
8f586b2243 [PATCH] correct printing to operator panel
This patch corrects the printing of progress indicators to the op
panel on p/iSeries ppc64 systems.  Each discrete reference code should
begin with a form feed char to clear the op panel, and the first and
second lines should be separated with a CR/LF sequence.  Padding with
spaces is not necessary.

Also, capitalize the hex value printed on the first line, to be
consistent with the values printed by firmware, service processor,
etc.

It turns out that there's an ibm,form-feed property; this patch uses
it in the pSeries-specific progress routine.  This patch also checks
the number of rows and the specific width of each row (the second row
on power5 systems can actually hold 80 characters).  If the displayed
text is too wide for the physical display, it can be viewed in the ASM
menus, or by selecting option 14 on the op panel.

Signed-off-by: Mike Strosaker <strosake@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 16:09:41 +10:00
Arnd Bergmann
ae209cf100 [PATCH] ppc64: Add driver for BPA iommu
Implementation of software load support for the BE iommu. This is very
different from other iommu code on ppc64, since we only do a static mapping.
The mapping is currently hardcoded but should really be read from the
firmware, but they don't set up the device nodes yet. There is a single
512MB DMA window for PCI, USB and ethernet at 0x20000000 for our RAM.

The Cell processor can put the I/O page table either in memory like
the hashed page table (hardware load) or have the operating system
write the entries into memory mapped CPU registers (software load).

I use the software load mechanism because I know that all I/O page
table entries for the amount of installed physical memory fit into
the IO TLB cache. At the point when we get machines with more than
4GB of installed memory, we can either use hardware I/O page table
access like the other platforms do or dynamically update the I/O
TLB entries when a page fault occurs in the I/O subsystem.

The software load can then use the macros that I have implemented
for the static mapping in order to do the TLB cache updates.

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 09:43:54 +10:00
Arnd Bergmann
cebf589c82 [PATCH] ppc64: Add driver for BPA interrupt controllers
Add support for the integrated interrupt controller on BPA
CPUs. There is one of those for each SMT thread.

The mapping of interrupt numbers to HW interrupt sources
is described in arch/ppc64/kernel/bpa_iic.h.

This version hardcodes the 'Spider' chip as the secondary
interrupt controller. That is not really generic for the
architecture, but at the moment it is the only secondary
PIC that exists.

A little more work will be needed on this as soon as
we have boards with multiple external interrupt controllers.

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 09:43:43 +10:00
Arnd Bergmann
fef1c772fa [PATCH] ppc64: add BPA platform type
This adds the basic support for running on BPA machines.
So far, this is only the IBM workstation, and it will
not run on others without a little more generalization.

It should be possible to configure a kernel for any
combination of CONFIG_PPC_BPA with any of the other
multiplatform targets.

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 09:43:37 +10:00
Utz Bacher
5f5b4e669a [PATCH] ppc64: add a minimal nvram driver
The firmware provides the location and size of the nvram
in the device tree, so it does not really contain any
hardware specific bits and could be used on other
machines as well.
 
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 09:43:31 +10:00
Arnd Bergmann
6566c6f1f1 [PATCH] ppc64: pSeries_progress -> rtas_progress
The pSeries_progress function is called from some places in the rtas code,
which may also be used by non-pSeries platforms.
Though pSeries is currently the only platform type that implements
display-character, the code is actually generic enough to be part of
the rtas subsystem.

I hit a bug here because the generic rtas code tried calling ppc_md.progress,
which points to an __init function on most platforms.

We could also clear the ppc_md.progress pointer when freeing the init memory
to make it more explicit that ppc_md.progress must not be called after
bootup.

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 09:43:28 +10:00
Arnd Bergmann
c5a3c2e52a [PATCH] ppc64: Split out generic rtas code from pSeries_pci.c.
BPA is using rtas for PCI but should not be confused by
pSeries code. This also avoids some #ifdefs. Other
platforms that want to use rtas_pci.c could create
their own platform_pci.c with platform specific fixups.

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 09:43:23 +10:00
Arnd Bergmann
773bf9c469 [PATCH] ppc64: rename pSeries rtc functions into rtas_*
The rtc rtas functions are not pSeries specific but can
also be used by BPA and other SLOF based platforms

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 09:43:18 +10:00
Arnd Bergmann
10f7e7c15e [PATCH] ppc64: consolidate calibrate_decr implementations
pSeries and maple have almost the same code for calibrate_decr,
and BPA would need yet another copy. Instead, I'm moving the
code to arch/ppc64/kernel/time.c.

Some of the related declarations were missing from header
files, so I'm moving those as well.

It makes sense to merge this with the pmac function of the
same name, so we end up having just one implemetation for
iSeries and one for Open Firmware based machines.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-06-23 09:43:07 +10:00
Linus Torvalds
a493604400 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-06-22 14:51:06 -07:00
Russell King
92a8cbed29 [PATCH] ARM: Remove explicit page-alignments in memory init
Since meminfo.bank[] array contains page-aligned start/size, we
no longer need to explicitly round up/down the addresses when
converting to PFNs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-06-22 21:47:25 +01:00
Russell King
3a66941106 [PATCH] ARM: Ensure memory information is page aligned
Ensure that meminfo.bank[] array contains page-aligned start/size
information.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-06-22 21:43:10 +01:00
Russell King
b46a58fd4e [PATCH] ARM: Use list_for_each_entry() for dmabounce
Convert dmabounce.c to use list_for_each_entry() instead of
list_for_each() + list_entry().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-06-22 21:25:58 +01:00
Kumar Gala
f1b04770b0 [PATCH] ppc32: Fix building MPC8555 CDS
Adding support for MPC8548 w/o PCI support, broke building MPC8555 CDS
by trying to remove a loop variable that was used when PCI is enabled.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org)
2005-06-22 13:23:38 -07:00
Russell King
e00d349e77 [PATCH] ARM: Move signal return code into vector page
Move the signal return code into the vector page instead of placing
it on the user mode stack, which will allow us to avoid flushing
the instruction cache on signals, as well as eventually allowing
non-exec stack.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-06-22 20:26:05 +01:00
Linus Torvalds
fb7a0e3653 Merge kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git
Do arch/ia64/defconfig by hand.
2005-06-22 12:22:12 -07:00
Russell King
052162198b [PATCH] ARM: Allow clps7500 to build without parsing "acorn" tag
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-06-22 09:56:57 +01:00
Russell King
ebe2a9ffa1 [PATCH] ARM: Allow riscpc to parse "acorn" boot info tag
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-06-22 09:55:04 +01:00
Russell King
522c37b9d3 [PATCH] ARM: Fix sa1111.c build error caused by klist changes
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-06-22 09:52:26 +01:00
Randy Vinson
bdca3f0aed [PATCH] I2C: Add support for Maxim/Dallas DS1374 Real-Time Clock Chip (2/2)
This change provides support for the DS1374 Real-Time Clock chip present
on the MPC8349ADS board. It depends on a previous patch which adds I2C
support for the DS1374.

Signed-off-by: Randy Vinson <rvinson@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-21 21:52:06 -07:00
Heiko Carstens
2b07188617 [PATCH] s390: pending interrupt after ipl from reader
Wait for interrupt and clear status pending after resetting the reader.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:34 -07:00
Heiko Carstens
e9b9a04796 [PATCH] s390: memory detection > 32GB
The kernel takes a very long time to boot if the memory size is bigger then
32767 MB.  The memory size is contained in a structure created by an sclp
call.  The kernel accesses the field with a LH instrution which performs a
sign extension of a 16 bit word.  In the case of a memory size with bit 2^15
set this results in a very large value and the memory detection just loops for
a long time.  In addition if more then 64 GB are used on a 64 bit system the
memory size is read from an incorrect storage location.

Use zero-extention to read the 16 bit memory size and the correct offset to
read the 4 byte memory size on 64 bit.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:34 -07:00
Heiko Carstens
447570cfde [PATCH] s390: cmm sender parameter visibility
Make cmm module parameter "sender" visible in sysfs.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:33 -07:00
Heiko Carstens
77eb65cbc1 [PATCH] s390: kernel stack overflow panic
die() doesn't return, therefore print registers and then panic instead.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:33 -07:00
Martin Schwidefsky
14651c798a [PATCH] s390: #ifdefs in compat_ioctls
Remove superflous #if .. #endif pairs from compat_ioctl.c.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:33 -07:00
Paolo 'Blaisorblade' Giarrusso
60b2737de1 [PATCH] uml: fix linkage of tt mode against NPTL
With Al Viro <viro@parcelfarce.linux.theplanet.co.uk>

To make sure switcheroo() can execute when we remap all the executable
image, we used a trick to make it use a local copy of errno...  this trick
does not work with NPTL glibc, only with LinuxThreads, so use another
(simpler) one to make it work anyway.

Hopefully, a lot improved thanks to merging with the version of Al Viro
(which had his part of problems, though, i.e.  removing a fix to another
bug and not fixing the problem on i386).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:32 -07:00
Paolo 'Blaisorblade' Giarrusso
b77d6adc92 [PATCH] uml: make hw_controller_type->release exist only for archs needing it
With Chris Wedgwood <cw@f00f.org>

As suggested by Chris, we can make the "just added" method ->release
conditional to UML only (better: to archs requesting it, i.e.  only UML
currently), so that other archs don't get this unneeded crud, and if UML
won't need it any more we can kill this.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:32 -07:00
Paolo 'Blaisorblade' Giarrusso
faec1e99ba [PATCH] uml: complete hw_controller_type->release conversion
This occurrence of free_irq_by_irq_and_dev() was missed when converting UML
to the use of hw_controller_type->release.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:32 -07:00
Paolo 'Blaisorblade' Giarrusso
dbce706e25 [PATCH] uml: add and use generic hw_controller_type->release
With Chris Wedgwood <cw@f00f.org>

Currently UML must explicitly call the UML-specific
free_irq_by_irq_and_dev() for each free_irq call it's done.

This is needed because ->shutdown and/or ->disable are only called when the
last "action" for that irq is removed.

Instead, for UML shared IRQs (UML IRQs are very often, if not always,
shared), for each dev_id some setup is done, which must be cleared on the
release of that fd.  For instance, for each open console a new instance
(i.e.  new dev_id) of the same IRQ is requested().

Exactly, a fd is stored in an array (pollfds), which is after read by a
host thread and passed to poll().  Each event registered by poll() triggers
an interrupt.  So, for each free_irq() we must remove the corresponding
host fd from the table, which we do via this -release() method.

In this patch we add an appropriate hook for this, and remove all uses of
it by pointing the hook to the said procedure; this is safe to do since the
said procedure.

Also some cosmetic improvements are included.

This is heavily based on some work by Chris Wedgwood, which however didn't
get the patch merged for something I'd call a "misunderstanding" (the need
for this patch wasn't cleanly explained, thus adding the generic hook was
felt as undesirable).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:32 -07:00
Hirokazu Takata
960c2a89a0 [PATCH] m32r: Update defconfig files
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:31 -07:00
Hirokazu Takata
1cc1265e9a [PATCH] m32r: Cleanup arch/m32r/mm/extable.c
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:31 -07:00
Hirokazu Takata
6f973b001a [PATCH] m32r: Update setup_xxxxx.c
Change coding styles of hw_interrupt_type struct's initialization portions.

Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:30 -07:00
Hirokazu Takata
2368086344 [PATCH] m32r: Support M3A-2170(Mappi-III) platform
This patchset is for supporting a new m32r platform, M3A-2170(Mappi-III)
evaluation board.  An M32R chip multiprocessor is equipped on the board.
http://http://www.linux-m32r.org/eng/platform/platform.html

	* arch/m32r/Kconfig: Support Mappi-III platform.
	* arch/m32r/kernel/Makefile: ditto.
	* arch/m32r/kernel/io_mappi3.c: ditto.
	* arch/m32r/kernel/setup.c: ditto.
	* arch/m32r/kernel/setup_mappi3.c: ditto.
	* include/asm-m32r/m32102.h: ditto.
	* include/asm-m32r/m32r.h: ditto.
	* include/asm-m32r/mappi3/mappi3_pld.h: ditto.

	* include/asm-m32r/ide.h: CF support for Mappi-III.
	* arch/m32r/kernel/setup_mappi3.c: ditto.

	* arch/m32r/mappi3/defconfig.smp: A default config file for Mappi-III.
	* arch/m32r/mappi3/dot.gdbinit: A default .gdbinit file for Mappi-III.

	* arch/m32r/boot/compressed/m32r_sio.c: Modified for Mappi-III
	  - At boot time, m32r-g00ff bootloader makes MMU off for Mappi-III,
	    on the contrary it makes MMU on for Mappi-II.

	* arch/m32r/kernel/io_mappi2.c: Update comments.
	* arch/m32r/kernel/setup_mappi2.c: ditto.

Signed-off-by: Mamoru Sakugawa <sakugawa@linux-m32r.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:30 -07:00
Brent Casavant
e5d310b349 [PATCH] ioc4: CONFIG split
The SGI IOC4 I/O controller chip drivers are currently all configured by
CONFIG_BLK_DEV_SGIIOC4.  This is undesirable as not all IOC4 hardware features
are needed by all systems.

This patch adds two configuration variables, CONFIG_SGI_IOC4 for core IOC4
driver support (see patch 1/3 in this series for further explanation) and
CONFIG_SERIAL_SGI_IOC4 to independently enable serial port support.

Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Acked-by: Pat Gefre <pfg@sgi.com>
Acked-by: Jeremy Higdon <jeremy@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:32 -07:00
Anton Blanchard
9b843cda19 [PATCH] ppc64: set/clear SMT capable bit at boot
Allow the SMT bit to be set/reset at boot, like the ALTIVEC bit.  This
means we will enable SMT on unknown cpus that support it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:31 -07:00
Anton Blanchard
515bae9cdc [PATCH] ppc64: Mark kernel hptes dirty
We dont use the hardware referenced and changed bits and setting them early
avoids a store to memory.  We already do this for userspace hptes but not
kernel ones.  Do it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:31 -07:00
Stephen Rothwell
ac5b33c9bc [PATCH] ppc64: tidy up vio devices fake parent
Currently we dynamically allocate the fake parent device for all devices on
the vio bus.  This patch statically allocates it.  This also allows us to
reuse it for the iSeries "generic" vio device (that is used for passing to
dma routines when communicating with the hypervisor without a device
involved).  Also unexport vio_bus_type as it is never used in modules.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:31 -07:00
Stephen Rothwell
145d01e428 [PATCH] ppc64 iSeries: allow build with no PCI
This patch allows iSeries to build with CONFIG_PCI=n.  This is useful for
partitions that have only virtual I/O.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:31 -07:00
Stephen Rothwell
7f74e79fe7 [PATCH] ppc64 iSeries: tidy up irq code after merge
This patch just removes some dead code, fixes messages that referred to the
file this code used to be in and inserts XmPciLpEvent_init into its caller.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:30 -07:00
Stephen Rothwell
89ef68f0be [PATCH] ppc64 iSeries: remove XmPciLpEvent.c
This patch just merges XmPciLpEvent.c into iSeries_irq.c (the only caller of
its only external function).  XmPciLpEvent.c just contained the lowlevel
iSeries irq code.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:30 -07:00
Stephen Rothwell
0c3b4f1a8e [PATCH] ppc64 iSeries: irq simple cleanups
This patch is just simple cleanups to the iSeries irq code.
	- whitespace and comments
	- rearrange some functions to avoid forward declarations
	- remove XmPciLpEvent.h as its functions were declared elsewhere
	- remove decaration of function that no longer exists
No semantic changes.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:30 -07:00
Stephen Rothwell
061c063efc [PATCH] ppc64 iSeries: remove some more members of iSeries_Device_Node
The AgentId, PhbId, FrameId, CardLocation and Location members of
iSeries_Device_Node are stored early in the boot process just so that a
message about the device can be printed later in the boot process.  Remove
them and construct the message by doing the VPD parsing at the time the
message is printed.

Also remove a few unused defines in iSeries_VpdInfo.c.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:30 -07:00
Stephen Rothwell
a2ebaf250f [PATCH] ppc64 iSeries: remove IoRetry from iSeries_Device_Node
The IoRetry member of iSeries_Devide_Node is really only used locally, so
remove it and replace it with a local variable.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:29 -07:00
Stephen Rothwell
aab41dea80 [PATCH] ppc64 iSeries: iSeries_pci.h cleanups
Remove no longer used things from iSeries_pci.h.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:29 -07:00
Stephen Rothwell
57ca86d4f0 [PATCH] ppc64 iSeries: iSeries_VpdInfo.c cleanups
Clean up iSeries_VpdInfo.c:
	- white space and comment fixes
	- make a function static
	- the functions here are only called from iSeries_pci.c, so
	  CONFIG_PCI will be set (so remove check)
	- only build when CONFIG_PCI is set
	- remove unneeded includes and cast

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:29 -07:00
Stephen Rothwell
ea7190d0af [PATCH] ppc64 iSeries: remove iSeries_pci_reset.c
The file arch/ppc64/kernel/iSeries_pci_reset contains only one function that
is not use anywhere (any more).  Remove it.  This function is the only user of
the ReturnCode member of iSeries_Device_Node, so remove that as well.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:29 -07:00
Stephen Rothwell
c670b1acd0 [PATCH] ppc64 iSeries: misc header cleanups
Last of this round of the iSeries header cleanups
	- don't have two defines for the same thing (HvMaxArchitectedLps
	  and HvMaxArchitectedVirtualLans)
	- HvCallSc.h only needs linux/types.h
	- remove unused struct definition
	- add "extern" to some more function declarations

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:28 -07:00
Stephen Rothwell
4a5304f5ba [PATCH] ppc64 iSeries: tidy up some includes and HvCall.h
This patch removes some unused bits from HvCall.h and some unneeded #includes
from other files.  Also includes ItLpQueue.h in paca.h in preference to a stub
declaration of struct ItLpQueue.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:28 -07:00
Stephen Rothwell
c92877e0a0 [PATCH] ppc64 iSeries: cleanup ItLpQueue.h
Just white space cleaups and move process_iSeries_events into its only caller.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:28 -07:00
Stephen Rothwell
dd61ce9227 [PATCH] ppc64 iSeries: eliminate some unused inline functions
This patch removes from the iSeries header files a large number of inline
functions that are not used.  It also changes the only caller of a HvCallCfg
function that is outside HvLpConfig.h to its equivalent HvLpConfig function
and no longer includes HvCallCfg.h where it is not needed.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:28 -07:00
Stephen Rothwell
0bc0ffd5f0 [PATCH] ppc64 iSeries: remove LparData.h
include/asm-ppc64/iSeries/LparData.h just included a whole lot of other files
to declare variables that would be better declared in those other files.  So,
remove it.  This will reduce that number of things needed to be included in
most cases to access the relevant variables.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:27 -07:00
Stephen Rothwell
0e3e4a1c4d [PATCH] ppc64 iSeries: remove iSeries_proc.h
include/asm-ppc64/iSeries/iSeries_proc.h just contains a declaration of a
function that no longer exists.  Remove it.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:26 -07:00
Sven Luther
723e2b35e4 [PATCH] ppc64: override command line AS/LD/CC variables when adding -m64 and co for biarch compilers
The following kind of calls currently fails :

  make ARCH=ppc64 CC="gcc-3.4"

Since the code for detecting a biarch compiler and adding the needed 64bit
magic argument fails if the AS/LD/CC commands are overriden in the command
line.

The attached patch fixes this by using the make override and += directive,
but i am not 100% sure this will work without gmake, as i am no Makefile
expert.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:26 -07:00
David Gibson
20cee16ced [PATCH] ppc64: Abolish ioremap_mm
Currently ppc64 has two mm_structs for the kernel, init_mm and also
ioremap_mm.  The latter really isn't necessary: this patch abolishes it,
instead restricting vmallocs to the lower 1TB of the init_mm's range and
placing io mappings in the upper 1TB.  This simplifies the code in a number
of places and eliminates an unecessary set of pagetables.  It also tweaks
the unmap/free path a little, allowing us to remove the unmap_im_area() set
of page table walkers, replacing them with unmap_vm_area().

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:26 -07:00
Benjamin Herrenschmidt
6879dc137e [PATCH] ppc32: Kill embedded system.map, use kallsyms
This patch kills the whole embedded System.map mecanism and the
bootloader-passed System.map that was used to provide symbol resolution in
xmon.  Instead, xmon now uses kallsyms like ppc64 does.

No hurry getting that in Linus tree, let it be tested in -mm for a while
first and make sure it doesn't break various embedded configs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:26 -07:00
Jakub Bogusz
a70d439345 [PATCH] ppc32: don't recursively crash in die() on CHRP/PReP machines
This patch avoids recursive crash (leading to kernel stack overflow) in
die() on CHRP/PReP machines when CONFIG_PMAC_BACKLIGHT=y.  set_backlight_*
functions are placed in pmac section, which is discarded when _machine !=
_MACH_Pmac.

Signed-off-by: Jakub Bogusz <qboosh@pld-linux.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:25 -07:00
Kumar Gala
ba8c6d534a [PATCH] ppc32: remove some unnecessary includes of prom.h
Fight the Good Fight: Limit prom.h header creep.

Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:25 -07:00
Kumar Gala
1492ec8069 [PATCH] ppc32: Factor out common exception code into macro's for 4xx/Book-E
4xx and Book-E PPC's have several exception levels.  The code to handle
each level is fairly regular.  Turning the code into macro's will ease the
handling of future exception levels (debug) in forth coming chips.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:25 -07:00
Kumar Gala
5be061eee9 [PATCH] ppc32: Clean up NUM_TLBCAMS usage for Freescale Book-E PPC's
Made the number of TLB CAM entries private and converted the board
consumers to use num_tlbcam_entries which is setup at boot time from
configuration registers.  This way the only consumers of the #define
NUM_TLBCAMS are the arrays used to manage the TLB.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:24 -07:00
Kumar Gala
65145e060b [PATCH] ppc32: Added support for all MPC8548 internal interrupts
The MPC8548 has 48 internal interrupts and 12 external interrupts.  The
previous generation PowerQUICC III devices only had 32 internal and 12
external interrupts on the primary interrupt controller.

Expanded the number of internal interrupts to 48 for all PowerQUICC III
processors and moved the interrupt numbers for the external after the 48
internal interrupt lines, rather than putting the 12 new internal
interrupts at the end and ifdef'ng the whole mess.  As parted of this
created a macro which represents the internal interrupt senses since they
are the same on all PQ3 processors.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:24 -07:00
Matt Porter
eee4146ab9 [PATCH] ppc32: remove orphaned ppc4xx_kgdb.c
Removes ppc4xx_kgdb.c which is no longer being used.  Pointed out by Andrei
Konovalov.

Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:24 -07:00
Kumar Gala
682afbbd14 [PATCH] ppc32: Add support for MPC8245 8250 serial ports on Sandpoint
Added platform device initialization for the two 8250 style UARTs that
exist on the MPC8245.  Additionally, updated the Sandpoint code to enable
one of these UARTs if an MPC8245 is connected to it.

Signed-off-by: Matt McClintock <msm@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:24 -07:00
Matt Porter
1e5aa8c865 [PATCH] ppc32: fix CONFIG_TASK_SIZE handling on 40x
This patch is virtually identical to my previous 44x one.  It removes
0x8000'0000 TASK_SIZE hardcoded assumption from head_4xx.S.

Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:24 -07:00
Kumar Gala
b264c35279 [PATCH] ppc32: Converted MPC10X bridge to use platform devices instead of OCP
Converted the MPC10x bridge support (used by MPC10x and 8240/1/5) to used
the standard platform device model.

Signed-off-by: Matt McClintock <msm@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:23 -07:00
Kumar Gala
c93fcff695 [PATCH] ppc32: Removed dependency on CONFIG_CPM2 for building mpc85xx_device.c
Previously we needed CONFIG_CPM2 enabled to get the proper IRQ ifdef's for
CPM interrupts.  Recent changes have caused that to be no longer necessary.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:23 -07:00
Kumar Gala
c91999bba3 [PATCH] ppc32: Added preliminary support for the MPC8548 CDS board
Adds support for using the MPC8548 processor on the CDS reference board.
Currently all the major busses (PCI, PCI-X, PCI-Express, sRIO) and eTSEC3
and eTSEC4 are not supported.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:23 -07:00
Kumar Gala
5b37b700f7 [PATCH] ppc32: Added support for new MPC8548 family of PowerQUICC III processors
Added descriptions of the new MPC8548 family processors, e500 core and
peripherals.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:23 -07:00
Coywolf Qi Hunt
2894801db1 [PATCH] kbuild: display compile version
I am always trying to make sure I've booted the right kernel after a new
install.  Too paranoid maybe.  But I guess there're other people like me.
So let's make kbuild display the compile version number at the end to give
us a hint.  I know we may be booting vmlinux someday, but don't care about
it for now.

Signed-off-by: Coywolf Qi Hunt <coywolf@lovecn.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:22 -07:00
Jes Sorensen
65ed0b337b [PATCH] SN2 XPC build patches
This patch contains the bits to make the XPC code use the uncached
allocator rather than calling into the mspec driver.  It also includes the
mspec.h header which is required to build the XPC modules.

Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:18 -07:00
Jes Sorensen
f14f75b811 [PATCH] ia64 uncached alloc
This patch contains the ia64 uncached page allocator and the generic
allocator (genalloc).  The uncached allocator was formerly part of the SN2
mspec driver but there are several other users of it so it has been split
off from the driver.

The generic allocator can be used by device driver to manage special memory
etc.  The generic allocator is based on the allocator from the sym53c8xx_2
driver.

Various users on ia64 needs uncached memory.  The SGI SN architecture requires
it for inter-partition communication between partitions within a large NUMA
cluster.  The specific user for this is the XPC code.  Another application is
large MPI style applications which use it for synchronization, on SN this can
be done using special 'fetchop' operations but it also benefits non SN
hardware which may use regular uncached memory for this purpose.  Performance
of doing this through uncached vs cached memory is pretty substantial.  This
is handled by the mspec driver which I will push out in a seperate patch.

Rather than creating a specific allocator for just uncached memory I came up
with genalloc which is a generic purpose allocator that can be used by device
drivers and other subsystems as they please.  For instance to handle onboard
device memory.  It was derived from the sym53c7xx_2 driver's allocator which
is also an example of a potential user (I am refraining from modifying sym2
right now as it seems to have been under fairly heavy development recently).

On ia64 memory has various properties within a granule, ie.  it isn't safe to
access memory as uncached within the same granule as currently has memory
accessed in cached mode.  The regular system therefore doesn't utilize memory
in the lower granules which is mixed in with device PAL code etc.  The
uncached driver walks the EFI memmap and pulls out the spill uncached pages
and sticks them into the uncached pool.  Only after these chunks have been
utilized, will it start converting regular cached memory into uncached memory.
Hence the reason for the EFI related code additions.

Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:18 -07:00
Badari Pulavarty
cbe37d0937 [PATCH] mm: remove PG_highmem
Remove PG_highmem, to save a page flag.  Use is_highmem() instead.  It'll
generate a little more code, but we don't use PageHigheMem() in many places.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:17 -07:00
Wolfgang Wander
1363c3cd86 [PATCH] Avoiding mmap fragmentation
Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

  1) the free_area_cache is used to continue a search for memory where
     the last search ended.  Before the change new areas were always
     searched from the base address on.

     So now new small areas are cluttering holes of all sizes
     throughout the whole mmap-able region whereas before small holes
     tended to close holes near the base leaving holes far from the base
     large and available for larger requests.

  2) the free_area_cache also is set to the location of the last
     munmap-ed area so in scenarios where we allocate e.g.  five regions of
     1K each, then free regions 4 2 3 in this order the next request for 1K
     will be placed in the position of the old region 3, whereas before we
     appended it to the still active region 1, placing it at the location
     of the old region 2.  Before we had 1 free region of 2K, now we only
     get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache.  If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander <wwc@rentec.com>
Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:16 -07:00
David Gibson
63551ae0fe [PATCH] Hugepage consolidation
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code.  It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.

Tested, at least a little, on ppc64, i386 and x86_64.

Notes:
	- this patch changes the meaning of set_huge_pte() to be more
	  analagous to set_pte()
	- does SH4 need s special huge_ptep_get_and_clear()??

Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:15 -07:00
Martin Hicks
753ee72896 [PATCH] VM: early zone reclaim
This is the core of the (much simplified) early reclaim.  The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.

One of the major uses of this is NUMA machines.  With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.

This adds a zone tuneable to enable early zone reclaim.  It is selected on a
per-zone basis and is turned on/off via syscall.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Ingo Molnar
39c715b717 [PATCH] smp_processor_id() cleanup
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.

The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.

Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.

In the new code, there are two externally visible symbols:

 - smp_processor_id(): debug variant.

 - raw_smp_processor_id(): nondebug variant. Replaces all existing
   uses of _smp_processor_id() and __smp_processor_id(). Defined
   by every SMP architecture in include/asm-*/smp.h.

There is one new internal symbol, dependent on DEBUG_PREEMPT:

 - debug_smp_processor_id(): internal debug variant, mapped to
                             smp_processor_id().

Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file.  All related comments got updated and/or
clarified.

I have build/boot tested the following 8 .config combinations on x86:

 {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT.  (Other
architectures are untested, but should work just fine.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:13 -07:00
Suresh Siddha
84929801e1 [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes
Appended patch will setup compatibility mode TASK_SIZE properly.  This will
fix atleast three known bugs that can be encountered while running
compatibility mode apps.

a) A malicious 32bit app can have an elf section at 0xffffe000.  During
   exec of this app, we will have a memory leak as insert_vm_struct() is
   not checking for return value in syscall32_setup_pages() and thus not
   freeing the vma allocated for the vsyscall page.  And instead of exec
   failing (as it has addresses > TASK_SIZE), we were allowing it to
   succeed previously.

b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area
   may return addresses beyond 32bits, ultimately causing corruption
   because of wrap-around and resulting in SEGFAULT, instead of returning
   ENOMEM.

c) 32bit app doing this below mmap will now fail.

  mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE,
	MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0);

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:12 -07:00
Tony Luck
29516d75a0 Auto merge with /home/aegl/GIT/linus 2005-06-21 16:21:20 -07:00
Matthew Chapman
4ea78729b8 [IA64] ptrace and restore_sigcontext() allow ar.rsc.pl==0
This patch fixes handling of accesses to ar.rsc via ptrace & restore_sigcontext
[With Thanks to Chris Wright for noticing the restore_sigcontext path]

Signed-off-by: Matthew Chapman <matthewc@hp.com>
Acked-by: David Mosberger <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-21 16:19:20 -07:00
David S. Miller
8005aba69a [SPARC64]: Fix cmsg length checks in Solaris emulation layer.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 15:39:22 -07:00
Bjorn Helgaas
7b404b3459 [IA64] remove "pci=routeirq" option
Remove "pci=routeirq" option for ia64.  This was a workaround
after ACPI IRQ routing was changed from "all at boot for everything
in _PRT" to "do it when the device is enabled" in case there were
drivers that didn't use pci_enable_device().

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-21 15:04:39 -07:00
Ken Chen
0393eed5c3 [IA64] fix nested_dtlb_miss handler for hugetlb address
The nested_dtlb_miss handler currently does not handle fault from
hugetlb address correctly.  It walks the page table assuming PAGE_SIZE.
Thus when taking a fault triggered from hugetlb address, it would not
calculate the pgd/pmd/pte address correctly and thus result an incorrect
invocation of ia64_do_page_fault().  In there, kernel will signal SIGBUS
and application dies (The faulting address is perfectly legal and we
have a valid pte for the corresponding user hugetlb address as well).
This patch fix the described kernel bug.  Since nested_dtlb_miss is a
rare event and a slow path anyway, I'm making the change without #ifdef
CONFIG_HUGETLB_PAGE for code readability.  Tony, please apply.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-21 14:40:31 -07:00
Christophe Lucas
52a0de2cd2 [IA64] printk needs KERN_INFO arch/ia64/kernel/smp.c
printk() calls should include appropriate KERN_* constant.

Signed-off-by: Christophe Lucas <clucas@rotomalug.org>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-21 14:21:17 -07:00
Greg Edwards
a35f1e03b8 [IA64] enable SGI simulator for generic kernels
Allow the SGI simulator (medusa) to work on generic kernels.  There is
no inherent dependency on an sn2-specific kernel.

Boot tested on Altix, medusa and HP rx2600.

Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-21 14:17:43 -07:00
Tony Luck
bd91c4bb13 [IA64] Refresh tiger_defconfig
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-21 14:16:00 -07:00
Greg Edwards
95dccdfe29 [IA64] refresh arch/ia64/defconfig
Refresh arch/ia64/defconfig, as it was getting a bit stale.  The only
manual changes I made were:

CONFIG_SCSI_SATA=y		needed for some Altix base I/O cards
CONFIG_SCSI_SATA_VITESSE=y

CONFIG_DM_MULTIPATH=m		the rest are already modules

CONFIG_FUSION_SPI=y		new driver breakout
CONFIG_FUSION_FC=m

CONFIG_SGI_TIOCX=y		enable some other SGI drivers
CONFIG_SGI_MBCS=m
CONFIG_AGP_SGI_TIOCA=m

Boot tested on Altix, HP rx2600 and Intel Tiger

Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-21 14:01:05 -07:00
Linus Torvalds
1d345dac1f Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 2005-06-20 16:00:33 -07:00
Yani Ioannou
ff381d2223 [PATCH] Driver Core: arch: update device attribute callbacks
Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:32 -07:00