Impact: improve NUMA handling by migrating irq_desc on smp_affinity changes
if CONFIG_NUMA_MIGRATE_IRQ_DESC is set:
- make irq_desc to go with affinity aka irq_desc moving etc
- call move_irq_desc in irq_complete_move()
- legacy irq_desc is not moved, because they are allocated via static array
for logical apic mode, need to add move_desc_in_progress_in_same_domain,
otherwise it will not be moved ==> also could need two phases to get
irq_desc moved.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new feature
Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.
To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.
When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).
This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 0c5d1eb77a (genirq: record trigger
type) caused powerpc platforms that had no set_type() function in their
struct irq_chip to spew out warnings about "No set_type function for
IRQ...". This warning isn't necessarily justified though because the
generic powerpc platform code calls set_irq_type() (which in turn calls
__irq_set_trigger) with information from the device tree to establish
the interrupt mappings, regardless of whether the PIC can actually set
a type.
A platform's irq_chip might not have a set_type function for a variety
of reasons, for example: the platform may have the type essentially
hard-coded, or as in the case for Cell interrupts are just messages
past around that have no real concept of type, or the platform
could even have a virtual PIC as on the PS3.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The affinity setting in setup irq is called before the NO_BALANCING
flag is checked and might therefore override affinity settings from the
calling code with the default setting.
Move the NO_BALANCING flag check before the call to the affinity
setting.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: preserve user-modified affinities on interrupts
Kumar Galak noticed that commit
1840475676 (genirq: Expose default irq
affinity mask (take 3))
overrides an already set affinity setting across a free /
request_irq(). Happens e.g. with ifdown/ifup of a network device.
Change the logic to mark the affinities as set and keep them
intact. This also fixes the unlocked access to irq_desc in
irq_select_affinity() when called from irq_affinity_proc_write()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This variable is only used in the source file, so make it static.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the member 'name' of the irq_desc structure happens to point to a
character string that is resident within a kernel module, problems ensue
if that module is rmmod'd (at which time dynamic_irq_cleanup() is called)
and then later show_interrupts() is called by someone.
It is also not a good thing if the character string resided in kmalloc'd
space that has been kfree'd (after having called dynamic_irq_cleanup()).
dynamic_irq_cleanup() fails to NULL the 'name' member and
show_interrupts() references it on a few architectures (like h8300, sh and
x86).
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix boot hang on a G5
In set_irq_type() we want to pass the type rather than the current
interrupt state.
Signed-off-by: Chris Friesen <cfriesen@nortel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
probe_irq_off() is disfunctional as the local nr_irqs is referenced
instead of the global one for the for_each_irq_desc() iterator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This code is not ready, but we need to rip it out instead of rebasing
as we would lose the APIC/IO_APIC unification otherwise.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
when SPARSE_IRQ is not used, should still use irq_desc->lock
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Noonan reported a boot hang when using irqpoll and
CONFIG_HAVE_SPARSE_IRQ=y.
The irqpoll loop needs to be updated to not iterate from 1 to nr_irqs
but to iterate via for_each_irq_desc(). (in the former case desc can
be NULL which crashes the box)
Reported-by: Steven Noonan <steven@uplinklabs.net>
Tested-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the IRQ affinity in the process context when the IRQ is disabled.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extraneous call to irq_to_desc().
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh Siddha noticed that we should have a spinlock around it.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This has been deprecated for years, the user space irqbalanced utility
works better with numa, has configurable policies, etc...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmai.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
change names:
irq_desc() ==> irq_desc_alloc
__irq_desc() ==> irq_desc
Also split a few of the uses in lowlevel x86 code.
v2: need to check if desc is null in smp_irq_move_cleanup
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
So we could remove some duplicated calling to irq_desc
v2: make sure irq_desc in init/main.c is not used without generic_hardirqs
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove irq limit checks - nr_irqs is dynamic and we expand anytime.
v2: fix checking about result irq_cfg_without_new, so could use msi again
v3: use irq_desc_without_new to check irq is valid
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are a handful of loops that go from 0 to nr_irqs and use
get_irq_desc() on them. These would allocate all the irq_desc
entries, regardless of the need for them.
Use the smarter for_each_irq_desc() iterator that will only iterate
over the present ones.
v2: make sure arch without GENERIC_HARDIRQS work too
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add an irq_desc accessor that will not allocate any sparse entry
but returns failure if there's no entry present.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
based on Eric's patch ...
together mold it with dyn_array for irq_desc, will allcate kstat_irqs for
nr_irq_desc alltogether if needed. -- at that point nr_cpus is known already.
v2: make sure system without generic_hardirqs works they don't have irq_desc
v3: fix merging
v4: [mingo@elte.hu] fix typo
[ mingo@elte.hu ] irq: build fix
fix:
arch/x86/xen/spinlock.c: In function 'xen_spin_lock_slow':
arch/x86/xen/spinlock.c:90: error: 'struct kernel_stat' has no member named 'irqs'
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add CONFIG_HAVE_SPARSE_IRQ to for use condensed array.
Get rid of irq_desc[] array assumptions.
Preallocate 32 irq_desc, and irq_desc() will try to get more.
( No change in functionality is expected anywhere, except the odd build
failure where we missed a code site or where a crossing commit itroduces
new irq_desc[] usage. )
v2: according to Eric, change get_irq_desc() to irq_desc()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
at this point nr_irqs is equal NR_IRQS
convert a few easy users from NR_IRQS to dynamic nr_irqs.
v2: according to Eric, we need to take care of arch without generic_hardirqs
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Genirq hasn't previously recorded the trigger type used by any given IRQ,
although some irq_chip support has done so. That data can be useful when
troubleshooting. This patch records it in the relevant irq_desc.status
bits, and improves consistency between the two driver-visible calls
affected:
- Make set_irq_type() usage match request_irq() usage:
* IRQ_TYPE_NONE should be a NOP; succeed, so irq_chip methods
won't have to handle that case any more (many do it wrong).
* IRQ_TYPE_PROBE is ignored; any buggy out-of-tree callers
might need to switch over to the real IRQ probing code.
* emit the same diagnostics (from shared utility code)
- Their kerneldoc now reflects usage:
* request_irq() flags include IRQF_TRIGGER_* to specify
active edge(s)/level ... docs previously omitted that
* set_irq_type() is declared in <linux/irq.h> so callers
should use the (bit-equivalent) IRQ_TYPE_* symbols there
Also: adds a warning about shared IRQs that don't end up using the
requested trigger mode; and fix an unrelated "sparse" warning.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch clarifies usage of irq_chip->startup() callback:
1. The "if (startup) startup(); else enabled();" code in setup_irq()
is unnecessary, as startup() falls back to enabled() via
default callbacks, set by irq_chip_set_defaults().
2. When using set_irq_chained_handler() the startup() was never called,
which is not good at all... Fixed. And again - when startup() is not
defined the call will fall back to enable() than to unmask() via
default callbacks.
Signed-off-by: Pawel Moll <pawel.moll@st.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When DEBUG_SHIRQ is selected, a spurious IRQ is issued before
the setup_irq() initializes the desc->depth. An IRQ handler may
call disable_irq_nosync(), but then setup_irq() will overwrite
desc->depth, and upon enable_irq() we'll catch this WARN:
------------[ cut here ]------------
Badness at kernel/irq/manage.c:180
NIP: c0061ab8 LR: c0061f10 CTR: 00000000
REGS: cf83be50 TRAP: 0700 Not tainted (2.6.27-rc3-23450-g74919b0)
MSR: 00021032 <ME,IR,DR> CR: 22042022 XER: 20000000
TASK = cf829100[5] 'events/0' THREAD: cf83a000
GPR00: c0061f10 cf83bf00 cf829100 c038e674 00000016 00000000 cf83bef8 00000038
GPR08: c0298910 00000000 c0310d28 cf83a000 00000c9c 1001a1a8 0fffe000 00800000
GPR16: ffffffff 00000000 007fff00 00000000 007ffeb0 c03320a0 c031095c c0310924
GPR24: cf8292ec cf807190 cf83a000 00009032 c038e6a4 c038e674 cf99b1cc c038e674
NIP [c0061ab8] __enable_irq+0x20/0x80
LR [c0061f10] enable_irq+0x50/0x70
Call Trace:
[cf83bf00] [c038e674] irq_desc+0x630/0x9000 (unreliable)
[cf83bf10] [c0061f10] enable_irq+0x50/0x70
[cf83bf30] [c01abe94] phy_change+0x68/0x108
[cf83bf50] [c0046394] run_workqueue+0xc4/0x16c
[cf83bf90] [c0046834] worker_thread+0x74/0xd4
[cf83bfd0] [c004ab7c] kthread+0x48/0x84
[cf83bff0] [c00135e0] kernel_thread+0x44/0x60
Instruction dump:
4e800020 3d20c031 38a94214 4bffffcc 9421fff0 7c0802a6 93e1000c 7c7f1b78
90010014 8123001c 2f890000 409e001c <0fe00000> 80010014 83e1000c 38210010
That trace corresponds to this line:
WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
The patch fixes the problem by moving the SHIRQ code below the
setup_irq().
Unfortunately we can't easily move the SHIRQ code inside the
setup_irq(), since it grabs a spinlock, so to prvent a 'real'
IRQ from interfere us we should disable that IRQ.
p.s. The driver in question is drivers/net/phy/phy.c.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch /proc/irq/*/smp_affinity , /proc/irq/default_smp_affinity to
seq_files.
cat(1) reads with 1024 chunks by default, with high enough NR_CPUS, there
will be -EINVAL.
As side effect, there are now two less users of the ->read_proc interface.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While I'm glad to finally see the hole fixed whereby passing an invalid
IRQ trigger type to request_irq() would be ignored, the current diagnostic
isn't quite useful. Fixed by also listing the trigger type which was
rejected.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace a printk+WARN_ON() by a WARN(); this increases the chance of the
string making it into the bugreport (ie: it goes inside the
---[ cut here ]--- section)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>