o kdump functionality reserves a per cpu area at boot time and exports the
physical address of that area to user space through sys interface. This
area stores some dump related information like cpu register states etc
at the time of crash.
o We were assuming that per cpu area always come from linearly mapped meory
region and using __pa() to determine physical address.
With percpu_alloc=page, per cpu area can come from vmalloc region also and
__pa() breaks.
o This patch implments a new function to convert per cpu address to
physical address.
Before the patch, crash_notes addresses looked as follows.
cpu0 60fffff49800
cpu1 60fffff60800
cpu2 60fffff77800
These are bogus phsyical addresses.
After the patch, address are following.
cpu0 13eb44000
cpu1 13eb43000
cpu2 13eb42000
cpu3 13eb41000
These look fine. I got 4G of memory and /proc/iomem tell me following.
100000000-13fffffff : System RAM
tj: * added missing asm/io.h include reported by Stephen Rothwell
* repositioned per_cpu_ptr_phys() in percpu.c and added comment.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
pcpu_extend_area_map() had the following two bugs.
* It should return 1 if pcpu_lock was dropped and reacquired but it
returned 0. This could lead to oops if free_percpu() races with
area map extension.
* pcpu_mem_free() was called under pcpu_lock. pcpu_mem_free() might
end up calling vfree() which isn't IRQ safe. This could lead to
deadlock through lock order inversion via IRQ.
In addition, Linus pointed out that the temporary lock dropping and
subtle three-way return value of pcpu_extend_area_map() was very ugly
and suggested to split the function into two - pcpu_need_to_extend()
and pcpu_extend_area_map().
This patch restructures pcpu_extend_area_map() as suggested and fixes
the two bugs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
pcpu_alloc() and pcpu_extend_area_map() perform a series of
spin_lock_irq()/spin_unlock_irq() calls, which make them unsafe
with respect to being called from contexts which have IRQs off.
This patch converts the code to perform save/restore of flags instead,
making pcpu_alloc() (or __alloc_percpu() respectively) to be called
from early kernel startup stage, where IRQs are off.
This is needed for proper initialization of per-cpu rq_weight data from
sched_init().
tj: added comment explaining why irqsave/restore is used in alloc path.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix the following two compile warnings which show up on i386.
mm/percpu.c:1873: warning: comparison of distinct pointer types lacks a cast
mm/percpu.c:1879: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'size_t'
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Warn and dump stack when percpu allocation fails. percpu allocator is
still young and unchecked NULL percpu pointer usage can result in
random memory corruption when combined with the pointer shifting in
access macros. Allocation failures should be rare and the warning
message will be disabled after certain times.
Signed-off-by: Tejun Heo <tj@kernel.org>
The parameters to pcpu_setup_first_chunk() come from different sources
depending on architecture and can be quite complex. The function runs
various sanity checks on the parameters and triggers BUG() if
something isn't right. However, this is very early during the boot
and not reporting exactly what the problem is makes debugging even
harder.
Add PCPU_SETUP_BUG() macro which prints out enough information about
the parameters. As the macro still puts separate BUG() for each
check, it won't lose any information even on the situations where only
the program counter can be retrieved.
While at it, also bump pcpu_dump_alloc_info() message to KERN_INFO so
that it's visible on the console if boot fails to complete.
Signed-off-by: Tejun Heo <tj@kernel.org>
Embedding first chunk allocator maintains the distances between units
in the vmalloc area and thus needs vmalloc space to be larger than the
maximum distances between units; otherwise, it wouldn't be able to
create any dynamic chunks. This patch makes the embedding first chunk
allocator check vmalloc space size and if the maximum distance between
units is larger than 75% of it, print warning and, if page mapping
allocator is available, fail initialization so that the system falls
back onto it.
This should work around percpu allocation failure problems on certain
sparc64 configurations where distances between NUMA nodes are larger
than the vmalloc area and makes percpu allocator more robust for
future configurations.
Signed-off-by: Tejun Heo <tj@kernel.org>
Implement page mapping percpu first chunk allocator as a fallback to
the embedding allocator. The next patch will make the embedding
allocator check distances between units to determine whether it fits
within the vmalloc area so that this fallback can be used on such
cases.
sparc64 currently has relatively small vmalloc area which makes it
impossible to create any dynamic chunks on certain configurations
leading to percpu allocation failures. This and the next patch should
allow those configurations to keep working until proper solution is
found.
While at it, mark pcpu_cpu_distance() with __init.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
pcpu_build_alloc_info() may be called multiple times when percpu is
falling back to different first chunk allocator. Make it clear static
buffers so that they don't contain values from previous runs.
Signed-off-by: Tejun Heo <tj@kernel.org>
pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
unit number while unit number can equal and go over NR_CPUS with
sparse unit map. This triggers BUG_ON() spuriously on machines which
have non-power-of-two number of cpus. Use UINT_MAX instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Tony Vroon <tony@linx.net>
.. duplicated by merging the same fix twice, for details see commit
0d9df2515d ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following commit made console open fails while booting:
commit b50989dc44
Author: Alan Cox <alan@linux.intel.com>
Date: Sat Sep 19 13:13:22 2009 -0700
tty: make the kref destructor occur asynchronously
Due to tty release routines run in a workqueue now, error like the
following will be reported while booting:
INIT open /dev/console Input/output error
It also causes hibernation regression to appear as reported at
http://bugzilla.kernel.org/show_bug.cgi?id=14229
The reason is that now there's latency issue with closing, but when
we open a "closing not finished" tty, -EIO will be returned.
Fix it as per the following Alan's suggestion:
Fun but it's actually not a bug and the fix is wrong in itself as
the port may be closing but not yet being destructed, in which case
it seems to do the wrong thing. Opening a tty that is closing (and
could be closing for long periods) is supposed to return -EIO.
I suspect a better way to deal with this and keep the old console
timing is to split tty->shutdown into two functions.
tty->shutdown() - called synchronously just before we dump the tty
onto the waitqueue for destruction
tty->cleanup() - called when the destructor runs.
We would then do the shutdown part which can occur in IRQ context
fine, before queueing the rest of the release (from tty->magic = 0
... the end) to occur asynchronously
The USB update in -next would then need a call like
if (tty->cleanup)
tty->cleanup(tty);
at the top of the async function and the USB shutdown to be split
between shutdown and cleanup as the USB resource cleanup and final
tidy cannot occur synchronously as it needs to sleep.
In other words the logic becomes
final kref put
make object unfindable
async
clean it up
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
[ rjw: Rebased on top of 2.6.31-git, reworked the changelog. ]
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
[ Changed serial naming to match new rules, dropped tty_shutdown as per
comments from Alan Stern - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 3d5b6fb47a ("ACPI: Kill overly
verbose "power state" log messages") removed the actual use of this
variable, but didn't remove the variable itself, resulting in build
warnings like
drivers/acpi/processor_idle.c: In function ‘acpi_processor_power_init’:
drivers/acpi/processor_idle.c:1169: warning: unused variable ‘i’
Just get rid of the now unused variable.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix hwpoison code related build failure on 32-bit NUMAQ
ia64's sim_defconfig uses CONFIG_ACPI=n
which now #define's acpi_disabled in <linux/acpi.h>
So we shouldn't re-define it here in <asm/acpi.h>
Signed-off-by: Len Brown <len.brown@intel.com>
I was recently lucky enough to get a 64-CPU system, so my kernel log
ends up with 64 lines like:
ACPI: CPU0 (power states: C1[C1] C2[C3])
This is pretty useless clutter because this info is already available
after boot from both /sys/devices/system/cpu/cpu*/cpuidle/state?/ as
well as /proc/acpi/processor/CPU*/power.
So just delete the code that prints the C-states in processor_idle.c.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This build failure triggers:
In file included from include/linux/suspend.h:8,
from arch/x86/kernel/asm-offsets_32.c:11,
from arch/x86/kernel/asm-offsets.c:2:
include/linux/mm.h:503:2: error: #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
Because due to the hwpoison page flag we ran out of page
flags on 32-bit.
Dont turn on hwpoison on 32-bit NUMA (it's rare in any
case).
Also clean up the Kconfig dependencies in the generic MM
code by introducing ARCH_SUPPORTS_MEMORY_FAILURE.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don't disable ARB_DISABLE when the familary ID is 0x0F.
http://bugzilla.kernel.org/show_bug.cgi?id=14211
This was a 2.6.31 regression, and so this patch
needs to be applied to 2.6.31.stable
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The message "ACPI: Device needs an ACPI driver" is misleading. The
device _may_ need an ACPI driver, if the BIOS implemented a custom
API for the device in question (which, AFAIK, can't be checked.) If
not, then either a generic ACPI driver may be used (for example
"thermal"), or nothing can be done (other than a white list).
I propose to reword the message to:
ACPI: If an ACPI driver is available for this device, you should use
it instead of the native driver
which I think is more correct. Comments and suggestions welcome.
I also added a message warning about possible problems and system
instability when users pass acpi_enforce_resources=lax, as suggested
by Len.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Alan Jenkins <sourcejedi.lkml@googlemail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Fix this problem when CONFIG_THINKPAD_ACPI_HOTKEY_POLL is undefined:
CHECK drivers/platform/x86/thinkpad_acpi.c
drivers/platform/x86/thinkpad_acpi.c:1968:21: error: not an lvalue
CC [M] drivers/platform/x86/thinkpad_acpi.o
drivers/platform/x86/thinkpad_acpi.c: In function 'tpacpi_hotkey_driver_mask_set':
drivers/platform/x86/thinkpad_acpi.c:1968: error: lvalue required as left operand of assignment
Reported-by: Noah Dain <noahdain@gmail.com>
Reported-by: Audrius Kazukauskas <audrius@neutrino.lt>
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight:
backlight: new driver for ADP5520/ADP5501 MFD PMICs
backlight: extend event support to also support poll()
backlight/eeepc-laptop: Update the backlight state when we change brightness
backlight/acpi: Update the backlight state when we change brightness
backlight: Allow drivers to update the core, and generate events on changes
backlight: switch to da903x driver to dev_pm_ops
backlight: Add support for the Avionic Design Xanthos backlight device.
backlight: spi driver for LMS283GF05 LCD
backlight: move hp680-bl's probe function to .devinit.text
backlight: Add support for new Apple machines.
backlight: mbp_nvidia_bl: add support for MacBookAir 1,1
backlight: Add WM831x backlight driver
Trivial conflicts due to '#ifdef CONFIG_PM' differences in
drivers/video/backlight/da903x_bl.c
* remove asm/atomic.h inclusion from kref.h -- not needed, linux/types.h
is enough for atomic_t
* remove linux/kref.h inclusion from files which do not need it.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: Resume clocksource without taking the clocksource mutex
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
modules, tracing: Remove stale struct marker signature from module_layout()
tracing/workqueue: Use %pf in workqueue trace events
tracing: Fix a comment and a trivial format issue in tracepoint.h
tracing: Fix failure path in ftrace_regex_open()
tracing: Fix failure path in ftrace_graph_write()
tracing: Check the return value of trace_get_user()
tracing: Fix off-by-one in trace_get_user()
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Remove redundant non-NUMA topology functions
x86: early_printk: Protect against using the same device twice
x86: Reduce verbosity of "PAT enabled" kernel message
x86: Reduce verbosity of "TSC is reliable" message
x86: mce: Use safer ways to access MCE registers
x86: mce, inject: Use real inject-msg in raise_local
x86: mce: Fix thermal throttling message storm
x86: mce: Clean up thermal throttling state tracking code
x86: split NX setup into separate file to limit unstack-protected code
xen: check EFER for NX before setting up GDT mapping
x86: Cleanup linker script using new linker script macros.
x86: Use section .data.page_aligned for the idt_table.
x86: convert to use __HEAD and HEAD_TEXT macros.
x86: convert compressed loader to use __HEAD and HEAD_TEXT macros.
x86: fix fragile computation of vsyscall address
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (32 commits)
ACPI: i2c-scmi: don't use acpi_device_uid()
ACPI: simplify building device HID/CID list
ACPI: remove acpi_device_uid() and related stuff
ACPI: remove acpi_device.flags.hardware_id
ACPI: remove acpi_device.flags.compatible_ids
ACPI: maintain a single list of _HID and _CID IDs
ACPI: make sure every acpi_device has an ID
ACPI: use acpi_device_hid() when possible
ACPI: fix synthetic HID for \_SB_
ACPI: handle re-enumeration, when acpi_devices might already exist
ACPI: factor out device type and status checking
ACPI: add acpi_bus_get_status_handle()
ACPI: use acpi_walk_namespace() to enumerate devices
ACPI: identify device tree root by null parent pointer, not ACPI_BUS_TYPE
ACPI: enumerate namespace before adding functional fixed hardware devices
ACPI: convert acpi_bus_scan() to operate on an acpi_handle
ACPI: add acpi_bus_get_parent() and remove "parent" arguments
ACPI: remove unnecessary argument checking
ACPI: remove redundant "type" arguments
ACPI: remove acpi_device_set_context() "type" argument
...
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: fix locking and list handling code in cifs_open and its helper
[CIFS] Remove build warning
cifs: fix problems with last two commits
[CIFS] Fix build break when keys support turned off
cifs: eliminate cifs_init_private
cifs: convert oplock breaks to use slow_work facility (try #4)
cifs: have cifsFileInfo hold an extra inode reference
cifs: take read lock on GlobalSMBSes_lock in is_valid_oplock_break
cifs: remove cifsInodeInfo.oplockPending flag
cifs: fix oplock request handling in posix codepath
[CIFS] Re-enable Lanman security
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
at91_can: Forgotten git 'add' of at91_can.c
TI Davinci EMAC: Fix in vector definition for EMAC_VERSION_2
ax25: Fix ax25_cb refcounting in ax25_ctl_ioctl
virtio_net: Check for room in the vq before adding buffer
virtio_net: avoid (most) NETDEV_TX_BUSY by stopping queue early.
virtio_net: formalize skb_vnet_hdr
virtio_net: don't free buffers in xmit ring
virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb.
virtio_net: skb_orphan() and nf_reset() in xmit path.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: vio: Kill BUILD_BUG_ON() in vio_dring_avail().
Trivial conflict in arch/sparc/include/asm/vio.h due to David removing
the whole messy BUG_ON that was confused.
Commit 200b812d00 "Clear the exclusive monitor when returning from an
exception" broke the vast majority of ARM systems in the wild which are
still pre ARMv6. The kernel is crashing on the first occurrence of an
exception due to the removal of the actual return instruction for them.
Let's add it back.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extend the backlight event support to also allow the use of
poll()/select() on actual_brightness.
We already have the entire event hookup anyway, adding a single
function call in one line to get functionality like that is a really
good deal.
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
We recently removed the acpi_device_uid() interface because nobody
used it. I don't think it's essential here either.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.
This fixes a problem with commit 56a131dcf7
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
In the emac_poll function when looking for interrupt status masks
correct definition must be chosen based on EMAC_VERSION(the bit
mask has changed from version 1 to version 2).
Signed-off-by: Sriram <srk@ti.com>
Acked-by: Chaithrika U S <chaithrika@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ax25_cb_put after ax25_find_cb in ax25_ctl_ioctl.
Reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Reviewed-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>