These build errors on some non-x86 platforms (PowerPC for example):
mm/page_alloc.c: In function '__alloc_memory_core_early':
mm/page_alloc.c:3468: error: implicit declaration of function 'find_early_area'
mm/page_alloc.c:3483: error: implicit declaration of function 'reserve_early_without_check'
The function is only needed on CONFIG_NO_BOOTMEM.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: Mel Gorman <mel@csn.ul.ie>
LKML-Reference: <4B747239.4070907@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes the range reservation feature available to other
architectures.
-v2: add get_max_mapped, max_pfn_mapped only defined in x86...
to fix PPC compiling
-v3: according to hpa, add CONFIG_HAVE_EARLY_RES
-v4: fix typo about EARLY_RES in config
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B7B5723.4070009@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
... so we can move early_res up.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-27-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
... in preparation of moving early_res to kernel/.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-26-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Let's make 32bit consistent with 64bit.
-v2: Andrew pointed out for 32bit that we should use -1ULL
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-25-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
... to make it always try to start from low at first.
This makes it less likely for early_memtest to reserve a bad range, in
particular it puts new early_res in a range that is already tested.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-24-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Makes early_res.c more clean, so later could move it to /kernel.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-23-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Prepare to move bck find_e820_area_size back to e820.c.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-22-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
... to make e820.c smaller.
-v2: fix 32bit compiling with MAX_DMA32_PFN
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-21-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
So prepare to make one more clean of early_res.c.
-v2: don't need to reserve first page in early_res
because we already mark that in e820 as reserved already.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-20-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add vmemmap_alloc_block_buf for mem map only.
It will fallback to the old way if it cannot get a block that big.
Before this patch, when a node have 128g ram installed, memmap are
split into two parts or more.
[ 0.000000] [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1
[ 0.000000] [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1
[ 0.000000] [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0
[ 0.000000] [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0
[ 0.000000] [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0
[ 0.000000] [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2
[ 0.000000] [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2
[ 0.000000] [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2
[ 0.000000] [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3
[ 0.000000] [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3
[ 0.000000] [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4
[ 0.000000] [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4
[ 0.000000] [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5
[ 0.000000] [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5
[ 0.000000] [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5
[ 0.000000] [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6
[ 0.000000] [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6
[ 0.000000] [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6
[ 0.000000] [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7
[ 0.000000] [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7
after patch will get
[ 0.000000] [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0
[ 0.000000] [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1
[ 0.000000] [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2
[ 0.000000] [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3
[ 0.000000] [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4
[ 0.000000] [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5
[ 0.000000] [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6
[ 0.000000] [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7
-v2: change buf to vmemmap_buf instead according to Ingo
also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo
-v3: according to Andrew, use sizeof(name) instead of hard coded 15
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-19-git-send-email-yinghai@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Could save some buffer space instead of applying one by one.
Could help that system that is going to use early_res instead of bootmem
less entries in early_res make search more faster on system with more memory.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-18-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Finally we can use early_res to replace bootmem for x86_64 now.
Still can use CONFIG_NO_BOOTMEM to enable it or not.
-v2: fix 32bit compiling about MAX_DMA32_PFN
-v3: folded bug fix from LKML message below
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B747239.4070907@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
64bit NUMA already make enough space under 4G with new early_node_mem.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-16-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Use early_res_count to track the num, and use find_e820 to get a new
buffer, then copy from the old to the new one.
Also, clear early_res to prevent later invalid usage.
-v2 _check_and_double_early_res should take new start
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-14-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
To prepare allocate early res array from fine_e820_area.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-13-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Simplify setup_node_mem: don't use bootmem from other node, instead
just find_e820_area in early_node_mem.
This keeps the boundary between early_res and boot mem more clear, and
lets us only call early_res_to_bootmem() one time instead of for all
nodes.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-12-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
So we can check that early in the bootlog.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-11-git-send-email-yinghai@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
So make interface more consistent with early_res.
Later we can share some code with early_res.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-10-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Should be good for 32bit too.
-v3: cast res->start
-v4: according to Linus, to use %pR instead of cast
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-9-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Prepare for 32bit pci root bus
-v2: hpa said we should compare with (resource_size_t)~0
-v3: according to Linus to use MAX_RESOURCE instead.
also need need to put related patches together
-v4: according to Andrew, use min in cap_resource()
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-8-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Prepare to enable it for 32bit.
-v2: remove not needed cast
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-7-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Found MSI amd k8 based laptops is hiding [0x70000000, 0x80000000) RAM
from e820.
enable amd one chain even for all.
-v2: use bool for found, according to Andrew
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-6-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Prepare to enable 32bit intel and amd bus.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-5-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
We have almost the same code for mtrr cleanup and amd_bus checkup, and
this code will also be used in replacing bootmem with early_res,
so try to move them together and reuse it from different parts.
Also rename update_range to subtract_range as that is what the
function is actually doing.
-v2: update comments as Christoph requested
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-4-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Rename add_range() to add_bus_range() to avoid conflict with the
naming of the generic range manipulation functions.
LKML-Reference: <1265793639-15071-4-git-send-email-yinghai@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PIT: control word is write-only
kvmclock: count total_sleep_time when updating guest clock
Export the symbol of getboottime and mmonotonic_to_bootbased
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: clean up memory allocation in at32_add_device_mci
arch/avr32: Fix build failure for avr32 caused by typo
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: fix dentry hash calculation for case-insensitive mounts
[CIFS] Don't cache timestamps on utimes due to coarse granularity
[CIFS] Maximum username length check in session setup does not match
cifs: fix length calculation for converted unicode readdir names
[CIFS] Add support for TCP_NODELAY
Commit f71dc176aa 'Make
hpte_need_flush() correctly mask for multiple page sizes' introduced
bug, which is triggered when a kernel with a 64k base page size is run
on a system whose hardware does not 64k hash PTEs. In this case, we
emulate 64k pages with multiple 4k hash PTEs, however in
hpte_need_flush() we incorrectly only mask the hardware page size from
the address, instead of the logical page size. This causes things to
go wrong when we later attempt to iterate through the hardware
subpages of the logical page.
This patch corrects the error. It has been tested on pSeries bare
metal by Michael Neuling.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
======
This fix is related to
http://bugzilla.kernel.org/show_bug.cgi?id=15142
but does not address that exact issue.
======
sysfs does like attributes being removed while they are being accessed
(i.e. read or written) and waits for the access to complete.
As accessing some md attributes takes the same lock that is held while
removing those attributes a deadlock can occur.
This patch addresses 3 issues in md that could lead to this deadlock.
Two relate to calling flush_scheduled_work while the lock is held.
This is probably a bad idea in general and as we use schedule_work to
delete various sysfs objects it is particularly bad.
In one case flush_scheduled_work is called from md_alloc (called by
md_probe) called from do_md_run which holds the lock. This call is
only present to ensure that ->gendisk is set. However we can be sure
that gendisk is always set (though possibly we couldn't when that code
was originally written. This is because do_md_run is called in three
different contexts:
1/ from md_ioctl. This requires that md_open has succeeded, and it
fails if ->gendisk is not set.
2/ from writing a sysfs attribute. This can only happen if the
mddev has been registered in sysfs which happens in md_alloc
after ->gendisk has been set.
3/ from autorun_array which is only called by autorun_devices, which
checks for ->gendisk to be set before calling autorun_array.
So the call to md_probe in do_md_run can be removed, and the check on
->gendisk can also go.
In the other case flush_scheduled_work is being called in do_md_stop,
purportedly to wait for all md_delayed_delete calls (which delete the
component rdevs) to complete. However there really isn't any need to
wait for them - they have already been disconnected in all important
ways.
The third issue is that raid5->stop() removes some attribute names
while the lock is held. There is already some infrastructure in place
to delay attribute removal until after the lock is released (using
schedule_work). So extend that infrastructure to remove the
raid5_attrs_group.
This does not address all lockdep issues related to the sysfs
"s_active" lock. The rest can be address by splitting that lockdep
context between symlinks and non-symlinks which hopefully will happen.
Signed-off-by: NeilBrown <neilb@suse.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix p9_client_destroy unconditional calling v9fs_put_trans
9p: fix memory leak in v9fs_parse_options()
9p: Fix the kernel crash on a failed mount
9p: fix option parsing
9p: Include fsync support for 9p client
net/9p: fix statsize inside twstat
net/9p: fail when user specifies a transport which we can't find
net/9p: fix virtio transport to correctly update status on connect
Current kvm wallclock does not consider the total_sleep_time which could cause
wrong wallclock in guest after host suspend/resume. This patch solve
this issue by counting total_sleep_time to get the correct host boot time.
Cc: stable@kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Export getboottime and monotonic_to_bootbased in order to let them
could be used by following patch.
Cc: stable@kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If a CHPID is offline during a device shutdown the ccw_device_halt|clear
may fail and the qdio device stays in state STOPPED until the shutdown is
finished. If an interrupt occurs before the device is set to INACTIVE
the STOPPED state triggers a WARN_ON in the interrupt handler.
Prevent this WARN_ON by catching the STOPPED state in the interrupt
handler.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Inbound traffic handling may hang if next buffer to check is in
state ERROR, polling is stopped and the final check for further
available inbound buffers disregards buffers in state ERROR.
This patch includes state ERROR when checking availability of
more inbound buffers.
Cc: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Test the value that was just allocated rather than the previously tested one.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r@
expression *x;
expression e;
identifier l;
@@
if (x == NULL || ...) {
... when forall
return ...; }
... when != goto l;
when != x = e
when != &x
*x == NULL
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the rc period, patches that are not bugfixes
should be done using the net-next tree.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The action modules have been prefixed with 'act_', but the Kconfig
description was not changed.
Signed-off-by: Jan Luebbe <jluebbe@debian.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>