The hugepage arch code provides a number of hook functions/macros
which mirror the functionality of various normal page pte access
functions. Various changes in the normal page accessors (in
particular BenH's recent changes to the handling of lazy icache
flushing and PAGE_EXEC) have caused the hugepage versions to get out
of sync with the originals. In some cases, this is a bug, at least on
some MMU types.
One of the reasons that some hooks were not identical to the normal
page versions, is that the fact we're dealing with a hugepage needed
to be passed down do use the correct dcache-icache flush function.
This patch makes the main flush_dcache_icache_page() function hugepage
aware (by checking for the PageCompound flag). That in turn means we
can make set_huge_pte_at() just a call to set_pte_at() bringing it
back into sync. As a bonus, this lets us remove the
hash_huge_page_do_lazy_icache() function, replacing it with a call to
the hash_page_do_lazy_icache() function it was based on.
Some other hugepage pte access hooks - huge_ptep_get_and_clear() and
huge_ptep_clear_flush() - are not so easily unified, but this patch at
least brings them back into sync with the current versions of the
corresponding normal page functions.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch separates the parts of hugetlbpage.c which are inherently
specific to the hash MMU into a new hugelbpage-hash64.c file.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch simplifies the logic used to initialize hugepages on
powerpc. The somewhat oddly named set_huge_psize() is renamed to
add_huge_page_size() and now does all necessary verification of
whether it's given a valid hugepage sizes (instead of just some) and
instantiates the generic hstate structure (but no more).
hugetlbpage_init() now steps through the available pagesizes, checks
if they're valid for hugepages by calling add_huge_page_size() and
initializes the kmem_caches for the hugepage pagetables. This means
we can now eliminate the mmu_huge_psizes array, since we no longer
need to pass the sizing information for the pagetable caches from
set_huge_psize() into hugetlbpage_init()
Determination of the default huge page size is also moved from the
hash code into the general hugepage code.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently each available hugepage size uses a slightly different
pagetable layout: that is, the bottem level table of pointers to
hugepages is a different size, and may branch off from the normal page
tables at a different level. Every hugepage aware path that needs to
walk the pagetables must therefore look up the hugepage size from the
slice info first, and work out the correct way to walk the pagetables
accordingly. Future hardware is likely to add more possible hugepage
sizes, more layout options and more mess.
This patch, therefore reworks the handling of hugepage pagetables to
reduce this complexity. In the new scheme, instead of having to
consult the slice mask, pagetable walking code can check a flag in the
PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
and the entry also contains the information (eseentially hugepage
shift) necessary to then interpret that table without recourse to the
slice mask. This scheme can be extended neatly to handle multiple
levels of self-describing "special" hugepage pagetables, although for
now we assume only one level exists.
This approach means that only the pagetable allocation path needs to
know how the pagetables should be set out. All other (hugepage)
pagetable walking paths can just interpret the structure as they go.
There already was a flag bit in PGD/PUD/PMD entries for hugepage
directory pointers, but it was only used for debug. We alter that
flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
pointer (normally it would be 1 since the pointer lies in the linear
mapping). This means that asm pagetable walking can test for (and
punt on) hugepage pointers with the same test that checks for
unpopulated page directory entries (beq becomes bge), since hugepage
pointers will always be positive, and normal pointers always negative.
While we're at it, we get rid of the confusing (and grep defeating)
#defining of hugepte_shift to be the same thing as mmu_huge_psizes.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we have a fair bit of rather fiddly code to manage the
various kmem_caches used to store page tables of various levels. We
generally have two caches holding some combination of PGD, PUD and PMD
tables, plus several more for the special hugepage pagetables.
This patch cleans this all up by taking a different approach. Rather
than the caches being designated as for PUDs or for hugeptes for 16M
pages, the caches are simply allocated to be a specific size. Thus
sharing of caches between different types/levels of pagetables happens
naturally. The pagetable size, where needed, is passed around encoded
in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
pagetable contains 2^n pointers.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, hpte_need_flush() only correctly flushes the given address
for normal pages. Callers for hugepages are required to mask the
address themselves.
But hpte_need_flush() already looks up the page sizes for its own
reasons, so this is a rather silly imposition on the callers. This
patch alters it to mask based on the pagesize it has looked up itself,
and removes the awkward masking code in the hugepage caller.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When running Active Memory Sharing, the Collaborative Memory Manager (CMM)
may mark some pages as "loaned" with the hypervisor. Periodically, the
CMM will query the hypervisor for a loan request, which is a single signed
value. When kexec'ing into a kdump kernel, the CMM driver in the kdump
kernel is not aware of the pages the previous kernel had marked as "loaned",
so the hypervisor and the CMM driver are out of sync. Fix the CMM driver
to handle this scenario by ignoring requests to decrease the number of loaned
pages if we don't think we have any pages loaned. Pages that are marked as
"loaned" which are not in the balloon will automatically get switched to "active"
the next time we touch the page. This also fixes the case where totalram_pages
is smaller than min_mem_mb, which can occur during kdump.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
get_irq_desc() is a powerpc-specific version of irq_to_desc(). That
is reason enough to remove it, but it also doesn't know about sparse
irq_desc support which irq_to_desc() does (when we enable it).
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rather than open-coding our own check, use irq_has_action()
to check if an irq has an action - ie. is "in use".
irq_has_action() doesn't take the descriptor lock, but it
shouldn't matter - we're just using it as an indicator
that the irq is in use. disable_irq_nosync() will take
the descriptor lock before doing anything also.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The irq_desc array consumes quite a lot of space, and for systems
that don't need or can't have 512 irqs it's just wasted space.
The first 16 are reserved for ISA, so the minimum of 32 is really
16 - and no one has asked for more than 512 so leave that as the
maximum.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The CHRP code has some fishy timer based code to scan the RTAS event
log, which uses a 1KB stack buffer and doesn't even use the results.
The pSeries code as a nicer daemon that allows userspace to read the
event log and basically uses the same RTAS interface
This patch moves rtasd.c out of platform/pseries and makes it usable
by CHRP, after removing the old crufty event log mechanism in there.
The nvram logging part of the daemon is still only available on 64-bit
since the underlying nvram management routines aren't currently shared.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some of the stuff in /proc/ppc64 such as the RTAS bits are actually
useful to some 32-bit platforms. Rename the file, and create a
symlink on 64-bit for backward compatibility
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Just as with kexec, hibernation may fail even on well-tested platforms:
some PCI device, a driver of which doesn't play well with hibernation,
is enough to break resuming.
Hibernation code is not much platform dependent, and hiding features only
because these were not verified on a particular hardware is
counterproductive: we just prevent the features from being widely tested.
For example, with this patch I just tested hibernation on a MPC83xx
board, and it works quite well, modulo a few drivers that need some
fixing.
So, let's make it possible to select hibernation support for all
PowerPCs, then let's wait for any possible bug reports, and actually fix
(or just collect ;-) the bugs instead of hiding them. If some platforms
really can't stand hibernation, we can make a blacklist, with proper
comments why exactly hibernation doesn't work, whether it is possible to
fix, and what needs to be done to fix it.
CONFIG_HIBERNATION is still =n by default, so the commit doesn't change
anything apart from ability to set it to =y.
I'm not sure if EXPERIMENTAL dependency is needed, I'd rather not add it
for a few reasons:
1) It doesn't matter much, for distro kernels user has no clue that some
feature is experimental. Majority of defconfigs enable EXPERIMENTAL
anyway (90 vs. 4, which, btw, means that EXPERIMENTAL is overused
in Kconfigs);
2) EXPERIMENTAL is a good thing for features that change default
behaviour of a kernel, while for hibernation user has to explicitly
issue 'echo disk > /sys/power/state' to trigger any hibernation bugs;
3) Per init/Kconfig, EXPERIMENTAL is a good thing to scare and discourage
users from 'widespread use of a feature', while we want to encourage
that use.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The non-debug case in ps3/mm.c uses pr_debug(), so that the compiler
still does type checks etc. and doesn't complain about unused
variables in the non-debug case.
However with DEBUG=n and CONFIG_DYNAMIC_DEBUG=y there's still code
generated for those pr_debugs().
size before:
text data bss dec hex filename
17553 4112 88 21753 54f9 arch/powerpc/platforms/ps3/mm.o
size after:
text data bss dec hex filename
7377 776 88 8241 2031 arch/powerpc/platforms/ps3/mm.o
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule
powerpc: Minor cleanup to lib/Kconfig.debug
powerpc: Minor cleanup to sound/ppc/Kconfig
powerpc: Minor cleanup to init/Kconfig
powerpc: Limit memory hotplug support to PPC64 Book-3S machines
powerpc: Limit hugetlbfs support to PPC64 Book-3S machines
powerpc: Fix compile errors found by new ppc64e_defconfig
powerpc: Add a Book-3E 64-bit defconfig
powerpc/booke: Fix xmon single step on PowerPC Book-E
powerpc: Align vDSO base address
powerpc: Fix segment mapping in vdso32
powerpc/iseries: Remove compiler version dependent hack
powerpc/perf_events: Fix priority of MSR HV vs PR bits
powerpc/5200: Update defconfigs
drivers/serial/mpc52xx_uart.c: Use UPIO_MEM rather than SERIAL_IO_MEM
powerpc/boot/dts: drop obsolete 'fsl5200-clocking'
of: Remove nested function
mpc5200: support for the MAN mpc5200 based board mucmc52
mpc5200: support for the MAN mpc5200 based board uc101
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
omap4: Fix UART4 platform data on omap4
omap4: Allow omap_serial_early_init() for OMAP4430 board
omap3: PM: enable UART3 module wakeups
omap2: Fix console serial port number for n8x0
omap2: Fix detection of n8x0
omap1: Fix DSP public peripherals support for ams-delta
omap1: Fix redundant UARTs pin muxing that can break other hardware support
omap: iommu: fix wrong condition check for SUPERSECTION
omap: SDMA: Fix omap_stop_dma() API for channel linking
omap: Fix omap-keypad by restoring old keypad.h without breaking omap2 boards that use matrix_keypad
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Do less agressive buddy clearing
sched: Disable SD_PREFER_LOCAL for MC/CPU domains
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, UV: Set DELIVERY_MODE=4 for vector=NMI_VECTOR in uv_hub_send_ipi()
x86, UV: Fix and clean up bau code to use uv_gpa_to_pnode()
x86: Don't print number of MCE banks for every CPU
x86, UV: Fix information in __uv_hub_info structure
x86: Document linker script ASSERT() quirk
xen_setup_stackprotector() ends up trying to set page protections,
so we need to have vm_mmu_ops set up before trying to do so.
Failing to do so causes an early boot crash.
[ Impact: Fix early crash under Xen. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Based on an original patch by Valentine Barshak <vbarshak@ru.mvista.com>
Use preempt_schedule_irq to prevent infinite irq-entry and
eventual stack overflow problems with fast-paced IRQ sources.
This kind of problems has been observed on the PASemi Electra IDE
controller. We have to make sure we are soft-disabled before calling
preempt_schedule_irq and hard disable interrupts after that
to avoid unrecoverable exceptions.
This patch also moves the "clrrdi r9,r1,THREAD_SHIFT" out of
the #ifdef CONFIG_PPC_BOOK3E scope, since r9 is clobbered
and has to be restored in both cases.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix the following 3 issues:
arch/powerpc/kernel/process.c: In function 'arch_randomize_brk':
arch/powerpc/kernel/process.c:1183: error: 'mmu_highuser_ssize' undeclared (first use in this function)
arch/powerpc/kernel/process.c:1183: error: (Each undeclared identifier is reported only once
arch/powerpc/kernel/process.c:1183: error: for each function it appears in.)
arch/powerpc/kernel/process.c:1183: error: 'MMU_SEGSIZE_1T' undeclared (first use in this function)
In file included from arch/powerpc/kernel/setup_64.c:60:
arch/powerpc/include/asm/mmu-hash64.h:132: error: redefinition of 'struct mmu_psize_def'
arch/powerpc/include/asm/mmu-hash64.h:159: error: expected identifier or '(' before numeric constant
arch/powerpc/include/asm/mmu-hash64.h:396: error: conflicting types for 'mm_context_t'
arch/powerpc/include/asm/mmu-book3e.h:184: error: previous declaration of 'mm_context_t' was here
cc1: warnings being treated as errors
arch/powerpc/kernel/pci_64.c: In function 'pcibios_unmap_io_space':
arch/powerpc/kernel/pci_64.c💯 error: unused variable 'res'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This defconfig's purpose at this time is to help catch compile errors
between Book-3S and Book-3E support in ppc64. It is based on the
ppc64_defconfig with some things disabled that we dont support on
Book-3E right now (hugetlbfs, slices, etc.)
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Prior to the arch/ppc -> arch/powerpc transition, xmon had support for single
stepping on 4xx boards. The functionality was lost when arch/ppc was removed.
This patch restores single step support for 44x boards, and Book-E in general.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ABI specifies a 64K alignment, we need to map the vDSO accordingly
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Due to missing segment assignments the .text section was put in the NOTES
segment (and marked as NOTE section), and the .got was put in the DYNAMIC
segment.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The creation of the flattened device tree depended on the compiler
putting the constant strings for an object in a section with a
particular name. This was changed with recent compilers. Do this
explicitly instead.
Without this patch, iseries kernels may silently not boot when built with
some compilers.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The architecture defines that if MSR PR is set we are in problem state
irrespective of the HV bit. This fixes perf events to reflect this.
Also, on bare metal systems, samples taken in Linux will now be reported
as kernel rather than hypervisor.
Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: paulus@samba.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The hugetlb dependencies presently depend on SUPERH && MMU while the
hugetlb page size definitions depend on CPU_SH4 or CPU_SH5. This
unfortunately allows SH-3 + MMU configurations to enable hugetlbfs
without a corresponding HPAGE_SHIFT definition, resulting in the build
blowing up.
As SH-3 doesn't support variable page sizes, we tighten up the
dependenies a bit to prevent hugetlbfs from being enabled. These days
we also have a shiny new SYS_SUPPORTS_HUGETLBFS, so switch to using
that rather than adding to the list of corner cases in fs/Kconfig.
Reported-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add an uImage.bin target to allow uncompressed uImages.
Useful for boards with busted u-boot decompression like
the rsk7203 on my desk.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
When CONFIG_FUNCTION_GRAPH_TRACER is enabled the function graph tracer
may patch return addresses on the stack with the address of
return_to_handler(). This really confuses the DWARF unwinder because it
will try find the caller of return_to_handler(), not the caller of the
real return address.
So teach the DWARF unwinder how to find the real return address whenever
it encounters return_to_handler().
This patch does not cope very well when multiple return addresses on the
stack have been patched. To make it work properly it would require state
to track how many return_to_handler()'s have been seen so that we'd know
where to look in current->curr_ret_stack[]. So for now, instead of
trying to handle this, just moan if more than one return address on the
stack has been patched.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This adds an __irq_entry annotation for do_IRQ() so that the IRQ
annotation in the function graph tracer works as advertized. We already
have the IRQENTRY section wired up, so this is just a trivial addition
to actually make use of it.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch removes the unnecessary UART4 platform which is under
data is wrong because of this
There is a separate platform structure for UART4
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-By: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This patch enables omap_serial_early_init() function for OMAP4430
SDP. Without this the bootup would throw oops in omap_serial_init().
Note that the ifndef CONFIG_ARCH_OMAP4 is split into two sections
to enable omap_serial_early_init(). This ifndef cannot be removed
until omap4 clock framework is implemented.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-By: Tony Lindgren <tony@atomide.com>
Reviewed-By: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
UART3 is in the PER powerdomain. If PER goes idle/inactive
independently of CORE, for UART3 to wakeup it must have its wakeup
enable bits setup in PM_WKEN_PER. This patch enables these bits.
The reason it works when PER and CORE work together is because when
CORE goes inactive/retention, the IOPAD wakeups are enabled and
trigger UART3 wakeup.
Without this patch, when the UART inactivity timer fires for UART3,
its clocks are disabled and it's unable to wakeup so will be unusable
until PER is awoken by another source.
Another way of testing is by keeping CORE on during suspend but
allowing PER to hit retention
# echo 3 > /debug/pm_debug/core_pwrdm/suspend
then enter suspend
# echo mem > /sys/power/state
Without this patch, UART3 will be unable to wakeup the system.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
With the recent changes omap serial ports match the physical
numbering like they should. Fix the kernel CMDLINE accordingly
so console works.
Signed-off-by: Tony Lindgren <tony@atomide.com>
DSP public peripherals used to work on OMAP1510 based (or all OMAP1 class?)
machines as long as old dspgateway code were present in the l-o tree. For
several months it is no longer included, breaking support for McBSP1 based
audio on Amstrad Delta, for example.
This patch, derived from the old dspgateway code, corrects the problem for the
board by simply taking the DSP out of reset state, I guess. That way, things
should not break when a new dsp code is added to the tree, and the change can
be reverted then.
If there are any reports on McBSP1 or other DSP public peripherals not working
for other OMAP1 machines (I've not heard of any for now), I can prepare a more
general patch providing an extra include file with a helper function defined.
Created and tested against linux-2.6.32-rc5
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 15ac408ee5 removed enabled_uart
and OMAP_TAG_UART. This works for mach-omap2, but causes issues on
mach-omap1 for some boards as the mach-omap1 serial.c was muxing
pins based on the enabled_uart flag for 15xx.
Fix this by muxing pins in board-*.c files for the 15xx boards for
the uart ports that had enabled_uart flag set before the commit
above.
Tested on Amsdtrad Delta only.
Note that in the future we should add support for powering down
the uarts with a timer like mach-omap2/serial.c does. Otherwise
the enabled uarts will be blocking retention-while-idle.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Signed-off-by: Tony Lindgren <tony@atomide.com>
A bit (2 << 0) is set both on SECTION and SUPERSECTION. To identify
SUPERSECTION correctly, other bits should be compared too.
Reported-by: "Srinivas Pulukuru" <srinivas.pulukuru@ti.com>
Signed-off-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
OMAP sDMA driver API omap_stop_dma() doesn't really stop the dma when used
in linking scenario.
The DMA channel needs to be disabled before resetting the chain.
Also fix clearing of the OMAP_DMA_ACTIVE status in the linked case.
Cc: Hari n <hari.zoom@gmail.com>
Cc: Jarkko Nikula <jhnikula@gmail.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Only mach-omap2 boards are currently using matrix_keypad. Allow
mach-omap1 boards to use the old style keypad.h without breaking.
Created against linux-2.6.32-rc5.
Compile tested with omap_3430sdp_defconfig and rx51_defconfig.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Signed-off-by: Tony Lindgren <tony@atomide.com>