Commit Graph

2997 Commits

Author SHA1 Message Date
Paul Gortmaker
b56eade55d powerpc: Fix up implicit stat.h users
They get it via module.h (via device.h) but we want to clean that up.
When we do, we'll get things like:

ibmebus.c:314: error: 'S_IWUSR' undeclared here (not in a function)
vio.c:972: error: 'S_IWUSR' undeclared here (not in a function)

so add in the stat header it is using explicitly in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:39 -04:00
Paul Gortmaker
9308794884 powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE
Fix failures in powerpc associated with the previously allowed
implicit module.h presence that now lead to things like this:

arch/powerpc/mm/mmu_context_hash32.c:76:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/powerpc/mm/tlb_hash32.c:48:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/kernel/pci_32.c:51:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/powerpc/kernel/iomap.c:36:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/platforms/44x/canyonlands.c:126:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/kvm/44x.c:168:59: error: 'THIS_MODULE' undeclared (first use in this function)

[with several contibutions from Stephen Rothwell <sfr@canb.auug.org.au>]

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:38 -04:00
Paul Gortmaker
66b15db69c powerpc: add export.h to files making use of EXPORT_SYMBOL
With module.h being implicitly everywhere via device.h, the absence
of explicitly including something for EXPORT_SYMBOL went unnoticed.
Since we are heading to fix things up and clean module.h from the
device.h file, we need to explicitly include these files now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:37 -04:00
Paul Gortmaker
333a151822 powerpc: io-workarounds.c was implicitly getting init_mm
It was coming in via device.h --> module.h etc. but we want to
clean that up.  So explicitly include the header where init_mm
is being declared.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:37 -04:00
Linus Torvalds
1bc87b0055 Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (75 commits)
  KVM: SVM: Keep intercepting task switching with NPT enabled
  KVM: s390: implement sigp external call
  KVM: s390: fix register setting
  KVM: s390: fix return value of kvm_arch_init_vm
  KVM: s390: check cpu_id prior to using it
  KVM: emulate lapic tsc deadline timer for guest
  x86: TSC deadline definitions
  KVM: Fix simultaneous NMIs
  KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode
  KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode
  KVM: x86 emulator: streamline decode of segment registers
  KVM: x86 emulator: simplify OpMem64 decode
  KVM: x86 emulator: switch src decode to decode_operand()
  KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack
  KVM: x86 emulator: switch OpImmUByte decode to decode_imm()
  KVM: x86 emulator: free up some flag bits near src, dst
  KVM: x86 emulator: switch src2 to generic decode_operand()
  KVM: x86 emulator: expand decode flags to 64 bits
  KVM: x86 emulator: split dst decode to a generic decode_operand()
  KVM: x86 emulator: move memop, memopp into emulation context
  ...
2011-10-30 15:36:45 -07:00
Paul Mackerras
19ccb76a19 KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code
With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel.  This is inefficient.

This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode.  When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle.  Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction.  When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.

This has some other ramifications.  First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle.  This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running.  In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.

This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.

Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.

Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.

Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-09-25 19:52:30 +03:00
Paul Mackerras
0214394760 KVM: PPC: book3s_pr: Simplify transitions between virtual and real mode
This simplifies the way that the book3s_pr makes the transition to
real mode when entering the guest.  We now call kvmppc_entry_trampoline
(renamed from kvmppc_rmcall) in the base kernel using a normal function
call instead of doing an indirect call through a pointer in the vcpu.
If kvm is a module, the module loader takes care of generating a
trampoline as it does for other calls to functions outside the module.

kvmppc_entry_trampoline then disables interrupts and jumps to
kvmppc_handler_trampoline_enter in real mode using an rfi[d].
That then uses the link register as the address to return to
(potentially in module space) when the guest exits.

This also simplifies the way that we call the Linux interrupt handler
when we exit the guest due to an external, decrementer or performance
monitor interrupt.  Instead of turning on the MMU, then deciding that
we need to call the Linux handler and turning the MMU back off again,
we now go straight to the handler at the point where we would turn the
MMU on.  The handler will then return to the virtual-mode code
(potentially in the module).

Along the way, this moves the setting and clearing of the HID5 DCBZ32
bit into real-mode interrupts-off code, and also makes sure that
we clear the MSR[RI] bit before loading values into SRR0/1.

The net result is that we no longer need any code addresses to be
stored in vcpu->arch.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-09-25 19:52:29 +03:00
Paul Mackerras
177339d7f7 KVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately
This makes arch/powerpc/kvm/book3s_rmhandlers.S and
arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
separate compilation units rather than having them #included in
arch/powerpc/kernel/exceptions-64s.S.  We no longer have any
conditional branches between the exception prologs in
exceptions-64s.S and the KVM handlers, so there is no need to
keep their contents close together in the vmlinux image.

In their current location, they are using up part of the limited
space between the first-level interrupt handlers and the firmware
NMI data area at offset 0x7000, and with some kernel configurations
this area will overflow (e.g. allyesconfig), leading to an
"attempt to .org backwards" error when compiling exceptions-64s.S.

Moving them out requires that we add some #includes that the
book3s_{,hv_}rmhandlers.S code was previously getting implicitly
via exceptions-64s.S.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-09-25 19:52:28 +03:00
Timur Tabi
dcd83aaff1 tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver
The ePAPR embedded hypervisor specification provides an API for "byte
channels", which are serial-like virtual devices for sending and receiving
streams of bytes.  This driver provides Linux kernel support for byte
channels via three distinct interfaces:

1) An early-console (udbg) driver.  This provides early console output
through a byte channel.  The byte channel handle must be specified in a
Kconfig option.

2) A normal console driver.  Output is sent to the byte channel designated
for stdout in the device tree.  The console driver is for handling kernel
printk calls.

3) A tty driver, which is used to handle user-space input and output.  The
byte channel used for the console is designated as the default tty.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-23 10:32:56 -07:00
Benjamin Herrenschmidt
a85fe3fce8 powerpc: Really fix build without CONFIG_PCI
Brown paper bag day, previous commit wouldn't work very well with modules
enabled. Move the exports into the ifdef.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-11 01:15:44 +10:00
Benjamin Herrenschmidt
81210c2062 powerpc: Fix build without CONFIG_PCI
Commit fea80311a9
"iomap: make IOPORT/PCI mapping functions conditional"

Broke powerpc build without CONFIG_PCI as we would still define
pci_iomap(), which overlaps with the new empty inline in the headers.

Make our implementation conditional on CONFIG_PCI

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 16:01:20 +10:00
Anton Blanchard
8aa6d35929 powerpc: Move kdump default base address to half RMO size on 64bit
We are seeing boot failures on some very large boxes even with
commit b5416ca9f8 (powerpc: Move kdump default base address to
64MB on 64bit).

This patch halves the RMO so both kernels get about the same
amount of RMO memory. On large machines this region will be
at least 256MB, so each kernel will get 128MB.

We cap it at 256MB (small SLB size) since some early allocations need
to be in the bolted SLB region. We could relax this on machines with
1TB SLBs in a future patch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 14:47:56 +10:00
David Ahern
b59a1bfcc2 powerpc/perf: Disable pagefaults during callchain stack read
Panic observed on an older kernel when collecting call chains for
the context-switch software event:

 [<b0180e00>]rb_erase+0x1b4/0x3e8
 [<b00430f4>]__dequeue_entity+0x50/0xe8
 [<b0043304>]set_next_entity+0x178/0x1bc
 [<b0043440>]pick_next_task_fair+0xb0/0x118
 [<b02ada80>]schedule+0x500/0x614
 [<b02afaa8>]rwsem_down_failed_common+0xf0/0x264
 [<b02afca0>]rwsem_down_read_failed+0x34/0x54
 [<b02aed4c>]down_read+0x3c/0x54
 [<b0023b58>]do_page_fault+0x114/0x5e8
 [<b001e350>]handle_page_fault+0xc/0x80
 [<b0022dec>]perf_callchain+0x224/0x31c
 [<b009ba70>]perf_prepare_sample+0x240/0x2fc
 [<b009d760>]__perf_event_overflow+0x280/0x398
 [<b009d914>]perf_swevent_overflow+0x9c/0x10c
 [<b009db54>]perf_swevent_ctx_event+0x1d0/0x230
 [<b009dc38>]do_perf_sw_event+0x84/0xe4
 [<b009dde8>]perf_sw_event_context_switch+0x150/0x1b4
 [<b009de90>]perf_event_task_sched_out+0x44/0x2d4
 [<b02ad840>]schedule+0x2c0/0x614
 [<b0047dc0>]__cond_resched+0x34/0x90
 [<b02adcc8>]_cond_resched+0x4c/0x68
 [<b00bccf8>]move_page_tables+0xb0/0x418
 [<b00d7ee0>]setup_arg_pages+0x184/0x2a0
 [<b0110914>]load_elf_binary+0x394/0x1208
 [<b00d6e28>]search_binary_handler+0xe0/0x2c4
 [<b00d834c>]do_execve+0x1bc/0x268
 [<b0015394>]sys_execve+0x84/0xc8
 [<b001df10>]ret_from_syscall+0x0/0x3c

A page fault occurred walking the callchain while creating a perf
sample for the context-switch event. To handle the page fault the
mmap_sem is needed, but it is currently held by setup_arg_pages.
(setup_arg_pages calls shift_arg_pages with the mmap_sem held.
shift_arg_pages then calls move_page_tables which has a cond_resched
at the top of its for loop - hitting that cond_resched is what caused
the context switch.)

This is an extension of Anton's proposed patch:
https://lkml.org/lkml/2011/7/24/151
adding case for 32-bit ppc.

Tested on the system that first generated the panic and then again
with latest kernel using a PPC VM. I am not able to test the 64-bit
path - I do not have H/W for it and 64-bit PPC VMs (qemu on Intel)
is horribly slow.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 14:47:56 +10:00
Anton Blanchard
fbafd72815 powerpc: Clean up some panic messages in prom_init
Add a newline to the panic messages in make_room. Also fix a
comment that suggested our chunk size is 4Mb. It's 1MB.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 14:47:55 +10:00
Anton Blanchard
966728dd88 powerpc: Fix device tree claim code
I have a box that fails in OF during boot with:

DEFAULT CATCH!, exception-handler=fff00400
at   %SRR0: 49424d2c4c6f6768   %SRR1: 800000004000b002

ie "IBM,Logh". OF got corrupted with a device tree string.

Looking at make_room and alloc_up, we claim the first chunk (1 MB)
but we never claim any more. mem_end is always set to alloc_top
which is the top of our available address space, guaranteeing we will
never call alloc_up and claim more memory.

Also alloc_up wasn't setting alloc_bottom to the bottom of the
available address space.

This doesn't help the box to boot, but we at least fail with
an obvious error. We could relocate the device tree in a future
patch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 14:47:54 +10:00
Scott Wood
26ee97672e powerpc: Return the_cpu_ spec from identify_cpu
Commit af9eef3c7b caused cpu_setup to see
the_cpu_spec, rather than the source struct.  However, on 32-bit, the
return value of identify_cpu was being used for feature fixups, and
identify_cpu was returning the source struct.  So if cpu_setup patches
the feature bits, the update won't affect the fixups.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 14:47:54 +10:00
Linus Torvalds
3960ef326a Merge branch 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc
* 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc:
  ARM: Consolidate the clkdev header files
  ARM: set vga memory base at run-time
  ARM: convert PCI defines to variables
  ARM: pci: make pcibios_assign_all_busses use pci_has_flag
  ARM: remove unnecessary mach/hardware.h includes
  pci: move microblaze and powerpc pci flag functions into asm-generic
  powerpc: rename ppc_pci_*_flags to pci_*_flags

Fix up conflicts in arch/microblaze/include/asm/pci-bridge.h
2011-07-26 17:12:10 -07:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Linus Torvalds
184475029a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits)
  drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c
  powerpc/85xx: fix mpic configuration in CAMP mode
  powerpc: Copy back TIF flags on return from softirq stack
  powerpc/64: Make server perfmon only built on ppc64 server devices
  powerpc/pseries: Fix hvc_vio.c build due to recent changes
  powerpc: Exporting boot_cpuid_phys
  powerpc: Add CFAR to oops output
  hvc_console: Add kdb support
  powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon
  powerpc/irq: Quieten irq mapping printks
  powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs
  powerpc: Add mpt2sas driver to pseries and ppc64 defconfig
  powerpc: Disable IRQs off tracer in ppc64 defconfig
  powerpc: Sync pseries and ppc64 defconfigs
  powerpc/pseries/hvconsole: Fix dropped console output
  hvc_console: Improve tty/console put_chars handling
  powerpc/kdump: Fix timeout in crash_kexec_wait_realmode
  powerpc/mm: Fix output of total_ram.
  powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards
  powerpc: Correct annotations of pmu registration functions
  ...

Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and
drivers/cpufreq
2011-07-25 22:59:39 -07:00
Linus Torvalds
45b583b10a Merge 'akpm' patch series
* Merge akpm patch series: (122 commits)
  drivers/connector/cn_proc.c: remove unused local
  Documentation/SubmitChecklist: add RCU debug config options
  reiserfs: use hweight_long()
  reiserfs: use proper little-endian bitops
  pnpacpi: register disabled resources
  drivers/rtc/rtc-tegra.c: properly initialize spinlock
  drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
  drivers/rtc: add support for Qualcomm PMIC8xxx RTC
  drivers/rtc/rtc-s3c.c: support clock gating
  drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
  init: skip calibration delay if previously done
  misc/eeprom: add eeprom access driver for digsy_mtc board
  misc/eeprom: add driver for microwire 93xx46 EEPROMs
  checkpatch.pl: update $logFunctions
  checkpatch: make utf-8 test --strict
  checkpatch.pl: add ability to ignore various messages
  checkpatch: add a "prefer __aligned" check
  checkpatch: validate signature styles and To: and Cc: lines
  checkpatch: add __rcu as a sparse modifier
  checkpatch: suggest using min_t or max_t
  ...

Did this as a merge because of (trivial) conflicts in
 - Documentation/feature-removal-schedule.txt
 - arch/xtensa/include/asm/uaccess.h
that were just easier to fix up in the merge than in the patch series.
2011-07-25 21:00:19 -07:00
Amerigo Wang
c5f41752fd notifiers: sys: move reboot notifiers into reboot.h
It is not necessary to share the same notifier.h.

This patch already moves register_reboot_notifier() and
unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.

[amwang@redhat.com: make allyesconfig succeed on ppc64]
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:14 -07:00
Linus Torvalds
d3ec4844d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  fs: Merge split strings
  treewide: fix potentially dangerous trailing ';' in #defined values/expressions
  uwb: Fix misspelling of neighbourhood in comment
  net, netfilter: Remove redundant goto in ebt_ulog_packet
  trivial: don't touch files that are removed in the staging tree
  lib/vsprintf: replace link to Draft by final RFC number
  doc: Kconfig: `to be' -> `be'
  doc: Kconfig: Typo: square -> squared
  doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
  drivers/net: static should be at beginning of declaration
  drivers/media: static should be at beginning of declaration
  drivers/i2c: static should be at beginning of declaration
  XTENSA: static should be at beginning of declaration
  SH: static should be at beginning of declaration
  MIPS: static should be at beginning of declaration
  ARM: static should be at beginning of declaration
  rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
  Update my e-mail address
  PCIe ASPM: forcedly -> forcibly
  gma500: push through device driver tree
  ...

Fix up trivial conflicts:
 - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
 - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
 - drivers/net/r8169.c (just context changes)
2011-07-25 13:56:39 -07:00
Linus Torvalds
fcda12e7f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  modpost: Fix modpost's license checking V3
  module: add /sys/module/<name>/uevent files
  module: change attr callbacks to take struct module_kobject
  modules: make arch's use default loader hooks
  modules: add default loader hook implementations
  param: fix return value handling in param_set_*
2011-07-24 09:54:54 -07:00
Linus Torvalds
5fabc487c9 Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  KVM: IOMMU: Disable device assignment without interrupt remapping
  KVM: MMU: trace mmio page fault
  KVM: MMU: mmio page fault support
  KVM: MMU: reorganize struct kvm_shadow_walk_iterator
  KVM: MMU: lockless walking shadow page table
  KVM: MMU: do not need atomicly to set/clear spte
  KVM: MMU: introduce the rules to modify shadow page table
  KVM: MMU: abstract some functions to handle fault pfn
  KVM: MMU: filter out the mmio pfn from the fault pfn
  KVM: MMU: remove bypass_guest_pf
  KVM: MMU: split kvm_mmu_free_page
  KVM: MMU: count used shadow pages on prepareing path
  KVM: MMU: rename 'pt_write' to 'emulate'
  KVM: MMU: cleanup for FNAME(fetch)
  KVM: MMU: optimize to handle dirty bit
  KVM: MMU: cache mmio info on page fault path
  KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code
  KVM: MMU: do not update slot bitmap if spte is nonpresent
  KVM: MMU: fix walking shadow page table
  KVM guest: KVM Steal time registration
  ...
2011-07-24 09:07:03 -07:00
Jonas Bonn
66574cc054 modules: make arch's use default loader hooks
This patch removes all the module loader hook implementations in the
architecture specific code where the functionality is the same as that
now provided by the recently added default hooks.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-24 22:06:04 +09:30
Linus Torvalds
4d4abdcb1d Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits)
  perf: Remove the nmi parameter from the oprofile_perf backend
  x86, perf: Make copy_from_user_nmi() a library function
  perf: Remove perf_event_attr::type check
  x86, perf: P4 PMU - Fix typos in comments and style cleanup
  perf tools: Make test use the preset debugfs path
  perf tools: Add automated tests for events parsing
  perf tools: De-opt the parse_events function
  perf script: Fix display of IP address for non-callchain path
  perf tools: Fix endian conversion reading event attr from file header
  perf tools: Add missing 'node' alias to the hw_cache[] array
  perf probe: Support adding probes on offline kernel modules
  perf probe: Add probed module in front of function
  perf probe: Introduce debuginfo to encapsulate dwarf information
  perf-probe: Move dwarf library routines to dwarf-aux.{c, h}
  perf probe: Remove redundant dwarf functions
  perf probe: Move strtailcmp to string.c
  perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
  tracing/kprobe: Update symbol reference when loading module
  tracing/kprobes: Support module init function probing
  kprobes: Return -ENOENT if probe point doesn't exist
  ...
2011-07-22 16:44:39 -07:00
Linus Torvalds
acb41c0f92 Merge branch 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  pci/of: Consolidate pci_bus_to_OF_node()
  pci/of: Consolidate pci_device_to_OF_node()
  x86/devicetree: Use generic PCI <-> OF matching
  microblaze/pci: Move the remains of pci_32.c to pci-common.c
  microblaze/pci: Remove powermac originated cruft
  pci/of: Match PCI devices to OF nodes dynamically
2011-07-22 14:54:02 -07:00
Benjamin Herrenschmidt
50d2a4223b powerpc: Copy back TIF flags on return from softirq stack
We already did it for hard IRQs but it looks like we forgot
to do it for softirqs. Without this, we would lose flags
such as TIF_NEED_RESCHED set using current_thread_info()
by something running of a softirq.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-22 13:38:58 +10:00
Benjamin Herrenschmidt
4b575f3e8a Merge remote-tracking branch 'jwb/next' into next 2011-07-22 13:16:41 +10:00
Andrew Gabbasov
9974eec2b8 powerpc: Exporting boot_cpuid_phys
Kernel loadable module can use hard_smp_processor_id() if building with SMP
kernel. In order to make it work for UP kernels too, boot_cpuid_phys
symbol (which is what hard_smp_processor_id() macro resolves to
in non-SMP configuration) must be exported.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19 15:13:34 +10:00
Michael Neuling
5115a026ce powerpc: Add CFAR to oops output
Now we have the CFAR saved add it to the oops output.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19 15:13:33 +10:00
Anton Blanchard
8896293422 powerpc/irq: Quieten irq mapping printks
HFI creates interrupts each time a window is setup. This results in
a lot of messages in the kernel log buffer:

    irq: irq 199007 on host null mapped to virtual irq 351

This box has over 3500 of them, causing more important kernel
messages to be overwritten. We can get at this information via
debugfs now so we may as well turn it into a pr_debug.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19 15:13:07 +10:00
Michael Neuling
63f21a56f1 powerpc/kdump: Fix timeout in crash_kexec_wait_realmode
The existing code it pretty ugly.  How about we clean it up even more
like this?

From: Anton Blanchard <anton@samba.org>

We check for timeout expiry in the outer loop, but we also need to
check it in the inner loop or we can lock up forever waiting for a
CPU to hit real mode.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19 15:13:05 +10:00
Dmitry Eremin-Solenikov
77c2342a57 powerpc: Correct annotations of pmu registration functions
This fixes the following warning:
WARNING: arch/powerpc/kernel/built-in.o(.text+0x29768): Section mismatch in reference from the function .register_power_pmu() to the function .cpuinit.text:.power_pmu_notifier()
The function .register_power_pmu() references
the function __cpuinit .power_pmu_notifier().
This is often because .register_power_pmu lacks a __cpuinit
annotation or the annotation of .power_pmu_notifier is wrong.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19 15:12:40 +10:00
Mathias Krause
0e0ebdb9c2 powerpc: Remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19 15:12:40 +10:00
Dave Kleikamp
9661534d6a powerpc/47x: allow kernel to be loaded in higher physical memory
The 44x code (which is shared by 47x) assumes the available physical memory
begins at 0x00000000.  This is not necessarily the case in an AMP
environment.

Support CONFIG_RELOCATABLE for 476 in order to allow the kernel to be
loaded into a higher memory range.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2011-07-12 10:34:24 -04:00
Rob Herring
0e47ff1ce6 powerpc: rename ppc_pci_*_flags to pci_*_flags
This renames pci flags functions and enums in preparation for creating
generic version in asm-generic/pci-bridge.h. The following search and
replace is done:

s/ppc_pci_/pci_/
s/PPC_PCI_/PCI_/

Direct accesses to ppc_pci_flag variable are replaced with helper
functions.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
2011-07-12 09:28:04 -05:00
Dave Kleikamp
91b191c71e powerpc/44x: don't use tlbivax on AMP systems
Since other OS's may be running on the other cores don't use tlbivax

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2011-07-12 09:21:55 -04:00
Paul Mackerras
9e368f2915 KVM: PPC: book3s_hv: Add support for PPC970-family processors
This adds support for running KVM guests in supervisor mode on those
PPC970 processors that have a usable hypervisor mode.  Unfortunately,
Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to
1), but the YDL PowerStation does have a usable hypervisor mode.

There are several differences between the PPC970 and POWER7 in how
guests are managed.  These differences are accommodated using the
CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature
bits.  Notably, on PPC970:

* The LPCR, LPID or RMOR registers don't exist, and the functions of
  those registers are provided by bits in HID4 and one bit in HID0.

* External interrupts can be directed to the hypervisor, but unlike
  POWER7 they are masked by MSR[EE] in non-hypervisor modes and use
  SRR0/1 not HSRR0/1.

* There is no virtual RMA (VRMA) mode; the guest must use an RMO
  (real mode offset) area.

* The TLB entries are not tagged with the LPID, so it is necessary to
  flush the whole TLB on partition switch.  Furthermore, when switching
  partitions we have to ensure that no other CPU is executing the tlbie
  or tlbsync instructions in either the old or the new partition,
  otherwise undefined behaviour can occur.

* The PMU has 8 counters (PMC registers) rather than 6.

* The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist.

* The SLB has 64 entries rather than 32.

* There is no mediated external interrupt facility, so if we switch to
  a guest that has a virtual external interrupt pending but the guest
  has MSR[EE] = 0, we have to arrange to have an interrupt pending for
  it so that we can get control back once it re-enables interrupts.  We
  do that by sending ourselves an IPI with smp_send_reschedule after
  hard-disabling interrupts.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:59 +03:00
Paul Mackerras
969391c58a powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits
This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
indicate that we have a usable hypervisor mode, and another to indicate
that the processor conforms to PowerISA version 2.06.  We also add
another bit to indicate that the processor conforms to ISA version 2.01
and set that for PPC970 and derivatives.

Some PPC970 chips (specifically those in Apple machines) have a
hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
is not useful in the sense that there is no way to run any code in
supervisor mode (HV=0 PR=0).  On these processors, the LPES0 and LPES1
bits in HID4 are always 0, and we use that as a way of detecting that
hypervisor mode is not useful.

Where we have a feature section in assembly code around code that
only applies on POWER7 in hypervisor mode, we use a construct like

END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)

The definition of END_FTR_SECTION_IFSET is such that the code will
be enabled (not overwritten with nops) only if all bits in the
provided mask are set.

Note that the CPU feature check in __tlbie() only needs to check the
ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
if we are running bare-metal, i.e. in hypervisor mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:58 +03:00
Paul Mackerras
aa04b4cc5b KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility.  These processors require a physically
contiguous, aligned area of memory for each guest.  When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access.  The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.

Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator.  The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.

KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs.  The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.

This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA.  It
also returns the size of the RMA in the argument structure.

Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace.  To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory.  Subsequently we will get rid of this
array and use memory associated with each memslot instead.

This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region.  Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB.  However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.

Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest.  This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:57 +03:00
Paul Mackerras
371fefd6f2 KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7.  The host still has to run single-threaded.

This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability.  The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.

To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode.  KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline).  To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c.  In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it.  Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.

When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host.  This number is exported
to userspace via the KVM_CAP_PPC_SMT capability.  If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.

We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host.  We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked.  This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.

When a vcore starts to run, it executes in the context of one of the
vcpu threads.  The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).

It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running.  In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest.  It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.

Note that there is no fixed relationship between the hardware thread
number and the vcpu number.  Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:57 +03:00
Paul Mackerras
a8606e20e4 KVM: PPC: Handle some PAPR hcalls in the kernel
This adds the infrastructure for handling PAPR hcalls in the kernel,
either early in the guest exit path while we are still in real mode,
or later once the MMU has been turned back on and we are in the full
kernel context.  The advantage of handling hcalls in real mode if
possible is that we avoid two partition switches -- and this will
become more important when we support SMT4 guests, since a partition
switch means we have to pull all of the threads in the core out of
the guest.  The disadvantage is that we can only access the kernel
linear mapping, not anything vmalloced or ioremapped, since the MMU
is off.

This also adds code to handle the following hcalls in real mode:

H_ENTER       Add an HPTE to the hashed page table
H_REMOVE      Remove an HPTE from the hashed page table
H_READ        Read HPTEs from the hashed page table
H_PROTECT     Change the protection bits in an HPTE
H_BULK_REMOVE Remove up to 4 HPTEs from the hashed page table
H_SET_DABR    Set the data address breakpoint register

Plus code to handle the following hcalls in the kernel:

H_CEDE        Idle the vcpu until an interrupt or H_PROD hcall arrives
H_PROD        Wake up a ceded vcpu
H_REGISTER_VPA Register a virtual processor area (VPA)

The code that runs in real mode has to be in the base kernel, not in
the module, if KVM is compiled as a module.  The real-mode code can
only access the kernel linear mapping, not vmalloc or ioremap space.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:55 +03:00
Paul Mackerras
de56a948b9 KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode.  Using hypervisor mode means
that the guest can use the processor's supervisor mode.  That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host.  This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses.  That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification.  In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest.  We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount.  Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition.  MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest.  At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management.  This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:54 +03:00
Paul Mackerras
3c42bf8a71 KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu
There are several fields in struct kvmppc_book3s_shadow_vcpu that
temporarily store bits of host state while a guest is running,
rather than anything relating to the particular guest or vcpu.
This splits them out into a new kvmppc_host_state structure and
modifies the definitions in asm-offsets.c to suit.

On 32-bit, we have a kvmppc_host_state structure inside the
kvmppc_book3s_shadow_vcpu since the assembly code needs to be able
to get to them both with one pointer.  On 64-bit they are separate
fields in the PACA.  This means that on 64-bit we don't need to
copy the kvmppc_host_state in and out on vcpu load/unload, and
in future will mean that the book3s_hv code doesn't need a
shadow_vcpu struct in the PACA at all.  That does mean that we
have to be careful not to rely on any values persisting in the
hstate field of the paca across any point where we could block
or get preempted.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:53 +03:00
Paul Mackerras
923c53caea powerpc: Set up LPCR for running guest partitions
In hypervisor mode, the LPCR controls several aspects of guest
partitions, including virtual partition memory mode, and also controls
whether the hypervisor decrementer interrupts are enabled.  This sets
up LPCR at boot time so that guest partitions will use a virtual real
memory area (VRMA) composed of 16MB large pages, and hypervisor
decrementer interrupts are disabled.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:52 +03:00
Paul Mackerras
b01c8b54a1 powerpc, KVM: Rework KVM checks in first-level interrupt handlers
Instead of branching out-of-line with the DO_KVM macro to check if we
are in a KVM guest at the time of an interrupt, this moves the KVM
check inline in the first-level interrupt handlers.  This speeds up
the non-KVM case and makes sure that none of the interrupt handlers
are missing the check.

Because the first-level interrupt handlers are now larger, some things
had to be move out of line in exceptions-64s.S.

This all necessitated some minor changes to the interrupt entry code
in KVM.  This also streamlines the book3s_32 KVM test.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:48 +03:00
Liu Yu
dd9ebf1f94 KVM: PPC: e500: Add shadow PID support
Dynamically assign host PIDs to guest PIDs, splitting each guest PID into
multiple host (shadow) PIDs based on kernel/user and MSR[IS/DS].  Use
both PID0 and PID1 so that the shadow PIDs for the right mode can be
selected, that correspond both to guest TID = zero and guest TID = guest
PID.

This allows us to significantly reduce the frequency of needing to
invalidate the entire TLB.  When the guest mode or PID changes, we just
update the host PID0/PID1.  And since the allocation of shadow PIDs is
global, multiple guests can share the TLB without conflict.

Note that KVM does not yet support the guest setting PID1 or PID2 to
a value other than zero.  This will need to be fixed for nested KVM
to work.  Until then, we enforce the requirement for guest PID1/PID2
to stay zero by failing the emulation if the guest tries to set them
to something else.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:39 +03:00
Scott Wood
4cd35f675b KVM: PPC: e500: Save/restore SPE state
This is done lazily.  The SPE save will be done only if the guest has
used SPE since the last preemption or heavyweight exit.  Restore will be
done only on demand, when enabling MSR_SPE in the shadow MSR, in response
to an SPE fault or mtmsr emulation.

For SPEFSCR, Linux already switches it on context switch (non-lazily), so
the only remaining bit is to save it between qemu and the guest.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:32 +03:00
Scott Wood
ecee273fc4 KVM: PPC: booke: use shadow_msr
Keep the guest MSR and the guest-mode true MSR separate, rather than
modifying the guest MSR on each guest entry to produce a true MSR.

Any bits which should be modified based on guest MSR must be explicitly
propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in
kvmppc_set_msr().

While we're modifying the guest entry code, reorder a few instructions
to bury some load latencies.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:32 +03:00