Commit Graph

41049 Commits

Author SHA1 Message Date
Linus Torvalds
ef26b1691d Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size
  x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h
  x86/alternatives: Check replacementlen <= instrlen at build time
  x86, 64-bit: Set data segments to null after switching to 64-bit mode
  x86: Clean up the loadsegment() macro
  x86: Optimize loadsegment()
  x86: Add missing might_fault() checks to copy_{to,from}_user()
  x86-64: __copy_from_user_inatomic() adjustments
  x86: Remove unused thread_return label from switch_to()
  x86, 64-bit: Fix bstep_iret jump
  x86: Don't use the strict copy checks when branch profiling is in use
  x86, 64-bit: Move K8 B step iret fixup to fault entry asm
  x86: Generate cmpxchg build failures
  x86: Add a Kconfig option to turn the copy_from_user warnings into errors
  x86: Turn the copy_from_user check into an (optional) compile time warning
  x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
2009-12-05 15:32:03 -08:00
Linus Torvalds
a77d2e081b Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
  x86, apic: Enable lapic nmi watchdog on AMD Family 11h
  x86: Remove unnecessary mdelay() from cpu_disable_common()
  x86, ioapic: Document another case when level irq is seen as an edge
  x86, ioapic: Fix the EOI register detection mechanism
  x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq
  x86: SGI UV: Map low MMR ranges
  x86: apic: Print out SRAT table APIC id in hex
  x86: Re-get cfg_new in case reuse/move irq_desc
  x86: apic: Remove not needed #ifdef
  x86: io-apic: IO-APIC MMIO should not fail on resource insertion
  x86: Remove asm/apicnum.h
  x86: apic: Do not use stacked physid_mask_t
  x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit
  x86, ioapic: Use snrpintf while set names for IO-APIC resourses
  x86, apic: Use PAGE_SIZE instead of numbers
  x86: Remove local_irq_enable()/local_irq_disable() in fixup_irqs()
  x86: Use EOI register in io-apic on intel platforms
  x86: Force irq complete move during cpu offline
  x86: Remove move_cleanup_count from irq_cfg
  x86, intr-remap: Avoid irq_chip mask/unmask in fixup_irqs() for intr-remapping
  ...
2009-12-05 15:31:25 -08:00
Linus Torvalds
c3fa27d136 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits)
  x86: Fix comments of register/stack access functions
  perf tools: Replace %m with %a in sscanf
  hw-breakpoints: Keep track of user disabled breakpoints
  tracing/syscalls: Make syscall events print callbacks static
  tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook
  perf: Don't free perf_mmap_data until work has been done
  perf_event: Fix compile error
  perf tools: Fix _GNU_SOURCE macro related strndup() build error
  trace_syscalls: Remove unused syscall_name_to_nr()
  trace_syscalls: Simplify syscall profile
  trace_syscalls: Remove duplicate init_enter_##sname()
  trace_syscalls: Add syscall_nr field to struct syscall_metadata
  trace_syscalls: Remove enter_id exit_id
  trace_syscalls: Set event_enter_##sname->data to its metadata
  trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit
  perf_event: Initialize data.period in perf_swevent_hrtimer()
  perf probe: Simplify event naming
  perf probe: Add --list option for listing current probe events
  perf probe: Add argv_split() from lib/argv_split.c
  perf probe: Move probe event utility functions to probe-event.c
  ...
2009-12-05 15:30:21 -08:00
David S. Miller
28b4d5cc17 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/net/pcmcia/fmvj18x_cs.c
	drivers/net/pcmcia/nmclan_cs.c
	drivers/net/pcmcia/xirc2ps_cs.c
	drivers/net/wireless/ray_cs.c
2009-12-05 15:22:26 -08:00
Linus Torvalds
96fa2b508d Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
  tracing: Separate raw syscall from syscall tracer
  ring-buffer-benchmark: Add parameters to set produce/consumer priorities
  tracing, function tracer: Clean up strstrip() usage
  ring-buffer benchmark: Run producer/consumer threads at nice +19
  tracing: Remove the stale include/trace/power.h
  tracing: Only print objcopy version warning once from recordmcount
  tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
  ring-buffer: Move access to commit_page up into function used
  tracing: do not disable interrupts for trace_clock_local
  ring-buffer: Add multiple iterations between benchmark timestamps
  kprobes: Sanitize struct kretprobe_instance allocations
  tracing: Fix to use __always_unused attribute
  compiler: Introduce __always_unused
  tracing: Exit with error if a weak function is used in recordmcount.pl
  tracing: Move conditional into update_funcs() in recordmcount.pl
  tracing: Add regex for weak functions in recordmcount.pl
  tracing: Move mcount section search to front of loop in recordmcount.pl
  tracing: Fix objcopy revision check in recordmcount.pl
  tracing: Check absolute path of input file in recordmcount.pl
  tracing: Correct the check for number of arguments in recordmcount.pl
  ...
2009-12-05 09:53:36 -08:00
Linus Torvalds
3e72b810e3 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  mutex: Fix missing conditions to build mutex_spin_on_owner()
  mutex: Better control mutex adaptive spinning config
  locking, task_struct: Reduce size on TRACE_IRQFLAGS and 64bit
  locking: Use __[SPIN|RW]_LOCK_UNLOCKED in [spin|rw]_lock_init()
  locking: Remove unused prototype
  locking: Reduce ifdefs in kernel/spinlock.c
  locking: Make inlining decision Kconfig based
2009-12-05 09:49:59 -08:00
Linus Torvalds
7b626acb8f Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
  x86/amd-iommu: Remove amd_iommu_pd_table
  x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
  x86/amd-iommu: Cleanup DTE flushing code
  x86/amd-iommu: Introduce iommu_flush_device() function
  x86/amd-iommu: Cleanup attach/detach_device code
  x86/amd-iommu: Keep devices per domain in a list
  x86/amd-iommu: Add device bind reference counting
  x86/amd-iommu: Use dev->arch->iommu to store iommu related information
  x86/amd-iommu: Remove support for domain sharing
  x86/amd-iommu: Rearrange dma_ops related functions
  x86/amd-iommu: Move some pte allocation functions in the right section
  x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
  x86/amd-iommu: Use get_device_id and check_device where appropriate
  x86/amd-iommu: Move find_protection_domain to helper functions
  x86/amd-iommu: Simplify get_device_resources()
  x86/amd-iommu: Let domain_for_device handle aliases
  x86/amd-iommu: Remove iommu specific handling from dma_ops path
  x86/amd-iommu: Remove iommu parameter from __(un)map_single
  x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
  ...
2009-12-05 09:49:07 -08:00
David Daney
27d16d0871 avr32: Convert BUG() to use unreachable()
Use the new unreachable() macro instead of for(;;);

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-05 09:10:12 -08:00
David Daney
5506e68975 s390: Convert BUG() to use unreachable()
Use the new unreachable() macro instead of for(;;);

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux390@de.ibm.com
CC: linux-s390@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-05 09:10:12 -08:00
David Daney
4ef5651e85 MIPS: Convert BUG() to use unreachable()
Use the new unreachable() macro instead of while(1);

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
CC: linux-mips@linux-mips.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-05 09:10:12 -08:00
David Daney
a5fc5eba4d x86: Convert BUG() to use unreachable()
Use the new unreachable() macro instead of for(;;);.  When
allyesconfig is built with a GCC-4.5 snapshot on i686 the size of the
text segment is reduced by 3987 bytes (from 6827019 to 6823032).

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-05 09:10:12 -08:00
Russell King
0719dc3413 Merge branch 'devel-stable' into devel 2009-12-05 10:35:33 +00:00
Russell King
e28edb723e Merge branches 'at91', 'ep93xx', 'etm', 'ks8695', 'nuc', 'u300' and 'u8500' into devel 2009-12-05 10:35:18 +00:00
Andreas Schwab
f195e2bff3 m68k: ptrace fixes
This fixes the following issues in ptrace:

- when single stepping into the signal handler stop at the first insn of
  the handler
- handle non-zero stkadj when accessing pc and sr in ptregs
- correctly handle PT_SR in PTRACE_POKEUSR
- report -EIO when trying to read unknown offset in PTRACE_PEEKUSR

Additionally, the handling of the special case that PT_SR accesses a 16
bit word instead of a 32 bit word has been moved into get_reg/put_reg.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2009-12-04 21:22:35 +01:00
Andreas Schwab
faa47b4669 m68k: use generic code for ptrace requests
Remove all but PTRACE_{PEEK,POKE}USR and PTRACE_{GET,SET}{REGS,FPREGS}
from arch_ptrace and let the rest be handled by generic code.  Define
PTRACE_SINGLEBLOCK to enable singleblock tracing.
[Geert] Not yet applicable for m68knommu

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2009-12-04 21:22:35 +01:00
Russell King
677f4f64e4 Merge branch 'for-rmk' of git://git.pengutronix.de/git/imx/linux-2.6 into devel-stable 2009-12-04 17:34:50 +00:00
Russell King
4567c4a896 Merge branch 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/ycmiao/pxa-linux-2.6 into devel-stable 2009-12-04 17:34:16 +00:00
Russell King
602fd7c367 Merge branch 'for-rmk' of git://git.fluff.org/bjdooks/linux into devel-stable
Conflicts:
	arch/arm/Kconfig
2009-12-04 17:33:54 +00:00
Takashi Iwai
baf9226667 Merge branch 'topic/asoc' into for-linus 2009-12-04 16:22:41 +01:00
Russell King
2fc42814d8 Merge branch 'pending-dma-streaming' (early part) into devel 2009-12-04 15:00:11 +00:00
Russell King
c6baa1963c Merge branch 'pending-dma-coherent' into devel 2009-12-04 15:00:00 +00:00
Russell King
5cb2faa6ed Merge branch 'pending-misc' (early part) into devel 2009-12-04 14:59:47 +00:00
Russell King
6060e8df51 ARM: I-cache: flush executable mappings in flush_cache_range()
Dirk Behme reported instability on ARM11 SMP (VIPT non-aliasing cache)
caused by the dynamic linker changing protection on text pages to write
GOT entries.  The problem is due to an interaction between the write
faulting code providing new anonymous pages which are incoherent with
the I-cache due to write buffering, and the I-cache not having been
invalidated.

a4db94d plugs the hole with the data cache coherency.  This patch
provides the other half of the fix by flushing the I-cache in
flush_cache_range() for VM_EXEC VMAs (which is what we have when the
region is being made executable again.)  This ensures that the I-cache
will be up to date with the newly COW'd pages.

Note: if users are writing instructions, then they still need to use
the ARM sys_cacheflush API to ensure that the caches are correctly
synchronized.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-04 14:58:51 +00:00
Russell King
ea201dbb78 ARM: I-cache: avoid flushing in flush_cache_mm()
flush_cache_mm() is called in two cases:
1. when a process exits, just before the page tables are torn down.
   We can allow the stale lines to evict themselves over time without
   causing any harm.

2. when a process forks, and we've allocated a new ASID.
   The instruction cache issues are dealt with as pages are brought
   into the new process address space.  Flushing the I-cache here is
   therefore unnecessary.

However, we must keep the VIPT aliasing D-cache flush to ensure that
any dirty cache lines are not written back after the pages have been
reallocated for some other use - which would result in corruption.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-04 14:58:51 +00:00
Russell King
9e95922b10 ARM: I-cache: Add invalidation for VIVT ASID tagged caches
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-04 14:58:51 +00:00
Catalin Marinas
115b22474e ARM: 5794/1: Flush the D-cache during copy_user_highpage()
The I and D caches for copy-on-write pages on processors with
write-allocate caches become incoherent causing problems on application
relying on CoW for text pages (dynamic linker relocating symbols in a
text page). This patch flushes the D-cache for such pages.

Cc: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-04 14:58:50 +00:00
Russell King
f91fb05d82 ARM: Remove __flush_icache_all() from __flush_dcache_page()
Both call sites for __flush_dcache_page() end up calling
__flush_icache_all() themselves, so having __flush_dcache_page() do
this as well is wasteful.  Remove the duplicated icache flushing.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-04 14:58:50 +00:00
Russell King
2df341edf6 ARM: Move __flush_icache_all() out of flush_pfn_alias()
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-04 14:58:50 +00:00
Russell King
7b0a1003e7 ARM: Reduce __flush_dcache_page() visibility
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-04 14:58:50 +00:00
Robert Schwebel
fedea672a3 mx31moboard: fix typo
Currently, linux-next breaks due to a typo introduced in commit
33c4d91928

This patch fixes it.

Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Cc: Valentin Longchamp <valentin.longchamp@epfl.ch>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2009-12-04 13:23:41 +01:00
Mark Brown
43f0de8d02 S3C64XX: Staticise platform data for PCM devices
The symbols aren't declared and don't need to be exported, they go
along with the device structure.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Ben Dooks <ben-linux@fluff.org>
2009-12-04 10:46:08 +00:00
Mark Brown
88d27041cf ARM: S3C6410: Correct names of IISv4 data output pin definitions
The naming of the defines suggests that there are three IISv4 ports
with one data line each when in fact there is a single IISv4 port
with three data lines.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2009-12-03 21:58:10 +00:00
Ben Dooks
009f742bde ARM: Merge next-s3c64xx-updates
Merge branch 'next-s3c64xx-updates' into for-rmk

Conflicts:

	arch/arm/plat-s3c/dev-hsmmc2.c
	arch/arm/plat-s3c/include/plat/sdhci.h
2009-12-03 21:53:10 +00:00
Ben Dooks
f18ea8276b ARM: Merge next-s3c24xx-dev-rtp
Merge branch 'next-s3c24xx-dev-rtp' into for-rmk

Conflicts:

	arch/arm/mach-s3c2440/mach-anubis.c
2009-12-03 21:33:01 +00:00
Ben Dooks
3d4db84cee ARM: Merge next-s3c24xx-simtec
Merge branch 'next-s3c24xx-simtec' into for-rmk
2009-12-03 21:31:20 +00:00
Srinidhi Kasagar
48371cd3f4 ARM: 5845/1: l2x0: check whether l2x0 already enabled
If running in non-secure mode accessing
some registers of l2x0 will fault. So
check if l2x0 is already enabled, if so
do not access those secure registers.

Signed-off-by: srinidhi kasagar <srinidhi.kasagar@stericsson.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-03 19:42:30 +00:00
Leo Chen
1f739d7643 ARM: 5792/1: bcmring: clean up mach/io.h
removed old macro definition for io access, using
the generic macros defined in asm/io.h

Signed-off-by: Leo Hao Chen <leochen@broadcom.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-12-03 19:42:30 +00:00
Ingo Molnar
d103d01e4b Merge branch 'perf/probes' into perf/core
Merge reason: add these fixes to 'perf probe'.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 20:11:38 +01:00
Ingo Molnar
26fb20d008 Merge branch 'perf/mce' into perf/core
Merge reason: It's ready for v2.6.33.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 20:11:06 +01:00
Mikael Pettersson
7d1849aff6 x86, apic: Enable lapic nmi watchdog on AMD Family 11h
The x86 lapic nmi watchdog does not recognize AMD Family 11h,
resulting in:

  NMI watchdog: CPU not supported

As far as I can see from available documentation (the BKDM),
family 11h looks identical to family 10h as far as the PMU
is concerned.

Extending the check to accept family 11h results in:

  Testing NMI watchdog ... OK.

I've been running with this change on a Turion X2 Ultra ZM-82
laptop for a couple of weeks now without problems.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: <stable@kernel.org>
LKML-Reference: <19223.53436.931768.278021@pilspetsen.it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 16:25:15 +01:00
Jens Axboe
220d0b1dbf Merge branch 'master' into for-2.6.33 2009-12-03 13:49:39 +01:00
Darrick J. Wong
4528752f49 x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
On a multi-node x3950M2 system, there's a slight oddity in the
PCI device tree for all secondary nodes:

 30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
     \-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

...as compared to the primary node:

 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
 03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
  \-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

In both nodes, the LSI RAID controller hangs off a CalIOC2
device, but on the secondary nodes, the BIOS hides the VGA
device and substitutes the device tree ending with the disk
controller.

It would seem that Calgary devices don't necessarily appear at
the top of the PCI tree, which means that the current code to
find the Calgary IOMMU that goes with a particular device is
buggy.

Rather than walk all the way to the top of the PCI
device tree and try to match bus number with Calgary descriptor,
the code needs to examine each parent of the particular device;
if it encounters a Calgary with a matching bus number, simply
use that.

Otherwise, we BUG() when the bus number of the Calgary doesn't
match the bus number of whatever's at the top of the device tree.

Extra note: This patch appears to work correctly for the x3950
that came before the x3950 M2.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jon D. Mason <jdmason@kudzu.us>
Cc: Corinna Schultz <coschult@us.ibm.com>
Cc: <stable@kernel.org>
LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 11:44:05 +01:00
Avi Kivity
d5696725b2 KVM: VMX: Fix comparison of guest efer with stale host value
update_transition_efer() masks out some efer bits when deciding whether
to switch the msr during guest entry; for example, NX is emulated using the
mmu so we don't need to disable it, and LMA/LME are handled by the hardware.

However, with shared msrs, the comparison is made against a stale value;
at the time of the guest switch we may be running with another guest's efer.

Fix by deferring the mask/compare to the actual point of guest entry.

Noted by Marcelo.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:34:20 +02:00
Carsten Otte
f50146bd7b KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
This patch corrects the checking of the new address for the prefix register.
On s390, the prefix register is used to address the cpu's lowcore (address
0...8k). This check is supposed to verify that the memory is readable and
present.
copy_from_guest is a helper function, that can be used to read from guest
memory. It applies prefixing, adds the start address of the guest memory in
user, and then calls copy_from_user. Previous code was obviously broken for
two reasons:
- prefixing should not be applied here. The current prefix register is
  going to be updated soon, and the address we're looking for will be
  0..8k after we've updated the register
- we're adding the guest origin (gmsor) twice: once in subject code
  and once in copy_from_guest

With kuli, we did not hit this problem because (a) we were lucky with
previous prefix register content, and (b) our guest memory was mmaped
very low into user address space.

Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:26 +02:00
Avi Kivity
3548bab501 KVM: Drop user return notifier when disabling virtualization on a cpu
This way, we don't leave a dangling notifier on cpu hotunplug or module
unload.  In particular, module unload leaves the notifier pointing into
freed memory.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:26 +02:00
Sheng Yang
046d87103a KVM: VMX: Disable unrestricted guest when EPT disabled
Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
supported processors.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:25 +02:00
Avi Kivity
eb3c79e64a KVM: x86 emulator: limit instructions to 15 bytes
While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Carsten Otte
d7b0b5eb30 KVM: s390: Make psw available on all exits, not just a subset
This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.

The userspace ABI is broken by this, however there are no applications
in the wild using this.  A capability check is provided so users can
verify the updated API exists.

Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Jan Kiszka
3cfc3092f4 KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Avi Kivity
65ac726404 KVM: VMX: Report unexpected simultaneous exceptions as internal errors
These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults.  If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Avi Kivity
a9c7399d6c KVM: Allow internal errors reported to userspace to carry extra data
Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available.  Add extra data to internal error
reporting so we can expose it to the debugger.  Extra data is specific
to the suberror.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Jan Kiszka
4f926bf291 KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from
KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance,
avoid triggering the WARN_ON in kvm_queue_exception if there is already
an exception pending and reject such invalid requests.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:24 +02:00
Marcelo Tosatti
2204ae3c96 KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
Otherwise kvm might attempt to dereference a NULL pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti
3ddea128ad KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.
Also serialize multiple accesses with kvm->lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Avi Kivity
92c0d90015 KVM: VMX: Remove vmx->msr_offset_efer
This variable is used to communicate between a caller and a callee; switch
to a function argument instead.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti
5f5c35aad5 KVM: MMU: update invlpg handler comment
Large page translations are always synchronized (either in level 3
or level 2), so its not necessary to properly deal with them
in the invlpg handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti
7c93be44a4 KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
outside guest context. Similarly pdptrs are updated via load_pdptrs.

Let kvm_set_cr3 perform the update, removing it from the vcpu_run
fast path.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Acked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Gleb Natapov
1655e3a3dc KVM: remove duplicated task_switch check
Probably introduced by a bad merge.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Hollis Blanchard
c0a187e12d KVM: powerpc: Fix BUILD_BUG_ON condition
The old BUILD_BUG_ON implementation didn't work with __builtin_constant_p().
Fixing that revealed this test had been inverted for a long time without
anybody noticing...

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Avi Kivity
26bb0981b3 KVM: VMX: Use shared msr infrastructure
Instead of reloading syscall MSRs on every preemption, use the new shared
msr infrastructure to reload them at the last possible minute (just before
exit to userspace).

Improves vcpu/idle/vcpu switches by about 2000 cycles (when EFER needs to be
reloaded as well).

[jan: fix slot index missing indirection]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Avi Kivity
18863bdd60 KVM: x86 shared msr infrastructure
The various syscall-related MSRs are fairly expensive to switch.  Currently
we switch them on every vcpu preemption, which is far too often:

- if we're switching to a kernel thread (idle task, threaded interrupt,
  kernel-mode virtio server (vhost-net), for example) and back, then
  there's no need to switch those MSRs since kernel threasd won't
  be exiting to userspace.

- if we're switching to another guest running an identical OS, most likely
  those MSRs will have the same value, so there's little point in reloading
  them.

- if we're running the same OS on the guest and host, the MSRs will have
  identical values and reloading is unnecessary.

This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Avi Kivity
44ea2b1758 KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area
Currently MSR_KERNEL_GS_BASE is saved and restored as part of the
guest/host msr reloading.  Since we wish to lazy-restore all the other
msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using
the common code.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Eduardo Habkost
3ce672d484 KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization
The svm_set_cr0() call will initialize save->cr0 properly even when npt is
enabled, clearing the NW and CD bits as expected, so we don't need to
initialize it manually for npt_enabled anymore.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Eduardo Habkost
18fa000ae4 KVM: SVM: Reset cr0 properly on vcpu reset
svm_vcpu_reset() was not properly resetting the contents of the guest-visible
cr0 register, causing the following issue:
https://bugzilla.redhat.com/show_bug.cgi?id=525699

Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine
with paging enabled, making the vcpu get a pagefault exception while trying to
run it.

Instead of setting vmcb->save.cr0 directly, the new code just resets
kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().

kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Eduardo Habkost
fa40052ca0 KVM: VMX: Use macros instead of hex value on cr0 initialization
This should have no effect, it is just to make the code clearer.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Glauber Costa
afbcf7ab8d KVM: allow userspace to adjust kvmclock offset
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:19 +02:00
Jan Kiszka
6be7d3062b KVM: SVM: Cleanup NMI singlestep
Push the NMI-related singlestep variable into vcpu_svm. It's dealing
with an AMD-specific deficit, nothing generic for x86.

Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

 arch/x86/include/asm/kvm_host.h |    1 -
 arch/x86/kvm/svm.c              |   12 +++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:19 +02:00
Jan Kiszka
94fe45da48 KVM: x86: Fix guest single-stepping while interruptible
Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.

Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:19 +02:00
Ed Swierk
ffde22ac53 KVM: Xen PV-on-HVM guest support
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:18 +02:00
Jan Kiszka
94c30d9ca6 KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check
This (broken) check dates back to the days when this code was shared
across architectures. x86 has IOMEM, so drop it.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:18 +02:00
Marcelo Tosatti
9fb41ba896 KVM: VMX: fix handle_pause declaration
There's no kvm_run argument anymore.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:18 +02:00
Zachary Amsden
6b7d7e762b KVM: x86: Harden against cpufreq
If cpufreq can't determine the CPU khz, or cpufreq is not compiled in,
we should fallback to the measured TSC khz.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:18 +02:00
Mark Langsdorf
565d0998ec KVM: SVM: Support Pause Filter in AMD processors
New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature.  This feature creates a new field in the VMCB
called Pause Filter Count.  If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting.  When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.

This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.

Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand.  Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.

Processor support for this feature is indicated by a CPUID
bit.

On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Zhai, Edwin
4b8d54f972 KVM: VMX: Add support for Pause-Loop Exiting
New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Joerg Roedel
d36f19e9ec KVM: SVM: Remove nsvm_printk debugging code
With all important informations now delivered through
tracepoints we can savely remove the nsvm_printk debugging
code for nested svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Joerg Roedel
532a46b989 KVM: SVM: Add tracepoint for skinit instruction
This patch adds a tracepoint for the event that the guest
executed the SKINIT instruction. This information is
important because SKINIT is an SVM extenstion not yet
implemented by nested SVM and we may need this information
for debugging hypervisors that do not yet run on nested SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel
ec1ff79084 KVM: SVM: Add tracepoint for invlpga instruction
This patch adds a tracepoint for the event that the guest
executed the INVLPGA instruction.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel
236649de33 KVM: SVM: Add tracepoint for #vmexit because intr pending
This patch adds a special tracepoint for the event that a
nested #vmexit is injected because kvm wants to inject an
interrupt into the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel
17897f3668 KVM: SVM: Add tracepoint for injected #vmexit
This patch adds a tracepoint for a nested #vmexit that gets
re-injected to the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Joerg Roedel
d8cabddf7e KVM: SVM: Add tracepoint for nested #vmexit
This patch adds a tracepoint for every #vmexit we get from a
nested guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Joerg Roedel
0ac406de8f KVM: SVM: Add tracepoint for nested vmrun
This patch adds a dedicated kvm tracepoint for a nested
vmrun.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Joerg Roedel
cd3ff653ae KVM: SVM: Move INTR vmexit out of atomic code
The nested SVM code emulates a #vmexit caused by a request
to open the irq window right in the request function. This
is a bug because the request function runs with preemption
and interrupts disabled but the #vmexit emulation might
sleep. This can cause a schedule()-while-atomic bug and is
fixed with this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Alexander Graf
8d23c46624 KVM: SVM: Notify nested hypervisor of lost event injections
If event_inj is valid on a #vmexit the host CPU would write
the contents to exit_int_info, so the hypervisor knows that
the event wasn't injected.

We don't do this in nested SVM by now which is a bug and
fixed by this patch.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:14 +02:00
Glauber Costa
e3267cbbbf KVM: x86: include pvclock MSRs in msrs_to_save
For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.

This patch moves then to the beginning of the list, and skip testing them.

Cc: stable@kernel.org
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:14 +02:00
Jan Kiszka
91586a3b7d KVM: x86: Rework guest single-step flag injection and filtering
Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Marcelo Tosatti
a68a6a7282 KVM: x86: disable paravirt mmu reporting
Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Jan Kiszka
355be0b930 KVM: x86: Refactor guest debug IOCTL handling
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Juan Quintela
201d945bcf KVM: remove pre_task_link setting in save_state_to_tss16
Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37a1
  Author: Gleb Natapov <gleb@redhat.com>
  Date:   Mon Mar 30 16:03:24 2009 +0300

    KVM: Fix task switch back link handling.

CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:13 +02:00
Zachary Amsden
3230bb4707 KVM: Fix hotplug of CPUs
Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.

Backend was not allocating enough structure for all possible CPUs, so
new CPUs coming online could not be hardware enabled.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:13 +02:00
Zachary Amsden
e6732a5af9 KVM: Fix printk name error in svm.c
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:13 +02:00
Zachary Amsden
0cca790753 KVM: Kill the confusing tsc_ref_khz and ref_freq variables
They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.

Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery.  Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.

On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:12 +02:00
Zachary Amsden
b820cc0ca2 KVM: Separate timer intialization into an indepedent function
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:12 +02:00
Joerg Roedel
e935d48e1b KVM: SVM: Remove remaining occurences of rdtscll
This patch replaces them with native_read_tsc() which can
also be used in expressions and saves a variable on the
stack in this case.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:12 +02:00
Joerg Roedel
33527ad7e1 KVM: SVM: don't copy exit_int_info on nested vmrun
The exit_int_info field is only written by the hardware and
never read. So it does not need to be copied on a vmrun
emulation.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:11 +02:00
Joerg Roedel
7fcdb5103d KVM: SVM: reorganize svm_interrupt_allowed
This patch reorganizes the logic in svm_interrupt_allowed to
make it better to read. This is important because the logic
is a lot more complicated with Nested SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:11 +02:00
Huang Weiyi
bfc33beaed KVM: remove duplicated #include
Remove duplicated #include('s) in
  arch/x86/kvm/lapic.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:10 +02:00
Alexander Graf
10474ae894 KVM: Activate Virtualization On Demand
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:10 +02:00
Marcelo Tosatti
e8b3433a5c KVM: SVM: remove needless mmap_sem acquision from nested_svm_map
nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:10 +02:00
Mohammed Gamal
80ced186d1 KVM: VMX: Enhance invalid guest state emulation
- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that have already failed

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:09 +02:00
Mohammed Gamal
abcf14b560 KVM: x86 emulator: Add pusha and popa instructions
This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.

[marcelo: remove unused variable]

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:09 +02:00