Commit Graph

338 Commits

Author SHA1 Message Date
Rusty Russell
bfc733a7a3 KVM: SVM: Make set_msr_interception more reliable
set_msr_interception() is used by svm to set up which MSRs should be
intercepted.  It can only fail if someone has changed the code to try
to intercept an MSR without updating the array of ranges.

The return value is ignored anyway: it should just BUG() if it doesn't
work.  (A build-time failure would be better, but that's tricky).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:22 +02:00
Rusty Russell
7e9d619d2a KVM: Cleanup mark_page_dirty
For some reason, mark_page_dirty open-codes __gfn_to_memslot().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:22 +02:00
Rusty Russell
fb76441649 KVM: Don't assign vcpu->cr3 if it's invalid: check first, set last
sSigned-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:22 +02:00
Yang, Sheng
002c7f7c32 KVM: VMX: Add cpu consistency check
All the physical CPUs on the board should support the same VMX feature
set.  Add check_processor_compatibility to kvm_arch_ops for the consistency
check.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:22 +02:00
Rusty Russell
39214915f5 KVM: kvm_vm_ioctl_get_dirty_log restore "nothing dirty" optimization
kvm_vm_ioctl_get_dirty_log scans bitmap to see it it's all zero, but
doesn't use that information.

Avi says:
	Looks like it was used to guard	kvm_mmu_slot_remove_write_access();
	optimizing the case where the guest just leaves the screen alone (which
	it usually does, especially in benchmarks).

	I'd rather reinstate that optimization.  See
	90cb0529dd where the damage was done.

It's pretty simple: if the bitmap is all zero, we don't need to do anything to
clean it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
b114b0804d KVM: Use alignment properties of vcpu to simplify FPU ops
Now we use a kmem cache for allocating vcpus, we can get the 16-byte
alignment required by fxsave & fxrstor instructions, and avoid
manually aligning the buffer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
c16f862d02 KVM: Use kmem cache for allocating vcpus
Avi wants the allocations of vcpus centralized again.  The easiest way
is to add a "size" arg to kvm_init_arch, and expose the thus-prepared
cache to the modules.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Laurent Vivier
e7d5d76cae KVM: Remove kvm_{read,write}_guest()
... in favor of the more general emulator_{read,write}_*.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Laurent Vivier
cebff02b11 KVM: Change the emulator_{read,write,cmpxchg}_* functions to take a vcpu
... instead of a x86_emulate_ctxt, so that other callers can use it easily.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
0e5017d4ae KVM: SVM: internal function name cleanup
Changes some svm.c internal function names:
1) io_adress -> io_address  (de-germanify the spelling)
2) kvm_reput_irq -> reput_irq  (it's not a generic kvm function)
3) kvm_do_inject_irq -> (it's not a generic kvm function)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
e756fc626d KVM: SVM: de-containization
container_of is wonderful, but not casting at all is better.  This
patch changes svm.c's internal functions to pass "struct vcpu_svm"
instead of "struct kvm_vcpu" and using container_of.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
3077c4513c KVM: Remove three magic numbers
There are several places where hardcoded numbers are used in place of
the easily-available constant, which is poor form.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
8b9cf98cc7 KVM: VMX: pass vcpu_vmx internally
container_of is wonderful, but not casting at all is better.  This
patch changes vmx.c's internal functions to pass "struct vcpu_vmx"
instead of "struct kvm_vcpu" and using container_of.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:21 +02:00
Rusty Russell
9bd01506ee KVM: fx_init() needs preemption disabled while it plays with the FPU state
Now that kvm generally runs with preemption enabled, we need to protect
the fpu intialization sequence.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Shaohua Li
11ec280471 KVM: Convert vm lock to a mutex
This allows the kvm mmu to perform sleepy operations, such as memory
allocation.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Avi Kivity
15ad71460d KVM: Use the scheduler preemption notifiers to make kvm preemptible
Current kvm disables preemption while the new virtualization registers are
in use.  This of course is not very good for latency sensitive workloads (one
use of virtualization is to offload user interface and other latency
insensitive stuff to a container, so that it is easier to analyze the
remaining workload).  This patch re-enables preemption for kvm; preemption
is now only disabled when switching the registers in and out, and during
the switch to guest mode and back.

Contains fixes from Shaohua Li <shaohua.li@intel.com>.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Jeff Dike
519ef35341 KVM: add hypercall nr to kvm_run
Add the hypercall number to kvm_run and initialize it.  This changes the ABI,
but as this particular ABI was unusable before this no users are affected.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Yang, Sheng
1c3d14fe0a KVM: VMX: Improve the method of writing vmcs control
Put cpu feature detecting part in hardware_setup, and stored the vmcs
condition in global variable for further check.

[glommer: fix for some i386-only machines not supporting CR8 load/store
 exiting]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Rusty Russell
fb3f0f51d9 KVM: Dynamically allocate vcpus
This patch converts the vcpus array in "struct kvm" to a pointer
array, and changes the "vcpu_create" and "vcpu_setup" hooks into one
"vcpu_create" call which does the allocation and initialization of the
vcpu (calling back into the kvm_vcpu_init core helper).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Gregory Haskins
a2fa3e9f52 KVM: Remove arch specific components from the general code
struct kvm_vcpu has vmx-specific members; remove them to a private structure.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Rusty Russell
c820c2aa27 KVM: load_pdptrs() cleanups
load_pdptrs can be handed an invalid cr3, and it should not oops.
This can happen because we injected #gp in set_cr3() after we set
vcpu->cr3 to the invalid value, or from kvm_vcpu_ioctl_set_sregs(), or
memory configuration changes after the guest did set_cr3().

We should also copy the pdpte array once, before checking and
assigning, otherwise an SMP guest can potentially alter the values
between the check and the set.

Finally one nitpick: ret = 1 should be done as late as possible: this
allows GCC to check for unset "ret" should the function change in
future.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:20 +02:00
Aurelien Jarno
3ccb8827fb KVM: Remove dead code in the cmpxchg instruction emulation
The writeback fixes (02c03a326a) let
some dead code in the cmpxchg instruction emulation. Remove it.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Yang, Sheng
62b3ffb8b3 KVM: VMX: Import some constants of vmcs from IA32 SDM
This patch mainly imports some constants and rename two exist constants
of vmcs according to IA32 SDM.

It also adds two constants to indicate Lock bit and Enable bit in
MSR_IA32_FEATURE_CONTROL, and replace the hardcode _5_ with these two
bits.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Shaohua Li
fe55188194 KVM: Move gfn_to_page out of kmap/unmap pairs
gfn_to_page might sleep with swap support. Move it out of the kmap calls.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Shaohua Li
9ae0448f53 KVM: Hoist kvm_mmu_reload() out of the critical section
vmx_cpu_run doesn't handle error correctly and kvm_mmu_reload might
sleep with mutex changes, so I move it above.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Rusty Russell
310bc76c2b KVM: Return if the pdptrs are invalid when the guest turns on PAE.
Don't fall through and turn on PAE in this case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Avi Kivity
394b6e5944 KVM: x86 emulator: fix faulty check for two-byte opcode
Right now, the bug is harmless as we never emulate one-byte 0xb6 or 0xb7.
But things may change.

Noted by the mysterious Gabriel C.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Avi Kivity
e3243452f4 KVM: x86 emulator: fix cmov for writeback changes
The writeback fixes (02c03a326a) broke
cmov emulation.  Fix.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Rusty Russell
7075bc816c KVM: Use standard CR8 flags, and fix TPR definition
Intel manual (and KVM definition) say the TPR is 4 bits wide.  Also fix
CR8_RESEVED_BITS typo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Jeff Dike
8fc0d085f5 KVM: Set exit_reason to KVM_EXIT_MMIO where run->mmio is initialized.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:19 +02:00
Rusty Russell
9eb829ced8 KVM: Trivial: Use standard BITMAP macros, open-code userspace-exposed header
Creating one's own BITMAP macro seems suboptimal: if we use manual
arithmetic in the one place exposed to userspace, we can use standard
macros elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell
66aee91aaa KVM: Use standard CR4 flags, tighten checking
On this machine (Intel), writing to the CR4 bits 0x00000800 and
0x00001000 cause a GPF.  The Intel manual is a little unclear, but
AFIACT they're reserved, too.

Also fix spelling of CR4_RESEVED_BITS.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell
f802a307cb KVM: Use standard CR3 flags, tighten checking
The kernel now has asm/cpu-features.h: use those macros instead of inventing
our own.

Also spell out definition of CR3_RESEVED_BITS, fix spelling and
tighten it for the non-PAE case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell
707d92fa72 KVM: Trivial: Use standard CR0 flags macros from asm/cpu-features.h
The kernel now has asm/cpu-features.h: use those macros instead of
inventing our own.

Also spell out definition of CR0_RESEVED_BITS (no code change) and fix typo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell
9a2b85c620 KVM: Trivial: Avoid hardware_disable predeclaration
Don't pre-declare hardware_disable: shuffle the reboot hook down.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell
dcc0766b22 KVM: Trivial: Comment spelling may escape grep
Speling error in comment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell
1e3c5cb0d5 KVM: Trivial: Make decode_register() static
I have shied away from touching x86_emulate.c (it could definitely use
some love, but it is forked from the Xen code, and it would be more
productive to cross-merge fixes).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Rusty Russell
5eb549a085 KVM: Trivial: Remove unused struct cpu_user_regs declaration
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:18 +02:00
Eddie Dong
65619eb5a8 KVM: In-kernel string pio write support
Add string pio write support to support some version of Windows.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:17 +02:00
Qing He
dad3795d2b KVM: SMP: Add vcpu_id field in struct vcpu
This patch adds a `vcpu_id' field in `struct vcpu', so we can
differentiate BSP and APs without pointer comparison or arithmetic.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:17 +02:00
Nguyen Anh Quynh
cd0d913797 KVM: Fix *nopage() in kvm_main.c
*nopage() in kvm_main.c should only store the type of mmap() fault if
the pointers are not NULL. This patch fixes the problem.

Signed-off-by: Nguyen Anh Quynh <aquynh@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:17 +02:00
Avi Kivity
36a7409741 KVM: Fix virtualization menu help text
What guest drivers?

Cc: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-22 12:55:18 -07:00
Avi Kivity
22d95b1282 KVM: MMU: Fix rare oops on guest context switch
A guest context switch to an uncached cr3 can require allocation of
shadow pages, but we only recycle shadow pages in kvm_mmu_page_fault().

Move shadow page recycling to mmu_topup_memory_caches(), which is called
from both the page fault handler and from guest cr3 reload.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-14 13:59:55 -07:00
Avi Kivity
6ec8a856e4 KVM: Avoid calling smp_call_function_single() with interrupts disabled
When taking a cpu down, we need to hardware_disable() it.
Unfortunately, the CPU_DYING notifier is called with interrupts
disabled, which means we can't use smp_call_function_single().

Fortunately, the CPU_DYING notifier is always called on the dying cpu,
so we don't need to use the function at all and can simply call
hardware_disable() directly.

Tested-by: Paolo Ornati <ornati@fastwebnet.it>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-19 10:13:49 -07:00
Jan Engelhardt
06bfb7eb15 Add some help texts to recently-introduced kconfig items
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (edited MACINTOSH_DRIVERS per Geert Uytterhoeven's remark)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-18 09:52:50 -07:00
Avi Kivity
bac27d35cb KVM: x86 emulator: fix debug reg mov instructions
More fallout from the writeback fixes: debug register transfer
instructions do their own writeback and thus need to disable the general
writeback mechanism.

This fixes oopses and some guest failures on AMD machines (the Intel
variant decodes the instruction in hardware and thus does not need
emulation).

Cc: Alistair John Strachan <alistair@devzero.co.uk>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-06 17:54:41 -07:00
Aurelien Jarno
d37c855719 KVM: disable writeback for 0x0f 0x01 instructions.
0x0f 0x01 instructions (ie lgdt, lidt, smsw, lmsw and invlpg) does
not use writeback. This patch set no_wb=1 when emulating those
instructions.

This fixes a regression booting the FreeBSD kernel on AMD.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-25 14:31:27 +03:00
Avi Kivity
4c981b43d7 KVM: Fix removal of nx capability from guest cpuid
Testing the wrong bit caused kvm not to disable nx on the guest when it is
disabled on the host (an mmu optimization relies on the nx bits being the
same in the guest and host).

This allows Windows to boot when nx is disabled on te host (e.g. when
host pae is disabled).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-25 14:31:13 +03:00
Avi Kivity
7cfa4b0a43 Revert "KVM: Avoid useless memory write when possible"
This reverts commit a3c870bdce.  While it
does save useless updates, it (probably) defeats the fork detector, causing
a massive performance loss.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-25 14:30:56 +03:00
Rusty Russell
5e58cfe41c KVM: Fix unlikely kvm_create vs decache_vcpus_on_cpu race
We add the kvm to the vm_list before initializing the vcpu mutexes,
which can be mutex_trylock()'ed by decache_vcpus_on_cpu().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-25 14:29:34 +03:00