Commit Graph

12335 Commits

Author SHA1 Message Date
Andrea Arcangeli
64cc6ae001 thp: bail out gup_fast on splitting pmd
Force gup_fast to take the slow path and block if the pmd is splitting,
not only if it's none.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:40 -08:00
Andrea Arcangeli
db3eb96f4e thp: add pmd mangling functions to x86
Add needed pmd mangling functions with symmetry with their pte
counterparts.  pmdp_splitting_flush() is the only new addition on the pmd_
methods and it's needed to serialize the VM against split_huge_page.  It
simply atomically sets the splitting bit in a similar way
pmdp_clear_flush_young atomically clears the accessed bit.
pmdp_splitting_flush() also has to flush the tlb to make it effective
against gup_fast, but it wouldn't really require to flush the tlb too.
Just the tlb flush is the simplest operation we can invoke to serialize
pmdp_splitting_flush() against gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:40 -08:00
Andrea Arcangeli
5f6e8da70a thp: special pmd_trans_* functions
These returns 0 at compile time when the config option is disabled, to
allow gcc to eliminate the transparent hugepage function calls at compile
time without additional #ifdefs (only the export of those functions have
to be visible to gcc but they won't be required at link time and
huge_memory.o can be not built at all).

_PAGE_BIT_UNUSED1 is never used for pmd, only on pte.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:40 -08:00
Andrea Arcangeli
2609ae6d10 thp: no paravirt version of pmd ops
No paravirt version of set_pmd_at/pmd_update/pmd_update_defer.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:39 -08:00
Andrea Arcangeli
331127f799 thp: add pmd paravirt ops
Paravirt ops pmd_update/pmd_update_defer/pmd_set_at.  Not all might be
necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs
pmd_update_defer), but this is to keep full simmetry with pte paravirt
ops, which looks cleaner and simpler from a common code POV.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:39 -08:00
Andrea Arcangeli
0a47de52db thp: add native_set_pmd_at
Used by paravirt and not paravirt set_pmd_at.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:39 -08:00
Andrea Arcangeli
9180706344 thp: alter compound get_page/put_page
Alter compound get_page/put_page to keep references on subpages too, in
order to allow __split_huge_page_refcount to split an hugepage even while
subpages have been pinned by one of the get_user_pages() variants.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:39 -08:00
David Rientjes
d0a21265df mm: unify module_alloc code for vmalloc
Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
module_init().  Much of the code is duplicated and can be generalized in a
globally accessible function, __vmalloc_node_range().

__vmalloc_node() now calls into __vmalloc_node_range() with a range of
[VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.

Each architecture may then use __vmalloc_node_range() directly to remove
the duplication of code.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:34 -08:00
Linus Torvalds
27d189c02b Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
  hwrng: via_rng - Fix memory scribbling on some CPUs
  crypto: padlock - Move padlock.h into include/crypto
  hwrng: via_rng - Fix asm constraints
  crypto: n2 - use __devexit not __exit in n2_unregister_algs
  crypto: mark crypto workqueues CPU_INTENSIVE
  crypto: mv_cesa - dont return PTR_ERR() of wrong pointer
  crypto: ripemd - Set module author and update email address
  crypto: omap-sham - backlog handling fix
  crypto: gf128mul - Remove experimental tag
  crypto: af_alg - fix af_alg memory_allocated data type
  crypto: aesni-intel - Fixed build with binutils 2.16
  crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets
  net: Add missing lockdep class names for af_alg
  include: Install linux/if_alg.h for user-space crypto API
  crypto: omap-aes - checkpatch --file warning fixes
  crypto: omap-aes - initialize aes module once per request
  crypto: omap-aes - unnecessary code removed
  crypto: omap-aes - error handling implementation improved
  crypto: omap-aes - redundant locking is removed
  crypto: omap-aes - DMA initialization fixes for OMAP off mode
  ...
2011-01-13 10:25:58 -08:00
Linus Torvalds
e691d24e9c Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, olpc: Speed up device tree creation during boot
  x86, olpc: Add OLPC device-tree support
  x86, of: Define irq functions to allow drivers/of/* to build on x86
2011-01-13 10:15:12 -08:00
Linus Torvalds
55065bc527 Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
  KVM: Initialize fpu state in preemptible context
  KVM: VMX: when entering real mode align segment base to 16 bytes
  KVM: MMU: handle 'map_writable' in set_spte() function
  KVM: MMU: audit: allow audit more guests at the same time
  KVM: Fetch guest cr3 from hardware on demand
  KVM: Replace reads of vcpu->arch.cr3 by an accessor
  KVM: MMU: only write protect mappings at pagetable level
  KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()
  KVM: MMU: Initialize base_role for tdp mmus
  KVM: VMX: Optimize atomic EFER load
  KVM: VMX: Add definitions for more vm entry/exit control bits
  KVM: SVM: copy instruction bytes from VMCB
  KVM: SVM: implement enhanced INVLPG intercept
  KVM: SVM: enhance mov DR intercept handler
  KVM: SVM: enhance MOV CR intercept handler
  KVM: SVM: add new SVM feature bit names
  KVM: cleanup emulate_instruction
  KVM: move complete_insn_gp() into x86.c
  KVM: x86: fix CR8 handling
  KVM guest: Fix kvm clock initialization when it's configured out
  ...
2011-01-13 10:14:24 -08:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Lasse Collin
303148045a x86: support XZ-compressed kernel
This integrates the XZ decompression code to the x86 pre-boot code.

mkpiggy.c is updated to reserve about 32 KiB more buffer safety margin for
kernel decompression.  It is done unconditionally for all decompressors to
keep the code simpler.

The XZ decompressor needs around 30 KiB of heap, so the heap size is
increased to 32 KiB on both x86-32 and x86-64.

Documentation/x86/boot.txt is updated to list the XZ magic number.

With the x86 BCJ filter in XZ, XZ-compressed x86 kernel tends to be a few
percent smaller than the equivalent LZMA-compressed kernel.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:25 -08:00
Andres Salomon
7637c9259f drivers/staging/olpc_dcon: convert to new cs5535 gpio API
Drop the old geode_gpio crud, as well as the raw outl() calls; instead,
use the Linux GPIO API where possible, and the cs5535_gpio API in other
places.

Note that we don't actually clean up the driver properly yet (once loaded,
it always remains loaded).  That'll come later..

This patch is necessary for building the driver.

Signed-off-by: Andres Salomon <dilinger@queued.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:13 -08:00
Amerigo Wang
351f8f8e64 kernel: clean up USE_GENERIC_SMP_HELPERS
For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
don't provide their own implementions.

Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
kernel/softirq.c.

For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g.  blackfin, only
on_each_cpu() is compiled.

Signed-off-by: Amerigo Wang <amwang@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:08 -08:00
Stephen Hemminger
3e5c12409c set_rtc_mmss: show warning message only once
Occasionally the system gets into a state where the CMOS clock has gotten
slightly ahead of current time and the periodic update of RTC fails.  The
message is a nuisance and repeats spamming the log.

  See: http://www.ntp.org/ntpfaq/NTP-s-trbl-spec.htm#Q-LINUX-SET-RTC-MMSS

Rather than just removing the message, make it show only once and reduce
severity since it indicates a normal and non urgent condition.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:07 -08:00
Avi Kivity
e5c3014282 KVM: Initialize fpu state in preemptible context
init_fpu() (which is indirectly called by the fpu switching code) assumes
it is in process context.  Rather than makeing init_fpu() use an atomic
allocation, which can cause a task to be killed, make sure the fpu is
already initialized when we enter the run loop.

KVM-Stable-Tag.
Reported-and-tested-by: Kirill A. Shutemov <kas@openvz.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 12:02:26 +02:00
Gleb Natapov
444e863d13 KVM: VMX: when entering real mode align segment base to 16 bytes
VMX checks that base is equal segment shifted 4 bits left. Otherwise
guest entry fails.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:31:20 +02:00
Xiao Guangrong
f8e453b00c KVM: MMU: handle 'map_writable' in set_spte() function
Move the operation of 'writable' to set_spte() to clean up code

[avi: remove unneeded booleanification]

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:31:19 +02:00
Xiao Guangrong
b034cf0105 KVM: MMU: audit: allow audit more guests at the same time
It only allows to audit one guest in the system since:
- 'audit_point' is a glob variable
- mmu_audit_disable() is called in kvm_mmu_destroy(), so audit is disabled
  after a guest exited

this patch fix those issues then allow to audit more guests at the same time

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:31:17 +02:00
Avi Kivity
aff48baa34 KVM: Fetch guest cr3 from hardware on demand
Instead of syncing the guest cr3 every exit, which is expensince on vmx
with ept enabled, sync it only on demand.

[sheng: fix incorrect cr3 seen by Windows XP]

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:31:16 +02:00
Avi Kivity
9f8fe5043f KVM: Replace reads of vcpu->arch.cr3 by an accessor
This allows us to keep cr3 in the VMCS, later on.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:31:15 +02:00
Marcelo Tosatti
e49146dce8 KVM: MMU: only write protect mappings at pagetable level
If a pagetable contains a writeable large spte, all of its sptes will be
write protected, including non-leaf ones, leading to endless pagefaults.

Do not write protect pages above PT_PAGE_TABLE_LEVEL, as the spte fault
paths assume non-leaf sptes are writable.

Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:13 +02:00
Avi Kivity
16d8f72f70 KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()
'error' is byte sized, so use a byte register constraint.

Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:12 +02:00
Avi Kivity
c445f8ef43 KVM: MMU: Initialize base_role for tdp mmus
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:11 +02:00
Avi Kivity
110312c84b KVM: VMX: Optimize atomic EFER load
When NX is enabled on the host but not on the guest, we use the entry/exit
msr load facility, which is slow.  Optimize it to use entry/exit efer load,
which is ~1200 cycles faster.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:09 +02:00
Avi Kivity
07c116d2f5 KVM: VMX: Add definitions for more vm entry/exit control bits
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:08 +02:00
Andre Przywara
dc25e89e07 KVM: SVM: copy instruction bytes from VMCB
In case of a nested page fault or an intercepted #PF newer SVM
implementations provide a copy of the faulting instruction bytes
in the VMCB.
Use these bytes to feed the instruction emulator and avoid the costly
guest instruction fetch in this case.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:07 +02:00
Andre Przywara
df4f310856 KVM: SVM: implement enhanced INVLPG intercept
When the DecodeAssist feature is available, the linear address
is provided in the VMCB on INVLPG intercepts. Use it directly to
avoid any decoding and emulation.
This is only useful for shadow paging, though.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:05 +02:00
Andre Przywara
cae3797a46 KVM: SVM: enhance mov DR intercept handler
Newer SVM implementations provide the GPR number in the VMCB, so
that the emulation path is no longer necesarry to handle debug
register access intercepts. Implement the handling in svm.c and
use it when the info is provided.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:04 +02:00
Andre Przywara
7ff76d58a9 KVM: SVM: enhance MOV CR intercept handler
Newer SVM implementations provide the GPR number in the VMCB, so
that the emulation path is no longer necesarry to handle CR
register access intercepts. Implement the handling in svm.c and
use it when the info is provided.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:03 +02:00
Andre Przywara
ddce97aac5 KVM: SVM: add new SVM feature bit names
the recent APM Vol.2 and the recent AMD CPUID specification describe
new CPUID features bits for SVM. Name them here for later usage.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:02 +02:00
Andre Przywara
51d8b66199 KVM: cleanup emulate_instruction
emulate_instruction had many callers, but only one used all
parameters. One parameter was unused, another one is now
hidden by a wrapper function (required for a future addition
anyway), so most callers use now a shorter parameter list.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:31:00 +02:00
Andre Przywara
db8fcefaa7 KVM: move complete_insn_gp() into x86.c
move the complete_insn_gp() helper function out of the VMX part
into the generic x86 part to make it usable by SVM.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:30:59 +02:00
Andre Przywara
eea1cff9ab KVM: x86: fix CR8 handling
The handling of CR8 writes in KVM is currently somewhat cumbersome.
This patch makes it look like the other CR register handlers
and fixes a possible issue in VMX, where the RIP would be incremented
despite an injected #GP.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:30:58 +02:00
Avi Kivity
a63512a4d7 KVM guest: Fix kvm clock initialization when it's configured out
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:56 +02:00
Takuya Yoshikawa
175504cdbf KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()
In KVM_CREATE_IRQCHIP, kvm_io_bus_unregister_dev() is called without taking
slots_lock in the error handling path.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:55 +02:00
Lai Jiangshan
a355c85c5f KVM: return true when user space query KVM_CAP_USER_NMI extension
userspace may check this extension in runtime.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:54 +02:00
Avi Kivity
61cfab2e83 KVM: Correct kvm_pio tracepoint count field
Currently, we record '1' for count regardless of the real count.  Fix.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:52 +02:00
Avi Kivity
d3c422bd33 KVM: MMU: Fix incorrect direct page write protection due to ro host page
If KVM sees a read-only host page, it will map it as read-only to prevent
breaking a COW.  However, if the page was part of a large guest page, KVM
incorrectly extends the write protection to the entire large page frame
instead of limiting it to the normal host page.

This results in the instantiation of a new shadow page with read-only access.

If this happens for a MOVS instruction that moves memory between two normal
pages, within a single large page frame, and mapped within the guest as a
large page, and if, in addition, the source operand is not writeable in the
host (perhaps due to KSM), then KVM will instantiate a read-only direct
shadow page, instantiate an spte for the source operand, then instantiate
a new read/write direct shadow page and instantiate an spte for the
destination operand.  Since these two sptes are in different shadow pages,
MOVS will never see them at the same time and the guest will not make
progress.

Fix by mapping the direct shadow page read/write, and only marking the
host page read-only.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:51 +02:00
Joerg Roedel
81dd35d42c KVM: SVM: Add xsetbv intercept
This patch implements the xsetbv intercept to the AMD part
of KVM. This makes AVX usable in a save way for the guest on
AVX capable AMD hardware.

The patch is tested by using AVX in the guest and host in
parallel and checking for data corruption. I also used the
KVM xsave unit-tests and they all pass.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:49 +02:00
Takuya Yoshikawa
d4dbf47009 KVM: MMU: Make the way of accessing lpage_info more generic
Large page information has two elements but one of them, write_count, alone
is accessed by a helper function.

This patch replaces this helper function with more generic one which returns
newly named kvm_lpage_info structure and use it to access the other element
rmap_pde.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:47 +02:00
Anthony Liguori
443381a828 KVM: VMX: add module parameter to avoid trapping HLT instructions (v5)
In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling.  There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.

Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:46 +02:00
Joerg Roedel
38e5e92fe8 KVM: SVM: Implement Flush-By-Asid feature
This patch adds the new flush-by-asid of upcoming AMD
processors to the KVM-AMD module.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:45 +02:00
Joerg Roedel
f40f6a459c KVM: SVM: Use svm_flush_tlb instead of force_new_asid
This patch replaces all calls to force_new_asid which are
intended to flush the guest-tlb by the more appropriate
function svm_flush_tlb. As a side-effect the force_new_asid
function is removed.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:44 +02:00
Joerg Roedel
fa22a8d608 KVM: SVM: Remove flush_guest_tlb function
This function is unused and there is svm_flush_tlb which
does the same. So this function can be removed.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:42 +02:00
Xiao Guangrong
fb67e14fc9 KVM: MMU: retry #PF for softmmu
Retry #PF for softmmu only when the current vcpu has the same cr3 as the time
when #PF occurs

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:41 +02:00
Xiao Guangrong
2ec4739ddc KVM: MMU: fix accessed bit set on prefault path
Retry #PF is the speculative path, so don't set the accessed bit

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:40 +02:00
Xiao Guangrong
78b2c54aa4 KVM: MMU: rename 'no_apf' to 'prefault'
It's the speculative path if 'no_apf = 1' and we will specially handle this
speculative path in the later patch, so 'prefault' is better to fit the sense.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:38 +02:00
Joerg Roedel
b53ba3f9cc KVM: SVM: Add clean-bit for LBR state
This patch implements the clean-bit for all LBR related
state. This includes the debugctl, br_from, br_to,
last_excp_from, and last_excp_to msrs.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:37 +02:00