Commit Graph

7681 Commits

Author SHA1 Message Date
Linus Torvalds
fcda12e7f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  modpost: Fix modpost's license checking V3
  module: add /sys/module/<name>/uevent files
  module: change attr callbacks to take struct module_kobject
  modules: make arch's use default loader hooks
  modules: add default loader hook implementations
  param: fix return value handling in param_set_*
2011-07-24 09:54:54 -07:00
Linus Torvalds
5fabc487c9 Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  KVM: IOMMU: Disable device assignment without interrupt remapping
  KVM: MMU: trace mmio page fault
  KVM: MMU: mmio page fault support
  KVM: MMU: reorganize struct kvm_shadow_walk_iterator
  KVM: MMU: lockless walking shadow page table
  KVM: MMU: do not need atomicly to set/clear spte
  KVM: MMU: introduce the rules to modify shadow page table
  KVM: MMU: abstract some functions to handle fault pfn
  KVM: MMU: filter out the mmio pfn from the fault pfn
  KVM: MMU: remove bypass_guest_pf
  KVM: MMU: split kvm_mmu_free_page
  KVM: MMU: count used shadow pages on prepareing path
  KVM: MMU: rename 'pt_write' to 'emulate'
  KVM: MMU: cleanup for FNAME(fetch)
  KVM: MMU: optimize to handle dirty bit
  KVM: MMU: cache mmio info on page fault path
  KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code
  KVM: MMU: do not update slot bitmap if spte is nonpresent
  KVM: MMU: fix walking shadow page table
  KVM guest: KVM Steal time registration
  ...
2011-07-24 09:07:03 -07:00
Jonas Bonn
66574cc054 modules: make arch's use default loader hooks
This patch removes all the module loader hook implementations in the
architecture specific code where the functionality is the same as that
now provided by the recently added default hooks.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-24 22:06:04 +09:30
Glauber Costa
d910f5c106 KVM guest: KVM Steal time registration
This patch implements the kvm bits of the steal time infrastructure.
The most important part of it, is the steal time clock. It is an
continuous clock that shows the accumulated amount of steal time
since vcpu creation. It is supposed to survive cpu offlining/onlining.

[marcelo: fix build with CONFIG_KVM_GUEST=n]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Avi Kivity <avi@redhat.com>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-24 11:49:36 +03:00
Linus Torvalds
b4db920c7f Merge branches 'x86-detect-hyper-for-linus', 'x86-fpu-for-linus', 'x86-kexec-for-linus', 'x86-platform-for-linus', 'x86-quirks-for-linus', 'x86-tsc-for-linus' and 'x86-smpboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-detect-hyper-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hyper: Change hypervisor detection order

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, fpu: Fix DNA exception during check_fpu()

* 'x86-kexec-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kexec, x86: Fix incorrect jump back address if not preserving context

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, config: Introduce an INTEL_MID configuration

* 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, quirks: Use pci_dev->revision

* 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: tsc: Remove unneeded DMI-based blacklisting

* 'x86-smpboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, boot: Wait for boot cpu to show up if nr_cpus limit is about to hit
2011-07-23 10:38:21 -07:00
Linus Torvalds
9d0715630e Merge branch 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource: apb: Share APB timer code with other platforms
2011-07-23 10:34:47 -07:00
Linus Torvalds
8e204874db Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64, vdso: Do not allocate memory for the vDSO
  clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option
  x86, vdso: Drop now wrong comment
  Document the vDSO and add a reference parser
  ia64: Replace clocksource.fsys_mmio with generic arch data
  x86-64: Move vread_tsc and vread_hpet into the vDSO
  clocksource: Replace vread with generic arch data
  x86-64: Add --no-undefined to vDSO build
  x86-64: Allow alternative patching in the vDSO
  x86: Make alternative instruction pointers relative
  x86-64: Improve vsyscall emulation CS and RIP handling
  x86-64: Emulate legacy vsyscalls
  x86-64: Fill unused parts of the vsyscall page with 0xcc
  x86-64: Remove vsyscall number 3 (venosys)
  x86-64: Map the HPET NX
  x86-64: Remove kernel.vsyscall64 sysctl
  x86-64: Give vvars their own page
  x86-64: Document some of entry_64.S
  x86-64: Fix alignment of jiffies variable
2011-07-22 17:05:15 -07:00
Linus Torvalds
8051207959 Merge branch 'x86-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Kill handle_signal()->set_fs()
  x86, do_signal: Simplify the TS_RESTORE_SIGMASK logic
  x86, signals: Convert the X86_32 code to use set_current_blocked()
  x86, signals: Convert the IA32_EMULATION code to use set_current_blocked()
2011-07-22 17:04:32 -07:00
Linus Torvalds
dc43d9fa73 Merge branch 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mtrr: Use pci_dev->revision
  x86, mtrr: use stop_machine APIs for doing MTRR rendezvous
  stop_machine: implement stop_machine_from_inactive_cpu()
  stop_machine: reorganize stop_cpus() implementation
  x86, mtrr: lock stop machine during MTRR rendezvous sequence
2011-07-22 17:04:04 -07:00
Linus Torvalds
80775068db Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, microcode, AMD: Fix section header size check
  x86, microcode, AMD: Correct buf references
2011-07-22 17:03:52 -07:00
Linus Torvalds
7c6582b28a Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: Use mce_sysdev_ prefix to group functions
  x86, mce: Use mce_chrdev_ prefix to group functions
  x86, mce: Cleanup mce_read()
  x86, mce: Cleanup mce_create()/remove_device()
  x86, mce: Check the result of ancient_init()
  x86, mce: Introduce mce_gather_info()
  x86, mce: Replace MCM_ with MCI_MISC_
  x86, mce: Replace MCE_SELF_VECTOR by irq_work
  x86, mce, severity: Clean up trivial coding style problems
  x86, mce, severity: Cleanup severity table
  x86, mce, severity: Make formatting a bit more readable
  x86, mce, severity: Fix two severities table signatures
2011-07-22 17:03:40 -07:00
Linus Torvalds
35b004cce1 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message
  x86, msr: Fix typo in ENERGY_PERF_BIAS_POWERSAVE
  x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS
2011-07-22 17:02:54 -07:00
Linus Torvalds
0a613b647b Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, smpboot: Mark the names[] array in __inquire_remote_apic() as const
  x86: Convert vmalloc()+memset() to vzalloc()
2011-07-22 17:02:38 -07:00
Linus Torvalds
eb47418dc5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix write lock scalability 64-bit issue
  x86: Unify rwsem assembly implementation
  x86: Unify rwlock assembly implementation
  x86, asm: Fix binutils 2.16 issue with __USER32_CS
  x86, asm: Cleanup thunk_64.S
  x86, asm: Flip RESTORE_ARGS arguments logic
  x86, asm: Flip SAVE_ARGS arguments logic
  x86, asm: Thin down SAVE/RESTORE_* asm macros
2011-07-22 17:02:24 -07:00
Linus Torvalds
2c9e88a108 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, ioapic: Print IR_IO_APIC_route_entry when IR is enabled
  x86, ioapic: Print IRTE when IR is enabled
  x86, x2apic: Preserve high 32-bits of IA32_APIC_BASE MSR
  x86, ioapic: Also print Dest field
  x86, ioapic: Format clean up for IOAPIC output
  x86: print APIC data a little later during boot
2011-07-22 17:02:07 -07:00
Linus Torvalds
a99a7d1436 Merge branch 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  mips: Fix i8253 clockevent fallout
  i8253: Cleanup outb/inb magic
  arm: Footbridge: Use common i8253 clockevent
  mips: Use common i8253 clockevent
  x86: Use common i8253 clockevent
  i8253: Create common clockevent implementation
  i8253: Export i8253_lock unconditionally
  pcpskr: MIPS: Make config dependencies finer grained
  pcspkr: Cleanup Kconfig dependencies
  i8253: Move remaining content and delete asm/i8253.h
  i8253: Consolidate definitions of PIT_LATCH
  x86: i8253: Consolidate definitions of global_clock_event
  i8253: Alpha, PowerPC: Remove unused asm/8253pit.h
  alpha: i8253: Cleanup remaining users of i8253pit.h
  i8253: Remove I8253_LOCK config
  i8253: Make pcsp sound driver use the shared i8253_lock
  i8253: Make pcspkr input driver use the shared i8253_lock
  i8253: Consolidate all kernel definitions of i8253_lock
  i8253: Unify all kernel declarations of i8253_lock
  i8253: Create linux/i8253.h and use it in all 8253 related files
2011-07-22 16:51:56 -07:00
Linus Torvalds
4d4abdcb1d Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits)
  perf: Remove the nmi parameter from the oprofile_perf backend
  x86, perf: Make copy_from_user_nmi() a library function
  perf: Remove perf_event_attr::type check
  x86, perf: P4 PMU - Fix typos in comments and style cleanup
  perf tools: Make test use the preset debugfs path
  perf tools: Add automated tests for events parsing
  perf tools: De-opt the parse_events function
  perf script: Fix display of IP address for non-callchain path
  perf tools: Fix endian conversion reading event attr from file header
  perf tools: Add missing 'node' alias to the hw_cache[] array
  perf probe: Support adding probes on offline kernel modules
  perf probe: Add probed module in front of function
  perf probe: Introduce debuginfo to encapsulate dwarf information
  perf-probe: Move dwarf library routines to dwarf-aux.{c, h}
  perf probe: Remove redundant dwarf functions
  perf probe: Move strtailcmp to string.c
  perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
  tracing/kprobe: Update symbol reference when loading module
  tracing/kprobes: Support module init function probing
  kprobes: Return -ENOENT if probe point doesn't exist
  ...
2011-07-22 16:44:39 -07:00
Linus Torvalds
6d16d6d9bb Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  iommu/core: Fix build with INTR_REMAP=y && CONFIG_DMAR=n
  iommu/amd: Don't use MSI address range for DMA addresses
  iommu/amd: Move missing parts to drivers/iommu
  iommu: Move iommu Kconfig entries to submenu
  x86/ia64: intel-iommu: move to drivers/iommu/
  x86: amd_iommu: move to drivers/iommu/
  msm: iommu: move to drivers/iommu/
  drivers: iommu: move to a dedicated folder
  x86/amd-iommu: Store device alias as dev_data pointer
  x86/amd-iommu: Search for existind dev_data before allocting a new one
  x86/amd-iommu: Allow dev_data->alias to be NULL
  x86/amd-iommu: Use only dev_data in low-level domain attach/detach functions
  x86/amd-iommu: Use only dev_data for dte and iotlb flushing routines
  x86/amd-iommu: Store ATS state in dev_data
  x86/amd-iommu: Store devid in dev_data
  x86/amd-iommu: Introduce global dev_data_list
  x86/amd-iommu: Remove redundant device_flush_dte() calls
  iommu-api: Add missing header file

Fix up trivial conflicts (independent additions close to each other) in
drivers/Makefile and include/linux/pci.h
2011-07-22 16:39:42 -07:00
Linus Torvalds
acb41c0f92 Merge branch 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  pci/of: Consolidate pci_bus_to_OF_node()
  pci/of: Consolidate pci_device_to_OF_node()
  x86/devicetree: Use generic PCI <-> OF matching
  microblaze/pci: Move the remains of pci_32.c to pci-common.c
  microblaze/pci: Remove powermac originated cruft
  pci/of: Match PCI devices to OF nodes dynamically
2011-07-22 14:54:02 -07:00
Linus Torvalds
951cc93a74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1287 commits)
  icmp: Fix regression in nexthop resolution during replies.
  net: Fix ppc64 BPF JIT dependencies.
  acenic: include NET_SKB_PAD headroom to incoming skbs
  ixgbe: convert to ndo_fix_features
  ixgbe: only enable WoL for magic packet by default
  ixgbe: remove ifdef check for non-existent define
  ixgbe: Pass staterr instead of re-reading status and error bits from descriptor
  ixgbe: Move interrupt related values out of ring and into q_vector
  ixgbe: add structure for containing RX/TX rings to q_vector
  ixgbe: inline the ixgbe_maybe_stop_tx function
  ixgbe: Update ATR to use recorded TX queues instead of CPU for routing
  igb: Fix for DH89xxCC near end loopback test
  e1000: always call e1000_check_for_link() on e1000_ce4100 MACs.
  netxen: add fw version compatibility check
  be2net: request native mode each time the card is reset
  ipv4: Constrain UFO fragment sizes to multiples of 8 bytes
  virtio_net: Fix panic in virtnet_remove
  ipv6: make fragment identifications less predictable
  ipv6: unshare inetpeers
  can: make function can_get_bittiming static
  ...
2011-07-22 14:43:13 -07:00
Rusty Russell
5dea1c88ed lguest: use a special 1:1 linear pagetable mode until first switch.
The Host used to create some page tables for the Guest to use at the
top of Guest memory; it would then tell the Guest where this was.  In
particular, it created linear mappings for 0 and 0xC0000000 addresses
because lguest used to switch to its real page tables quite late in
boot.

However, since d50d8fe19 Linux initialized boot page tables in
head_32.S even before the "are we lguest?" boot jump.  So, now we can
simplify things: the Host pagetable code assumes 1:1 linear mapping
until it first calls the LHCALL_NEW_PGTABLE hypercall, which we now do
before we reach C code.

This also means that the Host doesn't need to know anything about the
Guest's PAGE_OFFSET.  (Non-Linux guests might not even have such a
thing).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-22 14:39:48 +09:30
H. Peter Anvin
a536877e77 x86: Make Dell Latitude E6420 use reboot=pci
Yet another variant of the Dell Latitude series which requires
reboot=pci.

From the E5420 bug report by Daniel J Blueman:

> The E6420 is affected also (same platform, different casing and
> features), which provides an external confirmation of the issue; I can
> submit a patch for that later or include it if you prefer:
> http://linux.koolsolutions.com/2009/08/04/howto-fix-linux-hangfreeze-during-reboots-and-restarts/

Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
2011-07-21 11:47:17 -07:00
Daniel J Blueman
b7798d28ec x86: Make Dell Latitude E5420 use reboot=pci
Rebooting on the Dell E5420 often hangs with the keyboard or ACPI
methods, but is reliable via the PCI method.

[ hpa: this was deferred because we believed for a long time that the
  recent reshuffling of the boot priorities in commit
  660e34cebf fixed this platform.
  Unfortunately that turned out to be incorrect. ]

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Link: http://lkml.kernel.org/r/1305248699-2347-1-git-send-email-daniel.blueman@gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
2011-07-21 11:45:49 -07:00
Robert Richter
1ac2e6ca44 x86, perf: Make copy_from_user_nmi() a library function
copy_from_user_nmi() is used in oprofile and perf. Moving it to other
library functions like copy_from_user(). As this is x86 code for 32
and 64 bits, create a new file usercopy.c for unified code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110607172413.GJ20052@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 20:41:57 +02:00
Cyrill Gorcunov
f53173e47d x86, perf: P4 PMU - Fix typos in comments and style cleanup
This patch:

 - fixes typos in comments and clarifies the text
 - renames obscure p4_event_alias::original and ::alter members to
   ::original and ::alternative as appropriate
 - drops parenthesis from the return of p4_get_alias_event()

No functional changes.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/20110721160625.GX7492@sun
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 20:41:54 +02:00
Huang Ying
050438ed5a kexec, x86: Fix incorrect jump back address if not preserving context
In kexec jump support, jump back address passed to the kexeced
kernel via function calling ABI, that is, the function call
return address is the jump back entry.

Furthermore, jump back entry == 0 should be used to signal that
the jump back or preserve context is not enabled in the original
kernel.

But in the current implementation the stack position used for
function call return address is not cleared context
preservation is disabled. The patch fixes this bug.

Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 11:19:28 +02:00
Sergei Shtylyov
38175051f8 x86, quirks: Use pci_dev->revision
This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so
it wasn't converted by commit 44c10138fd ("PCI: Change all
drivers to use pci_device->revision") before being moved to arch/x86/...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/201107111901.39281.sshtylyov@ru.mvista.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 10:26:00 +02:00
Greg Dietsche
a6c23905ff x86, smpboot: Mark the names[] array in __inquire_remote_apic() as const
This array is read-only. Make it explicit by marking as const.

Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Link: http://lkml.kernel.org/r/1309482653-23648-1-git-send-email-Gregory.Dietsche@cuw.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 10:04:51 +02:00
Len Brown
17edf2d79f x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message
Fix the printk_once() so that it actually prints (didn't print before
due to a stray comma.)

[ hpa: changed to an incremental patch and adjusted the description
  accordingly. ]

Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107151732480.18606@x980
Cc: <table@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-15 15:13:55 -07:00
Oleg Nesterov
73d382decc x86: Kill handle_signal()->set_fs()
handle_signal()->set_fs() has a nice comment which explains what
set_fs() is, but it doesn't explain why it is needed and why it
depends on CONFIG_X86_64.

Afaics, the history of this confusion is:

	1. I guess today nobody can explain why it was needed
	   in arch/i386/kernel/signal.c, perhaps it was always
	   wrong. This predates 2.4.0 kernel.

	2. then it was copy-and-past'ed to the new x86_64 arch.

	3. then it was removed from i386 (but not from x86_64)
	   by b93b6ca3 "i386: remove unnecessary code".

	4. then it was reintroduced under CONFIG_X86_64 when x86
	   unified i386 and x86_64, because the patch above didn't
	   touch x86_64.

Remove it. ->addr_limit should be correct. Even if it was possible
that it is wrong, it is too late to fix it after setup_rt_frame().

Linus commented in:
http://lkml.kernel.org/r/alpine.LFD.0.999.0707170902570.19166@woody.linux-foundation.org

... about the equivalent bit from i386:

Heh. I think it's entirely historical.

Please realize that the whole reason that function is called "set_fs()" is 
that it literally used to set the %fs segment register, not 
"->addr_limit".

So I think the "set_fs(USER_DS)" is there _only_ to match the other

        regs->xds = __USER_DS;
        regs->xes = __USER_DS;
        regs->xss = __USER_DS;
        regs->xcs = __USER_CS;

things, and never mattered. And now it matters even less, and has been 
copied to all other architectures where it is just totally insane.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710164424.GA20261@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 21:46:20 -07:00
Oleg Nesterov
9b42962074 x86, do_signal: Simplify the TS_RESTORE_SIGMASK logic
1. do_signal() looks at TS_RESTORE_SIGMASK and calculates the
   mask which should be stored in the signal frame, then it
   passes "oldset" to the callees, down to setup_rt_frame().

   This is ugly, setup_rt_frame() can do this itself and nobody
   else needs this sigset_t. Move this code into setup_rt_frame.

2. do_signal() also clears TS_RESTORE_SIGMASK if handle_signal()
   succeeds.

   We can move this to setup_rt_frame() as well, this avoids the
   unnecessary checks and makes the logic more clear.

3. use set_current_blocked() instead of sigprocmask(SIG_SETMASK),
   sigprocmask() should be avoided.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710182203.GA27979@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 21:22:11 -07:00
Oleg Nesterov
3982294b03 x86, signals: Convert the X86_32 code to use set_current_blocked()
sys_sigsuspend() and sys_sigreturn() change ->blocked directly.
This is not correct, see the changelog in e6fa16ab
"signal: sigprocmask() should do retarget_shared_pending()"

Change them to use set_current_blocked().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710192727.GA31759@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 21:21:57 -07:00
Andy Lutomirski
98d0ac38ca x86-64: Move vread_tsc and vread_hpet into the vDSO
The vsyscall page now consists entirely of trap instructions.

Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 17:57:05 -07:00
Cyrill Gorcunov
f912987097 perf, x86: P4 PMU - Introduce event alias feature
Instead of hw_nmi_watchdog_set_attr() weak function
and appropriate x86_pmu::hw_watchdog_set_attr() call
we introduce even alias mechanism which allow us
to drop this routines completely and isolate quirks
of Netburst architecture inside P4 PMU code only.

The main idea remains the same though -- to allow
nmi-watchdog and perf top run simultaneously.

Note the aliasing mechanism applies to generic
PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
event (say passed as RAW initially) might have some
additional bits set inside ESCR register changing
the behaviour of event and we can't guarantee anymore
that alias event will give the same result.

P.S. Thanks a huge to Don and Steven for for testing
     and early review.

Acked-by: Don Zickus <dzickus@redhat.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Stephane Eranian <eranian@google.com>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14 17:25:04 -04:00
Len Brown
abe48b1082 x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS
Since 2.6.36 (23016bf0d2), Linux prints the existence of "epb" in /proc/cpuinfo,
Since 2.6.38 (d5532ee7b4), the x86_energy_perf_policy(8) utility has
been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS.

However, the typical BIOS fails to initialize the MSR, presumably
because this is handled by high-volume shrink-wrap operating systems...

Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
As a result, WSM-EP, SNB, and later hardware from Intel will run in its
default hardware power-on state (performance), which assumes that users
care for performance at all costs and not for energy efficiency.
While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode...

Initialize the MSR to the "normal" by default during kernel boot.

x86_energy_perf_policy(8) is available to change the default after boot,
should the user have a different preference.

Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107140051020.18606@x980
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
2011-07-14 12:13:42 -07:00
David S. Miller
6a7ebdf2fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/bluetooth/l2cap_core.c
2011-07-14 07:56:40 -07:00
Glauber Costa
3c404b578f KVM guest: Add a pv_ops stub for steal time
This patch adds a function pointer in one of the many paravirt_ops
structs, to allow guests to register a steal time function. Besides
a steal time function, we also declare two jump_labels. They will be
used to allow the steal time code to be easily bypassed when not
in use.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-14 12:59:44 +03:00
Andy Lutomirski
433bd805e5 clocksource: Replace vread with generic arch data
The vread field was bloating struct clocksource everywhere except
x86_64, and I want to change the way this works on x86_64, so let's
split it out into per-arch data.

Cc: x86@kernel.org
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: linux-ia64@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/3ae5ec76a168eaaae63f08a2a1060b91aa0b7759.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 11:23:12 -07:00
Andy Lutomirski
59e97e4d6f x86: Make alternative instruction pointers relative
This save a few bytes on x86-64 and means that future patches can
apply alternatives to unrelocated code.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 11:22:56 -07:00
Andy Lutomirski
c9712944b2 x86-64: Improve vsyscall emulation CS and RIP handling
Three fixes here:
 - Send SIGSEGV if called from compat code or with a funny CS.
 - Don't BUG on impossible addresses.
 - Add a missing local_irq_disable.

This patch also removes an unused variable.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/6fb2b13ab39b743d1e4f466eef13425854912f7f.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 11:22:55 -07:00
Maxime Ripard
3628c3f5c8 x86. reboot: Make Dell Latitude E6320 use reboot=pci
The Dell Latitude E6320 doesn't reboot unless reboot=pci is set.
Force it thanks to DMI.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Link: http://lkml.kernel.org/r/1309269451-4966-1-git-send-email-maxime.ripard@free-electrons.com
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-07-12 21:42:48 -07:00
Naga Chumbalkar
42f0efc5aa x86, ioapic: Print IR_IO_APIC_route_entry when IR is enabled
When IR (interrupt remapping) is enabled print_IO_APIC() displays output according
to legacy RTE (redirection table entry) definitons:

 NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 00  1    0    0   0   0    0    0    00
 01 00  0    0    0   0   0    0    0    01
 02 00  0    0    0   0   0    0    0    02
 03 00  1    0    0   0   0    0    0    03
 04 00  1    0    0   0   0    0    0    04
 05 00  1    0    0   0   0    0    0    05
 06 00  1    0    0   0   0    0    0    06
...

The above output is as per Sec 3.2.4 of the IOAPIC datasheet:
82093AA I/O Advanced Programmable Interrupt Controller (IOAPIC):
http://download.intel.com/design/chipsets/datashts/29056601.pdf

Instead the output should display the fields as discussed in Sec 5.5.1
of the VT-d specification:

(Intel Virtualization Technology for Directed I/O:
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf)

After the fix:
 NR Indx Fmt Mask Trig IRR Pol Stat Indx2 Zero Vect:
 00 0000 0   1    0    0   0   0    0     0    00
 01 000F 1   0    0    0   0   0    0     0    01
 02 0001 1   0    0    0   0   0    0     0    02
 03 0002 1   1    0    0   0   0    0     0    03
 04 0011 1   1    0    0   0   0    0     0    04
 05 0004 1   1    0    0   0   0    0     0    05
 06 0005 1   1    0    0   0   0    0     0    06
...

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712211658.2939.93123.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 20:17:58 -07:00
Naga Chumbalkar
3040db92ee x86, ioapic: Print IRTE when IR is enabled
When "apic=debug" is used as a boot parameter, Linux prints the IOAPIC routing
entries in "dmesg". Below is output from IOAPIC whose apic_id is 8:

# dmesg | grep "routing entry"
IOAPIC[8]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
IOAPIC[8]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
IOAPIC[8]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0)
...

Similarly, when IR (interrupt remapping) is enabled, and the IRTE
(interrupt remapping table entry) is set up we should display it.

After the fix:

# dmesg | grep IRTE
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:31 Dest:00000000 SID:00F1 SQ:0 SVT:1)
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:30 Dest:00000000 SID:00F1 SQ:0 SVT:1)
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:33 Dest:00000000 SID:00F1 SQ:0 SVT:1)
...

The IRTE is defined in Sec 9.5 of the Intel VT-d Specification.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712211704.2939.71291.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 14:34:00 -07:00
Naga Chumbalkar
2597085228 x86, x2apic: Preserve high 32-bits of IA32_APIC_BASE MSR
If there's no special reason to zero-out the "high" 32-bits of the IA32_APIC_BASE
MSR, let's preserve it.

The x2APIC Specification doesn't explicitly state any such requirement. (Sec 2.2
in: http://www.intel.com/Assets/PDF/manual/318148.pdf).

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712055831.2498.78521.sendpatchset@nchumbalkar.americas.cpqcorp.net
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 14:33:49 -07:00
Naga Chumbalkar
7fece83235 x86, ioapic: Also print Dest field
The code in setup_ioapic_irq() determines the Destination Field,
so why not also include it in the debug printk output that gets
displayed when the boot parameter "apic=debug" is used.

Before the change, "dmesg" will show:

 IOAPIC[0]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0)
 IOAPIC[0]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0)
 IOAPIC[0]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0) ...

After the change, you will see:

 IOAPIC[0]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
 IOAPIC[0]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
 IOAPIC[0]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0) ...

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708184603.2734.91071.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-11 16:31:05 +02:00
Naga Chumbalkar
bd6a46e087 x86, ioapic: Format clean up for IOAPIC output
When IOAPIC data is displayed in "dmesg" with the help of the
boot parameter "apic=debug" certain values are not formatted
correctly wrt their size.

In the "dmesg" snippet below, note that the output for "max
redirection entries", and "IO APIC version" which are each
defined to be just 8-bits long are displayed as 2 bytes in
length. Similarly, "Dst" under the "IRQ redirection table"
should only be 8-bits long.

IO APIC #0......
...
...
.... register #01: 00170020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 0
.......     : IO APIC version: 0020
...
...
.... IRQ redirection table:
 NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 000 1    0    0   0   0    0    0    00
 01 000 0    0    0   0   0    0    0    31
 02 000 0    0    0   0   0    0    0    30
 03 000 1    0    0   0   0    0    0    33
...
...

Do some formatting clean up, so you will see output like below:

IO APIC #0......
...
...
.... register #01: 00170020
.......     : max redirection entries: 17
.......     : PRQ implemented: 0
.......     : IO APIC version: 20
...
...
.... IRQ redirection table:
 NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 00  1    0    0   0   0    0    0    00
 01 00  0    0    0   0   0    0    0    31
 02 00  0    0    0   0   0    0    0    30
 03 00  1    0    0   0   0    0    0    33
...
...

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708184557.2734.61830.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-11 16:31:05 +02:00
Anupam Chanda
24a42bae68 x86, hyper: Change hypervisor detection order
Detect Xen before HyperV because in Viridian compatibility mode Xen
presents itself as HyperV.  Move Xen to the top since it seems more
likely that Xen would emulate VMware than vice versa.

Signed-off-by: Anupam Chanda <achanda@nicira.com>
Link: http://lkml.kernel.org/r/1310150570-26810-1-git-send-email-achanda@nicira.com
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-08 16:22:29 -07:00
Vivek Goyal
14cb6dcf0a x86, boot: Wait for boot cpu to show up if nr_cpus limit is about to hit
nr_cpus allows one to specify number of possible cpus in the system.
Current assumption seems to be that first cpu to show up is boot cpu
and this assumption will be broken in kdump scenario where we can be
booting on a non boot cpu with nr_cpus=1.

It might happen that first cpu we parse is not the cpu we boot on and
later we ignore boot cpu. Though code later seems to recognize this
anomaly and forcibly sets boot cpu in physical cpu map with following
warning.

if (!physid_isset(hard_smp_processor_id(), phys_cpu_present_map)) {
        printk(KERN_WARNING
                "weird, boot CPU (#%d) not listed by the BIOS.\n",
                hard_smp_processor_id());

        physid_set(hard_smp_processor_id(), phys_cpu_present_map);
}

This patch waits for boot cpu to show up and starts ignoring the cpus
once we have hit (nr_cpus - 1) number of cpus. So effectively we are
reserving one slot out of nr_cpus for boot cpu explicitly.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20110708171926.GF2930@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-08 15:33:35 -07:00
Naga Chumbalkar
ded1f6ab43 x86: print APIC data a little later during boot
To view IOAPIC data you could boot with "apic=debug".

When booting in such a way then the kernel will dump the
IO-APIC's registers, for example:

NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 000 1    0    0   0   0    0    0    00
 01 000 0    0    0   0   0    0    0    31
 02 000 0    0    0   0   0    0    0    30
 03 000 0    0    0   0   0    0    0    33
 04 000 0    0    0   0   0    0    0    34
 05 000 0    0    0   0   0    0    0    35
 06 000 0    0    0   0   0    0    0    36
 07 000 0    0    0   0   0    0    0    37
 08 000 0    0    0   0   0    0    0    38
 09 000 0    1    0   0   0    0    0    39
 0a 000 0    0    0   0   0    0    0    3A
 0b 000 0    0    0   0   0    0    0    3B
 0c 000 0    0    0   0   0    0    0    3C
 0d 000 0    0    0   0   0    0    0    3D
 0e 000 0    0    0   0   0    0    0    3E
 0f 000 0    0    0   0   0    0    0    3F
 10 000 1    0    0   0   0    0    0    00
 11 000 1    0    0   0   0    0    0    00
 12 000 1    0    0   0   0    0    0    00
 13 000 1    0    0   0   0    0    0    00
 14 000 1    0    0   0   0    0    0    00
 15 000 1    0    0   0   0    0    0    00
 16 000 1    0    0   0   0    0    0    00
 17 000 1    0    0   0   0    0    0    00

Delaying the call to print_ICs() gives better results:

NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 000 1    0    0   0   0    0    0    00
 01 000 0    0    0   0   0    0    0    31
 02 000 0    0    0   0   0    0    0    30
 03 000 1    0    0   0   0    0    0    33
 04 000 1    0    0   0   0    0    0    34
 05 000 1    0    0   0   0    0    0    35
 06 000 1    0    0   0   0    0    0    36
 07 000 1    0    0   0   0    0    0    37
 08 000 0    0    0   0   0    0    0    38
 09 000 0    1    0   0   0    0    0    39
 0a 000 1    0    0   0   0    0    0    3A
 0b 000 1    0    0   0   0    0    0    3B
 0c 000 0    0    0   0   0    0    0    3C
 0d 000 1    0    0   0   0    0    0    3D
 0e 000 1    0    0   0   0    0    0    3E
 0f 000 1    0    0   0   0    0    0    3F
 10 000 1    1    0   1   0    0    0    29
 11 000 1    0    0   0   0    0    0    00
 12 000 1    0    0   0   0    0    0    00
 13 000 1    0    0   0   0    0    0    00
 14 000 0    1    0   1   0    0    0    51
 15 000 1    0    0   0   0    0    0    00
 16 000 0    1    0   1   0    0    0    61
 17 000 0    1    0   1   0    0    0    59

Notice that the entries beyond interrupt input signal 0x0f also
get populated and arent just the hw-initialization default of
all zeroes.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708083555.2598.42216.sendpatchset@nchumbalkar.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-08 13:20:14 +02:00
Kees Cook
7a3136666b x86, suspend: Restore MISC_ENABLE MSR in realmode wakeup
Some BIOSes will reset the Intel MISC_ENABLE MSR (specifically the
XD_DISABLE bit) when resuming from S3, which can interact poorly with
ebba638ae7. In 32bit PAE mode, this can
lead to a fault when EFER is restored by the kernel wakeup routines,
due to it setting the NX bit for a CPU that (thanks to the BIOS reset)
now incorrectly thinks it lacks the NX feature. (64bit is not affected
because it uses a common CPU bring-up that specifically handles the
XD_DISABLE bit.)

The need for MISC_ENABLE being restored so early is specific to the S3
resume path. Normally, MISC_ENABLE is saved in save_processor_state(),
but this happens after the resume header is created, so just reproduce
the logic here. (acpi_suspend_lowlevel() creates the header, calls
do_suspend_lowlevel, which calls save_processor_state(), so the saved
processor context isn't available during resume header creation.)

[ hpa: Consider for stable if OK in mainline ]

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Link: http://lkml.kernel.org/r/20110707011034.GA8523@outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> 2.6.38+
2011-07-06 20:09:34 -07:00