Commit Graph

8713 Commits

Author SHA1 Message Date
Gleb Natapov
6edf14d8d0 KVM: Replace pending exception by PF if it happens serially
Replace previous exception with a new one in a hope that instruction
re-execution will regenerate lost exception.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:56 +03:00
Marcelo Tosatti
54dee9933e KVM: VMX: conditionally disable 2M pages
Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.

[avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:56 +03:00
Marcelo Tosatti
68f89400bc KVM: VMX: EPT misconfiguration handler
Handler for EPT misconfiguration which checks for valid state
in the shadow pagetables, printing the spte on each level.

The separate WARN_ONs are useful for kerneloops.org.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:56 +03:00
Marcelo Tosatti
94d8b056a2 KVM: MMU: add kvm_mmu_get_spte_hierarchy helper
Required by EPT misconfiguration handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:56 +03:00
Marcelo Tosatti
4d88954d62 KVM: MMU: make for_each_shadow_entry aware of largepages
This way there is no need to add explicit checks in every
for_each_shadow_entry user.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:55 +03:00
Marcelo Tosatti
e799794e02 KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits
Required for EPT misconfiguration handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:55 +03:00
Andre Przywara
71db602322 KVM: Move performance counter MSR access interception to generic x86 path
The performance counter MSRs are different for AMD and Intel CPUs and they
are chosen mainly by the CPUID vendor string. This patch catches writes to
all addresses (regardless of VMX/SVM path) and handles them in the generic
MSR handler routine. Writing a 0 into the event select register is something
we perfectly emulate ;-), so don't print out a warning to dmesg in this
case.
This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:54 +03:00
Marcelo Tosatti
2920d72857 KVM: MMU audit: largepage handling
Make the audit code aware of largepages.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:54 +03:00
Marcelo Tosatti
2aaf65e8c4 KVM: MMU audit: audit_mappings tweaks
- Fail early in case gfn_to_pfn returns is_error_pfn.
- For the pre pte write case, avoid spurious "gva is valid but spte is notrap"
  messages (the emulation code does the guest write first, so this particular
  case is OK).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:54 +03:00
Marcelo Tosatti
48fc03174b KVM: MMU audit: nontrapping ptes in nonleaf level
It is valid to set non leaf sptes as notrap.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:54 +03:00
Marcelo Tosatti
e58b0f9e0e KVM: MMU audit: update audit_write_protection
- Unsync pages contain writable sptes in the rmap.
- rmaps do not exclusively contain writable sptes anymore.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:53 +03:00
Marcelo Tosatti
08a3732bf2 KVM: MMU audit: update count_writable_mappings / count_rmaps
Under testing, count_writable_mappings returns a value that is 2 integers
larger than what count_rmaps returns.

Suspicion is that either of the two functions is counting a duplicate (either
positively or negatively).

Modifying check_writable_mappings_rmap to check for rmap existance on
all present MMU pages fails to trigger an error, which should keep Avi
happy.

Also introduce mmu_spte_walk to invoke a callback on all present sptes visible
to the current vcpu, might be useful in the future.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:53 +03:00
Marcelo Tosatti
776e663336 KVM: MMU: introduce is_last_spte helper
Hiding some of the last largepage / level interaction (which is useful
for gbpages and for zero based levels).

Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:53 +03:00
Avi Kivity
3f5d18a965 KVM: Return to userspace on emulation failure
Instead of mindlessly retrying to execute the instruction, report the
failure to userspace.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:52 +03:00
Gleb Natapov
988a2cae6a KVM: Use macro to iterate over vcpus.
[christian: remove unused variables on s390]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:52 +03:00
Gleb Natapov
73880c80aa KVM: Break dependency between vcpu index in vcpus array and vcpu_id.
Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:52 +03:00
Gleb Natapov
1ed0ce000a KVM: Use pointer to vcpu instead of vcpu_id in timer code.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:52 +03:00
Gleb Natapov
c5af89b68a KVM: Introduce kvm_vcpu_is_bsp() function.
Use it instead of open code "vcpu_id zero is BSP" assumption.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:51 +03:00
Avi Kivity
d555c333aa KVM: MMU: s/shadow_pte/spte/
We use shadow_pte and spte inconsistently, switch to the shorter spelling.

Rename set_shadow_pte() to __set_spte() to avoid a conflict with the
existing set_spte(), and to indicate its lowlevelness.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:51 +03:00
Avi Kivity
43a3795a3a KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte
Since the guest and host ptes can have wildly different format, adjust
the pte accessor names to indicate on which type of pte they operate on.

No functional changes.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:51 +03:00
Avi Kivity
439e218a6f KVM: MMU: Fix is_dirty_pte()
is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid
shadow_dirty_mask and use PT_DIRTY_MASK instead.

Misdetecting dirty pages could lead to unnecessarily setting the dirty bit
under EPT.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:50 +03:00
Avi Kivity
7ffd92c53c KVM: VMX: Move rmode structure to vmx-specific code
rmode is only used in vmx, so move it to vmx.c

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:50 +03:00
Nitin A Kamble
3a624e29c7 KVM: VMX: Support Unrestricted Guest feature
"Unrestricted Guest" feature is added in the VMX specification.
Intel Westmere and onwards processors will support this feature.

    It allows kvm guests to run real mode and unpaged mode
code natively in the VMX mode when EPT is turned on. With the
unrestricted guest there is no need to emulate the guest real mode code
in the vm86 container or in the emulator. Also the guest big real mode
code works like native.

  The attached patch enhances KVM to use the unrestricted guest feature
if available on the processor. It also adds a new kernel/module
parameter to disable the unrestricted guest feature at the boot time.

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:49 +03:00
Marcelo Tosatti
fa40a8214b KVM: switch irq injection/acking data structures to irq_lock
Protect irq injection/acking data structures with a separate irq_lock
mutex. This fixes the following deadlock:

CPU A                               CPU B
kvm_vm_ioctl_deassign_dev_irq()
  mutex_lock(&kvm->lock);            worker_thread()
  -> kvm_deassign_irq()                -> kvm_assigned_dev_interrupt_work_handler()
    -> deassign_host_irq()               mutex_lock(&kvm->lock);
      -> cancel_work_sync() [blocked]

[gleb: fix ia64 path]

Reported-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:49 +03:00
Marcelo Tosatti
9f4cc12765 KVM: Grab pic lock in kvm_pic_clear_isr_ack
isr_ack is protected by kvm_pic->lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:48 +03:00
Jan Kiszka
238adc7705 KVM: Cleanup LAPIC interface
None of the interface services the LAPIC emulation provides need to be
exported to modules, and kvm_lapic_get_base is even totally unused
today.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:48 +03:00
Avi Kivity
596ae89565 KVM: VMX: Fix reporting of unhandled EPT violations
Instead of returning -ENOTSUPP, exit normally but indicate the hardware
exit reason.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:46 +03:00
Avi Kivity
6de4f3ada4 KVM: Cache pdptrs
Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx,
guest memory access on svm) extract them on demand.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:46 +03:00
Avi Kivity
8f5d549f02 KVM: VMX: Simplify pdptr and cr3 management
Instead of reading the PDPTRs from memory after every exit (which is slow
and wrong, as the PDPTRs are stored on the cpu), sync the PDPTRs from
memory to the VMCS before entry, and from the VMCS to memory after exit.
Do the same for cr3.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:46 +03:00
Avi Kivity
2d84e993a8 KVM: VMX: Avoid duplicate ept tlb flush when setting cr3
vmx_set_cr3() will call vmx_tlb_flush(), which will flush the ept context.
So there is no need to call ept_sync_context() explicitly.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:46 +03:00
Gregory Haskins
6b66ac1ae3 KVM: do not register i8254 PIO regions until we are initialized
We currently publish the i8254 resources to the pio_bus before the devices
are fully initialized.  Since we hold the pit_lock, its probably not
a real issue.  But lets clean this up anyway.

Reported-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:45 +03:00
Gregory Haskins
d76685c4a0 KVM: cleanup io_device code
We modernize the io_device code so that we use container_of() instead of
dev->private, and move the vtable to a separate ops structure
(theoretically allows better caching for multiple instances of the same
ops structure)

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:45 +03:00
Avi Kivity
6c8166a77c KVM: SVM: Fold kvm_svm.h info svm.c
kvm_svm.h is only included from svm.c, so fold it in.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:44 +03:00
Andre Przywara
017cb99e87 KVM: SVM: use explicit 64bit storage for sysenter values
Since AMD does not support sysenter in 64bit mode, the VMCB fields storing
the MSRs are truncated to 32bit upon VMRUN/#VMEXIT. So store the values
in a separate 64bit storage to avoid truncation.

[andre: fix amd->amd migration]

Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:43 +03:00
Jan Kiszka
c5ff41ce66 KVM: Allow PIT emulation without speaker port
The in-kernel speaker emulation is only a dummy and also unneeded from
the performance point of view. Rather, it takes user space support to
generate sound output on the host, e.g. console beeps.

To allow this, introduce KVM_CREATE_PIT2 which controls in-kernel
speaker port emulation via a flag passed along the new IOCTL. It also
leaves room for future extensions of the PIT configuration interface.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:41 +03:00
Gregory Haskins
721eecbf4f KVM: irqfd
KVM provides a complete virtual system environment for guests, including
support for injecting interrupts modeled after the real exception/interrupt
facilities present on the native platform (such as the IDT on x86).
Virtual interrupts can come from a variety of sources (emulated devices,
pass-through devices, etc) but all must be injected to the guest via
the KVM infrastructure.  This patch adds a new mechanism to inject a specific
interrupt to a guest using a decoupled eventfd mechnanism:  Any legal signal
on the irqfd (using eventfd semantics from either userspace or kernel) will
translate into an injected interrupt in the guest at the next available
interrupt window.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:41 +03:00
Avi Kivity
0ba12d1081 KVM: Move common KVM Kconfig items to new file virt/kvm/Kconfig
Reduce Kconfig code duplication.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:41 +03:00
Gleb Natapov
787ff73637 KVM: Drop interrupt shadow when single stepping should be done only on VMX
The problem exists only on VMX. Also currently we skip this step if
there is pending exception. The patch fixes this too.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:41 +03:00
Christoph Hellwig
284e9b0f5a KVM: cleanup arch/x86/kvm/Makefile
Use proper foo-y style list additions to cleanup all the conditionals,
move module selection after compound object selection and remove the
superflous comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:40 +03:00
Avi Kivity
ee3d29e8be KVM: x86 emulator: fix jmp far decoding (opcode 0xea)
The jump target should not be sign extened; use an unsigned decode flag.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:40 +03:00
Avi Kivity
c9eaf20f26 KVM: x86 emulator: Implement zero-extended immediate decoding
Absolute jumps use zero extended immediate operands.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:39 +03:00
Mark McLoughlin
cb007648de KVM: fix cpuid E2BIG handling for extended request types
If we run out of cpuid entries for extended request types
we should return -E2BIG, just like we do for the standard
request types.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:39 +03:00
Jaswinder Singh Rajput
60af2ecdc5 KVM: Use MSR names in place of address
Replace 0xc0010010 with MSR_K8_SYSCFG and 0xc0010015 with MSR_K7_HWCR.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:39 +03:00
Huang Ying
890ca9aefa KVM: Add MCE support
The related MSRs are emulated. MCE capability is exported via
extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED.  A new
vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
such as the mcg_cap. MCE is injected via vcpu ioctl command
KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
not implemented.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:39 +03:00
Jaswinder Singh Rajput
af24a4e4ae KVM: Replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h
Use standard msr-index.h's MSR declaration.

MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves
80 column issue.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:38 +03:00
Gleb Natapov
ae0bb3e011 KVM: VMX: Properly handle software interrupt re-injection in real mode
When reinjecting a software interrupt or exception, use the correct
instruction length provided by the hardware instead of a hardcoded 1.

Fixes problems running the suse 9.1 livecd boot loader.

Problem introduced by commit f0a3602c20 ("KVM: Move interrupt injection
logic to x86.c").

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:38 +03:00
Yang Xiaowei
2496afbf1e xen: use stronger barrier after unlocking lock
We need to have a stronger barrier between releasing the lock and
checking for any waiting spinners.  A compiler barrier is not sufficient
because the CPU's ordering rules do not prevent the read xl->spinners
from happening before the unlock assignment, as they are different
memory locations.

We need to have an explicit barrier to enforce the write-read ordering
to different memory locations.

Because of it, I can't bring up > 4 HVM guests on one SMP machine.

[ Code and commit comments expanded -J ]

[ Impact: avoid deadlock when using Xen PV spinlocks ]

Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-09-09 16:38:44 -07:00
Jeremy Fitzhardinge
4d576b57b5 xen: only enable interrupts while actually blocking for spinlock
Where possible we enable interrupts while waiting for a spinlock to
become free, in order to reduce big latency spikes in interrupt handling.

However, at present if we manage to pick up the spinlock just before
blocking, we'll end up holding the lock with interrupts enabled for a
while.  This will cause a deadlock if we recieve an interrupt in that
window, and the interrupt handler tries to take the lock too.

Solve this by shrinking the interrupt-enabled region to just around the
blocking call.

[ Impact: avoid race/deadlock when using Xen PV spinlocks ]

Reported-by: "Yang, Xiaowei" <xiaowei.yang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-09-09 16:38:11 -07:00
Jeremy Fitzhardinge
577eebeae3 xen: make -fstack-protector work under Xen
-fstack-protector uses a special per-cpu "stack canary" value.
gcc generates special code in each function to test the canary to make
sure that the function's stack hasn't been overrun.

On x86-64, this is simply an offset of %gs, which is the usual per-cpu
base segment register, so setting it up simply requires loading %gs's
base as normal.

On i386, the stack protector segment is %gs (rather than the usual kernel
percpu %fs segment register).  This requires setting up the full kernel
GDT and then loading %gs accordingly.  We also need to make sure %gs is
initialized when bringing up secondary cpus too.

To keep things consistent, we do the full GDT/segment register setup on
both architectures.

Because we need to avoid -fstack-protected code before setting up the GDT
and because there's no way to disable it on a per-function basis, several
files need to have stack-protector inhibited.

[ Impact: allow Xen booting with stack-protector enabled ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-09-09 16:37:39 -07:00
Rafael J. Wysocki
bf992fa2bc Merge branch 'master' into for-linus 2009-09-10 00:02:02 +02:00
Jiri Slaby
748df9a4c6 x86/PCI: pci quirks, fix pci refcounting
Stanse found a pci reference leak in quirk_amd_nb_node.
Instead of putting nb_ht, there is a put of dev passed as
an argument.

http://stanse.fi.muni.cz/

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 14:11:02 -07:00
Jack Steiner
fa526d0d64 x86, pat: Fix cacheflush address in change_page_attr_set_clr()
Fix address passed to cpa_flush_range() when changing page
attributes from WB to UC. The address (*addr) is
modified by __change_page_attr_set_clr(). The result is that
the pages being flushed start at the _end_ of the changed range
instead of the beginning.

This should be considered for 2.6.30-stable and 2.6.31-stable.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Stable team <stable@kernel.org>
2009-09-09 14:05:24 -07:00
Alex Williamson
80286879c2 PCI iommu: iommu=pt is a valid early param
This avoids a "Malformed early option 'iommu'" on boot when trying
to use pass-through mode.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:28 -07:00
Jesse Barnes
2547089ca2 x86/PCI: initialize PCI bus node numbers early
The current mp_bus_to_node array is initialized only by AMD specific
code, since AMD platforms have registers that can be used for
determining mode numbers.  On new Intel platforms it's necessary to
initialize this array as well though, otherwise all PCI node numbers
will be 0, when in fact they should be -1 (indicating that I/O isn't
tied to any particular node).

So move the mp_bus_to_node code into the common PCI code, and
initialize it early with a default value of -1.  This may be overridden
later by arch code (e.g. the AMD code).

With this change, PCI consistent memory and other node specific
allocations (e.g. skbuff allocs) should occur on the "current" node.
If, for performance reasons, applications want to be bound to specific
nodes, they should open their devices only after being pinned to the
CPU where they'll run, for maximum locality.

Acked-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:21 -07:00
Alex Chiang
a7db504052 PCI: remove pcibios_scan_all_fns()
This was #define'd as 0 on all platforms, so let's get rid of it.

This change makes pci_scan_slot() slightly easier to read.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:18 -07:00
Tejun Heo
3e5cd1f257 dmi: extend dmi_get_year() to dmi_get_date()
There are cases where full date information is required instead of
just the year.  Add month and day parsing to dmi_get_year() and rename
it to dmi_get_date().

As the original function only required '/' followed by any number of
parseable characters at the end of the string, keep that behavior to
avoid upsetting existing users.

The new function takes dates of format [mm[/dd]]/yy[yy].  Year, month
and date are checked to be in the ranges of [1-9999], [1-12] and
[1-31] respectively and any invalid or out-of-range component is
returned as zero.

The dummy implementation is updated accordingly but the return value
is updated to indicate field not found which is consistent with how
other dummy functions behave.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08 21:17:48 -04:00
Peter Zijlstra
a8fae3ec5f sched: enable SD_WAKE_IDLE
Now that SD_WAKE_IDLE doesn't make pipe-test suck anymore,
enable it by default for MC, CPU and NUMA domains.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07 22:00:17 +02:00
Tobias Klauser
d535e4319a x86: Make memtype_seq_ops const
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-06 06:33:33 +02:00
Rafael J. Wysocki
23b6c52cf5 x86: Decrease the level of some NUMA messages to KERN_DEBUG
Some NUMA messages in srat_32.c are confusing to users,
because they seem to indicate errors, while in fact they
reflect normal behaviour.

Decrease the level of these messages to KERN_DEBUG so that
they don't show up unnecessarily.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <200909050107.45175.rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-06 06:32:23 +02:00
Ingo Molnar
ed011b22ce Merge commit 'v2.6.31-rc9' into tracing/core
Merge reason: move from -rc5 to -rc9.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-06 06:11:42 +02:00
H. Peter Anvin
b19ae39998 x86, msr: change msr-reg.o to obj-y, and export its symbols
Change msr-reg.o to obj-y (it will be included in virtually every
kernel since it is used by the initialization code for AMD processors)
and add a separate C file to export its symbols to modules, so that
msr.ko can use them; on uniprocessors we bypass the helper functions
in msr.o and use the accessor functions directly via inlines.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20090904140834.GA15789@elte.hu>
Cc: Borislav Petkov <petkovbb@googlemail.com>
2009-09-04 10:00:09 -07:00
Pekka Enberg
8e019366ba kmemleak: Don't scan uninitialized memory when kmemcheck is enabled
Ingo Molnar reported the following kmemcheck warning when running both
kmemleak and kmemcheck enabled:

  PM: Adding info for No Bus:vcsa7
  WARNING: kmemcheck: Caught 32-bit read from uninitialized memory
  (f6f6e1a4)
  d873f9f600000000c42ae4c1005c87f70000000070665f666978656400000000
   i i i i u u u u i i i i i i i i i i i i i i i i i i i i i u u u
           ^

  Pid: 3091, comm: kmemleak Not tainted (2.6.31-rc7-tip #1303) P4DC6
  EIP: 0060:[<c110301f>] EFLAGS: 00010006 CPU: 0
  EIP is at scan_block+0x3f/0xe0
  EAX: f40bd700 EBX: f40bd780 ECX: f16b46c0 EDX: 00000001
  ESI: f6f6e1a4 EDI: 00000000 EBP: f10f3f4c ESP: c2605fcc
   DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
  CR0: 8005003b CR2: e89a4844 CR3: 30ff1000 CR4: 000006f0
  DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
  DR6: ffff4ff0 DR7: 00000400
   [<c110313c>] scan_object+0x7c/0xf0
   [<c1103389>] kmemleak_scan+0x1d9/0x400
   [<c1103a3c>] kmemleak_scan_thread+0x4c/0xb0
   [<c10819d4>] kthread+0x74/0x80
   [<c10257db>] kernel_thread_helper+0x7/0x3c
   [<ffffffff>] 0xffffffff
  kmemleak: 515 new suspected memory leaks (see
  /sys/kernel/debug/kmemleak)
  kmemleak: 42 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

The problem here is that kmemleak will scan partially initialized
objects that makes kmemcheck complain. Fix that up by skipping
uninitialized memory regions when kmemcheck is enabled.

Reported-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-09-04 16:05:55 +01:00
Ingo Molnar
695a461296 Merge branch 'amd-iommu/2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2009-09-04 14:44:16 +02:00
Ingo Molnar
840a065310 sched: Turn on SD_BALANCE_NEWIDLE
Start the re-tuning of the balancer by turning on newidle.

It improves hackbench performance and parallelism on a 4x4 box.
The "perf stat --repeat 10" measurements give us:

  domain0             domain1
  .......................................
 -SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
   2041.273208  task-clock-msecs         #      9.354 CPUs    ( +-   0.363% )

 +SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
   2086.326925  task-clock-msecs         #     11.934 CPUs    ( +-   0.301% )

 +SD_BALANCE_NEWIDLE +SD_BALANCE_NEWIDLE:
   2115.289791  task-clock-msecs         #     12.158 CPUs    ( +-   0.263% )

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 11:52:54 +02:00
Ingo Molnar
47734f89be sched: Clean up topology.h
Re-organize the flag settings so that it's visible at a glance
which sched-domains flags are set and which not.

With the new balancer code we'll need to re-tune these details
anyway, so make it cleaner to make fewer mistakes down the
road ;-)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 11:52:53 +02:00
Yinghai Lu
0d96b9ff74 x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus
Otherwise, system with apci id lifting will have wrong apicid in
/proc/cpuinfo.

and use that in srat_detect_node().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <4A998CCA.1040407@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:55:29 +02:00
markus.t.metzger@intel.com
1653192f51 x86, perf_counter, bts: Do not allow kernel BTS tracing for now
Kernel BTS tracing generates too much data too fast for us to
handle, causing the kernel to hang.

Fail for BTS requests for kernel code.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zjilstra@chello.nl>
LKML-Reference: <20090902140616.901253000@intel.com>
[ This is really a workaround - but we want BTS tracing in .32
  so make sure we dont regress. The lockup should be fixed
  ASAP. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:26:40 +02:00
markus.t.metzger@intel.com
596da17f94 x86, perf_counter, bts: Correct pointer-to-u64 casts
On 32bit, pointers in the DS AREA configuration are cast to
u64. The current (long) cast to avoid compiler warnings results
in a signed 64bit address.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090902140615.305889000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:26:39 +02:00
markus.t.metzger@intel.com
747b50aaf7 x86, perf_counter, bts: Fail if BTS is not available
Reserve PERF_COUNT_HW_BRANCH_INSTRUCTIONS with sample_period ==
1 for BTS tracing and fail, if BTS is not available.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090902140612.943801000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 09:26:39 +02:00
Jeremy Fitzhardinge
53f824520b x86/i386: Put aligned stack-canary in percpu shared_aligned section
Pack aligned things together into a special section to minimize
padding holes.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <4AA035C0.9070202@goop.org>
[ queued up in tip:x86/asm because it depends on this commit:
  x86/i386: Make sure stack-protector segment base is cache aligned ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 07:10:31 +02:00
Andreas Herrmann
5a925b4282 x86, sched: Workaround broken sched domain creation for AMD Magny-Cours
Current sched domain creation code can't handle multi-node processors.
When switching to power_savings scheduling errors show up and
system might hang later on (due to broken sched domain hierarchy):

  # echo 0  >> /sys/devices/system/cpu/sched_mc_power_savings
  CPU0 attaching sched-domain:
   domain 0: span 0-5 level MC
    groups: 0 1 2 3 4 5
    domain 1: span 0-23 level NODE
     groups: 0-5 6-11 18-23 12-17
  ...
  # echo 1  >> /sys/devices/system/cpu/sched_mc_power_savings
  CPU0 attaching sched-domain:
   domain 0: span 0-11 level MC
    groups: 0 1 2 3 4 5 6 7 8 9 10 11
  ERROR: parent span is not a superset of domain->span
    domain 1: span 0-5 level CPU
  ERROR: domain->groups does not contain CPU0
     groups: 6-11 (__cpu_power = 12288)
  ERROR: groups don't span domain->span
     domain 2: span 0-23 level NODE
      groups:
  ERROR: domain->cpu_power not set

  ERROR: groups don't span domain->span
  ...

Fixing all aspects of power-savings scheduling for Magny-Cours needs
some larger changes in the sched domain creation code.

As a short-term and temporary workaround avoid the problems by
extending "the worst possible hack" ;-(
and always use llc_shared_map on AMD Magny-Cours when MC domain span
is calculated.

With this I get:

  # echo 1  >> /sys/devices/system/cpu/sched_mc_power_savings
  CPU0 attaching sched-domain:
   domain 0: span 0-5 level MC
    groups: 0 1 2 3 4 5
    domain 1: span 0-5 level CPU
     groups: 0-5 (__cpu_power = 6144)
     domain 2: span 0-23 level NODE
      groups: 0-5 (__cpu_power = 6144) 6-11 (__cpu_power = 6144) 18-23 (__cpu_power = 6144) 12-17 (__cpu_power = 6144)
  ...

I.e. no errors during sched domain creation, no system hangs, and also
mc_power_savings scheduling works to a certain extend.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:10:14 -07:00
Andreas Herrmann
cb9805ab5b x86, mcheck: Use correct cpumask for shared bank4
This fixes threshold_bank4 support on multi-node processors.

The correct mask to use is llc_shared_map, representing an internal
node on Magny-Cours.

We need to create 2 sets of symlinks for sibling shared banks -- one
set for each internal node, symlinks of each set should target the
first core on same internal node.

Currently only one set is created where all symlinks are targeting
the first core of the entire socket.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:10:08 -07:00
Andreas Herrmann
a326e948c5 x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors
L3 cache size, associativity and shared_cpu information need to be
adapted to show information for an internal node instead of the
entire physical package.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:10:03 -07:00
Andreas Herrmann
4a376ec3a2 x86: Fix CPU llc_shared_map information for AMD Magny-Cours
Construct entire NodeID and use it as cpu_llc_id. Thus internal node
siblings are stored in llc_shared_map.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:09:59 -07:00
Jeremy Fitzhardinge
1ea0d14e48 x86/i386: Make sure stack-protector segment base is cache aligned
The Intel Optimization Reference Guide says:

	In Intel Atom microarchitecture, the address generation unit
	assumes that the segment base will be 0 by default. Non-zero
	segment base will cause load and store operations to experience
	a delay.
		- If the segment base isn't aligned to a cache line
		  boundary, the max throughput of memory operations is
		  reduced to one [e]very 9 cycles.
	[...]
	Assembly/Compiler Coding Rule 15. (H impact, ML generality)
	For Intel Atom processors, use segments with base set to 0
	whenever possible; avoid non-zero segment base address that is
	not aligned to cache line boundary at all cost.

We can't avoid having a non-zero base for the stack-protector
segment, but we can make it cache-aligned.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: <stable@kernel.org>
LKML-Reference: <4AA01893.6000507@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-03 21:30:51 +02:00
Ingo Molnar
8adf65cfae x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too
The macro was defined in the 32-bit path as well - breaking the
build on 32-bit platforms:

  arch/x86/lib/msr-reg.S: Assembler messages:
  arch/x86/lib/msr-reg.S:53: Error: Bad macro parameter list
  arch/x86/lib/msr-reg.S💯 Error: invalid character '_' in mnemonic
  arch/x86/lib/msr-reg.S:101: Error: invalid character '_' in mnemonic

Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-f6909f394c2d4a0a71320797df72d54c49c5927e@git.kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-03 21:26:34 +02:00
Joerg Roedel
2b681fafcc Merge branch 'amd-iommu/pagetable' into amd-iommu/2.6.32
Conflicts:
	arch/x86/kernel/amd_iommu.c
2009-09-03 17:14:57 +02:00
Joerg Roedel
03362a05c5 Merge branch 'amd-iommu/passthrough' into amd-iommu/2.6.32
Conflicts:
	arch/x86/kernel/amd_iommu.c
	arch/x86/kernel/amd_iommu_init.c
2009-09-03 16:34:23 +02:00
Joerg Roedel
85da07c409 Merge branches 'gart/fixes', 'amd-iommu/fixes+cleanups' and 'amd-iommu/fault-handling' into amd-iommu/2.6.32 2009-09-03 16:32:00 +02:00
Pavel Vasilyev
6ac162d6c0 x86/gart: Do not select AGP for GART_IOMMU
There is no dependency from the gart code to the agp code.
And since a lot of systems today do not have agp anymore
remove this dependency from the kernel configuration.

Signed-off-by: Pavel Vasilyev <pavel@pavlinux.ru>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:20:55 +02:00
Joerg Roedel
4751a95134 x86/amd-iommu: Initialize passthrough mode when requested
This patch enables the passthrough mode for AMD IOMMU by
running the initialization function when iommu=pt is passed
on the kernel command line.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:46 +02:00
Joerg Roedel
a1ca331c8a x86/amd-iommu: Don't detach device from pt domain on driver unbind
This patch makes sure a device is not detached from the
passthrough domain when the device driver is unloaded or
does otherwise release the device.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:46 +02:00
Joerg Roedel
21129f786f x86/amd-iommu: Make sure a device is assigned in passthrough mode
When the IOMMU driver runs in passthrough mode it has to
make sure that every device not assigned to an IOMMU-API
domain must be put into the passthrough domain instead of
keeping it unassigned.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:45 +02:00
Joerg Roedel
eba6ac60ba x86/amd-iommu: Align locking between attach_device and detach_device
This patch makes the locking behavior between the functions
attach_device and __attach_device consistent with the
locking behavior between detach_device and __detach_device.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:44 +02:00
Joerg Roedel
aa879fff5d x86/amd-iommu: Fix device table write order
The V bit of the device table entry has to be set after the
rest of the entry is written to not confuse the hardware.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:43 +02:00
Joerg Roedel
0feae533dd x86/amd-iommu: Add passthrough mode initialization functions
When iommu=pt is passed on kernel command line the devices
should run untranslated. This requires the allocation of a
special domain for that purpose. This patch implements the
allocation and initialization path for iommu=pt.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:42 +02:00
Joerg Roedel
2650815fb0 x86/amd-iommu: Add core functions for pd allocation/freeing
This patch factors some code of protection domain allocation
into seperate functions. This way the logic can be used to
allocate the passthrough domain later. As a side effect this
patch fixes an unlikely domain id leakage bug.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:34 +02:00
Joerg Roedel
ac0101d396 x86/dma: Mark iommu_pass_through as __read_mostly
This variable is read most of the time. This patch marks it
as such. It also documents the meaning the this variable
while at it.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:13:46 +02:00
Joerg Roedel
abdc5eb3d6 x86/amd-iommu: Change iommu_map_page to support multiple page sizes
This patch adds a map_size parameter to the iommu_map_page
function which makes it generic enough to handle multiple
page sizes. This also requires a change to alloc_pte which
is also done in this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:11:17 +02:00
Joerg Roedel
a6b256b413 x86/amd-iommu: Support higher level PTEs in iommu_page_unmap
This patch changes fetch_pte and iommu_page_unmap to support
different page sizes too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:11:08 +02:00
Joerg Roedel
674d798a80 x86/amd-iommu: Remove old page table handling macros
These macros are not longer required. So remove them.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:49 +02:00
Joerg Roedel
8f7a017ce0 x86/amd-iommu: Use 2-level page tables for dma_ops domains
The driver now supports a dynamic number of levels for IO
page tables. This allows to reduce the number of levels for
dma_ops domains by one because a dma_ops domain has usually
an address space size between 128MB and 4G.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:49 +02:00
Joerg Roedel
bad1cac28a x86/amd-iommu: Remove bus_addr check in iommu_map_page
The driver now supports full 64 bit device address spaces.
So this check is not longer required.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:48 +02:00
Joerg Roedel
8c8c143cdc x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX
This change allows to remove these old macros later.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:47 +02:00
Joerg Roedel
8bc3e12742 x86/amd-iommu: Change alloc_pte to support 64 bit address space
This patch changes the alloc_pte function to be able to map
pages into the whole 64 bit address space supported by AMD
IOMMU hardware from the old limit of 2**39 bytes.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:46 +02:00
Joerg Roedel
50020fb632 x86/amd-iommu: Introduce increase_address_space function
This function will be used to increase the address space
size of a protection domain.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:46 +02:00
Joerg Roedel
04bfdd8406 x86/amd-iommu: Flush domains if address space size was increased
Thist patch introduces the update_domain function which
propagates the larger address space of a protection domain
to the device table and flushes all relevant DTEs and the
domain TLB.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:45 +02:00
Joerg Roedel
407d733e30 x86/amd-iommu: Introduce set_dte_entry function
This function factors out some logic of attach_device to a
seperate function. This new function will be used to update
device table entries when necessary.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:44 +02:00
Joerg Roedel
6a0dbcbe4e x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices
This patch adds a generic variant of
amd_iommu_flush_all_devices function which flushes only the
DTEs for a given protection domain.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:43 +02:00
Joerg Roedel
a6d41a4027 x86/amd-iommu: Use fetch_pte in amd_iommu_iova_to_phys
Don't reimplement the page table walker in this function.
Use the generic one.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:42 +02:00
Joerg Roedel
38a76eeeaf x86/amd-iommu: Use fetch_pte in iommu_unmap_page
Instead of reimplementing existing logic use fetch_pte to
walk the page table in iommu_unmap_page.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:41 +02:00
Joerg Roedel
9355a08186 x86/amd-iommu: Make fetch_pte aware of dynamic mapping levels
This patch changes the fetch_pte function in the AMD IOMMU
driver to support dynamic mapping levels.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:34 +02:00
Joerg Roedel
6a1eddd2f9 x86/amd-iommu: Reset command buffer if wait loop fails
Instead of a panic on an comletion wait loop failure, try to
recover from that event from resetting the command buffer.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:56:19 +02:00
Joerg Roedel
b26e81b871 x86/amd-iommu: Panic if IOMMU command buffer reset fails
To prevent the driver from doing recursive command buffer
resets, just panic when that recursion happens.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:35 +02:00
Joerg Roedel
a345b23b79 x86/amd-iommu: Reset command buffer on ILLEGAL_COMMAND_ERROR
On an ILLEGAL_COMMAND_ERROR the IOMMU stops executing
further commands. This patch changes the code to handle this
case better by resetting the command buffer in the IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:34 +02:00
Joerg Roedel
93f1cc67cf x86/amd-iommu: Add reset function for command buffers
This patch factors parts of the command buffer
initialization code into a seperate function which can be
used to reset the command buffer later.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:34 +02:00
Joerg Roedel
d586d7852c x86/amd-iommu: Add function to flush all DTEs on one IOMMU
This function flushes all DTE entries on one IOMMU for all
devices behind this IOMMU. This is required for command
buffer resetting later.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:23 +02:00
Joerg Roedel
e0faf54ee8 x86/amd-iommu: fix broken check in amd_iommu_flush_all_devices
The amd_iommu_pd_table is indexed by protection domain
number and not by device id. So this check is broken and
must be removed.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:56 +02:00
Joerg Roedel
ae908c22aa x86/amd-iommu: Remove redundant 'IOMMU' string
The 'IOMMU: ' prefix is not necessary because the
DUMP_printk macro already prints its own prefix.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:56 +02:00
Joerg Roedel
4c6f40d4e0 x86/amd-iommu: replace "AMD IOMMU" by "AMD-Vi"
This patch replaces the "AMD IOMMU" printk strings with the
official name for the hardware: "AMD-Vi".

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:56 +02:00
Joerg Roedel
f2430bd104 x86/amd-iommu: Remove some merge helper code
This patch removes some left-overs which where put into the code to
simplify merging code which also depends on changes in other trees.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:55 +02:00
Joerg Roedel
e394d72aa8 x86/amd-iommu: Introduce function for iommu-local domain flush
This patch introduces a function to flush all domain tlbs
for on one given IOMMU. This is required later to reset the
command buffer on one IOMMU.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:41:34 +02:00
Joerg Roedel
945b4ac44e x86/amd-iommu: Dump illegal command on ILLEGAL_COMMAND_ERROR
This patch adds code to dump the command which caused an
ILLEGAL_COMMAND_ERROR raised by the IOMMU hardware.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 14:28:04 +02:00
Joerg Roedel
e3e59876e8 x86/amd-iommu: Dump fault entry on DTE error
This patch adds code to dump the content of the device table
entry which caused an ILLEGAL_DEV_TABLE_ENTRY error from the
IOMMU hardware.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 14:17:08 +02:00
Ingo Molnar
f76bd108e5 Merge branch 'perfcounters/urgent' into perfcounters/core
Merge reason: We are going to modify a place modified by
              perfcounters/urgent.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 21:42:59 +02:00
David Howells
ee18d64c1f KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]
Add a keyctl to install a process's session keyring onto its parent.  This
replaces the parent's session keyring.  Because the COW credential code does
not permit one process to change another process's credentials directly, the
change is deferred until userspace next starts executing again.  Normally this
will be after a wait*() syscall.

To support this, three new security hooks have been provided:
cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
the blank security creds and key_session_to_parent() - which asks the LSM if
the process may replace its parent's session keyring.

The replacement may only happen if the process has the same ownership details
as its parent, and the process has LINK permission on the session keyring, and
the session keyring is owned by the process, and the LSM permits it.

Note that this requires alteration to each architecture's notify_resume path.
This has been done for all arches barring blackfin, m68k* and xtensa, all of
which need assembly alteration to support TIF_NOTIFY_RESUME.  This allows the
replacement to be performed at the point the parent process resumes userspace
execution.

This allows the userspace AFS pioctl emulation to fully emulate newpag() and
the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
alter the parent process's PAG membership.  However, since kAFS doesn't use
PAGs per se, but rather dumps the keys into the session keyring, the session
keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
the newpag flag.

This can be tested with the following program:

	#include <stdio.h>
	#include <stdlib.h>
	#include <keyutils.h>

	#define KEYCTL_SESSION_TO_PARENT	18

	#define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)

	int main(int argc, char **argv)
	{
		key_serial_t keyring, key;
		long ret;

		keyring = keyctl_join_session_keyring(argv[1]);
		OSERROR(keyring, "keyctl_join_session_keyring");

		key = add_key("user", "a", "b", 1, keyring);
		OSERROR(key, "add_key");

		ret = keyctl(KEYCTL_SESSION_TO_PARENT);
		OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");

		return 0;
	}

Compiled and linked with -lkeyutils, you should see something like:

	[dhowells@andromeda ~]$ keyctl show
	Session Keyring
	       -3 --alswrv   4043  4043  keyring: _ses
	355907932 --alswrv   4043    -1   \_ keyring: _uid.4043
	[dhowells@andromeda ~]$ /tmp/newpag
	[dhowells@andromeda ~]$ keyctl show
	Session Keyring
	       -3 --alswrv   4043  4043  keyring: _ses
	1055658746 --alswrv   4043  4043   \_ user: a
	[dhowells@andromeda ~]$ /tmp/newpag hello
	[dhowells@andromeda ~]$ keyctl show
	Session Keyring
	       -3 --alswrv   4043  4043  keyring: hello
	340417692 --alswrv   4043  4043   \_ user: a

Where the test program creates a new session keyring, sticks a user key named
'a' into it and then installs it on its parent.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02 21:29:22 +10:00
Ingo Molnar
936e894a97 Merge commit 'v2.6.31-rc8' into x86/txt
Conflicts:
	arch/x86/kernel/reboot.c
	security/Kconfig

Merge reason: resolve the conflicts, bump up from rc3 to rc8.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 08:17:56 +02:00
Huang Ying
ae4b688db2 x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h
This function measures whether the FPU/SSE state can be touched in
interrupt context. If the interrupted code is in user space or has no
valid FPU/SSE context (CR0.TS == 1), FPU/SSE state can be used in IRQ
or soft_irq context too.

This is used by AES-NI accelerated AES implementation and PCLMULQDQ
accelerated GHASH implementation.

v3:
 - Renamed to irq_fpu_usable to reflect the purpose of the function.

v2:
 - Renamed to irq_is_fpu_using to reflect the real situation.

Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-01 21:39:15 -07:00
Shane Wang
69575d3886 x86, intel_txt: clean up the impact on generic code, unbreak non-x86
Move tboot.h from asm to linux to fix the build errors of intel_txt
patch on non-X86 platforms. Remove the tboot code from generic code
init/main.c and kernel/cpu.c.

Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-01 18:25:07 -07:00
H. Peter Anvin
f6909f394c x86, msr: fix msr-reg.S compilation with gas 2.16.1
msr-reg.S used the :req option on a macro argument, which wasn't
supported by gas 2.16.1 (but apparently by some earlier versions of
gas, just to be confusing.)  It isn't necessary, so just remove it.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
2009-09-01 13:37:21 -07:00
Ingo Molnar
c931aaf0e1 Merge branch 'x86/paravirt' into x86/cpu
Conflicts:
	arch/x86/include/asm/paravirt.h

Manual merge:
	arch/x86/include/asm/paravirt_types.h

Merge reason: x86/paravirt conflicts non-trivially with x86/cpu,
              resolve it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-01 12:13:30 +02:00
Catalin Marinas
acde31dc46 kmemleak: Ignore the aperture memory hole on x86_64
This block is allocated with alloc_bootmem() and scanned by kmemleak but
the kernel direct mapping may no longer exist. This patch tells kmemleak
to ignore this memory hole. The dma32_bootmem_ptr in
dma32_reserve_bootmem() is also ignored.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-09-01 11:12:32 +01:00
H. Peter Anvin
ff55df53df x86, msr: Export the register-setting MSR functions via /dev/*/msr
Make it possible to access the all-register-setting/getting MSR
functions via the MSR driver.  This is implemented as an ioctl() on
the standard MSR device node.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <petkovbb@gmail.com>
2009-08-31 16:16:04 -07:00
H. Peter Anvin
8b956bf1f0 x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs()
Create _on_cpu helpers for {rw,wr}msr_safe_regs() analogously with the
other MSR functions.  This will be necessary to add support for these
to the MSR driver.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <petkovbb@gmail.com>
2009-08-31 16:15:57 -07:00
H. Peter Anvin
0cc0213e73 x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
For some reason, the _safe MSR functions returned -EFAULT, not -EIO.
However, the only user which cares about the return code as anything
other than a boolean is the MSR driver, which wants -EIO.  Change it
to -EIO across the board.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-08-31 15:15:23 -07:00
H. Peter Anvin
79c5dca361 x86, msr: CFI annotations, cleanups for msr-reg.S
Add CFI annotations for native_{rd,wr}msr_safe_regs().
Simplify the 64-bit implementation: we don't allow the upper half
registers to be set, and so we can use them to carry state across the
operation.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com>
2009-08-31 15:14:47 -07:00
H. Peter Anvin
709972b1f6 x86, asm: Make _ASM_EXTABLE() usable from assembly code
We have had this convenient macro _ASM_EXTABLE() to generate exception
table entry in inline assembly.  Make it also usable for pure
assembly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:30 -07:00
H. Peter Anvin
fe9b4e4e40 x86, asm: Add 32-bit versions of the combined CFI macros
Add 32-bit versions of the combined CFI macros, equivalent to the
64-bit ones except, obviously, operating on 32-bit stack words.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:29 -07:00
Borislav Petkov
6b0f43ddfa x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit
fbd8b1819e turns off the bit for
/proc/cpuinfo. However, a proper/full fix would be to additionally
turn off the bit in the CPUID output so that future callers get
correct CPU features info.

Do that by basically reversing what the BIOS wrongfully does at boot.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1251705011-18636-3-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:29 -07:00
Borislav Petkov
177fed1ee8 x86, msr: Rewrite AMD rd/wrmsr variants
Switch them to native_{rd,wr}msr_safe_regs and remove
pv_cpu_ops.read_msr_amd.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-2-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:28 -07:00
Borislav Petkov
132ec92f3f x86, msr: Add rd/wrmsr interfaces with preset registers
native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow
presetting of a subset of eight x86 GPRs before executing the rd/wrmsr
instructions. This is needed at least on AMD K8 for accessing an erratum
workaround MSR.

Originally based on an idea by H. Peter Anvin.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:26 -07:00
Michal Schmidt
23386d63bb x86: Detect stack protector for i386 builds on x86_64
Stack protector support was not detected when building with
ARCH=i386 on x86_64 systems:

  arch/x86/Makefile:80: stack protector enabled but no compiler support

The "-m32" argument needs to be passed to the detection script.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <20090829182718.10f566b1@leela>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--
2009-08-30 20:39:48 +02:00
Jan Beulich
47d25003cb x86: Fix earlyprintk=dbgp for machines without NX
Since parse_early_param() may (e.g. for earlyprintk=dbgp)
involve calls to page table manipulation functions (here
set_fixmap_nocache()), NX hardware support must be determined
before calling that function (so that __supported_pte_mask gets
properly set up).

But the call after parse_early_param() can also not go away, as
that will honor eventual command line specified disabling of
the NX functionality.

( This will then just result in whatever mappings got
  established during parse_early_param() having the NX bit set
  despite it being disabled on the command line, but I think
  that's tolerable).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <4A97F3BD02000078000121B9@vpn.id2.novell.com>
[ merged to x86/pat to resolve a conflict. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29 15:47:32 +02:00
Ingo Molnar
eebc57f73d Merge branch 'for-ingo' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 into x86/apic
Merge reason: the SFI (Simple Firmware Interface) feature in the ACPI
              tree needs this cleanup, pull it into the APIC branch as
	      well so that there's no interactions.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29 09:31:47 +02:00
Feng Tang
2a4ab640d3 ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n
Some IO-APIC routines are ACPI specific now, but need to
be exposed when CONFIG_ACPI=n for the benefit of SFI.

Remove #ifdef ACPI around these routines:

io_apic_get_unique_id(int ioapic, int apic_id);
io_apic_get_version(int ioapic);
io_apic_get_redir_entries(int ioapic);

Move these routines from ACPI-specific boot.c to io_apic.c:

uniq_ioapic_id(u8 id)
mp_find_ioapic()
mp_find_ioapic_pin()
mp_register_ioapic()

Also, since uniq_ioapic_id() is now no longer static,
re-name it to io_apic_unique_id() for consistency
with the other public io_apic routines.

For simplicity, do not #ifdef the resulting code ACPI || SFI,
thought that could be done in the future if it is important
to optimize the !ACPI !SFI IO-APIC x86 kernel for size.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2009-08-28 19:57:27 -04:00
H. Peter Anvin
b855192c08 Merge branch 'x86/urgent' into x86/pat
Reason: Change to is_new_memtype_allowed() in x86/urgent

Resolved semantic conflicts in:

	 arch/x86/mm/pat.c
	 arch/x86/mm/ioremap.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 17:24:28 -07:00
Venkatesh Pallipadi
d886c73cd4 x86, pat: Sanity check remap_pfn_range for RAM region
Add sanity check for remap_pfn_range of RAM regions using
lookup_memtype(). Previously, we did not have anyway to get the type of
RAM memory regions as they were tracked using a single bit in
page_struct (WB, nonWB). Now we can get the actual type from page struct
(WB, WC, UC_MINUS) and make sure the requester gets that type.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:35 -07:00
Venkatesh Pallipadi
1087637616 x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
Lookup the reserved memtype during vm_insert_pfn and use that memtype
for the new mapping. This takes care or handling of vm_insert_pfn()
interface in track_pfn_vma*/untrack_pfn_vma.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:32 -07:00
Venkatesh Pallipadi
637b86e75f x86, pat: Add lookup_memtype to get the current memtype of a paddr
Add a new routine lookup_memtype() to get the current memtype based on
the PAT reserves and frees.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:28 -07:00
Venkatesh Pallipadi
f584174096 x86, pat: Use page flags to track memtypes of RAM pages
Change reserve_ram_pages_type and free_ram_pages_type to use 2 page
flags to track UC_MINUS, WC, WB and default types. Previous RAM tracking
just tracked WB or NonWB, which was not complete and did not allow
tracking of RAM fully and there was no way to get the actual type
reserved by looking at the page flags.

We use the memtype_lock spinlock for atomicity in dealing with
memtype tracking in struct page.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:24 -07:00
Venkatesh Pallipadi
46cf98cdae x86, pat: Generalize the use of page flag PG_uncached
Only IA64 was using PG_uncached as of now. We now intend to use this bit
in x86 as well, to keep track of memory type of those addresses that
have page struct for them. So, generalize the use of that bit across
ia64 and x86.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:22 -07:00
Venkatesh Pallipadi
335ef896d4 x86, pat: Add rbtree to do quick lookup in memtype tracking
PAT memtype tracking uses a linear link list to keep track of IO
(non-RAM) regions and their memtypes. The code used a last_accessed
pointer as a cache to speedup the lookup. As per discussions with
H. Peter Anvin a while back, having a rbtree here will avoid bad
performances in pathological cases where we may end up with huge
linked list. This may not add any noticable performance speedup
in normal case as the number of entires in PAT memtype list tend
to be ~20-30 range. The patch removes the "cached_entry" logic
as with rbtree we have more generic way of speeding up the lookup.

With this patch, we use rbtree to do the quick lookup. We still use
linked list as the memtype range tracked can be of different sizes
and can overlap in different ways. We also keep track of usage counts
with linked list.

Example:
Multiple ioremaps with different sizes
uncached-minus @ 0xfffff00000-0xfffff04000
uncached-minus @ 0xfffff02000-0xfffff03000

And one userlevel mmap and the thread forks a new process
uncached-minus @ 0xbf453000-0xbf454000
uncached-minus @ 0xbf453000-0xbf454000

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:19 -07:00
Venkatesh Pallipadi
9e36fda0b3 x86, pat: Add PAT reserve free to io_mapping* APIs
io_mapping_* interfaces were added, mainly for graphics drivers.
Make this interface go through the PAT reserve/free, instead of
hardcoding WC mapping. This makes sure that there are no
aliases due to unconditional WC setting.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:16 -07:00
Venkatesh Pallipadi
9fd126bc74 x86, pat: New i/f for driver to request memtype for IO regions
Add new routines to request memtype for IO regions. This will currently
be a backend for io_mapping_* routines. But, it can also be made available
to drivers directly in future, in case it is needed.

reserve interface reserves the memory, makes sure we have a compatible
memory type available and keeps the identity map in sync when needed.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:10 -07:00
Venkatesh Pallipadi
279e669b3f x86, pat: ioremap to follow same PAT restrictions as other PAT users
ioremap has this hard-coded check for new type and requested type. That
check differs from other PAT users like /dev/mem mmap, remap_pfn_range
in only one condition where requested type is UC_MINUS and new type
is WC. Under that condition, ioremap fails. But other PAT interfaces succeed
with a WC mapping.

Change to make ioremap be in sync with other PAT APIs and use the same
macro as others. Also changes the error print to KERN_ERR instead of
pr_debug.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:07 -07:00
Venkatesh Pallipadi
5fc517466d x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
Make reserve_memtype internally take care of pat disabled case and fallback
to default return values.

Remove the specific pat_disabled checks in track_* routines.

Change kernel_map_sync_memtype to sync identity map even when
pat_disabled.

This change ensures that, even for pat_disabled case, we take care of
keeping identity map in sync. Before this patch, in pat disabled case,
ioremap() keeps the identity maps in sync and other APIs like pci and
/dev/mem mmap don't, which is not a very consistent behavior.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:40:58 -07:00
Jason Baron
117226d158 tracing: Remove FTRACE_SYSCALL_MAX definitions
Remove the FTRACE_SYSCALL_MAX definitions now that we have converted the
syscall event tracing code to use NR_syscalls.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Josh Stone <jistone@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anwin <hpa@zytor.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <f2240cdc8f0b1ca7617390c8f5ec90ba2bd348cf.1251146513.git.jbaron@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 21:30:39 +02:00
Jason Baron
57421dbbdc tracing: Convert event tracing code to use NR_syscalls
Convert the syscalls event tracing code to use NR_syscalls, instead of
FTRACE_SYSCALL_MAX. NR_syscalls is standard accross most arches, and
reduces code confusion/complexity.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Josh Stone <jistone@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anwin <hpa@zytor.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <9b4f1a84ecae57cc6599412772efa36f0d2b815b.1251146513.git.jbaron@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 21:30:02 +02:00
Jason Baron
a5a2f8e2ac tracing: Define NR_syscalls for x86_64
Express the available number of syscalls in a standard way by defining
NR_syscalls.

The common way to define it is to place its definition in asm/unistd.h
However, the number of syscalls is defined using __NR_syscall_max in
x86-64 after building a dynamic header file "asm-offsets.h"

The source file that generates this header, asm-offsets-64.c includes
unistd.h, then if we want to express NR_syscalls from __NR_syscall_max
in unistd.h only after generating the dynamic header file, we need a
watchguard.

If unistd.h is included from asm-offsets-64.c, then we are generating
asm-offset.h which defines __NR_syscall_max. At this time, we don't
want to (we can't) define NR_syscalls, then we do nothing.
Otherwise we define NR_syscalls because we know asm-offsets.h has
been generated.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Josh Stone <jistone@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anwin <hpa@zytor.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <20090826160910.GB2658@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 21:29:58 +02:00
Jason Baron
dd86dda24c tracing: Define NR_syscalls for x86 (32)
Add a NR_syscalls #define for x86. This is used in the syscall events
tracing code. Todo: make it dynamic like x86_64.

NR_syscalls is the usual name used to determine the number of syscalls
supported by the current arch. We want to unify the use of this number
across archs that support the syscall tracing. This also prepare to move
some of the arch code to core code in the syscall tracing area.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Josh Stone <jistone@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anwin <hpa@zytor.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <0f33c0f96d198fccc3ddd9ff7f5334ff5cb42706.1251146513.git.jbaron@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 21:29:55 +02:00
Cyrill Gorcunov
d3a247bfb2 x86, apic: Slim down stack usage in early_init_lapic_mapping()
As far as I see there is no external poking of mp_lapic_addr in
this procedure which could lead to unpredited changes and
require local storage unit for it. Lets use it plain forward.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090826171324.GC4548@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26 20:26:23 +02:00
Yinghai Lu
295594e9cf x86: Fix vSMP boot crash
2.6.31-rc7 does not boot on vSMP systems:

[    8.501108] CPU31: Thermal monitoring enabled (TM1)
[    8.501127] CPU 31 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
[    8.650254] CPU31: Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz stepping 04
[    8.710324] Brought up 32 CPUs
[    8.713916] Total of 32 processors activated (162314.96 BogoMIPS).
[    8.721489] ERROR: parent span is not a superset of domain->span
[    8.727686] ERROR: domain->groups does not contain CPU0
[    8.733091] ERROR: groups don't span domain->span
[    8.737975] ERROR: domain->cpu_power not set
[    8.742416]

Ravikiran Thirumalai bisected it to:

| commit 2759c3287d
| x86: don't call read_apic_id if !cpu_has_apic

The problem is that on vSMP systems the CPUID derived
initial-APICIDs are overlapping - so we need to fall
back on hard_smp_processor_id() which reads the local
APIC.

Both come from the hardware (influenced by firmware
though) so it's a tough call which one to trust.

Doing the quirk expresses the vSMP property properly
and also does not affect other systems, so we go for
this solution instead of a revert.

Reported-and-Tested-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shai Fultheim <shai@scalex86.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4A944D3C.5030100@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26 10:13:17 +02:00
Cyrill Gorcunov
5051fd6977 x86, e820: Guard against array overflowed in __e820_add_region()
Better to be paranoid against unpredicted nr_map modifications.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090824175551.146070377@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26 08:17:47 +02:00
Cyrill Gorcunov
ffc438366c x86, ioapic: Get rid of needless check and simplify ioapic_setup_resources()
alloc_bootmem() already panics on allocation failure. There is
no need to check the result.

Also there is a way to unbind global variable from its body and
use it as a parameter which allow us to simplify
ioapic_init_mappings as well -- "for" cycle already uses
nr_ioapics as a conditional variable and there is no need to
check if ioapic_setup_resources was returning NULL again.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090824175551.493629148@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26 08:16:38 +02:00
Cyrill Gorcunov
8f3e1df48b x86, ioapic: Define IO_APIC_DEFAULT_PHYS_BASE constant
We already have APIC_DEFAULT_PHYS_BASE so just to be
consistent.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090824175550.927946757@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26 08:16:38 +02:00
H. Peter Anvin
7adb4df410 x86, xen: Initialize cx to suppress warning
Initialize cx before calling xen_cpuid(), in order to suppress the
"may be used uninitialized in this function" warning.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
2009-08-25 21:10:32 -07:00
Jeremy Fitzhardinge
d560bc6157 x86, xen: Suppress WP test on Xen
Xen always runs on CPUs which properly support WP enforcement in
privileged mode, so there's no need to test for it.

This also works around a crash reported by Arnd Hannemann, though I
think its just a band-aid for that case.

Reported-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-25 21:10:32 -07:00
H. Peter Anvin
ab94fcf528 x86: allow "=rm" in native_save_fl()
This is a partial revert of f1f029c7bf.

"=rm" is allowed in this context, because "pop" is explicitly defined
to adjust the stack pointer *before* it evaluates its effective
address, if it has one.  Thus, we do end up writing to the correct
address even if we use an on-stack memory argument.

The original reporter for f1f029c7bf was
apparently using a broken x86 simulator.

[ Impact: performance ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Gabe Black <spamforgabe@umich.edu>
2009-08-25 16:47:16 -07:00
Josh Stone
1c569f0264 tracing: Create generic syscall TRACE_EVENTs
This converts the syscall_enter/exit tracepoints into TRACE_EVENTs, so
you can have generic ftrace events that capture all system calls with
arguments and return values.  These generic events are also renamed to
sys_enter/exit, so they're more closely aligned to the specific
sys_enter_foo events.

Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <1251150194-1713-5-git-send-email-jistone@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 00:41:48 +02:00
H. Peter Anvin
e8a2eb47e6 Merge commit 'origin/x86/urgent' into x86/asm 2009-08-25 15:40:29 -07:00
Josh Stone
9741987586 tracing: Move tracepoint callbacks from declaration to definition
It's not strictly correct for the tracepoint reg/unreg callbacks to
occur when a client is hooking up, because the actual tracepoint may not
be present yet.  This happens to be fine for syscall, since that's in
the core kernel, but it would cause problems for tracepoints defined in
a module that hasn't been loaded yet.  It also means the reg/unreg has
to be EXPORTed for any modules to use the tracepoint (as in SystemTap).

This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces
DEFINE_TRACE_FN which stores the callbacks in struct tracepoint.  The
callbacks are used now when the active state of the tracepoint changes
in set_tracepoint & disable_tracepoint.

This also introduces TRACE_EVENT_FN, so ftrace events can also provide
registration callbacks if needed.

Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <1251150194-1713-4-git-send-email-jistone@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 00:36:41 +02:00
Josh Stone
6670000119 tracing: Rename FTRACE_SYSCALLS for tracepoints
s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g
s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g

The syscall enter/exit tracing is no longer specific to just ftrace, so
they now have names that reflect their tie to tracepoints instead.

Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <1251150194-1713-2-git-send-email-jistone@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 00:17:35 +02:00
Linus Torvalds
44afa9a4b8 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clockevent: Prevent dead lock on clockevents_lock
  timers: Drop write permission on /proc/timer_list
2009-08-25 11:24:04 -07:00
Linus Torvalds
9f459fadbb Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix build with older binutils and consolidate linker script
  x86: Fix an incorrect argument of reserve_bootmem()
  x86: add vmlinux.lds to targets in arch/x86/boot/compressed/Makefile
  xen: rearrange things to fix stackprotector
  x86: make sure load_percpu_segment has no stackprotector
  i386: Fix section mismatches for init code with !HOTPLUG_CPU
  x86, pat: Allow ISA memory range uncacheable mapping requests
2009-08-25 11:23:25 -07:00
Roel Kluin
005155b1f6 x86: Fix x86_model test in es7000_apic_is_cluster()
For the x86_model to be greater than 6 or less than 12 is
logically always true.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-25 15:58:12 +02:00
Jan Beulich
c62e43202e x86: Fix build with older binutils and consolidate linker script
binutils prior to 2.17 can't deal with the currently possible
situation of a new segment following the per-CPU segment, but
that new segment being empty - objcopy misplaces the .bss (and
perhaps also the .brk) sections outside of any segment.

However, the current ordering of sections really just appears
to be the effect of cumulative unrelated changes; re-ordering
things allows to easily guarantee that the segment following
the per-CPU one is non-empty, and at once eliminates the need
for the bogus data.init2 segment.

Once touching this code, also use the various data section
helper macros from include/asm-generic/vmlinux.lds.h.

-v2: fix !SMP builds.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: <sam@ravnborg.org>
LKML-Reference: <4A94085D02000078000119A5@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-25 15:54:16 +02:00
Amerigo Wang
a6a06f7b57 x86: Fix an incorrect argument of reserve_bootmem()
This line looks suspicious, because if this is true, then the
'flags' parameter of function reserve_bootmem_generic() will be
unused when !CONFIG_NUMA. I don't think this is what we want.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-24 20:22:55 +02:00
Alexey Dobriyan
10f02d1168 x86: uv: Clean up uv_ptc_init(), use proc_create()
create_proc_entry() is getting duhprecated.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: cpw@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-24 12:26:46 +02:00
Ingo Molnar
5f9ece0240 Merge commit 'v2.6.31-rc7' into x86/cleanups
Merge reason: we were on -rc1 before - go up to -rc7

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-24 12:25:54 +02:00
Tobias Doerffel
366d19e181 x86: add specific support for Intel Atom architecture
Add another option when selecting CPU family so the kernel can be
optimized for Intel Atom CPUs. If GCC supports tuning options for
Intel Atom they will be used.

Signed-off-by: Tobias Doerffel <tobias.doerffel@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1251018457-19157-1-git-send-email-tobias.doerffel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 11:20:02 +02:00
Ingo Molnar
8a517c514d Merge commit 'v2.6.31-rc7' into x86/cpu 2009-08-23 11:18:47 +02:00
Rafael J. Wysocki
8400146d0d Merge branch 'master' into for-linus 2009-08-23 00:03:00 +02:00
H. Peter Anvin
5400743db5 x86, mtrr: make mtrr_aps_delayed_init static bool
mtr_aps_delayed_init was declared u32 and made global, but it only
ever takes boolean values and is only ever used in
arch/x86/kernel/cpu/mtrr/main.c.  Declare it "static bool" and remove
external references.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2009-08-21 17:00:02 -07:00
Suresh Siddha
d0af9eed5a x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
the need for synchronizing the logical cpu's while initializing/updating
MTRR.

Currently Linux kernel does the synchronization of all cpu's only when
a single MTRR register is programmed/updated. During an AP online
(during boot/cpu-online/resume)  where we initialize all the MTRR/PAT registers,
we don't follow this synchronization algorithm.

This can lead to scenarios where during a dynamic cpu online, that logical cpu
is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
HT sibling continue to run (also with cache disabled because of cr0.cd=1
on its sibling).

Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
(because of some VMX performance optimizations) and the above scenario
(with one logical cpu doing VMX activity and another logical cpu coming online)
can result in system crash.

Fix the MTRR initialization by doing rendezvous of all the cpus. During
boot and resume, we delay the MTRR/PAT init for APs till all the
logical cpu's come online and the rendezvous process at the end of AP's bringup,
will initialize the MTRR/PAT for all AP's.

For dynamic single cpu online, we synchronize all the logical cpus and
do the MTRR/PAT init on the AP that is coming online.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-21 16:25:55 -07:00
Jan Beulich
8b5a10fc6f x86: properly annotate alternatives.c
Some of the NOPs tables aren't used on 64-bits, quite some code and
data is needed post-init for module loading only, and a couple of
functions aren't used outside that file (i.e. can be static, and don't
need to be exported).

The change to __INITDATA/__INITRODATA is needed to avoid an assembler
warning.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4A8BC8A00200007800010823@vpn.id2.novell.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-21 15:30:12 -07:00
Linus Torvalds
b04e6373d6 x86: don't call '->send_IPI_mask()' with an empty mask
As noted in 83d349f35e ("x86: don't send
an IPI to the empty set of CPU's"), some APIC's will be very unhappy
with an empty destination mask.  That commit added a WARN_ON() for that
case, and avoided the resulting problem, but didn't fix the underlying
reason for why those empty mask cases happened.

This fixes that, by checking the result of 'cpumask_andnot()' of the
current CPU actually has any other CPU's left in the set of CPU's to be
sent a TLB flush, and not calling down to the IPI code if the mask is
empty.

The reason this started happening at all is that we started passing just
the CPU mask pointers around in commit 4595f9620 ("x86: change
flush_tlb_others to take a const struct cpumask"), and when we did that,
the cpumask was no longer thread-local.

Before that commit, flush_tlb_mm() used to create it's own copy of
'mm->cpu_vm_mask' and pass that copy down to the low-level flush
routines after having tested that it was not empty.  But after changing
it to just pass down the CPU mask pointer, the lower level TLB flush
routines would now get a pointer to that 'mm->cpu_vm_mask', and that
could still change - and become empty - after the test due to other
CPU's having flushed their own TLB's.

See

	http://bugzilla.kernel.org/show_bug.cgi?id=13933

for details.

Tested-by: Thomas Björnell <thomas.bjornell@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-21 09:48:10 -07:00
Linus Torvalds
83d349f35e x86: don't send an IPI to the empty set of CPU's
The default_send_IPI_mask_logical() function uses the "flat" APIC mode
to send an IPI to a set of CPU's at once, but if that set happens to be
empty, some older local APIC's will apparently be rather unhappy.  So
just warn if a caller gives us an empty mask, and ignore it.

This fixes a regression in 2.6.30.x, due to commit 4595f9620 ("x86:
change flush_tlb_others to take a const struct cpumask"), documented
here:

	http://bugzilla.kernel.org/show_bug.cgi?id=13933

which causes a silent lock-up.  It only seems to happen on PPro, P2, P3
and Athlon XP cores.  Most developers sadly (or not so sadly, if you're
a developer..) have more modern CPU's.  Also, on x86-64 we don't use the
flat APIC mode, so it would never trigger there even if the APIC didn't
like sending an empty IPI mask.

Reported-by: Pavel Vilim <wylda@volny.cz>
Reported-and-tested-by: Thomas Björnell <thomas.bjornell@gmail.com>
Reported-and-tested-by: Martin Rogge <marogge@onlinehome.de>
Cc: Mike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-21 09:23:57 -07:00
Amerigo Wang
3e0e1e9c5a x86: Fix an incorrect argument of reserve_bootmem()
This line looks suspicious, because if this is true, then the
'flags' parameter of function reserve_bootmem_generic() will be
unused when !CONFIG_NUMA. I don't think this is what we want.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-21 16:40:31 +02:00
Xiao Guangrong
8126dec327 x86: Fix system crash when loading with "reservetop" parameter
The system will die if the kernel is booted with "reservetop"
parameter, in present code, parse "reservetop" parameter after
early_ioremap_init(), and some function still use
early_ioremap() after it.

The problem is, "reservetop" parameter can modify
'FIXADDR_TOP', then the virtual address got by early_ioremap()
is base on old 'FIXADDR_TOP', but the page mapping is base on
new 'FIXADDR_TOP', it will occur page fault, and the IDT is not
prepare yet, so, the system is dead.

So, put parse_early_param() in the front of
early_ioremap_init() in this patch.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: yinghai@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A8D402F.4080805@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-21 16:40:30 +02:00
Jan Beulich
fc0ce23506 x86: add vmlinux.lds to targets in arch/x86/boot/compressed/Makefile
The absence of vmlinux.lds here keeps .vmlinux.lds.cmd from being
included, which in turn leads to it and all its dependents always
getting rebuilt independent of whether they are already up-to-date.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4A8D84670200007800010D31@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-20 16:08:58 -07:00
Rafael J. Wysocki
39cf0518d8 Merge branch 'master' into for-linus 2009-08-20 20:24:33 +02:00
Ingo Molnar
cbcb340cb6 Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/urgent 2009-08-20 12:05:24 +02:00
Jeremy Fitzhardinge
ce2eef33d3 xen: rearrange things to fix stackprotector
Make sure the stack-protector segment registers are properly set up
before calling any functions which may have stack-protection compiled
into them.

[ Impact: prevent Xen early-boot crash when stack-protector is enabled ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-08-19 17:09:28 -07:00
Jeremy Fitzhardinge
5416c26635 x86: make sure load_percpu_segment has no stackprotector
load_percpu_segment() is used to set up the per-cpu segment registers,
which are also used for -fstack-protector.  Make sure that the
load_percpu_segment() function doesn't have stackprotector enabled.

[ Impact: allow percpu setup before calling stack-protected functions ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-08-19 17:09:21 -07:00
Suresh Siddha
f833bab87f clockevent: Prevent dead lock on clockevents_lock
Currently clockevents_notify() is called with interrupts enabled at
some places and interrupts disabled at some other places.

This results in a deadlock in this scenario.

cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
cpu C doing set_mtrr() which will try to rendezvous of all the cpus.

This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the spinlock and thus not
reaching the rendezvous point.

Fix the clockevents code so that clockevents_lock is taken with
interrupts disabled and thus avoid the above deadlock.

Also call lapic_timer_propagate_broadcast() on the destination cpu so
that we avoid calling smp_call_function() in the clockevents notifier
chain.

This issue left us wondering if we need to change the MTRR rendezvous
logic to use stop machine logic (instead of smp_call_function) or add
a check in spinlock debug code to see if there are other spinlocks
which gets taken under both interrupts enabled/disabled conditions.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: "Brown Len" <len.brown@intel.com>
LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-19 18:15:10 +02:00
Linus Torvalds
77f312a96d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: use the right flag for get_vm_area()
  percpu, sparc64: fix sparse possible cpu map handling
  init: set nr_cpu_ids before setup_per_cpu_areas()
2009-08-18 19:41:05 -07:00
Jan Beulich
78b89ecd73 i386: Fix section mismatches for init code with !HOTPLUG_CPU
Commit 0e83815be7 changed the
section the initial_code variable gets allocated in, in an
attempt to address a section conflict warning. This, however
created a new section conflict when building without
HOTPLUG_CPU. The apparently only (reasonable) way to address
this is to always use __REFDATA.

Once at it, also fix a second section mismatch when not using
HOTPLUG_CPU.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4A8AE7CD020000780001054B@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-18 17:52:35 +02:00
Suresh Siddha
1adcaafe74 x86, pat: Allow ISA memory range uncacheable mapping requests
Max Vozeler reported:
>  Bug 13877 -  bogl-term broken with CONFIG_X86_PAT=y, works with =n
>
>  strace of bogl-term:
>  814   mmap2(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0)
>				 = -1 EAGAIN (Resource temporarily unavailable)
>  814   write(2, "bogl: mmaping /dev/fb0: Resource temporarily unavailable\n",
>	       57) = 57

PAT code maps the ISA memory range as WB in the PAT attribute, so that
fixed range MTRR registers define the actual memory type (UC/WC/WT etc).

But the upper level is_new_memtype_allowed() API checks are failing,
as the request here is for UC and the return tracked type is WB (Tracked type is
WB as MTRR type for this legacy range potentially will be different for each
4k page).

Fix is_new_memtype_allowed() by always succeeding the ISA address range
checks, as the null PAT (WB) and def MTRR fixed range register settings
satisfy the memory type needs of the applications that map the ISA address
range.

Reported-and-Tested-by: Max Vozeler <xam@debian.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-17 14:12:44 -07:00
Yinghai Lu
b7f42ab2e2 x86, apic: Move dmar_table_init() out of enable_IR()
On an x2apic system, we got:

[    1.818072] ------------[ cut here ]------------
[    1.820376] WARNING: at kernel/lockdep.c:2461 lockdep_trace_alloc+0xa5/0xe9()
[    1.835282] Hardware name: ASSY,
[    1.839006] Modules linked in:
[    1.841253] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip-03926-g39aaa80-dirty #510
[    1.858056] Call Trace:
[    1.859913]  [<ffffffff810d13aa>] ? lockdep_trace_alloc+0xa5/0xe9
[    1.876270]  [<ffffffff81093f37>] warn_slowpath_common+0x8d/0xd0
[    1.879132]  [<ffffffff81093fa1>] warn_slowpath_null+0x27/0x3d
[    1.896823]  [<ffffffff810d13aa>] lockdep_trace_alloc+0xa5/0xe9
[    1.900659]  [<ffffffff810cf5a0>] ? lock_release_holdtime+0x2f/0x199
[    1.917188]  [<ffffffff81167a3c>] kmem_cache_alloc_notrace+0x42/0x111
[    1.922320]  [<ffffffff8106fe8c>] ? reserve_memtype+0x152/0x518
[    1.938137]  [<ffffffff8106f8b1>] ? pat_pagerange_is_ram+0x4a/0x91
[    1.941730]  [<ffffffff8106fe8c>] reserve_memtype+0x152/0x518
[    1.958115]  [<ffffffff8106ce62>] __ioremap_caller+0x1dd/0x30f
[    1.975507]  [<ffffffff81ce2c5c>] ? acpi_os_map_memory+0x2a/0x47
[    1.978987]  [<ffffffff8106d0fd>] ioremap_nocache+0x2a/0x40
[    2.031400]  [<ffffffff810d0364>] ? trace_hardirqs_off+0x20/0x36
[    2.036096]  [<ffffffff81ce2c5c>] acpi_os_map_memory+0x2a/0x47
[    2.046263]  [<ffffffff815cd642>] acpi_tb_verify_table+0x3d/0x85
[    2.050349]  [<ffffffff81d34af7>] ? _spin_unlock_irqrestore+0x50/0x76
[    2.067327]  [<ffffffff815ccad6>] acpi_get_table_with_size+0x64/0xd9
[    2.070860]  [<ffffffff81d34af7>] ? _spin_unlock_irqrestore+0x50/0x76
[    2.088000]  [<ffffffff825c88d5>] dmar_table_detect+0x33/0x70
[    2.092047]  [<ffffffff825c8a01>] dmar_table_init+0x43/0x428
[    2.106854]  [<ffffffff825a7537>] enable_IR+0x1c/0x8d
[    2.110256]  [<ffffffff825a7624>] enable_IR_x2apic+0x7c/0x19e
[    2.127139]  [<ffffffff825a4876>] native_smp_prepare_cpus+0x139/0x3b8
[    2.145175]  [<ffffffff8259678d>] kernel_init+0x71/0x1da
[    2.148913]  [<ffffffff8104305a>] child_rip+0xa/0x20
[    2.152349]  [<ffffffff810429fc>] ? restore_args+0x0/0x30
[    2.167931]  [<ffffffff8259671c>] ? kernel_init+0x0/0x1da
[    2.171671]  [<ffffffff81043050>] ? child_rip+0x0/0x20
[    2.187607] ---[ end trace a7919e7f17c0a725 ]---

Venkatesh Pallipadi said:

| Looks like the problem started with this commit
|
| commit ce69a78450
| Author: Gleb Natapov <gleb@redhat.com>
| Date:   Mon Jul 20 15:24:17 2009 +0300
|
| x86/apic: Enable x2APIC without interrupt remapping under KVM
|
| Before this commit, dmar_table_init() was getting called
| with interrupts enabled and after this commit, it is getting
| called with interrupts disabled.

so try to move out dmar_table_init out of that function.

Analyzed-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
LKML-Reference: <4A899F3C.2050104@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-17 20:22:56 +02:00
H. Peter Anvin
62a3207b8c x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE
On 32 bits, we can have CONFIG_ACPI_SLEEP set without implying
CONFIG_X86_TRAMPOLINE.  In that case, we simply do not need to mark
the trampoline as a MAC region.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Shane Wang <shane.wang@intel.com>
Cc: Joseph Cihula <joseph.cihula@intel.com>
2009-08-17 11:16:16 -07:00
Ingo Molnar
e412cd257e x86, mce: Don't initialize MCEs on unknown CPUs
An older test-box started hanging at the following point during
bootup:

 [    0.022996] Mount-cache hash table entries: 512
 [    0.024996] Initializing cgroup subsys debug
 [    0.025996] Initializing cgroup subsys cpuacct
 [    0.026995] Initializing cgroup subsys devices
 [    0.027995] Initializing cgroup subsys freezer
 [    0.028995] mce: CPU supports 5 MCE banks

I've bisected it down to commit 4efc0670 ("x86, mce: use 64bit
machine check code on 32bit"), which utilizes the MCE code on
32-bit systems too.

The problem is caused by this detail in my config:

  # CONFIG_CPU_SUP_INTEL is not set

This disables the quirks in mce_cpu_quirks() but still enables
MCE support - which then hangs due to the missing quirk
workaround needed on this CPU:

	if (c->x86 == 6 && c->x86_model < 0x1A && banks > 0)
		mce_banks[0].init = 0;

The safe solution is to not initialize MCEs if we dont know on
what CPU we are running (or if that CPU's support code got
disabled in the config).

Also be a bit more defensive on 32-bit systems: dont do a
boot-time dump of pending MCEs not just on the specific system
that we found a problem with (Pentium-M), but earlier ones as
well.

Now this problem is probably not common and disabling CPU
support is rare - but still being more defensive in something
we turned on for a wide range of CPUs is prudent.

Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: Message-ID: <4A88E3E4.40506@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-17 13:28:25 +02:00
Bartlomiej Zolnierkiewicz
c7f6fa4411 x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):

MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 1 MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-1 UNKNOWN Error
STATUS f200000000000195 MCGSTATUS 0

[ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error)
  and f200000000000115 (... READ Error).

  To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
  the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
  content of STATUS MSR before it is cleared during initialization. ]

Since the bogus MCE results in a kernel taint (which in turn disables
lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
by default ("mce=bootlog" boot parameter can be be used to get the old
behavior).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-17 10:17:02 +02:00
Leonardo Potenza
52459ab913 x86: Annotate section mismatch warnings in kernel/apic/x2apic_uv_x.c
The function uv_acpi_madt_oem_check() has been marked __init,
the struct apic_x2apic_uv_x has been marked __refdata.

The aim is to address the following section mismatch messages:

WARNING: arch/x86/kernel/apic/built-in.o(.data+0x1368): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

WARNING: arch/x86/kernel/built-in.o(.data+0x68e8): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

WARNING: arch/x86/built-in.o(.text+0x7b36f): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_ioremap()
The function uv_acpi_madt_oem_check() references
the function __init early_ioremap().
This is often because uv_acpi_madt_oem_check lacks a __init
annotation or the annotation of early_ioremap is wrong.

WARNING: arch/x86/built-in.o(.text+0x7b38d): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_iounmap()
The function uv_acpi_madt_oem_check() references
the function __init early_iounmap().
This is often because uv_acpi_madt_oem_check lacks a __init
annotation or the annotation of early_iounmap is wrong.

WARNING: arch/x86/built-in.o(.data+0x8668): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
LKML-Reference: <200908161855.48302.lpotenza@inwind.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-16 19:44:13 +02:00
Hugh Dickins
4e5c25d405 x86, mce: therm_throt: Don't log redundant normality
0d01f31439 "x86, mce: therm_throt
- change when we print messages" removed redundant
announcements of "Temperature/speed normal".

They're not worth logging and remove their accompanying
"Machine check events logged" messages as well from the
console.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
LKML-Reference: <Pine.LNX.4.64.0908161544100.7929@sister.anvils>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-16 17:25:41 +02:00
Rafael J. Wysocki
3e2bcad898 Merge branch 'master' into for-linus 2009-08-16 11:50:10 +02:00
Ingo Molnar
be750231ce Merge branch 'perfcounters/urgent' into perfcounters/core
Conflicts:
	kernel/perf_counter.c

Merge reason: update to latest upstream (-rc6) and resolve
              the conflict with urgent fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-15 12:06:12 +02:00
Cliff Wickman
3ef12c3c97 x86: Fix UV BAU destination subnode id
The SGI UV Broadcast Assist Unit is used to send TLB shootdown
messages to remote nodes of the system.  The header of the
message must contain the subnode id of the block in the
receiving hub that handles such messages.  It should always be
0x10, the id of the "LB" block.

It had previously been documented as a "must be zero" field.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <E1Mc1x7-0005Ce-6t@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-15 11:58:02 +02:00
H. Peter Anvin
58c41d2825 x86, intel_txt: Factor out the code for S3 setup
S3 sleep requires special setup in tboot.  However, the data
structures needed to do such setup are only available if
CONFIG_ACPI_SLEEP is enabled.  Abstract them out as much as possible,
so we can have a single tboot_setup_sleep() which either is a proper
implementation or a stub which simply calls BUG().

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Shane Wang <shane.wang@intel.com>
Cc: Joseph Cihula <joseph.cihula@intel.com>
2009-08-14 12:14:19 -07:00
Tejun Heo
e933a73f48 percpu: kill lpage first chunk allocator
With x86 converted to embedding allocator, lpage doesn't have any user
left.  Kill it along with cpa handling code.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
2009-08-14 15:00:53 +09:00
Tejun Heo
4518e6a0c0 x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
Embedding percpu first chunk allocator can now handle very sparse unit
mapping.  Use embedding allocator instead of lpage for 64bit NUMA.
This removes extra TLB pressure and the need to do complex and fragile
dancing when changing page attributes.

For 32bit, using very sparse unit mapping isn't a good idea because
the vmalloc space is very constrained.  32bit NUMA machines aren't
exactly the focus of optimization and it isn't very clear whether
lpage performs better than page.  Use page first chunk allocator for
32bit NUMAs.

As this leaves setup_pcpu_*() functions pretty much empty, fold them
into setup_per_cpu_areas().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>
2009-08-14 15:00:52 +09:00