kernel-ark

Author	SHA1	Message	Date
Andre Przywara	b586eb0253	KVM: SVM: Fix cross vendor migration issue in segment segment descriptor On AMD CPUs sometimes the DB bit in the stack segment descriptor is left as 1, although the whole segment has been made unusable. Clear it here to pass an Intel VMX entry check when cross vendor migrating. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:51 +03:00
Glauber Costa	9b5843ddd2	KVM: fix apic_debug instances Apparently nobody turned this on in a while... setting apic_debug to something compilable, generates some errors. This patch fixes it. Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:50 +03:00
Sheng Yang	522c68c441	KVM: Enable snooping control for supported hardware Memory aliases with different memory type is a problem for guest. For the guest without assigned device, the memory type of guest memory would always been the same as host(WB); but for the assigned device, some part of memory may be used as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of host memory type then be a potential issue. Snooping control can guarantee the cache correctness of memory go through the DMA engine of VT-d. [avi: fix build on ia64] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:50 +03:00
Sheng Yang	4b12f0de33	KVM: Replace get_mt_mask_shift with get_mt_mask Shadow_mt_mask is out of date, now it have only been used as a flag to indicate if TDP enabled. Get rid of it and use tdp_enabled instead. Also put memory type logical in kvm_x86_ops->get_mt_mask(). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:49 +03:00
Jan Blunck	9b62e5b10f	KVM: Wake up waitqueue before calling get_cpu() This moves the get_cpu() call down to be called after we wake up the waiters. Therefore the waitqueue locks can safely be rt mutex. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Sven-Thorsten Dietrich <sven@thebigcorporation.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:49 +03:00
Gleb Natapov	14d0bc1f7c	KVM: Get rid of get_irq() callback It just returns pending IRQ vector from the queue for VMX/SVM. Get IRQ directly from the queue before migration and put it back after. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:49 +03:00
Gleb Natapov	16d7a19117	KVM: Fix userspace IRQ chip migration Re-put pending IRQ vector into interrupt_bitmap before migration. Otherwise it will be lost if migration happens in the wrong time. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:48 +03:00
Gleb Natapov	95ba827313	KVM: SVM: Add NMI injection support Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:48 +03:00
Gleb Natapov	c4282df98a	KVM: Get rid of arch.interrupt_window_open & arch.nmi_window_open They are recalculated before each use anyway. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:48 +03:00
Gleb Natapov	0a5fff1923	KVM: Do not report TPR write to userspace if new value bigger or equal to a previous one. Saves many exits to userspace in a case of IRQ chip in userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:47 +03:00
Gleb Natapov	615d519305	KVM: sync_lapic_to_cr8() should always sync cr8 to V_TPR Even if IRQ chip is in userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:47 +03:00
Gleb Natapov	115666dfc7	KVM: Remove kvm_push_irq() No longer used. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:47 +03:00
Gleb Natapov	1d6ed0cb95	KVM: Remove inject_pending_vectors() callback It is the same as inject_pending_irq() for VMX/SVM now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:47 +03:00
Gleb Natapov	1cb948ae86	KVM: Remove exception_injected() callback. It always return false for VMX/SVM now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:46 +03:00
Gleb Natapov	9222be18f7	KVM: SVM: Coalesce userspace/kernel irqchip interrupt injection logic Start to use interrupt/exception queues like VMX does. This also fix the bug that if exit was caused by a guest internal exception access to IDT the exception was not reinjected. Use EVENTINJ to inject interrupts. Use VINT only for detecting when IRQ windows is open again. EVENTINJ ensures the interrupt is injected immediately and not delayed. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:46 +03:00
Gleb Natapov	5df5664647	KVM: Use kvm_arch_interrupt_allowed() instead of checking interrupt_window_open directly kvm_arch_interrupt_allowed() also checks IF so drop the check. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:46 +03:00
Gleb Natapov	1f21e79aac	KVM: VMX: Cleanup vmx_intr_assist() Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:45 +03:00
Gleb Natapov	863e8e658e	KVM: VMX: Consolidate userspace and kernel interrupt injection for VMX Use the same callback to inject irq/nmi events no matter what irqchip is in use. Only from VMX for now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:45 +03:00
Gleb Natapov	8061823a25	KVM: Make kvm_cpu_(has\|get)_interrupt() work for userspace irqchip too At the vector level, kernel and userspace irqchip are fairly similar. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:45 +03:00
Jan Kiszka	3438253926	KVM: MMU: Fix auditing code Fix build breakage of hpa lookup in audit_mappings_page. Moreover, make this function robust against shadow_notrap_nonpresent_pte entries. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:45 +03:00
Marcelo Tosatti	59839dfff5	KVM: x86: check for cr3 validity in ioctl_set_sregs Matt T. Yourst notes that kvm_arch_vcpu_ioctl_set_sregs lacks validity checking for the new cr3 value: "Userspace callers of KVM_SET_SREGS can pass a bogus value of cr3 to the kernel. This will trigger a NULL pointer access in gfn_to_rmap() when userspace next tries to call KVM_RUN on the affected VCPU and kvm attempts to activate the new non-existent page table root. This happens since kvm only validates that cr3 points to a valid guest physical memory page when code inside the guest sets cr3. However, kvm currently trusts the userspace caller (e.g. QEMU) on the host machine to always supply a valid page table root, rather than properly validating it along with the rest of the reloaded guest state." http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2687641&group_id=180599 Check for a valid cr3 address in kvm_arch_vcpu_ioctl_set_sregs, triple fault in case of failure. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:43 +03:00
Avi Kivity	463656c000	KVM: Replace kvmclock open-coded get_cpu_var() with the real thing Suggested by Ingo Molnar. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:42 +03:00
Gleb Natapov	8317c298ea	KVM: SVM: Skip instruction on a task switch only when appropriate If a task switch was initiated because off a task gate in IDT and IDT was accessed because of an external even the instruction should not be skipped. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:42 +03:00
Gleb Natapov	ba8afb6b0a	KVM: x86 emulator: Add new mode of instruction emulation: skip In the new mode instruction is decoded, but not executed. The EIP is moved to point after the instruction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:42 +03:00
Gleb Natapov	e637b8238a	KVM: x86 emulator: Decode soft interrupt instructions Do not emulate them yet. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:41 +03:00
Gleb Natapov	84ce66a686	KVM: x86 emulator: Completely decode in/out at decoding stage Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:41 +03:00
Gleb Natapov	341de7e372	KVM: x86 emulator: Add unsigned byte immediate decode Extend "Source operand type" opcode description field to 4 bites to accommodate new option. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:41 +03:00
Gleb Natapov	d53c4777b3	KVM: x86 emulator: Complete decoding of call near in decode stage Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:41 +03:00
Gleb Natapov	b2833e3cde	KVM: x86 emulator: Complete short/near jcc decoding in decode stage Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:40 +03:00
Gleb Natapov	782b877c80	KVM: x86 emulator: Complete ljmp decoding at decode stage Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:40 +03:00
Gleb Natapov	0654169e73	KVM: x86 emulator: Add lcall decoding No emulation yet. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:40 +03:00
Gleb Natapov	a5f868bd45	KVM: x86 emulator: Add decoding of 16bit second immediate argument Such as segment number in lcall/ljmp Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:40 +03:00
Marcelo Tosatti	c2d0ee46e6	KVM: MMU: remove global page optimization logic Complexity to fix it not worthwhile the gains, as discussed in http://article.gmane.org/gmane.comp.emulators.kvm.devel/28649. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:39 +03:00
Marcelo Tosatti	ede2ccc517	KVM: PIT: fix count read and mode 0 handling Commit 46ee278652f4cbd51013471b64c7897ba9bcd1b1 causes Solaris 10 to hang on boot. Assuming that PIT counter reads should return 0 for an expired timer is wrong: when it is active, the counter never stops (see comment on __kpit_elapsed). Also arm a one shot timer for mode 0. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:39 +03:00
Gleb Natapov	2d03319654	KVM: x86 emulator: fix call near emulation The length of pushed on to the stack return address depends on operand size not address size. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:38 +03:00
Sheng Yang	4c26b4cd6f	KVM: MMU: Discard reserved bits checking on PDE bit 7-8 1. It's related to a Linux kernel bug which fixed by Ingo on `07a66d7c53`. The original code exists for quite a long time, and it would convert a PDE for large page into a normal PDE. But it fail to fit normal PDE well. With the code before Ingo's fix, the kernel would fall reserved bit checking with bit 8 - the remaining global bit of PTE. So the kernel would receive a double-fault. 2. After discussion, we decide to discard PDE bit 7-8 reserved checking for now. For this marked as reserved in SDM, but didn't checked by the processor in fact... Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:38 +03:00
Gleb Natapov	64a7ec0668	KVM: Fix unneeded instruction skipping during task switching. There is no need to skip instruction if the reason for a task switch is a task gate in IDT and access to it is caused by an external even. The problem is currently solved only for VMX since there is no reliable way to skip an instruction in SVM. We should emulate it instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:38 +03:00
Gleb Natapov	b237ac37a1	KVM: Fix task switch back link handling. Back link is written to a wrong TSS now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:37 +03:00
Gleb Natapov	8843419048	KVM: VMX: Do not zero idt_vectoring_info in vmx_complete_interrupts(). We will need it later in task_switch(). Code in handle_exception() is dead. is_external_interrupt(vect_info) will always be false since idt_vectoring_info is zeroed in vmx_complete_interrupts(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:37 +03:00
Gleb Natapov	37b96e9880	KVM: VMX: Rewrite vmx_complete_interrupt()'s twisted maze of if() statements ...with a more straightforward switch(). Also fix a bug when NMI could be dropped on exit. Although this should never happen in practice, since NMIs can only be injected, never triggered internally by the guest like exceptions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:37 +03:00
Gleb Natapov	7b4a25cb29	KVM: VMX: Fix handling of a fault during NMI unblocked due to IRET Bit 12 is undefined in any of the following cases: If the VM exit sets the valid bit in the IDT-vectoring information field. If the VM exit is due to a double fault. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:36 +03:00
Dong, Eddie	20c466b561	KVM: Use rsvd_bits_mask in load_pdptrs() Also remove bit 5-6 from rsvd_bits_mask per latest SDM. Signed-off-by: Eddie Dong <Eddie.Dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:36 +03:00
Sheng Yang	93ba03c2e2	KVM: VMX: Fix feature testing The testing of feature is too early now, before vmcs_config complete initialization. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:36 +03:00
Sheng Yang	045471563d	KVM: VMX: Clean up Flex Priority related And clean paranthes on returns. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:36 +03:00
Wei Yongjun	7a6ce84c74	KVM: remove pointless conditional before kfree() in lapic initialization Remove pointless conditional before kfree(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:35 +03:00
Avi Kivity	9645bb56b3	KVM: MMU: Use different shadows when EFER.NXE changes A pte that is shadowed when the guest EFER.NXE=1 is not valid when EFER.NXE=0; if bit 63 is set, the pte should cause a fault, and since the shadow EFER always has NX enabled, this won't happen. Fix by using a different shadow page table for different EFER.NXE bits. This allows vcpus to run correctly with different values of EFER.NXE, and for transitions on this bit to be handled correctly without requiring a full flush. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:35 +03:00
Dong, Eddie	82725b20e2	KVM: MMU: Emulate #PF error code of reserved bits violation Detect, indicate, and propagate page faults where reserved bits are set. Take care to handle the different paging modes, each of which has different sets of reserved bits. [avi: fix pte reserved bits for efer.nxe=0] Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:35 +03:00
Eddie Dong	a8b876b1a4	KVM: MMU: Fix comment in page_fault() The original one is for the code before refactoring. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:34 +03:00
Sheng Yang	f9c617f611	KVM: VMX: Correct wrong vmcs field sizes EXIT_QUALIFICATION and GUEST_LINEAR_ADDRESS are natural width, not 64-bit. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:34 +03:00
Avi Kivity	7d433b9f94	KVM: VMX: Make flexpriority module parameter reflect hardware capability If the hardware does not support flexpriority, zero the module parameter. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:34 +03:00
Gleb Natapov	78646121e9	KVM: Fix interrupt unhalting a vcpu when it shouldn't kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:33 +03:00
Gleb Natapov	09cec75488	KVM: Timer event should not unconditionally unhalt vcpu. Currently timer events are processed before entering guest mode. Move it to main vcpu event loop since timer events should be processed even while vcpu is halted. Timer may cause interrupt/nmi to be injected and only then vcpu will be unhalted. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:33 +03:00
Avi Kivity	089d034e0c	KVM: VMX: Fold vm_need_ept() into callers Trivial. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:32 +03:00
Avi Kivity	575ff2dcb2	KVM: VMX: Zero ept module parameter if ept is not present Allows reading back hardware capability. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:32 +03:00
Avi Kivity	919818abc2	KVM: VMX: Zero the vpid module parameter if vpid is not supported This allows reading back how the hardware is configured. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:32 +03:00
Avi Kivity	4462d21a61	KVM: VMX: Annotate module parameters as __read_mostly Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:32 +03:00
Avi Kivity	736caefe15	KVM: VMX: Simplify module parameter names Instead of 'enable_vpid=1', use a simple 'vpid=1'. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Avi Kivity	6062d012ed	KVM: VMX: Rename kvm_handle_exit() to vmx_handle_exit() It is a static vmx-specific function. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Avi Kivity	c1f8bc04c6	KVM: VMX: Make module parameters readable Useful to see how the module was loaded. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Gleb Natapov	fe4c7b1914	KVM: reuse (pop\|push)_irq from svm.c in vmx.c The prioritized bit vector manipulation functions are useful in both vmx and svm. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Gleb Natapov	61c50edfcd	KVM: SVM: Remove duplicate code in svm_do_inject_vector() svm_do_inject_vector() reimplements pop_irq(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:30 +03:00
Amit Shah	7fe29e0faa	KVM: x86: Ignore reads to EVNTSEL MSRs We ignore writes to the performance counters and performance event selector registers already. Kaspersky antivirus reads the eventsel MSR causing it to crash with the current behaviour. Return 0 as data when the eventsel registers are read to stop the crash. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:30 +03:00
Gleb Natapov	f00be0cae4	KVM: MMU: do not free active mmu pages in free_mmu_pages() free_mmu_pages() should only undo what alloc_mmu_pages() does. Free mmu pages from the generic VM destruction function, kvm_destroy_vm(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:30 +03:00
Sheng Yang	e56d532f20	KVM: Device assignment framework rework After discussion with Marcelo, we decided to rework device assignment framework together. The old problems are kernel logic is unnecessary complex. So Marcelo suggest to split it into a more elegant way: 1. Split host IRQ assign and guest IRQ assign. And userspace determine the combination. Also discard msi2intx parameter, userspace can specific KVM_DEV_IRQ_HOST_MSI \| KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to enable MSI to INTx convertion. 2. Split assign IRQ and deassign IRQ. Import two new ioctls: KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ. This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the old interface). [avi: replace homemade bitcount() by hweight_long()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:29 +03:00
Hannes Eder	386eb6e8b3	KVM: make 'lapic_timer_ops' and 'kpit_ops' static Fix this sparse warnings: arch/x86/kvm/lapic.c:916:22: warning: symbol 'lapic_timer_ops' was not declared. Should it be static? arch/x86/kvm/i8254.c:268:22: warning: symbol 'kpit_ops' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:29 +03:00
Gleb Natapov	58c2dde17d	KVM: APIC: get rid of deliver_bitmask Deliver interrupt during destination matching loop. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:27 +03:00
Gleb Natapov	e1035715ef	KVM: change the way how lowest priority vcpu is calculated The new way does not require additional loop over vcpus to calculate the one with lowest priority as one is chosen during delivery bitmap construction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:27 +03:00
Gleb Natapov	343f94fe4d	KVM: consolidate ioapic/ipi interrupt delivery logic Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in apic_send_ipi() to figure out ipi destination instead of reimplementing the logic. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:27 +03:00
Gleb Natapov	6da7e3f643	KVM: APIC: kvm_apic_set_irq deliver all kinds of interrupts Get rid of ioapic_inj_irq() and ioapic_inj_nmi() functions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:26 +03:00
Joerg Roedel	f5a1e9f895	KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr There is no reason to update the shadow pte here because the guest pte is only changed to dirty state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:26 +03:00
Marcelo Tosatti	d3c7b77d1a	KVM: unify part of generic timer handling Hide the internals of vcpu awakening / injection from the in-kernel emulated timers. This makes future changes in this logic easier and decreases the distance to more generic timer handling. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:25 +03:00
Marcelo Tosatti	fd66842370	KVM: PIT: remove usage of count_load_time for channel 0 We can infer elapsed time from hrtimer_expires_remaining. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:25 +03:00
Marcelo Tosatti	5a05d54554	KVM: PIT: remove unused scheduled variable Unused. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:25 +03:00
Marcelo Tosatti	a90ede7b17	KVM: x86: paravirt skip pit-through-ioapic boot check Skip the test which checks if the PIT is properly routed when using the IOAPIC, aimed at buggy hardware. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:24 +03:00
Matt T. Yourst	2dea4c84bc	KVM: x86: silence preempt warning on kvm_write_guest_time This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64) with PREEMPT enabled. We're getting syslog warnings like this many (but not all) times qemu tells KVM to run the VCPU: BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-x86/28938 caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit Call Trace: debug_smp_processor_id+0xf7/0x100 kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] ? __wake_up+0x4e/0x70 ? wake_futex+0x27/0x40 kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm] enqueue_hrtimer+0x8a/0x110 _spin_unlock_irqrestore+0x27/0x50 vfs_ioctl+0x31/0xa0 do_vfs_ioctl+0x74/0x480 sys_futex+0xb4/0x140 sys_ioctl+0x99/0xa0 system_call_fastpath+0x16/0x1b As it turns out, the call trace is messed up due to gcc's inlining, but I isolated the problem anyway: kvm_write_guest_time() is being used in a non-thread-safe manner on preemptable kernels. Basically kvm_write_guest_time()'s body needs to be surrounded by preempt_disable() and preempt_enable(), since the kernel won't let us query any per-CPU data (indirectly using smp_processor_id()) without preemption disabled. The attached patch fixes this issue by disabling preemption inside kvm_write_guest_time(). [marcelo: surround only __get_cpu_var calls since the warning is harmless] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:24 +03:00
Sheng Yang	d510d6cc65	KVM: Enable MSI-X for KVM assigned device This patch finally enable MSI-X. What we need for MSI-X: 1. Intercept one page in MMIO region of device. So that we can get guest desired MSI-X table and set up the real one. Now this have been done by guest, and transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY. 2. Information for incoming interrupt. Now one device can have more than one interrupt, and they are all handled by one workqueue structure. So we need to identify them. The previous patch enable gsi_msg_pending_bitmap get this done. 3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X message address/data. We used same entry number for the host and guest here, so that it's easy to find the correlated guest gsi. What we lack for now: 1. The PCI spec said nothing can existed with MSI-X table in the same page of MMIO region, except pending bits. The patch ignore pending bits as the first step (so they are always 0 - no pending). 2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch didn't support this, and Linux also don't work in this way. 3. The patch didn't implement MSI-X mask all and mask single entry. I would implement the former in driver/pci/msi.c later. And for single entry, userspace should have reposibility to handle it. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:23 +03:00
Sheng Yang	bfd349d073	KVM: bit ops for deliver_bitmap It's also convenient when we extend KVM supported vcpu number in the future. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:22 +03:00
Sheng Yang	110c2faeba	KVM: Update intr delivery func to accept unsigned long* bitmap Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is increased. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:22 +03:00
Avi Kivity	5897297bc2	KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE Windows 2008 accesses this MSR often on context switch intensive workloads; since we run in guest context with the guest MSR value loaded (so swapgs can work correctly), we can simply disable interception of rdmsr/wrmsr for this MSR. A complication occurs since in legacy mode, we run with the host MSR value loaded. In this case we enable interception. This means we need two MSR bitmaps, one for legacy mode and one for long mode. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:21 +03:00
Avi Kivity	3e7c73e9b1	KVM: VMX: Don't use highmem pages for the msr and pio bitmaps Highmem pages are a pain, and saving three lowmem pages on i386 isn't worth the extra code. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:21 +03:00
Chuck Ebbert	0b8c3d5ab0	x86: Clear TS in irq_ts_save() when in an atomic section The dynamic FPU context allocation changes caused the padlock driver to generate the below warning. Fix it by masking TS when doing padlock encryption operations in an atomic section. This solves: BUG: sleeping function called from invalid context at mm/slub.c:1602 in_atomic(): 1, irqs_disabled(): 0, pid: 82, name: cryptomgr_test Pid: 82, comm: cryptomgr_test Not tainted 2.6.29.4-168.test7.fc11.x86_64 #1 Call Trace: [<ffffffff8103ff16>] __might_sleep+0x10b/0x110 [<ffffffff810cd3b2>] kmem_cache_alloc+0x37/0xf1 [<ffffffff81018505>] init_fpu+0x49/0x8a [<ffffffff81012a83>] math_state_restore+0x3e/0xbc [<ffffffff813ac6d0>] do_device_not_available+0x9/0xb [<ffffffff810123ab>] device_not_available+0x1b/0x20 [<ffffffffa001c066>] ? aes_crypt+0x66/0x74 [padlock_aes] [<ffffffff8119a51a>] ? blkcipher_walk_next+0x257/0x2e0 [<ffffffff8119a731>] ? blkcipher_walk_first+0x18e/0x19d [<ffffffffa001c1fe>] aes_encrypt+0x9d/0xe5 [padlock_aes] [<ffffffffa0027253>] crypt+0x6b/0x114 [xts] [<ffffffffa001c161>] ? aes_encrypt+0x0/0xe5 [padlock_aes] [<ffffffffa001c161>] ? aes_encrypt+0x0/0xe5 [padlock_aes] [<ffffffffa0027390>] encrypt+0x49/0x4b [xts] [<ffffffff81199acc>] async_encrypt+0x3c/0x3e [<ffffffff8119dafc>] test_skcipher+0x1da/0x658 [<ffffffff811979c3>] ? crypto_spawn_tfm+0x8e/0xb1 [<ffffffff8119672d>] ? __crypto_alloc_tfm+0x11b/0x15f [<ffffffff811979c3>] ? crypto_spawn_tfm+0x8e/0xb1 [<ffffffff81199dbe>] ? skcipher_geniv_init+0x2b/0x47 [<ffffffff8119a905>] ? async_chainiv_init+0x5c/0x61 [<ffffffff8119dfdd>] alg_test_skcipher+0x63/0x9b [<ffffffff8119e1bc>] alg_test+0x12d/0x175 [<ffffffff8119c488>] cryptomgr_test+0x38/0x54 [<ffffffff8119c450>] ? cryptomgr_test+0x0/0x54 [<ffffffff8105c6c9>] kthread+0x4d/0x78 [<ffffffff8101264a>] child_rip+0xa/0x20 [<ffffffff81011f67>] ? restore_args+0x0/0x30 [<ffffffff8105c67c>] ? kthread+0x0/0x78 [<ffffffff81012640>] ? child_rip+0x0/0x20 Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090609104050.50158cfe@dhcp-100-2-144.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-09 16:50:43 +02:00
Yong Wang	fecc8ac849	perf_counter, x86: Correct some event and umask values for Intel processors Correct some event and UMASK values according to Intel SDM, in the Nehalem and Atom tables. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090609131553.GA12489@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-09 16:50:07 +02:00
Andreas Herrmann	42937e81a8	x86: Detect use of extended APIC ID for AMD CPUs Booting a 32-bit kernel on Magny-Cours results in the following panic: ... Using APIC driver default ... Overriding APIC driver with bigsmp ... Getting VERSION: 80050010 Getting VERSION: 80050010 Getting ID: 10000000 Getting ID: ef000000 Getting LVT0: 700 Getting LVT1: 10000 Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (16 vs 0) Pid: 1, comm: swapper Not tainted 2.6.30-rcX #2 Call Trace: [<c05194da>] ? panic+0x38/0xd3 [<c0743102>] ? native_smp_prepare_cpus+0x259/0x31f [<c073b19d>] ? kernel_init+0x3e/0x141 [<c073b15f>] ? kernel_init+0x0/0x141 [<c020325f>] ? kernel_thread_helper+0x7/0x10 The reason is that default_get_apic_id handled extension of local APIC ID field just in case of XAPIC. Thus for this AMD CPU, default_get_apic_id() returns 0 and bigsmp_get_apic_id() returns 16 which leads to the respective kernel panic. This patch introduces a Linux specific feature flag to indicate support for extended APIC id (8 bits instead of 4 bits width) and sets the flag on AMD CPUs if applicable. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: <stable@kernel.org> LKML-Reference: <20090608135509.GA12431@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-09 15:28:46 +02:00
Yinghai Lu	eaa958402e	cpumask: alloc zeroed cpumask for static cpumask_var_ts These are defined as static cpumask_var_t so if MAXSMP is not used, they are cleared already. Avoid surprises when MAXSMP is enabled. Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-09 22:30:27 +09:30
Joerg Roedel	e9a22a13c7	amd-iommu: remove unnecessary "AMD IOMMU: " prefix That prefix is already included in the DUMP_printk macro. So there is no need to repeat it in the format string. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 12:01:58 +02:00
Joerg Roedel	71ff3bca2f	amd-iommu: detach device explicitly before attaching it to a new domain This fixes a bug with a device that could not be assigned to a KVM guest because it is still assigned to a dma_ops protection domain. [chrisw: simply remove WARN_ON(), will always fire since dev->driver will be pci-sub] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 11:14:14 +02:00
Joerg Roedel	29150078d7	amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling Handling this event causes device assignment in KVM to fail because the device gets re-attached as soon as the pci-stub registers as the driver for the device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 10:54:18 +02:00
Joerg Roedel	d2dd01de99	Merge commit 'tip/core/iommu' into amd-iommu/fixes	2009-06-09 10:50:57 +02:00
Thomas Gleixner	820a644211	perf_counter, x86: Clean up hw_cache_event ids copies Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 23:10:42 +02:00
Thomas Gleixner	f86748e91a	perf_counter, x86: Implement generalized cache event types, add AMD support Fill in amd_hw_cache_event_id[] with the AMD CPU specific events, for family 0x0f, 0x10 and 0x11. There's apparently no distinction between load and store events, so we only fill in the load events. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 23:10:37 +02:00
Andreas Herrmann	c9690998ef	x86: memtest: remove 64-bit division Using gcc 3.3.5 a "make allmodconfig" + "CONFIG_KVM=n" triggers a build error: arch/x86/mm/built-in.o(.init.text+0x43f7): In function `__change_page_attr': arch/x86/mm/pageattr.c:114: undefined reference to `__udivdi3' make: *** [.tmp_vmlinux1] Error 1 The culprit turned out to be a division in arch/x86/mm/memtest.c For more info see this thread: http://marc.info/?l=linux-kernel&m=124416232620683 The patch entirely removes the division that caused the build error. [ Impact: build fix with certain GCC versions ] Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: xiyou.wangcong@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <20090608170939.GB12431@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 19:18:25 +02:00
Jack Steiner	c4ed3f04ba	x86, UV: Fix macros for multiple coherency domains Fix bug in the SGI UV macros that support systems with multiple coherency domains. The macros used for referencing global MMR (chipset registers) are failing to correctly "or" the NASID (node identifier) bits that reside above M+N. These high bits are supplied automatically by the chipset for memory accesses coming from the processor socket. However, the bits must be present for references to the special global MMR space used to map chipset registers. (See uv_hub.h for more details ...) The bug results in references to invalid/incorrect nodes. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: <stable@kernel.org> LKML-Reference: <20090608154405.GA16395@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 18:57:47 +02:00
Ingo Molnar	1123e3ad73	perf_counter: Clean up x86 boot messages Standardize and tidy up all the messages we print during perfcounter initialization. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 12:29:30 +02:00
Thomas Gleixner	ad68922061	perf_counter, x86: Implement generalized cache event types, add Atom support Fill in core2_hw_cache_event_id[] with the Atom model specific events. The events can be used in all the tools via the -e (--event) parameter, for example "-e l1-misses" or -"-e l2-accesses" or "-e l2-write-misses". ( Note: these are straight from the Intel manuals - not tested yet.) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 11:18:27 +02:00
Thomas Gleixner	0312af8416	perf_counter, x86: Implement generalized cache event types, add Core2 support Fill in core2_hw_cache_event_id[] with the Core2 model specific events. The events can be used in all the tools via the -e (--event) parameter, for example "-e l1-misses" or -"-e l2-accesses" or "-e l2-write-misses". Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 11:18:26 +02:00
Figo.zhang	aeef50bc04	x86, microcode: Simplify vfree() use vfree() does its own 'NULL' check, so no need for check before calling it. In v2, remove the stray newline. [ Impact: cleanup ] Signed-off-by: Figo.zhang <figo1802@gmail.com> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com> LKML-Reference: <1244385036.3402.11.camel@myhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:35:11 +02:00
Lubomir Rintel	3aa6b186f8	x86: Fix non-lazy GS handling in sys_vm86() This fixes a stack corruption panic or null dereference oops due to a bad GS in resume_userspace() when returning from sys_vm86() and calling lockdep_sys_exit(). Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR enabled. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1244384628.2323.4.camel@bimbo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:31:23 +02:00
Cyrill Gorcunov	a4046f8d29	x86, nmi: Use predefined numbers instead of hardcoded one [ Impact: cleanup ] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090607081937.GC4547@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:22:02 +02:00
Cyrill Gorcunov	103428e57b	x86, apic: Fix dummy apic read operation together with broken MP handling Ingo Molnar reported that read_apic is buggy novadays: [ 0.000000] Using APIC driver default [ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs [ 0.000000] Local APIC disabled by BIOS -- you can enable it with "lapic" [ 0.000000] APIC: disable apic facility [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at arch/x86/kernel/apic/apic.c:254 native_apic_read_dummy+0x2d/0x3b() [ 0.000000] Hardware name: HP OmniBook PC Indeed we still rely on apic->read operation for SMP compiled kernel. And instead of disfigure the SMP code with #ifdef we allow to call apic->read. To capture any unexpected results we check for apic->read being called for sane reason via WARN_ON_ONCE but(!) instead of OR we should use AND logical operation (thanks Yinghai for spotting the root of the problem). Along with that we could be have bad MP table and we are to fix it that way no SMP started and no complains about BIOS bug if apic was just disabled via command line. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090607124840.GD4547@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:08:05 +02:00
Jean Delvare	4a4aca641b	x86: Add quirk for reboot stalls on a Dell Optiplex 360 The Dell Optiplex 360 hangs on reboot, just like the Optiplex 330, so the same quirk is needed. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Steve Conklin <steve.conklin@canonical.com> Cc: Leann Ogasawara <leann.ogasawara@canonical.com> Cc: <stable@kernel.org> LKML-Reference: <200906051202.38311.jdelvare@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 15:51:20 +02:00
Jaswinder Singh Rajput	5095f59bda	x86: cpu_debug: Remove model information to reduce encoding-decoding Remove model information, encoding/decoding and reduce bookkeeping. This, besides removing a lot of code and cleaning up the code, also enables these features on many more CPUs that were enumerated before. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> LKML-Reference: <1244224637.8212.6.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 12:22:56 +02:00
Ingo Molnar	5f4457a4f6	Merge branch 'linus' into x86/cpu	2009-06-07 12:22:15 +02:00
Ingo Molnar	56fdd18c7b	Merge branch 'linus' into core/iommu Merge reason: This branch was on an -rc5 base so pull almost-2.6.30 to resync with the latest upstream fixes and make sure the combination works fine. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 11:35:05 +02:00
Linus Torvalds	ccc0d38ec1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86/pci: fix mmconfig detection with 32bit near 4g PCI: use fixed-up device class when configuring device	2009-06-06 14:33:54 -07:00
Ingo Molnar	75b5032212	Merge branch 'linus' into perfcounters/core Merge reason: Pick up the latest fixes before the -v8 perfcounters release. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 20:21:28 +02:00
Ingo Molnar	8326f44da0	perf_counter: Implement generalized cache event types Extend generic event enumeration with the PERF_TYPE_HW_CACHE method. This is a 3-dimensional space: { L1-D, L1-I, L2, ITLB, DTLB, BPU } x { load, store, prefetch } x { accesses, misses } User-space passes in the 3 coordinates and the kernel provides a counter. (if the hardware supports that type and if the combination makes sense.) Combinations that make no sense produce a -EINVAL. Combinations that are not supported by the hardware produce -ENOTSUP. Extend the tools to deal with this, and rewrite the event symbol parsing code with various popular aliases for the units and access methods above. So 'l1-cache-miss' and 'l1d-read-ops' are both valid aliases. ( x86 is supported for now, with the Nehalem event table filled in, and with Core2 and Atom having placeholder tables. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 13:14:47 +02:00
Ingo Molnar	a21ca2cac5	perf_counter: Separate out attr->type from attr->config Counter type is a frequently used value and we do a lot of bit juggling by encoding and decoding it from attr->config. Clean this up by creating a separate attr->type field. Also clean up the various similarly complex user-space bits all around counter attribute management. The net improvement is significant, and it will be easier to add a new major type (which is what triggered this cleanup). (This changes the ABI, all tools are adapted.) (PowerPC build-tested.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 11:37:22 +02:00
Mark Langsdorf	fe2245c905	x86: enable GART-IOMMU only after setting up protection methods The current code to set up the GART as an IOMMU enables GART translations before it removes the aperture from the kernel memory map, sets the GART PTEs to UC, sets up the guard and scratch pages, or does a wbinvd(). This leaves the possibility of cache aliasing open and can cause system crashes. Re-order the code so as to enable the GART translations only after all safeguards are in place and the tlb has been flushed. AMD has tested this patch on both Istanbul systems and 1st generation Opteron systems with APG enabled and seen no adverse effects. Istanbul systems with HT Assist enabled sometimes see MCE errors due to cache artifacts with the unmodified code. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Cc: <stable@kernel.org> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: akpm@linux-foundation.org Cc: jbarnes@virtuousgeek.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 09:42:09 +02:00
Dave Jones	2c701b1028	[CPUFREQ] powernow-k8: check space_id of _PCT registers to be FFH The powernow-k8 driver checks to see that the Performance Control/Status Registers are declared as FFH (functional fixed hardware) by the BIOS. However, this check got broken in the commit: `0e64a0c982` [CPUFREQ] checkpatch cleanups for powernow-k8 Fix based on an original patch from Naga Chumbalkar. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-05 13:25:25 -04:00
Peter Zijlstra	f7b6eb3fa0	x86: Set context.vdso before installing the mapping In order to make arch_vma_name() work from inside install_special_mapping() we need to set the context.vdso before calling it. ( This is needed for performance counters to be able to track this special executable area. ) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-05 14:46:40 +02:00
Rusty Russell	2cb7878a3a	lguest: fix 'unhandled trap 13' with CONFIG_CC_STACKPROTECTOR We don't set up the canary; let's disable stack protector on boot.c so we can get into lguest_init, then set it up. As a side effect, switch_to_new_gdt() sets up %fs for us properly too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-04 11:50:06 -07:00
Yinghai Lu	75e613cdc7	x86/pci: fix mmconfig detection with 32bit near 4g Pascal reported and bisected a commit: \| x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case which broke one system system. ACPI: Using IOAPIC for interrupt routing PCI: MCFG configuration 0: base f0000000 segment 0 buses 0 - 255 PCI: MCFG area at f0000000 reserved in ACPI motherboard resources PCI: Using MMCONFIG for extended config space it didn't have PCI: updated MCFG configuration 0: base f0000000 segment 0 buses 0 - 63 anymore, and try to use 0xf000000 - 0xffffffff for mmconfig For 32bit, mcfg_res->end could be 32bit only (if 64 resources aren't used) So use end - 1 to pass the value in mcfg->end to avoid overflow. We don't need to worry about the e820 path, they are always 64 bit. Reported-by: Pascal Terjan <pterjan@mandriva.com> Bisected-by: Pascal Terjan <pterjan@mandriva.com> Tested-by: Pascal Terjan <pterjan@mandriva.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: stable@kernel.org Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-06-04 11:31:13 +01:00
Hidetoshi Seto	8051dbd2df	x86, mce: fix for mce counters Make the MCE counters work on 32bit and add poll count in arch_irq_stat_cpu. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:59 -07:00
Andi Kleen	9b1beaf2b5	x86, mce: support action-optional machine checks Newer Intel CPUs support a new class of machine checks called recoverable action optional. Action Optional means that the CPU detected some form of corruption in the background and tells the OS about using a machine check exception. The OS can then take appropiate action, like killing the process with the corrupted data or logging the event properly to disk. This is done by the new generic high level memory failure handler added in a earlier patch. The high level handler takes the address with the failed memory and does the appropiate action, like killing the process. In this version of the patch the high level handler is stubbed out with a weak function to not create a direct dependency on the hwpoison branch. The high level handler cannot be directly called from the machine check exception though, because it has to run in a defined process context to be able to sleep when taking VM locks (it is not expected to sleep for a long time, just do so in some exceptional cases like lock contention) Thus the MCE handler has to queue a work item for process context, trigger process context and then call the high level handler from there. This patch adds two path to process context: through a per thread kernel exit notify_user() callback or through a high priority work item. The first runs when the process exits back to user space, the other when it goes to sleep and there is no higher priority process. The machine check handler will schedule both, and whoever runs first will grab the event. This is done because quick reaction to this event is critical to avoid a potential more fatal machine check when the corruption is consumed. There is a simple lock less ring buffer to queue the corrupted addresses between the exception handler and the process context handler. Then in process context it just calls the high level VM code with the corrupted PFNs. The code adds the required code to extract the failed address from the CPU's machine check registers. It doesn't try to handle all possible cases -- the specification has 6 different ways to specify memory address -- but only the linear address. Most of the required checking has been already done earlier in the mce_severity rule checking engine. Following the Intel recommendations Action Optional errors are only enabled for known situations (encoded in MCACODs). The errors are ignored otherwise, because they are action optional. v2: Improve comment, disable preemption while processing ring buffer (reported by Ying Huang) Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:59 -07:00
Andi Kleen	8fa8dd9e3a	x86, mce: define MCE_VECTOR Add MCE_VECTOR for the #MC exception. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:05 -07:00
Andi Kleen	9ff36ee966	x86, mce: rename mce_notify_user to mce_notify_irq Rename the mce_notify_user function to mce_notify_irq. The next patch will split the wakeup handling of interrupt context and of process context and it's better to give it a clearer name for this. Contains a fix from Ying Huang [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:04 -07:00
Andi Kleen	4ef702c10b	x86: fix panic with interrupts off (needed for MCE) For some time each panic() called with interrupts disabled triggered the !irqs_disabled() WARN_ON in smp_call_function(), producing ugly backtraces and confusing users. This is a common situation with machine checks for example which tend to call panic with interrupts disabled, but will also hit in other situations e.g. panic during early boot. In fact it means that panic cannot be called in many circumstances, which would be bad. This all started with the new fancy queued smp_call_function, which is then used by the shutdown path to shut down the other CPUs. On closer examination it turned out that the fancy RCU smp_call_function() does lots of things not suitable in a panic situation anyways, like allocating memory and relying on complex system state. I originally tried to patch this over by checking for panic there, but it was quite complicated and the original patch was also not very popular. This also didn't fix some of the underlying complexity problems. The new code in post 2.6.29 tries to patch around this by checking for oops_in_progress, but that is not enough to make this fully safe and I don't think that's a real solution because panic has to be reliable. So instead use an own vector to reboot. This makes the reboot code extremly straight forward, which is definitely a big plus in a panic situation where it is important to avoid relying on too much kernel state. The new simple code is also safe to be called from interupts off region because it is very very simple. There can be situations where it is important that panic is reliable. For example on a fatal machine check the panic is needed to get the system up again and running as quickly as possible. So it's important that panic is reliable and all function it calls simple. This is why I came up with this simple vector scheme. It's very hard to beat in simplicity. Vectors are not particularly precious anymore since all big systems are using per CPU vectors. Another possibility would have been to use an NMI similar to kdump, but there is still the problem that NMIs don't work reliably on some systems due to BIOS issues. NMIs would have been able to stop CPUs running with interrupts off too. In the sake of universal reliability I opted for using a non NMI vector for now. I put the reboot vector into the highest priority bucket of the APIC vectors and moved the 64bit UV_BAU message down instead into the next lower priority. [ Impact: bug fix, fixes an old regression ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:35 -07:00
Huang Ying	4611a6fa4b	x86, mce: export MCE severities coverage via debugfs The MCE severity judgement code is data-driven, so code coverage tools such as gcov can not be used for measuring coverage. Instead a dedicated coverage mechanism is implemented. The kernel keeps track of rules executed and reports them in debugfs. This is useful for increasing coverage of the mce-test testsuite. Right now it's unconditionally enabled because it's very little code. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:34 -07:00
Andi Kleen	ed7290d0ee	x86, mce: implement new status bits The x86 architecture recently added some new machine check status bits: S(ignalled) and AR (Action-Required). Signalled allows to check if a specific event caused an exception or was just logged through CMCI. AR allows the kernel to decide if an event needs immediate action or can be delayed or ignored. Implement support for these new status bits. mce_severity() uses the new bits to grade the machine check correctly and decide what to do. The exception handler uses AR to decide to kill or not. The S bit is used to separate events between the poll/CMCI handler and the exception handler. Classical UC always leads to panic. That was true before anyways because the existing CPUs always passed a PCC with it. Also corrects the rules whether to kill in user or kernel context and how to handle missing RIPV. The machine check handler largely uses the mce-severity grading engine now instead of making its own decisions. This means the logic is centralized in one place. This is useful because it has to be evaluated multiple times. v2: Some rule fixes; Add AO events Fix RIPV, RIPV\|EIPV order (Ying Huang) Fix UCNA with AR=1 message (Ying Huang) Add comment about panicing in m_c_p. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:34 -07:00
Andi Kleen	86503560e4	x86, mce: print header/footer only once for multiple MCEs When multiple MCEs are printed print the "HARDWARE ERROR" header and "This is not a software error" footer only once. This makes the output much more compact with many CPUs. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:34 -07:00
Andi Kleen	29b0f591d6	x86, mce: default to panic timeout for machine checks Fatal machine checks can be logged to disk after boot, but only if the system did a warm reboot. That's unfortunately difficult with the default panic behaviour, which waits forever and the admin has to press the power button because modern systems usually miss a reset button. This clears the machine checks in the registers and make it impossible to log them. This patch changes the default for machine check panic to always reboot after 30s. Then the mce can be successfully logged after reboot. I believe this will improve machine check experience for any system running the X server. This is dependent on successfull boot logging of MCEs. This currently only works on Intel systems, on AMD there are quite a lot of systems around which leave junk in the machine check registers after boot, so it's disabled here. These systems will continue to default to endless waiting panic. v2: Only force panic timeout when it's shorter (H.Seto) v3: Only force timeout when there is no timeout (based on comment H.Seto) [ Fix changelog - HS ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:33 -07:00
Huang Ying	1b2797dcc9	x86, mce: improve mce_get_rip Assume IP on the stack is valid when either EIPV or RIPV are set. This influences whether the machine check exception handler decides to return or panic. This fixes a test case in the mce-test suite and is more compliant to the specification. This currently only makes a difference in a artificial testing scenario with the mce-test test suite. Also in addition do not force the EIPV to be valid with the exact register MSRs, and keep in trust the CS value on stack even if MSR is available. [AK: combination of patches from Huang Ying and Hidetoshi Seto, with new description by me] [add some description, no code changed - HS] Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:33 -07:00
Andi Kleen	ac9603754d	x86, mce: make non Monarch panic message "Fatal machine check" too ... instead of "Machine check". This is for consistency with the Monarch panic message. Based on a report from Ying Huang. v2: But add a descriptive postfix so that the test suite can distingush. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:12 -07:00
Andi Kleen	3c0797925f	x86, mce: switch x86 machine check handler to Monarch election. On Intel platforms machine check exceptions are always broadcast to all CPUs. This patch makes the machine check handler synchronize all these machine checks, elect a Monarch to handle the event and collect the worst event from all CPUs and then process it first. This has some advantages: - When there is a truly data corrupting error the system panics as quickly as possible. This improves containment of corrupted data and makes sure the corrupted data never hits stable storage. - The panics are synchronized and do not reenter the panic code on multiple CPUs (which currently does not handle this well). - All the errors are reported. Currently it often happens that another CPU happens to do the panic first, but reports useless information (empty machine check) because the real error happened on another CPU which came in later. This is a big advantage on Nehalem where the 8 threads per CPU lead to often the wrong CPU winning the race and dumping useless information on a machine check. The problem also occurs in a less severe form on older CPUs. - The system can detect when no CPUs detected a machine check and shut down the system. This can happen when one CPU is so badly hung that that it cannot process a machine check anymore or when some external agent wants to stop the system by asserting the machine check pin. This follows Intel hardware recommendations. - This matches the recommended error model by the CPU designers. - The events can be output in true severity order - When a panic happens on another CPU it makes sure to be actually be able to process the stop IPI by enabling interrupts. The code is extremly careful to handle timeouts while waiting for other CPUs. It can't rely on the normal timing mechanisms (jiffies, ktime_get) because of its asynchronous/lockless nature, so it uses own timeouts using ndelay() and a "SPINUNIT" The timeout is configurable. By default it waits for upto one second for the other CPUs. This can be also disabled. From some informal testing AMD systems do not see to broadcast machine checks, so right now it's always disabled by default on non Intel CPUs or also on very old Intel systems. Includes fixes from Ying Huang Fixed a "ecception" in a comment (H.Seto) Moved global_nwo reset later based on suggestion from H.Seto v2: Avoid duplicate messages [ Impact: feature, fixes long standing problems. ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:12 -07:00
Andi Kleen	f94b61c2c9	x86, mce: implement panic synchronization In some circumstances multiple CPUs can enter mce_panic() in parallel. This gives quite confused output because they will all dump the same machine check buffer. The other problem is that they would all panic in parallel, but not process each other's shutdown IPIs because interrupts are disabled. Detect this situation early on in mce_panic(). On the first CPU entering will do the panic, the others will just wait to be killed. For paranoia reasons in case the other CPU dies during the MCE I added a 5 seconds timeout. If it expires each CPU will panic on its own again. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:12 -07:00
Andi Kleen	ccc3c3192a	x86, mce: implement bootstrapping for machine check wakeups Machine checks support waking up the mcelog daemon quickly. The original wake up code for this was pretty ugly, relying on a idle notifier and a special process flag. The reason it did it this way is that the machine check handler is not subject to normal interrupt locking rules so it's not safe to call wake_up(). Instead it set a process flag and then either did the wakeup in the syscall return or in the idle notifier. This patch adds a new "bootstraping" method as replacement. The idea is that the handler checks if it's in a state where it is unsafe to call wake_up(). If it's safe it calls it directly. When it's not safe -- that is it interrupted in a critical section with interrupts disables -- it uses a new "self IPI" to trigger an IPI to its own CPU. This can be done safely because IPI triggers are atomic with some care. The IPI is raised once the interrupts are reenabled and can then safely call wake_up(). When APICs are disabled the event is just queued and will be picked up eventually by the next polling timer. I think that's a reasonable compromise, since it should only happen quite rarely. Contains fixes from Ying Huang. [ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:44:05 -07:00
Andi Kleen	bd19a5e6b7	x86, mce: check early in exception handler if panic is needed The exception handler should behave differently if the exception is fatal versus one that can be returned from. In the first case it should never clear any registers because these need to be preserved for logging after the next boot. Otherwise it should clear them on each CPU step by step so that other CPUs sharing the same bank don't see duplicate events. Otherwise we risk reporting events multiple times on any CPUs which have shared machine check banks, which is a common problem on Intel Nehalem which has both SMT (two CPU threads sharing banks) and shared machine check banks in the uncore. Determine early in a special pass if any event requires a panic. This uses the mce_severity() function added earlier. This is needed for the next patch. Also fixes a problem together with an earlier patch that corrected events weren't logged on a fatal MCE. [ Impact: Feature ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:39 -07:00
Andi Kleen	817f32d02a	x86, mce: add table driven machine check grading The machine check grading (as in deciding what should be done for a given register value) has to be done multiple times soon and it's also getting more complicated. So it makes sense to consolidate it into a single function. To get smaller and more straight forward and possibly more extensible code I opted towards a new table driven method. The various rules are put into a table when is then executed by a very simple interpreter. The grading engine is in a new file mce-severity.c. I also added a private include file mce-internal.h, because mce.h is already a bit too cluttered. This is dead code right now, but will be used in followon patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:39 -07:00
Andi Kleen	a0189c70e5	x86, mce: remove TSC print heuristic Previously mce_panic used a simple heuristic to avoid printing old so far unreported machine check events on a mce panic. This worked by comparing the TSC value at the start of the machine check handler with the event time stamp and only printing newer ones. This has a couple of issues, in particular on systems where the TSC is not fully synchronized between CPUs it could lose events or print old ones. It is also problematic with full system synchronization as it is added by the next patch. Remove the TSC heuristic and instead replace it with a simple heuristic to print corrected errors first and after that uncorrected errors and finally the worst machine check as determined by the machine check handler. This simplifies the code because there is no need to pass the original TSC value around. Contains fixes from Ying Huang [ Impact: bug fix, cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Ying Huang <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:39 -07:00
Andi Kleen	de8a84d85a	x86, mce: log corrected errors when panicing Normally the machine check handler ignores corrected errors and leaves them to machine_check_poll(). But when panicing mcp won't run, so log all errors. Note: this can still miss some cases until the "early no way out" patch later is applied too. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:39 -07:00
Andi Kleen	8ee08347c1	x86, mce: extend struct mce user interface with more information. Experience has shown that struct mce which is used to pass an machine check to the user space daemon currently a few limitations. Also some data which is useful to print at panic level is also missing. This patch addresses most of them. The same information is also printed out together with mce panic. struct mce can be painlessly extended in a compatible way, the mcelog user space code just ignores additional fields with a warning. - It doesn't provide a wall time timestamp. There have been a few complaints about that. Fix that by adding a 64bit time_t - It doesn't provide the exact CPU identification. This makes it awkward for mcelog to decode the event correctly, especially when there are variations in the supported MCE codes on different CPU models or when mcelog is running on a different host after a panic. Previously the administrator had to specify the correct CPU when mcelog ran on a different host, but with the more variation in machine checks now it's better to auto detect that. It's also useful for more detailed analysis of CPU events. Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead. - Socket ID and initial APIC ID are useful to report because they allow to identify the failing CPU in some (not all) cases. This is also especially useful for the panic situation. This addresses one of the complaints from Thomas Gleixner earlier. - The MCG capabilities MSR needs to be reported for some advanced error processing in mcelog Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	d620c67fb9	x86, mce: support more than 256 CPUs in struct mce The old struct mce had a limitation to 256 CPUs. But x86 Linux supports more than that now with x2apic. Add a new field extcpu to report the extended number. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	f6fb0ac086	x86, mce: store record length into memory struct mce anchor This makes it easier for tools who want to extract the mcelog out of crash images or memory dumps to adapt to changing struct mce size. The length field replaces padding, so it's fully compatible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	ca84f69697	x86, mce: add MCE poll count to /proc/interrupts Keep a count of the machine check polls (or CMCI events) in /proc/interrupts. Andi needs this for debugging, but it's also useful in general to see what's going in by the kernel. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	01ca79f141	x86, mce: add machine check exception count in /proc/interrupts Useful for debugging, but it's also good general policy to have a counter for all special interrupts there. This makes it easier to diagnose where a CPU is spending its time. [ Impact: feature, debugging tool ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Ingo Molnar	128f048f0f	perf_counter: Fix throttling lock-up Throttling logic is broken and we can lock up with too small hw sampling intervals. Make the throttling code more robust: disable counters even if we already disabled them. ( Also clean up whitespace damage i noticed while reading various pieces of code related to throttling. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 23:39:51 +02:00
Cliff Wickman	0e2595cdfd	x86: Fix UV BAU activation descriptor init The UV tlb shootdown code has a serious initialization error. An array of structures [328] is initialized as if it were [32]. The array is indexed by (cpu number on the blade)8, so the short initialization works for up to 4 cpus on a blade. But above that, we provide an invalid opcode to the hub's broadcast assist unit. This patch changes the allocation of the array to use its symbolic dimensions for better clarity. And initializes all 32*8 entries. Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's recommendation. Tested on the UV simulator. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> LKML-Reference: <E1M6lZR-0007kV-Aq@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 13:07:31 +02:00
Jiri Slaby	367d04c4ec	amd_iommu: fix lock imbalance In alloc_coherent there is an omitted unlock on the path where mapping fails. Add the unlock. [ Impact: fix lock imbalance in alloc_coherent ] Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-03 10:34:55 +02:00
Yong Wang	a32881066e	perf_counter/x86: Remove the IRQ (non-NMI) handling bits Remove the IRQ (non-NMI) handling bits as NMI will be used always. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 09:53:34 +02:00
Mike Galbraith	6799687a53	x86, boot: add new generated files to the appropriate .gitignore files git status complains of untracked (generated) files in arch/x86/boot.. # Untracked files: # (use "git add <file>..." to include in what will be committed) # # ../../arch/x86/boot/compressed/mkpiggy # ../../arch/x86/boot/compressed/piggy.S # ../../arch/x86/boot/compressed/vmlinux.lds # ../../arch/x86/boot/voffset.h # ../../arch/x86/boot/zoffset.h ..so adjust .gitignore files accordingly. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-02 21:13:30 -07:00
Peter Zijlstra	0d48696f87	perf_counter: Rename perf_counter_hw_event => perf_counter_attr The structure isn't hw only and when I read event, I think about those things that fall out the other end. Rename the thing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> Cc: Stephane Eranian <eranian@googlemail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:33 +02:00
Peter Zijlstra	e4abb5d4f7	perf_counter: x86: Emulate longer sample periods Do as Power already does, emulate sample periods up to 2^63-1 by composing them of smaller values limited by hardware capabilities. Only once we wrap the software period do we generate an overflow event. Just 10 lines of new code. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:31 +02:00
Peter Zijlstra	8a016db386	perf_counter: Remove the last nmi/irq bits IRQ (non-NMI) sampling is not used anymore - remove the last few bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:31 +02:00
Peter Zijlstra	b23f3325ed	perf_counter: Rename various fields A few renames: s/irq_period/sample_period/ s/irq_freq/sample_freq/ s/PERF_RECORD_/PERF_SAMPLE_/ s/record_type/sample_type/ And change both the new sample_type and read_format to u64. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:30 +02:00
Huang Ying	2cf4ac8beb	crypto: aes-ni - Add support for more modes Because kernel_fpu_begin() and kernel_fpu_end() operations are too slow, the performance gain of general mode implementation + aes-aesni is almost all compensated. The AES-NI support for more modes are implemented as follow: - Add a new AES algorithm implementation named __aes-aesni without kernel_fpu_begin/end() - Use fpu(<mode>(AES)) to provide kenrel_fpu_begin/end() invoking - Add <mode>(AES) ablkcipher, which uses cryptd(fpu(<mode>(AES))) to defer cryption to cryptd context in soft_irq context. Now the ctr, lrw, pcbc and xts support are added. Performance testing based on dm-crypt shows that cryption time can be reduced to 50% of general mode implementation + aes-aesni implementation. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2009-06-02 14:04:16 +10:00
Huang Ying	150c7e8552	crypto: fpu - Add template for blkcipher touching FPU Blkcipher touching FPU need to be enclosed by kernel_fpu_begin() and kernel_fpu_end(). If they are invoked in cipher algorithm implementation, they will be invoked for each block, so that performance will be hurt, because they are "slow" operations. This patch implements "fpu" template, which makes these operations to be invoked for each request. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2009-06-02 14:04:15 +10:00
Jiri Slaby	3d58829b05	x86, apic: Restore irqs on fail paths lapic_resume forgets to restore interrupts on fail paths. Fix that. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <1243497289-18591-1-git-send-email-jirislaby@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com>	2009-06-02 02:48:59 +02:00
Naga Chumbalkar	58f892e022	x86: Print real IOAPIC version for x86-64 Fix the fact that the IOAPIC version number in the x86_64 code path always gets assigned to 0, instead of the correct value. Before the patch: (from "dmesg" output): ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-23 <--- After the patch: ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 <--- History: io_apic_get_version() was compiled out of the x86_64 code path in the commit `f2c2cca3ac`: Author: Andi Kleen <ak@suse.de> Date: Tue Sep 26 10:52:37 2006 +0200 [PATCH] Remove APIC version/cpu capability mpparse checking/printing ACPI went to great trouble to get the APIC version and CPU capabilities of different CPUs before passing them to the mpparser. But all that data was used was to print it out. Actually it even faked some data based on the boot cpu, not on the actual CPU being booted. Remove all this code because it's not needed. Cc: len.brown@intel.com At the time, the IOAPIC version number was deliberately not printed in the x86_64 code path. However, after the x86 and x86_64 files were merged, the net result is that the IOAPIC version is printed incorrectly in the x86_64 code path. The patch below provides a fix. I have tested it with acpi, and with acpi=off, and did not see any problems. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <20090416014230.4885.94926.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu> *************************	2009-06-02 02:03:18 +02:00
H. Peter Anvin	48b1fddbb1	Merge branch 'irq/numa' into x86/mce3 Merge reason: arch/x86/kernel/irqinit_{32,64}.c unified in irq/numa and modified in x86/mce3; this merge resolves the conflict. Conflicts: arch/x86/kernel/irqinit.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-01 15:25:31 -07:00
Ingo Molnar	ee4c24a5c9	Merge branch 'x86/cpufeature' into irq/numa Merge reason: irq/numa didnt build because this commit: `2759c32`: x86: don't call read_apic_id if !cpu_has_apic Had a dependency on x86/cpufeature changes. Pull in that (small) branch to fix the dependency. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 22:30:01 +02:00
Ingo Molnar	3d58f48ba0	Merge branch 'linus' into irq/numa Conflicts: arch/mips/sibyte/bcm1480/irq.c arch/mips/sibyte/sb1250/irq.c Merge reason: we gathered a few conflicts plus update to latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 21:06:21 +02:00
Ingo Molnar	23db9f430b	Merge branch 'linus' into perfcounters/core Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6 based - to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 10:01:39 +02:00
Joe Perches	61c8c67e3a	acpi-cpufreq: fix printk typo and indentation Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>	2009-05-29 21:26:26 -04:00
Mel Gorman	32b154c0b0	x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302 On x86 and x86-64, it is possible that page tables are shared beween shared mappings backed by hugetlbfs. As part of this, page_table_shareable() checks a pair of vma->vm_flags and they must match if they are to be shared. All VMA flags are taken into account, including VM_LOCKED. The problem is that VM_LOCKED is cleared on fork(). When a process with a shared memory segment forks() to exec() a helper, there will be shared VMAs with different flags. The impact is that the shared segment is sometimes considered shareable and other times not, depending on what process is checking. What happens is that the segment page tables are being shared but the count is inaccurate depending on the ordering of events. As the page tables are freed with put_page(), bad pmd's are found when some of the children exit. The hugepage counters also get corrupted and the Total and Free count will no longer match even when all the hugepage-backed regions are freed. This requires a reboot of the machine to "fix". This patch addresses the problem by comparing all flags except VM_LOCKED when deciding if pagetables should be shared or not for hugetlbfs-backed mapping. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <starlight@binnacle.cx> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-29 08:40:03 -07:00
Yong Wang	c323d95fa4	perf_counter/x86: Always use NMI for performance-monitoring interrupt Always use NMI for performance-monitoring interrupt as there could be racy situations if we switch between irq and nmi mode frequently. Signed-off-by: Yong Wang <yong.y.wang@intel.com> LKML-Reference: <20090529052835.GA13657@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-29 09:04:58 +02:00
H. Peter Anvin	38736072d4	x86, mce: drop "extern" from function prototypes in asm/mce.h Function prototypes don't need to be prefixed by "extern". [ Impact: cleanup ] Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 10:05:33 -07:00
Hidetoshi Seto	cd13adcc82	x86: trivial clean up for arch/x86/Kconfig Use tab. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 10:05:21 -07:00
Andi Kleen	eb2a6ab729	x86: trivial clean up for irq_vectors.h Fix a wrong comment. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	98a9c8c3ba	x86, mce: trivial clean up for mce-inject.c Fix for: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> +#include <asm/uaccess.h> WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc + if (m.cpu >= NR_CPUS \|\| !cpu_online(m.cpu)) ERROR: trailing whitespace +/* $ Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	61a021a070	x86, mce: trivial clean up for mce_intel_64.c Fix for: WARNING: space prohibited between function name and open parenthesis '(' + for_each_online_cpu (cpu) { Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	34fa1967aa	x86, mce: trivial clean up for mce_amd_64.c Fix for followings: WARNING: Use #include <linux/percpu.h> instead of <asm/percpu.h> +#include <asm/percpu.h> ERROR: Macros with multiple statements should be enclosed in a do - while loop +#define THRESHOLD_ATTR(_name, _mode, _show, _store) \ +{ \ + .attr = {.name = __stringify(_name), .mode = _mode }, \ + .show = _show, \ + .store = _store, \ +}; WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc + if (cpu >= NR_CPUS) Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	14a02530e2	x86, mce: trivial clean up for mce.c This fixs following checkpatch warnings: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> +#include <asm/uaccess.h> WARNING: Use #include <linux/smp.h> instead of <asm/smp.h> +#include <asm/smp.h> WARNING: line over 80 characters + set_bit(MCE_OVERFLOW, (unsigned long *)&mcelog.flags); WARNING: braces {} are not necessary for any arm of this statement + if (mce_notify_user()) { [...] + } else { [...] Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	cc3aec52ab	x86, mce: trivial clean up for therm_throt.c This patch removes following checkpatch warning: WARNING: Use #include <linux/cpu.h> instead of <asm/cpu.h> +#include <asm/cpu.h> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Hidetoshi Seto	9319cec8c1	x86, mce: use strict_strtoull Use strict_strtoull instead of simple_strtoull. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	b170204ddb	x86, mce: drop BKL in mce_open BKL is not needed for anything in mce_open because it has an own spinlock. Remove it. [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	32561696c2	x86, mce: rename and align out2 label There's only a single out path in do_machine_check now, so rename the label from out2 to out. Also align it at the first column. [ Impact: minor cleanup, no functional changes ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Thomas Gleixner	8be9110569	x86, mce: remove mce_init unused argument Remove unused mce_init argument. [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	fc016a49c2	x86, mce: remove unused mce_events variable Remove unused mce_events static variable. [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	b56f642d2b	x86, mce: use extended sysattrs for the check_interval attribute. Instead of using own callbacks use the generic ones provided by the sysdev later. This finally allows to get rid of the ugly ACCESSOR macros. Should also save some text size. [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	88921be302	x86, mce: synchronize core after machine check handling The example code in the IA32 SDM recommends to synchronize the CPU after machine check handling. So do that here. [ Impact: Spec compliance ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
H. Peter Anvin	5706001aac	x86, mce: fix comment style in mce-inject.c Fix style of winged comment in mce-inject.c. [ Impact: comment only ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
H. Peter Anvin	a1ff41bfc1	x86, mce: add comment about mce_chrdev_ops being writable Add a comment explaining that mce_chrdev_ops is intentionally writable. [ Impact: comment only ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
Andi Kleen	ea149b36c7	x86, mce: add basic error injection infrastructure Allow user programs to write mce records into /dev/mcelog. When they do that a fake machine check is triggered to test the machine check code. This uses the MCE MSR wrappers added earlier. The implementation is straight forward. There is a struct mce record per CPU and the MCE MSR accesses get data from there if there is valid data injected there. This allows to test the machine check code relatively realistically because only the lowest layer of hardware access is intercepted. The test suite and injector are available at git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
Andi Kleen	5f8c1a54ca	x86, mce: add MSR read wrappers for easier error injection This will be used by future patches to allow machine check error injection. Right now it's a nop, except for adding some wrappers around the MSR reads. This is early in the sequence to avoid too many conflicts. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
Andi Kleen	de5619dfef	x86, mce: enable MCE_AMD for 32bit NEW_MCE That's very easy using the infrastructure enabled earlier for MCE_INTEL Untested. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:13 -07:00
Andi Kleen	7856f6cce4	x86, mce: enable MCE_INTEL for 32bit new MCE Enable the 64bit MCE_INTEL code (CMCI, thermal interrupts) for 32bit NEW_MCE. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:13 -07:00
Andi Kleen	4efc0670ba	x86, mce: use 64bit machine check code on 32bit The 64bit machine check code is in many ways much better than the 32bit machine check code: it is more specification compliant, is cleaner, only has a single code base versus one per CPU, has better infrastructure for recovery, has a cleaner way to communicate with user space etc. etc. Use the 64bit code for 32bit too. This is the second attempt to do this. There was one a couple of years ago to unify this code for 32bit and 64bit. Back then this ran into some trouble with K7s and was reverted. I believe this time the K7 problems (and some others) are addressed. I went over the old handlers and was very careful to retain all quirks. But of course this needs a lot of testing on old systems. On newer 64bit capable systems I don't expect much problems because they have been already tested with the 64bit kernel. I made this a CONFIG for now that still allows to select the old machine check code. This is mostly to make testing easier, if someone runs into a problem we can ask them to try with the CONFIG switched. The new code is default y for more coverage. Once there is confidence the 64bit code works well on older hardware too the CONFIG_X86_OLD_MCE and the associated code can be easily removed. This causes a behaviour change for 32bit installations. They now have to install the mcelog package to be able to log corrected machine checks. The 64bit machine check code only handles CPUs which support the standard Intel machine check architecture described in the IA32 SDM. The 32bit code has special support for some older CPUs which have non standard machine check architectures, in particular WinChip C3 and Intel P5. I made those a separate CONFIG option and kept them for now. The WinChip variant could be probably removed without too much pain, it doesn't really do anything interesting. P5 is also disabled by default (like it was before) because many motherboards have it miswired, but according to Alan Cox a few embedded setups use that one. Forward ported/heavily changed version of old patch, original patch included review/fixes from Thomas Gleixner, Bert Wesarg. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:13 -07:00
Andi Kleen	d896a940ef	x86, mce: remove oops_begin() use in 64bit machine check First 32bit doesn't have oops_begin, so it's a barrier of using this code on 32bit. On closer examination it turns out oops_begin is not a good idea in a machine check panic anyways. All oops_begin does it so check for recursive/parallel oopses and implement the "wait on oops" heuristic. But there's actually no good reason to lock machine checks against oopses or prevent them from recursion. Also "wait on oops" does not really make sense for a machine check too. Replace it with a manual bust_spinlocks/console_verbose. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:13 -07:00
Andi Kleen	8e97aef5f4	x86, mce: remove machine check handler idle notify on 64bit i386 has no idle notifiers, but the 64bit machine check code uses them to wake up mcelog from a fatal machine check exception. For corrected machine checks found by the poller or threshold interrupts going through an idle notifier is not needed because the wake_up can is just done directly and doesn't need the idle notifier. It is only needed for logging exceptions. To be honest I never liked the idle notifier even though I signed off on it. On closer investigation the code actually turned out to be nearly. Right now machine check exceptions on x86 are always unrecoverable (lead to panic due to PCC), which means we never execute the idle notifier path. The only exception is the somewhat weird tolerant==3 case, which ignores PCC. I'll fix this in a future patch in a much cleaner way. So remove the "mcelog wakeup through idle notifier" code from 64bit. This allows to compile the 64bit machine check handler on 32bit which doesn't have idle notifiers. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	d7c3c9a609	x86, mce: move mce_disabled option into common 32bit/64bit code It's the same function, so let's share it. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	04b2b1a4df	x86, mce: rename 64bit mce_dont_init to mce_disabled Give it the same name as on 32bit. This makes further merging easier. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	5d7279268b	x86, mce: use a call vector to call the 64bit mce handler Allows to call different machine check handlers from the low level machine check entry vector. This is needed for later when it will be used for 32bit too. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	2e6f694fde	x86, mce: port K7 bank 0 quirk to 64bit mce code Various K7 have broken bank 0s. Don't enable it by default Port from the 32bit code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	06b7a7a5ec	x86, mce: implement the PPro bank 0 quirk in the 64bit machine check code Quoting the comment: * SDM documents that on family 6 bank 0 should not be written * because it aliases to another special BIOS controlled * register. * But it's not aliased anymore on model 0x1a+ * Don't ignore bank 0 completely because there could be a valid * event later, merely don't write CTL0. This is mostly a port on the 32bit code, except that 32bit always didn't write it and didn't have the 0x1a heuristic. I checked with the CPU designers that the quirk is not required starting with this model. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	3cde5c8c83	x86, mce: initial steps to make 64bit mce code 32bit clean Replace unsigned long with u64s if they need to contain 64bit values. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Thomas Gleixner	01c6680a54	x86, mce: Cleanup MCG definitions Decode more magic constants and turn them into symbols. [ Sort definitions bitwise, introduce MCG_EXT_CNT - HS ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Thomas Gleixner	ba2d0f2b0c	x86, mce: Cleanup symbols in intel thermal codes Decode magic constants and turn them into symbols. [ Cleanup to use symbols already exists - HS ] [ Impact: cleanup ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	b659294b77	x86, mce: print number of MCE banks The number of MCE banks supported by a CPU is a useful number to know, so print it out during CPU initialization. [ Impact: add printout ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	cb491fca55	x86, mce: Rename sysfs variables Shorten variable names. This also compacts the code a bit. device_mce => mce_dev mce_device_initialized => mce_dev_initialized mce_attribute => mce_attrs [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	dba3725d44	x86, mce: unify move mce_64.c => mce.c and glue it up in the Makefile. Remove mce_32.c Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	711c2e481c	x86, mce: unify, prepare for 32-bit v2 Prepare the 64-bit mce_64.c code side to be built on 32-bit. [ includes ifdef relocation by Andi Kleen ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@firstfloor.org> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	a988d334ae	x86, mce: unify, prepare codes Move current 32-bit mce_32.c code into mce_64.c. [ Remove unused artifact stop/restart_mce pointed by Andi Kleen ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@firstfloor.org> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	06b851d982	x86, mce: unify, prepare 64bit in mce.h Prepare mce.h for unification, so that it will build on 32-bit x86 kernels too. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Thomas Gleixner	a65d086235	x86, mce: unify Intel thermal init Mechanic unification. No change in code. [ Impact: cleanup, 32-bit / 64-bit unification ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Thomas Gleixner	6cc6f3ebd1	x86, mce: unify Intel thermal init, prepare Prepare for unification, make two intel_init_thermal equal. [ Impact: cleanup ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	1cb2a8e176	x86, mce: clean up mce_amd_64.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	cb6f3c155b	x86, mce: clean up therm_throt.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	bdbfbdd5e8	x86, mce: clean up non-fatal.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	91425084f7	x86, mce: clean up winchip.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	efee4ca809	x86, mce: clean up k7.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00

... 2 3 4 5 6 ...

7952 Commits