kernel-ark/arch/x86/kernel
Glauber Costa 489fb490db x86, paravirt: Add a global synchronization point for pvclock
In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19 11:41:00 +03:00
..
acpi include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
apic Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip 2010-04-07 11:02:23 -07:00
cpu perf & kvm: Clean up some of the guest profiling callback API details 2010-04-20 08:08:28 +02:00
.gitignore
alternative.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
amd_iommu_init.c Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-04-13 13:24:54 +02:00
amd_iommu.c Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-04-13 13:24:54 +02:00
apb_timer.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
aperture_64.c x86/gart: Disable GART explicitly before initialization 2010-04-07 14:36:30 +02:00
apm_32.c x86: Remove trailing spaces in messages 2010-02-07 17:47:51 +01:00
asm-offsets_32.c
asm-offsets_64.c tracing: Define NR_syscalls for x86_64 2009-08-26 21:29:58 +02:00
asm-offsets.c
audit_64.c
bios_uv.c Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-02-28 11:00:55 -08:00
bootflag.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
check.c
cpuid.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
crash_dump_32.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
crash_dump_64.c
crash.c Revert "x86: disable IOMMUs on kernel crash" 2010-04-07 11:51:17 +02:00
doublefault_32.c x86: Use get_desc_base() 2009-07-19 18:27:51 +02:00
dumpstack_32.c perf: Drop useless check for ignored frame 2010-01-13 10:09:08 +01:00
dumpstack_64.c perf/x86-64: Use frame pointer to walk on irq and process stacks 2010-03-10 14:26:40 +01:00
dumpstack.c x86, perf, bts, mm: Delete the never used BTS-ptrace code 2010-03-26 11:33:55 +01:00
dumpstack.h perf: Fix unsafe frame rewinding with hot regs fetching 2010-04-08 19:03:28 +02:00
e820.c x86: Make e820_remove_range to handle all covered case 2010-03-31 17:40:57 -07:00
early_printk.c x86: earlyprintk: Fix regression to handle serial,ttySn as 1 arg 2009-10-01 10:34:16 +02:00
early-quirks.c
efi_32.c
efi_64.c x86: Make 64-bit efi_ioremap use ioremap on MMIO regions 2009-08-03 13:34:25 -07:00
efi_stub_32.S
efi_stub_64.S
efi.c x86: Remove trailing spaces in messages 2010-02-07 17:47:51 +01:00
entry_32.S x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper 2009-12-10 15:55:36 -08:00
entry_64.S Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-16 12:02:37 -08:00
ftrace.c Merge branch 'tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/core 2010-02-27 10:06:10 +01:00
head32.c x86: Make sure free_init_pages() frees pages on page boundary 2010-03-29 18:55:33 +02:00
head64.c x86: Make sure free_init_pages() frees pages on page boundary 2010-03-29 18:55:33 +02:00
head_32.S Merge branch 'master' into percpu 2010-01-05 09:17:33 +09:00
head_64.S tree-wide: Assorted spelling fixes 2010-02-09 11:13:56 +01:00
head.c
hpet.c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip 2010-04-07 11:02:23 -07:00
hw_breakpoint.c Merge branch 'perf/core' into perf/urgent 2010-03-04 11:47:52 +01:00
i386_ksyms_32.c x86: Don't generate cmpxchg8b_emu if CONFIG_X86_CMPXCHG64=y 2009-10-01 08:42:24 +02:00
i387.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
i8237.c
i8253.c x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown 2009-08-21 21:13:37 +02:00
i8259.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
init_task.c Use new __init_task_data macro in arch init_task.c files. 2009-09-21 06:27:08 +02:00
io_delay.c
ioport.c x86-64, paravirt: Call set_iopl_mask() on 64 bits 2009-12-09 16:54:08 -08:00
irq_32.c x86: Unify fixup_irqs() for 32-bit and 64-bit kernels 2009-11-02 15:56:34 +01:00
irq_64.c x86: Unify fixup_irqs() for 32-bit and 64-bit kernels 2009-11-02 15:56:34 +01:00
irq.c genirq: Convert irq_desc.lock to raw_spinlock 2009-12-14 23:55:33 +01:00
irqinit.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
k8.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
kdebugfs.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
kgdb.c x86,kgdb: Always initialize the hw breakpoint attribute 2010-04-01 08:26:32 +02:00
kprobes.c x86, ptrace: Fix block-step 2010-03-26 11:33:57 +01:00
kvm.c KVM guest: do not batch pte updates from interrupt context 2009-09-10 18:10:50 +03:00
kvmclock.c Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-09-18 14:05:47 -07:00
ldt.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
machine_kexec_32.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-08 13:27:33 -08:00
machine_kexec_64.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
Makefile x86, perf, bts, mm: Delete the never used BTS-ptrace code 2010-03-26 11:33:55 +01:00
mca_32.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
microcode_amd.c Revert "x86: ucode-amd: Load ucode-patches once ..." 2010-01-23 06:21:59 +01:00
microcode_core.c Revert "x86: ucode-amd: Load ucode-patches once ..." 2010-01-23 06:21:59 +01:00
microcode_intel.c x86: Remove trailing spaces in messages 2010-02-07 17:47:51 +01:00
mmconf-fam10h_64.c x86: Move range related operation to one file 2010-02-10 17:47:17 -08:00
module.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mpparse.c x86: Handle overlapping mptables 2010-04-01 13:31:07 -07:00
mrst.c x86, mrst: Platform clock setup code 2010-02-24 11:01:33 -08:00
msr.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
olpc.c x86, olpc: Use pci subarch init for OLPC 2010-02-25 19:26:23 -08:00
paravirt_patch_32.c
paravirt_patch_64.c
paravirt-spinlocks.c locking: Convert __raw_spin* functions to arch_spin* 2009-12-14 23:55:32 +01:00
paravirt.c x86, paravirt: Remove kmap_atomic_pte paravirt op. 2010-02-27 14:41:35 -08:00
pci-calgary_64.c tree-wide: Assorted spelling fixes 2010-02-09 11:13:56 +01:00
pci-dma.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
pci-gart_64.c Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-04-13 13:24:54 +02:00
pci-nommu.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
pci-swiotlb.c x86: Split swiotlb initialization into two stages 2009-12-15 13:01:57 +01:00
pcspeaker.c
pmtimer_64.c
probe_roms_32.c
process_32.c x86, perf, bts, mm: Delete the never used BTS-ptrace code 2010-03-26 11:33:55 +01:00
process_64.c x86, perf, bts, mm: Delete the never used BTS-ptrace code 2010-03-26 11:33:55 +01:00
process.c Merge branch 'perf/urgent' into perf/core 2010-04-02 19:38:10 +02:00
ptrace.c Merge branch 'linus' into perf/core 2010-04-08 13:37:18 +02:00
pvclock.c x86, paravirt: Add a global synchronization point for pvclock 2010-05-19 11:41:00 +03:00
quirks.c x86: Disable HPET MSI on ATI SB700/SB800 2010-01-23 06:21:58 +01:00
reboot_fixups_32.c cs5535: move the DIVIL MSR definition into linux/cs5535.h 2009-12-15 08:53:28 -08:00
reboot.c x86: Add iMac9,1 to pci_reboot_dmi_table 2010-02-17 08:08:21 +01:00
relocate_kernel_32.S
relocate_kernel_64.S
rtc.c Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-09-18 14:05:47 -07:00
scx200_32.c
setup_percpu.c early_res: Add free_early_partial() 2010-02-26 08:25:35 +01:00
setup.c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip 2010-04-07 11:02:23 -07:00
sfi.c SFI: remove unneeded includes 2009-09-15 15:08:40 -04:00
signal.c x86: Merge sys_sigaltstack 2009-12-09 16:28:59 -08:00
smp.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
smpboot.c Merge branch 'master' into export-slabh 2010-04-05 11:37:28 +09:00
stacktrace.c perf events, x86/stacktrace: Make stack walking optional 2009-12-17 09:56:19 +01:00
step.c x86, ptrace: Fix block-step 2010-03-26 11:33:57 +01:00
sys_i386_32.c Add generic sys_olduname() 2010-03-12 15:52:32 -08:00
sys_x86_64.c improve sys_newuname() for compat architectures 2010-03-12 15:52:32 -08:00
syscall_64.c
syscall_table_32.S Add generic sys_old_mmap() 2010-03-12 15:52:32 -08:00
tboot.c KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) 2010-05-19 11:36:34 +03:00
tce_64.c
test_nx.c
test_rodata.c
time.c x86: Convert i8259_lock to raw_spinlock 2010-02-16 18:21:32 +01:00
tlb_uv.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
tls.c
tls.h
topology.c
trampoline_32.S x86: cpuinit-annotate SMP boot trampolines properly 2009-09-20 20:23:37 +02:00
trampoline_64.S x86: Fix Suspend to RAM freeze on Acer Aspire 1511Lmi laptop 2009-10-12 18:06:48 +02:00
trampoline.c x86: Use find_e820() instead of hard coded trampoline address 2009-12-11 09:28:22 +01:00
traps.c x86, ptrace: Fix block-step 2010-03-26 11:33:57 +01:00
tsc_sync.c locking: Convert __raw_spin* functions to arch_spin* 2009-12-14 23:55:32 +01:00
tsc.c Merge branch 'for-next' into for-linus 2010-03-08 16:55:37 +01:00
uv_irq.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
uv_sysfs.c x86: Remove trailing spaces in messages 2010-02-07 17:47:51 +01:00
uv_time.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
verify_cpu_64.S
visws_quirks.c Merge remote branch 'origin/x86/apic' into x86/mrst 2010-02-22 16:25:18 -08:00
vm86_32.c x86, 32-bit: Convert sys_vm86 & sys_vm86old 2009-12-09 16:29:23 -08:00
vmi_32.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
vmiclock_32.c Merge branch 'for-next' into for-linus 2010-03-08 16:55:37 +01:00
vmlinux.lds.S x86: Make smp_locks end with page alignment 2010-03-29 18:42:30 +02:00
vsmp_64.c
vsyscall_64.c x86: Raise vsyscall priority on hotplug notifier chain 2010-03-01 12:35:40 -03:00
x86_init.c Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-07 15:59:39 -08:00
x8664_ksyms_64.c x86-64: Modify copy_user_generic() alternatives mechanism 2009-12-30 11:57:31 +01:00
xsave.c x86, ptrace: regset extensions to support xstate 2010-02-11 15:08:17 -08:00