Commit Graph

114 Commits

Author SHA1 Message Date
Martin Schwidefsky
e370e47694 s390: fix floating pointer register corruption (again)
There is a tricky interaction between the machine check handler
and the critical sections of load_fpu_regs and save_fpu_regs
functions. If the machine check interrupts one of the two
functions the critical section cleanup will complete the function
before the machine check handler s390_do_machine_check is called.
Trouble is that the machine check handler needs to validate the
floating point registers *before* and not *after* the completion
of load_fpu_regs/save_fpu_regs.

The simplest solution is to rewind the PSW to the start of the
load_fpu_regs/save_fpu_regs and retry the function after the
return from the machine check handler.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org> # 4.3+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-10 14:35:42 +01:00
Christian Borntraeger
b1685ab9bd s390/cpumf: Improve guest detection heuristics
commit e22cf8ca6f ("s390/cpumf: rework program parameter setting
to detect guest samples") requires guest changes to get proper
guest/host. We can do better: We can use the primary asn value,
which is set on all Linux variants to compare this with the host
pp value.
We now have the following cases:
1. Guest using PP
host sample:  gpp == 0, asn == hpp --> host
guest sample: gpp != 0 --> guest
2. Guest not using PP
host sample:  gpp == 0, asn == hpp --> host
guest sample: gpp == 0, asn != hpp --> guest

As soon as the host no longer sets CR4, we must back out
this heuristics - let's add a comment in switch_to.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:28 -06:00
Martin Schwidefsky
419123f900 s390/spinlock: do not yield to a CPU in udelay/mdelay
It does not make sense to try to relinquish the time slice with diag 0x9c
to a CPU in a state that does not allow to schedule the CPU. The scenario
where this can happen is a CPU waiting in udelay/mdelay while holding a
spin-lock.

Add a CIF bit to tag a CPU in enabled wait and use it to detect that the
yield of a CPU will not be successful and skip the diagnose call.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:18 +01:00
Heiko Carstens
db7e007fd6 s390/udelay: make udelay have busy loop semantics
When using systemtap it was observed that our udelay implementation is
rather suboptimal if being called from a kprobe handler installed by
systemtap.

The problem observed when a kprobe was installed on lock_acquired().
When the probe was hit the kprobe handler did call udelay, which set
up an (internal) timer and reenabled interrupts (only the clock comparator
interrupt) and waited for the interrupt.
This is an optimization to avoid that the cpu is busy looping while waiting
that enough time passes. The problem is that the interrupt handler still
does call irq_enter()/irq_exit() which then again can lead to a deadlock,
since some accounting functions may take locks as well.

If one of these locks is the same, which caused lock_acquired() to be
called, we have a nice deadlock.

This patch reworks the udelay code for the interrupts disabled case to
immediately leave the low level interrupt handler when the clock
comparator interrupt happens. That way no C code is being called and the
deadlock cannot happen anymore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:13 +02:00
Christian Borntraeger
e22cf8ca6f s390/cpumf: rework program parameter setting to detect guest samples
The program parameter can be used to mark hardware samples with
some token.  Previously, it was used to mark guest samples only.

Improve the program parameter doubleword by combining two parts,
the leftmost LPP part and the rightmost PID part.  Set the PID
part for processes by using the task PID.
To distinguish host and guest samples for the kernel (PID part
is zero), the guest must always set the program paramater to a
non-zero value.  Use the leftmost bit in the LPP part of the
program parameter to be able to detect guest kernel samples.

[brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced
__LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts.
And updated the commit message accordingly.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:12 +02:00
Hendrik Brueckner
83abeffbd5 s390/entry: add assembler macro to conveniently tests under mask
Various functions in entry.S perform test-under-mask instructions
to test for particular bits in memory.  Because test-under-mask uses
a mask value of one byte, the mask value and the offset into the
memory must be calculated manually.  This easily introduces errors
and is hard to review and read.

Introduce the TSTMSK assembler macro to specify a mask constant and
let the macro calculate the offset and the byte mask to generate a
test-under-mask instruction.  The benefit is that existing symbolic
constants can now be used for tests.  Also the macro checks for
zero mask values and mask values that consist of multiple bytes.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:09 +02:00
Hendrik Brueckner
0ac277790e s390/fpu: add static FPU save area for init_task
Previously, the init task did not have an allocated FPU save area and
saving an FPU state was not possible.  Now if the vector extension is
always enabled, provide a static FPU save area to save FPU states of
vector instructions that can be executed quite early.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:08 +02:00
Hendrik Brueckner
b5510d9b68 s390/fpu: always enable the vector facility if it is available
If the kernel detects that the s390 hardware supports the vector
facility, it is enabled by default at an early stage.  To force
it off, use the novx kernel parameter.  Note that there is a small
time window, where the vector facility is enabled before it is
forced to be off.

With enabling the vector facility by default, the FPU save and
restore functions can be improved.  They do not longer require
to manage expensive control register updates to enable or disable
the vector enablement control for particular processes.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:08 +02:00
Martin Schwidefsky
72d38b1978 s390/vtime: correct scaled cputime of partially idle CPUs
The calculation for the SMT scaling factor for a hardware thread
which has been partially idle needs to disregard the cycles spent
by the other threads of the core while the thread is idle.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-30 16:22:38 +02:00
Heiko Carstens
9380cf5a88 s390: fix floating point register corruption
The critical section cleanup code misses to add the offset of the
thread_struct to the task address.
Therefore, if the critical section code gets executed, it may corrupt
the task struct or restore the contents of the floating point registers
from the wrong memory location.
Fixes d0164ee20d "s390/kernel: remove save_fpu_regs() parameter and use
__LC_CURRENT instead".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:41 +02:00
Christian Borntraeger
888d5e9804 KVM: s390: use pid of cpu thread for sampling tagging
Right now we use the address of the sie control block as tag for
the sampling data. This is hard to get for users. Let's just use
the PID of the cpu thread to mark the hardware samples.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-08-03 10:04:59 +02:00
Hendrik Brueckner
d0164ee20d s390/kernel: remove save_fpu_regs() parameter and use __LC_CURRENT instead
All calls to save_fpu_regs() specify the fpu structure of the current task
pointer as parameter.  The task pointer of the current task can also be
retrieved from the CPU lowcore directly.  Remove the parameter definition,
load the __LC_CURRENT task pointer from the CPU lowcore, and rebase the FPU
structure onto the task structure.  Apply the same approach for the
load_fpu_regs() function.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-08-03 10:04:37 +02:00
Martin Schwidefsky
2acb94f431 s390/nmi: use the normal asynchronous stack for machine checks
If a machine checks is received while the CPU is in the kernel, only
the s390_do_machine_check function will be called. The call to
s390_handle_mcck is postponed until the CPU returns to user space.
Because of this it is safe to use the asynchronous stack for machine
checks even if the CPU is already handling an interrupt.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:04 +02:00
Martin Schwidefsky
a359bb1190 s390/kernel: squeeze a few more cycles out of the system call handler
Reorder the instructions of UPDATE_VTIME to improve superscalar execution,
remove duplicate checks for problem-state from the asynchronous interrupt
handlers, and move the check for problem-state from the synchronous
exit path to the program check path as it is only needed for program
checks inside the kernel.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:04 +02:00
Martin Schwidefsky
d0fc41071a s390/kvm: integrate HANDLE_SIE_INTERCEPT into cleanup_critical
Currently there are two mechanisms to deal with cleanup work due to
interrupts. The HANDLE_SIE_INTERCEPT macro is used to undo the changes
required to enter SIE in sie64a. If the SIE instruction causes a program
check, or an asynchronous interrupt is received the HANDLE_SIE_INTERCEPT
code forwards the program execution to sie_exit.

All the other critical sections in entry.S are handled by the code in
cleanup_critical that is called by the SWITCH_ASYNC macro.

Move the sie64a function to the beginning of the critical section and
add the code from HANDLE_SIE_INTERCEPT to cleanup_critical. Add a special
case for the sie64a cleanup to the program check handler.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:03 +02:00
Martin Schwidefsky
dcd2a9aaa0 s390/kvm: fix interrupt race with HANDLE_SIE_INTERCEPT
The HANDLE_SIE_INTERCEPT macro is used in the interrupt handlers
and the program check handler to undo a few changes done by sie64a.
Among them are guest vs host LPP, the gmap ASCE vs kernel ASCE and
the bit that indicates that SIE is currently running on the CPU.

There is a race of a voluntary SIE exit vs asynchronous interrupts.
If the CPU completed the SIE instruction and the TM instruction of
the LPP macro at the time it receives an interrupt, the interrupt
handler will run while the LPP, the ASCE and the SIE bit are still
set up for guest execution. This might result in wrong sampling data,
but it will not cause data corruption or lockups.

The critical section in sie64a needs to be enlarged to include all
instructions that undo the changes required for guest execution.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:03 +02:00
Hendrik Brueckner
9977e886cb s390/kernel: lazy restore fpu registers
Improve the save and restore behavior of FPU register contents to use the
vector extension within the kernel.

The kernel does not use floating-point or vector registers and, therefore,
saving and restoring the FPU register contents are performed for handling
signals or switching processes only.  To prepare for using vector
instructions and vector registers within the kernel, enhance the save
behavior and implement a lazy restore at return to user space from a
system call or interrupt.

To implement the lazy restore, the save_fpu_regs() sets a CPU information
flag, CIF_FPU, to indicate that the FPU registers must be restored.
Saving and setting CIF_FPU is performed in an atomic fashion to be
interrupt-safe.  When the kernel wants to use the vector extension or
wants to change the FPU register state for a task during signal handling,
the save_fpu_regs() must be called first.  The CIF_FPU flag is also set at
process switch.  At return to user space, the FPU state is restored.  In
particular, the FPU state includes the floating-point or vector register
contents, as well as, vector-enablement and floating-point control.  The
FPU state restore and clearing CIF_FPU is also performed in an atomic
fashion.

For KVM, the restore of the FPU register state is performed when restoring
the general-purpose guest registers before the SIE instructions is started.
Because the path towards the SIE instruction is interruptible, the CIF_FPU
flag must be checked again right before going into SIE.  If set, the guest
registers must be reloaded again by re-entering the outer SIE loop.  This
is the same behavior as if the SIE critical section is interrupted.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:01 +02:00
Martin Schwidefsky
3827ec3d8f s390: adapt entry.S to the move of thread_struct
git commit 0c8c0f03e3
"x86/fpu, sched: Dynamically allocate 'struct fpu'"
moved the thread_struct to the end of the task_struct.

This causes some of the offsets used in entry.S to overflow their
instruction operand field. To fix this  use aghi to create a
dedicated pointer for the thread_struct.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-20 13:22:18 +02:00
Christian Borntraeger
8e23654687 KVM: s390: make exit_sie_sync more robust
exit_sie_sync is used to kick CPUs out of SIE and prevent reentering at
any point in time. This is used to reload the prefix pages and to
set the IBS stuff in a way that guarantees that after this function
returns we are no longer in SIE. All current users trigger KVM requests.

The request must be set before we block the CPUs to avoid races. Let's
make this implicit by adding the request into a new function
kvm_s390_sync_requests that replaces exit_sie_sync and split out
s390_vcpu_block and s390_vcpu_unblock, that can be used to keep
CPUs out of SIE independent of requests.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-05-08 15:51:14 +02:00
Heiko Carstens
a876cb3f6b s390: remove 31 bit syscalls
Remove the 31 bit syscalls from the syscall table. This is a separate patch
just in case I screwed something up so it can be easily reverted.
However the conversion was done with a script, so everything should be ok.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:35 +01:00
Heiko Carstens
4bfc86ce94 s390: remove "64" suffix from a couple of files
Rename a couple of files to get rid of the "64" suffix.
"git blame" will still work.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:34 +01:00
Heiko Carstens
5a79859ae0 s390: remove 31 bit support
Remove the 31 bit support in order to reduce maintenance cost and
effectively remove dead code. Since a couple of years there is no
distribution left that comes with a 31 bit kernel.

The 31 bit kernel also has been broken since more than a year before
anybody noticed. In addition I added a removal warning to the kernel
shown at ipl for 5 minutes: a960062e58 ("s390: add 31 bit warning
message") which let everybody know about the plan to remove 31 bit
code. We didn't get any response.

Given that the last 31 bit only machine was introduced in 1999 let's
remove the code.
Anybody with 31 bit user space code can still use the compat mode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:33 +01:00
Martin Schwidefsky
86ed42f401 s390: use local symbol names in entry[64].S
To improve the output of the perf tool hide most of the symbols
from entry[64].S by using the '.L' prefix.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:38 +01:00
Linus Torvalds
b05d59dfce At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM.  Changes include:
 
 - a lot of s390 changes: optimizations, support for migration,
   GDB support and more
 
 - ARM changes are pretty small: support for the PSCI 0.2 hypercall
   interface on both the guest and the host (the latter acked by Catalin)
 
 - initial POWER8 and little-endian host support
 
 - support for running u-boot on embedded POWER targets
 
 - pretty large changes to MIPS too, completing the userspace interface
   and improving the handling of virtualized timer hardware
 
 - for x86, a larger set of changes is scheduled for 3.17.  Still,
   we have a few emulator bugfixes and support for running nested
   fully-virtualized Xen guests (para-virtualized Xen guests have
   always worked).  And some optimizations too.
 
 The only missing architecture here is ia64.  It's not a coincidence
 that support for KVM on ia64 is scheduled for removal in 3.17.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX
 BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1
 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8
 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ
 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH
 CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA
 nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB
 TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m
 ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og
 VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl
 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz
 bAknaZpPiOrW11Et1htY
 =j5Od
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next

Pull KVM updates from Paolo Bonzini:
 "At over 200 commits, covering almost all supported architectures, this
  was a pretty active cycle for KVM.  Changes include:

   - a lot of s390 changes: optimizations, support for migration, GDB
     support and more

   - ARM changes are pretty small: support for the PSCI 0.2 hypercall
     interface on both the guest and the host (the latter acked by
     Catalin)

   - initial POWER8 and little-endian host support

   - support for running u-boot on embedded POWER targets

   - pretty large changes to MIPS too, completing the userspace
     interface and improving the handling of virtualized timer hardware

   - for x86, a larger set of changes is scheduled for 3.17.  Still, we
     have a few emulator bugfixes and support for running nested
     fully-virtualized Xen guests (para-virtualized Xen guests have
     always worked).  And some optimizations too.

  The only missing architecture here is ia64.  It's not a coincidence
  that support for KVM on ia64 is scheduled for removal in 3.17"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
  KVM: add missing cleanup_srcu_struct
  KVM: PPC: Book3S PR: Rework SLB switching code
  KVM: PPC: Book3S PR: Use SLB entry 0
  KVM: PPC: Book3S HV: Fix machine check delivery to guest
  KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
  KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
  KVM: PPC: Book3S HV: Fix dirty map for hugepages
  KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
  KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
  KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
  KVM: PPC: Book3S: Add ONE_REG register names that were missed
  KVM: PPC: Add CAP to indicate hcall fixes
  KVM: PPC: MPIC: Reset IRQ source private members
  KVM: PPC: Graciously fail broken LE hypercalls
  PPC: ePAPR: Fix hypercall on LE guest
  KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
  KVM: PPC: BOOK3S: Always use the saved DAR value
  PPC: KVM: Make NX bit available with magic page
  KVM: PPC: Disable NX for old magic page using guests
  KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
  ...
2014-06-04 08:47:12 -07:00
Martin Schwidefsky
d3a73acbc2 s390: split TIF bits into CIF, PIF and TIF bits
The oi and ni instructions used in entry[64].S to set and clear bits
in the thread-flags are not guaranteed to be atomic in regard to other
CPUs. Split the TIF bits into CPU, pt_regs and thread-info specific
bits. Updates on the TIF bits are done with atomic instructions,
updates on CPU and pt_regs bits are done with non-atomic instructions.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:47 +02:00
Martin Schwidefsky
beef560b4c s390/uaccess: simplify control register updates
Always switch to the kernel ASCE in switch_mm. Load the secondary
space ASCE in finish_arch_post_lock_switch after checking that
any pending page table operations have completed. The primary
ASCE is loaded in entry[64].S. With this the update_primary_asce
call can be removed from the switch_to macro and from the start
of switch_mm function. Remove the load_primary argument from
update_user_asce/clear_user_asce, rename update_user_asce to
set_user_asce and rename update_primary_asce to load_kernel_asce.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:46 +02:00
Jens Freimann
21ee7ffd17 s390: rename and split lowcore field per_perc_atmid
per_perc_atmid is currently a two-byte field that combines two
fields, the PER code and the PER Addressing-and-Translation-Mode
Identification (ATMID)

Let's make them accessible indepently and also rename per_cause to
per_code.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:48 +02:00
Heiko Carstens
457f218095 s390/uaccess: rework uaccess code - fix locking issues
The current uaccess code uses a page table walk in some circumstances,
e.g. in case of the in atomic futex operations or if running on old
hardware which doesn't support the mvcos instruction.

However it turned out that the page table walk code does not correctly
lock page tables when accessing page table entries.
In other words: a different cpu may invalidate a page table entry while
the current cpu inspects the pte. This may lead to random data corruption.

Adding correct locking however isn't trivial for all uaccess operations.
Especially copy_in_user() is problematic since that requires to hold at
least two locks, but must be protected against ABBA deadlock when a
different cpu also performs a copy_in_user() operation.

So the solution is a different approach where we change address spaces:

User space runs in primary address mode, or access register mode within
vdso code, like it currently already does.

The kernel usually also runs in home space mode, however when accessing
user space the kernel switches to primary or secondary address mode if
the mvcos instruction is not available or if a compare-and-swap (futex)
instruction on a user space address is performed.
KVM however is special, since that requires the kernel to run in home
address space while implicitly accessing user space with the sie
instruction.

So we end up with:

User space:
- runs in primary or access register mode
- cr1 contains the user asce
- cr7 contains the user asce
- cr13 contains the kernel asce

Kernel space:
- runs in home space mode
- cr1 contains the user or kernel asce
  -> the kernel asce is loaded when a uaccess requires primary or
     secondary address mode
- cr7 contains the user or kernel asce, (changed with set_fs())
- cr13 contains the kernel asce

In case of uaccess the kernel changes to:
- primary space mode in case of a uaccess (copy_to_user) and uses
  e.g. the mvcp instruction to access user space. However the kernel
  will stay in home space mode if the mvcos instruction is available
- secondary space mode in case of futex atomic operations, so that the
  instructions come from primary address space and data from secondary
  space

In case of kvm the kernel runs in home space mode, but cr1 gets switched
to contain the gmap asce before the sie instruction gets executed. When
the sie instruction is finished cr1 will be switched back to contain the
user asce.

A context switch between two processes will always load the kernel asce
for the next process in cr1. So the first exit to user space is a bit
more expensive (one extra load control register instruction) than before,
however keeps the code rather simple.

In sum this means there is no need to perform any error prone page table
walks anymore when accessing user space.

The patch seems to be rather large, however it mainly removes the
the page table walk code and restores the previously deleted "standard"
uaccess code, with a couple of changes.

The uaccess without mvcos mode can be enforced with the "uaccess_primary"
kernel parameter.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03 14:31:04 +02:00
Martin Schwidefsky
53e857f308 s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries
Git commit 050eef364a "[S390] fix tlb flushing vs. concurrent
/proc accesses" introduced the attach counter to avoid using the
mm_users value to decide between IPTE for every PTE and lazy TLB
flushing with IDTE. That fixed the problem with mm_users but it
introduced another subtle race, fortunately one that is very hard
to hit.
The background is the requirement of the architecture that a valid
PTE may not be changed while it can be used concurrently by another
cpu. The decision between IPTE and lazy TLB flushing needs to be
done while the PTE is still valid. Now if the virtual cpu is
temporarily stopped after the decision to use lazy TLB flushing but
before the invalid bit of the PTE has been set, another cpu can attach
the mm, find that flush_mm is set, do the IDTE, return to userspace,
and recreate a TLB that uses the PTE in question. When the first,
stopped cpu continues it will change the PTE while it is attached on
another cpu. The first cpu will do another IDTE shortly after the
modification of the PTE which makes the race window quite short.

To fix this race the CPU that wants to attach the address space of a
user space thread needs to wait for the end of the PTE modification.
The number of concurrent TLB flushers for an mm is tracked in the
upper 16 bits of the attach_count and finish_arch_post_lock_switch
is used to wait for the end of the flush operation if required.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:18 +01:00
Martin Schwidefsky
dbbfe487e5 s390: fix system call restart after inferior call
Git commit 616498813b "s390: system call path micro optimization"
introduced a regression in regard to system call restarting and inferior
function calls via the ptrace interface. The pointer to the system call
table needs to be loaded in sysc_sigpending if do_signal returns with
TIF_SYSCALl set after it restored a system call context.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-30 13:04:40 +02:00
Martin Schwidefsky
0587d409ec s390/time: return with irqs disabled from psw_idle
Modify the psw_idle waiting logic in entry[64].S to return with
interrupts disabled. This avoids potential issues with udelay
and interrupt loops as interrupts are not reenabled after
clock comparator interrupts.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-28 09:19:23 +02:00
Martin Schwidefsky
1f44a22577 s390: convert interrupt handling to use generic hardirq
With the introduction of PCI it became apparent that s390 should
convert to generic hardirqs as too many drivers do not have the
correct dependency for GENERIC_HARDIRQS. On the architecture
level s390 does not have irq lines. It has external interrupts,
I/O interrupts and adapter interrupts. This patch hard-codes all
external interrupts as irq #1, all I/O interrupts as irq #2 and
all adapter interrupts as irq #3. The additional information from
the lowcore associated with the interrupt is stored in the
pt_regs of the interrupt frame, where the interrupt handler can
pick it up. For PCI/MSI interrupts the adapter interrupt handler
scans the relevant bit fields and calls generic_handle_irq with
the virtual irq number for the MSI interrupt.

Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22 12:20:04 +02:00
Martin Schwidefsky
48f6b00c6e s390/irq: store interrupt information in pt_regs
Copy the interrupt parameters from the lowcore to the pt_regs structure
in entry[64].S and reduce the arguments of the low level interrupt handler
to the pt_regs pointer only. In addition move the test-pending-interrupt
loop from do_IRQ to entry[64].S to make sure that interrupt information
is always delivered via pt_regs.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-26 21:10:23 +02:00
Martin Schwidefsky
616498813b s390: system call path micro optimization
Add a pointer to the system call table to the thread_info structure.
The TIF_31BIT bit is set or cleared by SET_PERSONALITY exactly once
for the lifetime of a process. With the pointer to the correct system
call table in thread_info the system call code in entry64.S path can
drop the check for TIF_31BIT which saves a couple of instructions.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26 09:07:05 +02:00
Martin Schwidefsky
dc7ee00d47 s390: lowcore stack pointer offsets
Store the stack pointers in the lowcore for the kernel stack, the async
stack and the panic stack with the offset required for the first user.
This avoids an unnecessary add instruction on the system call path.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-26 09:07:01 +02:00
Martin Schwidefsky
6551fbdfd8 s390: critical section cleanup vs. machine checks
The current machine check code uses the registers stored by the machine
in the lowcore at __LC_GPREGS_SAVE_AREA as the registers of the interrupted
context. The registers 0-7 of a user process can get clobbered if a machine
checks interrupts the execution of a critical section in entry[64].S.

The reason is that the critical section cleanup code may need to modify
the PSW and the registers for the previous context to get to the end of a
critical section. If registers 0-7 have to be replaced the relevant copy
will be in the registers, which invalidates the copy in the lowcore. The
machine check handler needs to explicitly store registers 0-7 to the stack.

Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-03-05 10:21:35 +01:00
Linus Torvalds
c7708fac5a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 update from Martin Schwidefsky:
 "Add support to generate code for the latest machine zEC12, MOD and XOR
  instruction support for the BPF jit compiler, the dasd safe offline
  feature and the big one: the s390 architecture gets PCI support!!
  Right before the world ends on the 21st ;-)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
  s390/qdio: rename the misleading PCI flag of qdio devices
  s390/pci: remove obsolete email addresses
  s390/pci: speed up __iowrite64_copy by using pci store block insn
  s390/pci: enable NEED_DMA_MAP_STATE
  s390/pci: no msleep in potential IRQ context
  s390/pci: fix potential NULL pointer dereference in dma_free_seg_table()
  s390/pci: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  s390/bpf,jit: add support for XOR instruction
  s390/bpf,jit: add support MOD instruction
  s390/cio: fix pgid reserved check
  vga: compile fix, disable vga for s390
  s390/pci: add PCI Kconfig options
  s390/pci: s390 specific PCI sysfs attributes
  s390/pci: PCI hotplug support via SCLP
  s390/pci: CHSC PCI support for error and availability events
  s390/pci: DMA support
  s390/pci: PCI adapter interrupts for MSI/MSI-X
  s390/bitops: find leftmost bit instruction support
  s390/pci: CLP interface
  s390/pci: base support
  ...
2012-12-13 14:20:19 -08:00
Martin Schwidefsky
39efd4ec9a s390/ptrace: race of single stepping vs signal delivery
The current single step code is racy in regard to concurrent delivery
of signals. If a signal is delivered after a PER program check occurred
but before the TIF_PER_TRAP bit has been checked in entry[64].S the code
clears TIF_PER_TRAP and then calls do_signal. This is wrong, if the
instruction completed (or has been suppressed) a SIGTRAP should be
delivered to the debugger in any case. Only if the instruction has been
nullified the SIGTRAP may not be send.

The new logic always sets TIF_PER_TRAP if the program check indicates PER
tracing but removes it again for all program checks that are nullifying.
The effect is that for each change in the PSW address we now get a
single SIGTRAP.

Reported-by: Andreas Arnez <arnez@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-23 11:14:33 +01:00
Al Viro
30dcb0996e s390: switch to saner kernel_execve() semantics
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-29 10:54:37 -04:00
Al Viro
f322220d61 s390: convert to generic kernel_execve()
same situation as with alpha and arm - only massage needed

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-30 23:03:04 -04:00
Al Viro
37fe5d41f6 s390: fold kernel_thread_helper() into ret_from_fork()
... and don't bother with syscall return path in case of kernel
threads.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-30 23:03:03 -04:00
Al Viro
65f22a906e s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-30 23:03:02 -04:00
Martin Schwidefsky
27f6b41662 s390/vtimer: rework virtual timer interface
The current virtual timer interface is inherently per-cpu and hard to
use. The sole user of the interface is appldata which uses it to execute
a function after a specific amount of cputime has been used over all cpus.

Rework the virtual timer interface to hook into the cputime accounting.
This makes the interface independent from the CPU timer interrupts, and
makes the virtual timers global as opposed to per-cpu.
Overall the code is greatly simplified. The downside is that the accuracy
is not as good as the original implementation, but it is still good enough
for appldata.

Reviewed-by: Jan Glauber <jang@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-20 11:15:08 +02:00
Heiko Carstens
a53c8fab3f s390/comments: unify copyright messages and remove file names
Remove the file name from the comment at top of many files. In most
cases the file name was wrong anyway, so it's rather pointless.

Also unify the IBM copyright statement. We did have a lot of sightly
different statements and wanted to change them one after another
whenever a file gets touched. However that never happened. Instead
people start to take the old/"wrong" statements to use as a template
for new files.
So unify all of them in one go.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-07-20 11:15:04 +02:00
Heiko Carstens
fbe765680d s390/smp: make absolute lowcore / cpu restart parameter accesses more robust
Setting the cpu restart parameters is done in three different fashions:
- directly setting the four parameters individually
- copying the four parameters with memcpy (using 4 * sizeof(long))
- copying the four parameters using a private structure

In addition code in entry*.S relies on a certain order of the restart
members of struct _lowcore.

Make all of this more robust to future changes by adding a
mem_absolute_assign(dest, val) define, which assigns val to dest
using absolute addressing mode. Also the load multiple instructions
in entry*.S have been split into separate load instruction so the
order of the struct _lowcore members doesn't matter anymore.

In addition move the prototypes of memcpy_real/absolute from uaccess.h
to processor.h. These memcpy* variants are not related to uaccess at all.
string.h doesn't seem to match as well, so lets use processor.h.

Also replace the eight byte array in struct _lowcore which represents a
misaliged u64 with a u64. The compiler will always create code that
handles the misaligned u64 correctly.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-06-14 09:09:02 +02:00
Heiko Carstens
eb546195a7 s390/sigp: use sigp order code defines in assembly code
Use sigp order code defines in assembly code as well.
With this change all places that use sigp constants should
have been converted to use self describing defines instead
of directly using constants.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-06-05 13:23:36 +02:00
Martin Schwidefsky
eda0c6d6b0 s390: fix race on TIF_MCCK_PENDING
There is a small race window in the __switch_to code in regard to
the transfer of the TIF_MCCK_PENDING bit from the previous to the
next task. The bit is transferred before the task struct pointer
and the thread-info pointer for the next task has been stored to
lowcore. If a machine check sets the TIF_MCCK_PENDING bit between
the transfer code and the store of current/thread_info the bit
is still set for the previous task. And if the previous task has
terminated it can get lost. The effect is that a pending CRW is
not retrieved until the next machine checks sets TIF_MCCK_PENDING.
To fix this reorder __switch_to to first store the task struct
and thread-info pointer and then do the transfer of the bit.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:46 +02:00
Martin Schwidefsky
4c1051e37a [S390] rework idle code
Whenever the cpu loads an enabled wait PSW it will appear as idle to the
underlying host system. The code in default_idle calls vtime_stop_cpu
which does the necessary voodoo to get the cpu time accounting right.
The udelay code just loads an enabled wait PSW. To correct this rework
the vtime_stop_cpu/vtime_start_cpu logic and move the difficult parts
to entry[64].S, vtime_stop_cpu can now be called from anywhere and
vtime_start_cpu is gone. The correction of the cpu time during wakeup
from an enabled wait PSW is done with a critical section in entry[64].S.
As vtime_start_cpu is gone, s390_idle_check can be removed as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-11 11:59:28 -04:00
Martin Schwidefsky
8b646bd759 [S390] rework smp code
Define struct pcpu and merge some of the NR_CPUS arrays into it, including
__cpu_logical_map, current_set and smp_cpu_state. Split smp related
functions to those operating on physical cpus and the functions operating
on a logical cpu number. Make the functions for physical cpus use a
pointer to a struct pcpu. This hides the knowledge about cpu addresses in
smp.c, entry[64].S and swsusp_asm64.S, thus remove the sigp.h header.

The PSW restart mechanism is used to start secondary cpus, calling a
function on an online cpu, calling a function on the ipl cpu, and for
the nmi signal. Replace the different assembler functions with a
single function restart_int_handler. The new entry point calls a function
whose pointer is stored in the lowcore of the target cpu and it can wait
for the source cpu to stop. This covers all existing use cases.

Overall the code is now simpler and there are ~380 lines less code.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-11 11:59:28 -04:00
Martin Schwidefsky
7e180bd802 [S390] rename lowcore field
The 16 bit value at the lowcore location with offset 0x84 is the
cpu address that is associated with an external interrupt. Rename
the field from cpu_addr to ext_cpu_addr to make that clear.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-11 11:59:27 -04:00