kernel-ark

Author	SHA1	Message	Date
Aneesh Kumar K.V	2bfd65e45e	powerpc/mm/radix: Add radix callbacks for early init routines This adds routines for early setup for radix. We use device tree property "ibm,processor-radix-AP-encodings" to find supported page sizes. If we don't find the above we consider 64K and 4K as supported page sizes. We do map vmemap using 2M page size if we can. The linear mapping is done such that we use required page size for that range. For example memory of 3.5G is mapped such that we use 1G mapping till 3G range and use 2M mapping for the rest. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:33:00 +10:00
Aneesh Kumar K.V	756d08d1ba	powerpc/mm: Abstract early MMU init in preparation for radix Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:58 +10:00
Aneesh Kumar K.V	6cc1a0ee4c	powerpc/mm/radix: Add radix callback for pmd accessors This only does 64K Linux page support for now. 64K hash Linux config THP needs to differentiate it from hugetlb huge page because with THP we need to track hash pte slot information with respect to each subpage. This is not needed with hugetlb hugepage, because we don't do MPSS with hugetlb. Radix doesn't have any such restrictions. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:57 +10:00
Aneesh Kumar K.V	a9252aaefe	powerpc/mm: Move hugetlb and THP related pmd accessors to pgtable.h Here we create pgtable-64/4k.h and move pmd accessors that are common between hash and radix there. We can't do much sharing with 4K Linux page size because 4K Linux page size with hash config doesn't support THP. So for now it is empty. In later patches we will add functions that does conditional hash/radix accessors there. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:56 +10:00
Aneesh Kumar K.V	ac94ac79dc	powerpc/mm: Add radix callbacks to pte accessors For those pte accessors, that operate on a different set of pte bits between hash/radix, we add a generic variant that does a conditional to hash linux or radix variant. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:55 +10:00
Aneesh Kumar K.V	566ca99af0	powerpc/mm/radix: Add dummy radix_enabled() In this patch we add the radix Kconfig and conditional check. radix_enabled() is written to always return 0 here. Once we have all needed radix changes added, we will update this to an mmu_feature check. We need to add this early so that we can get it all build in the early stage. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:54 +10:00
Aneesh Kumar K.V	b0b5e9b130	powerpc/mm/radix: Add radix pte #defines This adds Power ISA 3.0 specific pte defines. We share most of the details with hash Linux page table format. This patch indicates only things where we differ. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:52 +10:00
Aneesh Kumar K.V	34fbadd8e9	powerpc/mm: Move pte related functions together Only code movement. No functionality change. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:51 +10:00
Aneesh Kumar K.V	aba480e137	powerpc/mm: Move page table index and and vaddr to pgtable.h Now that the page table size is a variable, we can move these to generic pgtable.h. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:50 +10:00
Aneesh Kumar K.V	dd1842a2a4	powerpc/mm: Make page table size a variable Radix and hash MMU models support different page table sizes. Make the #defines a variable so that existing code can work with variable sizes. Slice related code is only used by hash, so use hash constants there. We will replicate some of the boundary conditions with resepct to TASK_SIZE using radix values too. Right now we do boundary condition check using hash constants. Swapper pgdir size is initialized in asm code. We select the max pgd size to keep it simple. For now we select hash pgdir. When adding radix we will switch that to radix pgdir which is 64K. BUILD_BUG_ON check which is removed is already done in hugepage_init() using MAYBE_BUILD_BUG_ON(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:48 +10:00
Aneesh Kumar K.V	13f829a5a1	powerpc/mm: Move pte accessors that operate on common pte bits to pgtable.h These pte functions will remain the same between radix and hash. Move them to pgtable.h. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:47 +10:00
Aneesh Kumar K.V	2e8735198a	powerpc/mm: Move common pte bits and accessors to book3s/64/pgtable.h Now that we have moved book3s hash64 Linux pte bits to match Power ISA 3.0 radix pte bit positions, we move the matching pte bits to a common header. Only code movement in this patch. No functionality change. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:46 +10:00
Aneesh Kumar K.V	d2cf005038	powerpc/mm: Handle _PTE_NONE_MASK I am splitting this as a separate patch to get better review. If ok we should merge this with previous patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:45 +10:00
Aneesh Kumar K.V	945537df7a	powerpc/mm/book3s: Rename hash specific PTE bits to carry H_ prefix This helps to make following hash only pte bits easier. We have kept _PAGE_CHG_MASK, _HPAGE_CHG_MASK and _PAGE_PROT_BITS as it is in this patch eventhough they use hash specific bits. Using them in radix as it is should be ok, because with radix we expect those bit positions to be zero. Only renames in this patch, no change in functionality. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:43 +10:00
Aneesh Kumar K.V	eee24b5aaf	powerpc/mm: Move hash and no hash code to separate files This patch reduces the number of #ifdefs in C code and will also help in adding radix changes later. Only code movement in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Propagate copyrights and update GPL text] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:42 +10:00
Aneesh Kumar K.V	50de596de8	powerpc/mm/hash: Add support for Power9 Hash PowerISA 3.0 adds a parition table indexed by LPID. Parition table allows us to specify the MMU model that will be used for guest and host translation. This patch adds support with SLB based hash model (UPRT = 0). What is required with this model is to support the new hash page table entry format and also setup partition table such that we use hash table for address translation. We don't have segment table support yet. In order to make sure we don't load KVM module on Power9 (since we don't have kvm support yet) this patch also disables KVM on Power9. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:40 +10:00
Aneesh Kumar K.V	e99833448c	powerpc/mm/radix: Add partition table format & callback Add structs and #defines related to the radix MMU partition table format. We also add a ppc_md callback for updating a partition table entry. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:39 +10:00
Aneesh Kumar K.V	11a6f6abd7	powerpc/mm: Move radix/hash common data structures to book3s64 headers Start moving code that is generic between radix and hash to book3s64 specific headers from the book3s64 hash specific one. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:37 +10:00
Aneesh Kumar K.V	33d336d986	powerpc/mm: Use generic version of ptep_clear_flush_young() The radix variant is going to require a flush_tlb_range(). With flush_tlb_range() added, ptep_clear_flush_young() is the same as the generic version. So drop the powerpc specific variant. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:36 +10:00
Aneesh Kumar K.V	ff844b741e	powerpc/mm: Use generic version of pmdp_clear_flush_young() The radix variant is going to require a flush_pmd_tlb_range(). With flush_pmd_tlb_range() added, pmdp_clear_flush_young() is the same as the generic version. So drop the powerpc specific variant. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:35 +10:00
Aneesh Kumar K.V	30bda41aba	powerpc/mm: Drop WIMG in favour of new constants PowerISA 3.0 introduces two pte bits with the below meaning for radix: 00 -> Normal Memory 01 -> Strong Access Order (SAO) 10 -> Non idempotent I/O (Cache inhibited and guarded) 11 -> Tolerant I/O (Cache inhibited) We drop the existing WIMG bits in the Linux page table in favour of the above constants. We loose _PAGE_WRITETHRU with this conversion. We only use writethru via pgprot_cached_wthru() which is used by fbdev/controlfb.c which is Apple control display and also PPC32. With respect to _PAGE_COHERENCE, we have been marking hpte always coherent for some time now. htab_convert_pte_flags() always added HPTE_R_M. NOTE: KVM changes need closer review. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:33 +10:00
Aneesh Kumar K.V	72176dd0ad	powerpc/mm: Use a helper for finding pte bits mapping I/O area Use a helper instead of open coding with constants. A later patch will drop the WIMG bits and use PowerISA 3.0 defines. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:32 +10:00
Aneesh Kumar K.V	e58e87adc8	powerpc/mm: Update _PAGE_KERNEL_RO PS3 had used a PPP bit hack to implement a read only mapping in the kernel area. Since we are bolting the ioremap area, it used the pte flags _PAGE_PRESENT \| _PAGE_USER to get a PPP value of 0x3 there by resulting in a read only mapping. This means the area can be accessed by user space, but kernel will never return such an address to user space. But we can do better by implementing a read only kernel mapping using PPP bits 0b110. This also allows us to do read only kernel mapping for radix in later patches. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:30 +10:00
Aneesh Kumar K.V	96270b1fc2	powerpc/mm: Remove RPN_SHIFT and RPN_SIZE PTE_RPN_SHIFT is actually page size dependent. Even though PowerISA 3.0 expects only the lower 12 bits to be zero, we will always find the pages to be PAGE_SHIFT aligned. In case of hash config, this also allows us to use the additional 3 bits to track pte specific information. We need to make sure we use these bits only for hash specific pte flags. For both 4K and 64K config, pte now can hold 57 bits address. Inorder to keep things simpler, drop PTE_RPN_SHIFT and PTE_RPN_SIZE and specify the 57 bit detail explicitly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:29 +10:00
Aneesh Kumar K.V	ac29c64089	powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED _PAGE_PRIVILEGED means the page can be accessed only by the kernel. This is done to keep pte bits similar to PowerISA 3.0 Radix PTE format. User pages are now marked by clearing _PAGE_PRIVILEGED bit. Previously we allowed the kernel to have a privileged page in the lower address range (USER_REGION). With this patch such access is denied. We also prevent a kernel access to a non-privileged page in higher address range (ie, REGION_ID != 0). Both the above access scenarios should never happen. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:26 +10:00
Aneesh Kumar K.V	e7bfc462d3	powerpc/mm: Use pte_user() instead of open coding We have a common declaration in pte-common.h Add a book3s specific one and switch to pte_user() in callchain.c. In a subsequent patch we will switch _PAGE_USER to _PAGE_PRIVILEGED in the book3s version only. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:25 +10:00
Michael Ellerman	7e1e63c5e9	powerpc/mm: Convert pte_user() to static inline In a subsequent patch we want to add a second definition of pte_user(). Before we do that, make the signature clear, ie. it takes a pte_t and returns bool. We move it up inside the existing #ifndef __ASSEMBLY__ block, but otherwise it's a straight conversion. Convert the call in settlbcam(), which passes an unsigned long, to pass a pte_t. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:24 +10:00
Aneesh Kumar K.V	73a1441a9b	powerpc/mm/subpage: Clear RWX bit to indicate no access Subpage protection used to depend on the _PAGE_USER bit to implement no access mode. This patch switches that to use _PAGE_RWX. We clear Read, Write and Execute access from the pte instead of clearing _PAGE_USER now. This was done so that we can switch to _PAGE_PRIVILEGED in a later patch. subpage_protection() returns pte bits that need to be cleared. Instead of updating the interface to handle no-access in a separate way, it appears simpler to clear RWX acecss to indicate no access. We still don't insert hash ptes for no access implied by !_PAGE_RWX. Hence we should not get PROT_FAULT with change. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:23 +10:00
Aneesh Kumar K.V	c7d54842de	powerpc/mm: Use _PAGE_READ to indicate Read access This splits the _PAGE_RW bit into _PAGE_READ and _PAGE_WRITE. It also removes the dependency on _PAGE_USER for implying read only. Few things to note here is that, we have read implied with write and execute permission. Hence we should always find _PAGE_READ set on hash pte fault. We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa depends on marking a prot none pte _PAGE_WRITE. (For more details look at `b191f9b106` "mm: numa: preserve PTE write permissions across a NUMA hinting fault") Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:21 +10:00
Michael Ellerman	ee3caed37d	powerpc/mm: Use pte_raw() in pte_same()/pmd_same() We can avoid doing endian conversions by using pte_raw() in pxx_same(). The swap of the constant (_PAGE_HPTEFLAGS) should be done at compile time by the compiler. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:19 +10:00
Aneesh Kumar K.V	5dc1ef858c	powerpc/mm: Use big endian Linux page tables for book3s 64 Traditionally Power server machines have used the Hashed Page Table MMU mode. In this mode Linux manages its own tree of nested page tables, aka. "the Linux page tables", which are not used by the hardware directly, and software loads translations into the hash page table for use by the hardware. Power ISA 3.0 defines a new MMU mode, known as Radix Tree Translation, where the hardware can directly operate on the Linux page tables. However the hardware requires that the page tables be in big endian format. To accommodate this, switch the pgtable types to __be64 and add appropriate endian conversions. Because we will be supporting a single kernel binary that boots using either radix or hash mode, we always store the Linux page tables big endian, even in hash mode where they are not actually used by the hardware. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fix sparse errors, flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:18 +10:00
Michael Ellerman	3910a7f485	powerpc/mm: Add pte_xchg() helper We have five locations in 64-bit hash MMU code that do a cmpxchg() of a PTE. Currently doing it inline OK, but in a future patch we will be converting the PTEs to __be64 in some configs. In that case we will need casts at every cmpxchg() site in order to keep sparse happy. So move the logic into a helper, this is a reasonably nice cleanup on its own. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:16 +10:00
Aneesh Kumar K.V	4bece39b50	powerpc/mm: Drop PTE_ATOMIC_UPDATES from pmd_hugepage_update() pmd_hugepage_update() is inside #ifdef CONFIG_TRANSPARENT_HUGEPAGE. THP can only be enabled if PPC_BOOK3S_64=y && PPC_64K_PAGES=y, aka. hash64. On hash64 we always define PTE_ATOMIC_UPDATES to 1, meaning the #ifdef in pmd_hugepage_update() is unnecessary, so drop it. That is also the only use of PTE_ATOMIC_UPDATES in any of the hash code, meaning we no longer need to #define it at all in the hash headers. Note it's still #defined and used in the nohash code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:15 +10:00
Michael Ellerman	670eea9241	powerpc/mm: Always use STRICT_MM_TYPECHECKS Testing done by Paul Mackerras has shown that with a modern compiler there is no negative effect on code generation from enabling STRICT_MM_TYPECHECKS. So remove the option, and always use the strict type definitions. Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:14 +10:00
Aneesh Kumar K.V	8ffb4103f5	IB/qib: Use cache inhibitted and guarded mapping on powerpc The driver was requesting for a writethrough mapping. But with those flags we will end up with an SAO mapping because we now have memory conherence always enabled. ie, the existing mapping will end up with a WIMG value 0b1110 which is Strong Access Order. Update this to use cache inhibitted guarded mapping. Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:32:13 +10:00
Madhavan Srinivasan	5bcca743cb	powerpc/perf: Replace raw event hex values with #defines Minor cleanup patch to replace the raw event hex values in power8-pmu.c with #defines. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-27 16:27:34 +10:00
Thiago Jung Bauermann	7132e2d669	ftrace: Match dot symbols when searching functions on ppc64 In the ppc64 big endian ABI, function symbols point to function descriptors. The symbols which point to the function entry points have a dot in front of the function name. Consequently, when the ftrace filter mechanism searches for the symbol corresponding to an entry point address, it gets the dot symbol. As a result, ftrace filter users have to be aware of this ABI detail on ppc64 and prepend a dot to the function name when setting the filter. The perf probe command insulates the user from this by ignoring the dot in front of the symbol name when matching function names to symbols, but the sysfs interface does not. This patch makes the ftrace filter mechanism do the same when searching symbols. Fixes the following failure in ftracetest's kprobe_ftrace.tc: .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument That failure is on this line of kprobe_ftrace.tc: echo _do_fork > set_ftrace_filter This is because there's no _do_fork entry in the functions list: # cat available_filter_functions \| grep _do_fork ._do_fork This change introduces no regressions on the perf and ftracetest testsuite results. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-27 09:47:29 +10:00
Daniel Axtens	8fe088850f	powerpc: rework sparse for lib/xor_vmx.c Sparse doesn't seem to be passing -maltivec around properly, leading to lots of errors: .../include/altivec.h:34:2: error: Use the "-maltivec" flag to enable PowerPC AltiVec support arch/powerpc/lib/xor_vmx.c:27:16: error: Expected ; at end of declaration arch/powerpc/lib/xor_vmx.c:27:16: error: got signed arch/powerpc/lib/xor_vmx.c:60:9: error: No right hand side of '*'-expression arch/powerpc/lib/xor_vmx.c:60:9: error: Expected ; at end of statement arch/powerpc/lib/xor_vmx.c:60:9: error: got v1_in ... arch/powerpc/lib/xor_vmx.c:87:9: error: too many errors Only include the altivec.h header for non-__CHECKER__ builds. For builds with __CHECKER__, make up some stubs instead, as suggested by Balbir. (The vector size of 16 is arbitrary.) Suggested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-27 09:33:37 +10:00
Chris Smart	8a649045e7	powerpc: Add support for userspace P9 copy paste The copy paste facility introduced in POWER9 provides an optimised mechanism for a userspace application to copy a cacheline. This is provided by a pair of instructions, copy and paste, while a third, cp_abort (copy paste abort), provides a clean up of the state in case of a failure. The copy instruction will read a 128 byte cacheline and store it in an internal buffer. The subsequent paste instruction will store this internal buffer to memory and set a CR field if the paste succeeds. Since the state of the copy paste buffer is internal (and not architecturally visible), in the unlikely event of a context switch, the state cannot be stored and the paste should therefore fail. The cp_abort instruction exists to fail and clean up any such interrupted copy paste sequence and is to be called by the kernel as part of the context switch. Doing so prevents data from a preceding copy in one process leaking into the paste of another. This code enables use of the cp_abort instruction if a supported processor is detected. NOTE: this is for userspace only, not in kernel, and does not deal with KVM guests. Patch created with much assistance from Michael Neuling <mikey@neuling.org> Signed-off-by: Chris Smart <chris@distroguy.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-27 09:28:07 +10:00
Andrew Donnellan	4ad5e8831e	powerpc/mpic: handle subsys_system_register() failure mpic_init_sys() currently doesn't check whether subsys_system_register() succeeded or not. Check the return code of subsys_system_register() and clean up if there's an error. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-27 09:23:41 +10:00
Andrew Donnellan	2d5217840f	powerpc/eeh: fix misleading indentation Found by smatch. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-27 09:19:37 +10:00
Aneesh Kumar K.V	3b1dbfa14f	cxl: Fix DAR check & use REGION_ID instead of opencoding The current code will set _PAGE_USER to the access flags for any fault address, because the ~ operation will be true for all address we take a fault on. But setting _PAGE_USER also means that the fault will be handled only if the page table have _PAGE_USER set. Hence there is no security hole with the current code. Now if it is an user space access, then the change in this patch really don't have an impact because we have (!ctx->kernel) set true and we take the if condition true. Now kernel context created fault on an address in the kernel range will result in a fault loop because we will not insert the hash pte due to access and pte permission mismatch. This patch fix the above issue. Fixes: `f204e0b8ce` ("cxl: Driver code for powernv PCIe based cards for userspace access") Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-26 21:06:36 +10:00
Frederic Barrat	4aec6ec0da	cxl: Increase timeout for detection of AFU mmio hang PSL designers recommend a larger value for the mmio hang pulse, 256 us instead of 1 us. The CAIA architecture states that it needs to be smaller than 1/2 of the RTOS timeout set in the PHB for outbound non-posted transactions, which is still (easily) the case here. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Tested-by: Frank Haverkamp <haver@linux.vnet.ibm.com> Tested-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-22 21:45:50 +10:00
Frederic Barrat	e009a7e858	cxl: Allow initialization on timebase sync failures Failure to synchronize the PSL timebase currently prevents the initialization of the cxl card, thus rendering the card useless. This is too extreme for a feature which is rarely used, if at all. No hardware AFUs or software is currently using PSL timebase. This patch still tries to synchronize the PSL timebase when the card is initialized, but ignores the error if it can't. Instead, it reports a status via /sys. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-22 21:45:44 +10:00
Madhavan Srinivasan	bb62bad623	tool/perf: Add sample_reg_mask to include all perf_regs Add sample_reg_mask array with pt_regs registers. This is needed for printing supported regs ( -I? option). Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-21 23:33:00 +10:00
Anju T	dc642e8388	tools/perf: Map the ID values with register names Map ID values with corresponding register names. These names are then displayed when user issues perf record with the -I option followed by perf report/script with -D option. To test this patchset, Eg: $ perf record -I ls # record machine state at interrupt $ perf script -D # read the perf.data file Sample output obtained for this patch / output looks like as follows: 496768515470 0x1988 [0x188]: PERF_RECORD_SAMPLE(IP, 0x1): 4522/4522: 0xc0000000001e538c period: 1 addr: 0 ... intr regs: mask 0x7ffffffffff ABI 64-bit .... r0 0xc0000000001e5e34 .... r1 0xc000000fe733f9a0 .... r2 0xc000000001523100 .... r3 0xc000000ffaadeb60 .... r4 0xc000000003456800 .... r5 0x73a9b5e000 .... r6 0x1e000000 .... r7 0x0 .... r8 0x0 .... r9 0x0 .... r10 0x1 .... r11 0x0 .... r12 0x24022822 .... r13 0xc00000000feec180 .... r14 0x0 .... r15 0xc000001e4be18800 .... r16 0x0 .... r17 0xc000000ffaac5000 .... r18 0xc000000fe733f8a0 .... r19 0xc000000001523100 .... r20 0xc00000000009fd1c .... r21 0xc000000fcaa69000 .... r22 0xc0000000001e4968 .... r23 0xc000000001523100 .... r24 0xc000000fe733f850 .... r25 0xc000000fcaa69000 .... r26 0xc000000003b8fcf0 .... r27 0xfffffffffffffead .... r28 0x0 .... r29 0xc000000fcaa69000 .... r30 0x1 .... r31 0x0 .... nip 0xc0000000001dd320 .... msr 0x9000000000009032 .... orig_r3 0xc0000000001e538c .... ctr 0xc00000000009d550 .... link 0xc0000000001e5e34 .... xer 0x0 .... ccr 0x84022882 .... softe 0x0 .... trap 0xf01 .... dar 0x0 .... dsisr 0xf00040060000004 ... thread: :4522:4522 ...... dso: /root/.debug/.build-id/b0/ef11b1a1629e62ac9de75199117ee5ef9469e9 :4522 4522 496.768515: 1 cycles: c0000000001e538c .perf_event_context_sched_in (/boot/vmlinux) Signed-off-by: Anju T <anju@linux.vnet.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-21 23:33:00 +10:00
Anju T	ed4a4ef85c	powerpc/perf: Add support for sampling interrupt register state The perf infrastructure uses a bit mask to find out valid registers to display. Define a register mask for supported registers defined in uapi/asm/perf_regs.h. The bit positions also correspond to register IDs which is used by perf infrastructure to fetch the register values. CONFIG_HAVE_PERF_REGS enables sampling of the interrupted machine state. Signed-off-by: Anju T <anju@linux.vnet.ibm.com> [mpe: Add license, use CONFIG_PPC64, fix 32-bit build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-21 23:32:59 +10:00
Anju T	1bfadabfeb	powerpc/perf: Assign an id to each powerpc register The enum definition assigns an 'id' to each register in "struct pt_regs" of arch/powerpc. The order of these values in the enum definition are based on the order of members in pt_regs. Signed-off-by: Anju T <anju@linux.vnet.ibm.com> [mpe: Rename LNK to LINK, use _UAPI_ASM for include guards] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-21 23:32:59 +10:00
Hari Bathini	057b6d7e62	powerpc/book3s64: Remove __end_handlers marker The __end_handlers marker was intended to mark down upto code that gets called from exception prologs. But that hasn't kept pace with code changes. Case in point, slb_miss_realmode being called from exception prolog code but isn't below __end_handlers marker. So, __end_handlers marker is as good as a comment but could be misleading at times if it isn't in sync with the code, as is the case now. So, let us avoid this confusion by having a better comment and removing __end_handlers marker altogether. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-21 23:32:58 +10:00
Hari Bathini	8ed8ab4004	powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel Some of the interrupt vectors on 64-bit POWER server processors are only 32 bytes long (8 instructions), which is not enough for the full first-level interrupt handler. For these we need to branch to an out-of-line (OOL) handler. But when we are running a relocatable kernel, interrupt vectors till __end_interrupts marker are copied down to real address 0x100. So, branching to labels (ie. OOL handlers) outside this section must be handled differently (see LOAD_HANDLER()), considering relocatable kernel, which would need at least 4 instructions. However, branching from interrupt vector means that we corrupt the CFAR (come-from address register) on POWER7 and later processors as mentioned in commit `1707dd16`. So, EXCEPTION_PROLOG_0 (6 instructions) that contains the part up to the point where the CFAR is saved in the PACA should be part of the short interrupt vectors before we branch out to OOL handlers. But as mentioned already, there are interrupt vectors on 64-bit POWER server processors that are only 32 bytes long (like vectors 0x4f00, 0x4f20, etc.), which cannot accomodate the above two cases at the same time owing to space constraint. Currently, in these interrupt vectors, we simply branch out to OOL handlers, without using LOAD_HANDLER(), which leaves us vulnerable when running a relocatable kernel (eg. kdump case). While this has been the case for sometime now and kdump is used widely, we were fortunate not to see any problems so far, for three reasons: 1. In almost all cases, production kernel (relocatable) is used for kdump as well, which would mean that crashed kernel's OOL handler would be at the same place where we end up branching to, from short interrupt vector of kdump kernel. 2. Also, OOL handler was unlikely the reason for crash in almost all the kdump scenarios, which meant we had a sane OOL handler from crashed kernel that we branched to. 3. On most 64-bit POWER server processors, page size is large enough that marking interrupt vector code as executable (see commit `429d2e83`) leads to marking OOL handler code from crashed kernel, that sits right below interrupt vector code from kdump kernel, as executable as well. Let us fix this by moving the __end_interrupts marker down past OOL handlers to make sure that we also copy OOL handlers to real address 0x100 when running a relocatable kernel. This fix has been tested successfully in kdump scenario, on an LPAR with 4K page size by using different default/production kernel and kdump kernel. Also tested by manually corrupting the OOL handlers in the first kernel and then kdump'ing, and then causing the OOL handlers to fire - mpe. Fixes: `c1fb6816fb` ("powerpc: Add relocation on exception vector handlers") Cc: stable@vger.kernel.org Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-21 23:32:44 +10:00

1 2 3 4 5 ...

589138 Commits