229 lines
12 KiB
Diff
229 lines
12 KiB
Diff
|
|
||
|
Delivered-To: jwboyer@gmail.com
|
||
|
Received: by 10.229.175.203 with SMTP id bb11csp66243qcb;
|
||
|
Fri, 8 Jun 2012 15:08:27 -0700 (PDT)
|
||
|
Received: by 10.68.222.133 with SMTP id qm5mr23412736pbc.113.1339193307132;
|
||
|
Fri, 08 Jun 2012 15:08:27 -0700 (PDT)
|
||
|
Return-Path: <stable-owner@vger.kernel.org>
|
||
|
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
|
||
|
by mx.google.com with ESMTP id ku9si12482578pbc.355.2012.06.08.15.08.24;
|
||
|
Fri, 08 Jun 2012 15:08:25 -0700 (PDT)
|
||
|
Received-SPF: pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
|
||
|
Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mail=stable-owner@vger.kernel.org
|
||
|
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
|
||
|
id S964992Ab2FHWIW (ORCPT <rfc822;bigsmallbd@gmail.com> + 21 others);
|
||
|
Fri, 8 Jun 2012 18:08:22 -0400
|
||
|
Received: from mail-bk0-f74.google.com ([209.85.214.74]:41783 "EHLO
|
||
|
mail-bk0-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
|
||
|
with ESMTP id S964922Ab2FHWIV (ORCPT
|
||
|
<rfc822;stable@vger.kernel.org>); Fri, 8 Jun 2012 18:08:21 -0400
|
||
|
Received: by bkty5 with SMTP id y5so128736bkt.1
|
||
|
for <stable@vger.kernel.org>; Fri, 08 Jun 2012 15:08:20 -0700 (PDT)
|
||
|
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
|
||
|
d=google.com; s=20120113;
|
||
|
h=subject:to:cc:from:date:message-id:x-gm-message-state;
|
||
|
bh=RSdNZSZcXg/enKaYIM+JR4+Bd890ieO+blY9bsk9giI=;
|
||
|
b=NwTZEmRSdqDAiTV/EW91GXpM/yrRd7CNzfPif0JcF0iFgxGAo4lB7W1I05vmrnPcCQ
|
||
|
Va+P6xXLWle2rAVQLsPooKdtb3u2wnNRDEGvBPZl2alje+qzhKGlQcVgnI5+KCM6GaS+
|
||
|
YWoE+2gv5UFmF6JlelThyecGTyZ0D93K5aVYewSxg0H7KZ6BgvMnB/qJKFdScatv1uDH
|
||
|
g39MFwJzmD+DmNMn149jeUWYOLLTeMZJkymtJCLgxS8eJzQxXA0nes2Wz/pXCBdxXF2z
|
||
|
mft6LyzKtoEUDeTtalgm9zxkT4XJ+6bsAMEXBFgkcyNq0Ic8P79AP0ynlET2L/Ql3ARP
|
||
|
C5Sg==
|
||
|
Received: by 10.14.101.2 with SMTP id a2mr2823176eeg.6.1339193299969;
|
||
|
Fri, 08 Jun 2012 15:08:19 -0700 (PDT)
|
||
|
Received: from hpza10.eem.corp.google.com ([74.125.121.33])
|
||
|
by gmr-mx.google.com with ESMTPS id d52si7345113eei.1.2012.06.08.15.08.19
|
||
|
(version=TLSv1/SSLv3 cipher=AES128-SHA);
|
||
|
Fri, 08 Jun 2012 15:08:19 -0700 (PDT)
|
||
|
Received: from akpm.mtv.corp.google.com (akpm.mtv.corp.google.com [172.18.96.75])
|
||
|
by hpza10.eem.corp.google.com (Postfix) with ESMTP id 9D09620004E;
|
||
|
Fri, 8 Jun 2012 15:08:19 -0700 (PDT)
|
||
|
Received: from localhost.localdomain (localhost [127.0.0.1])
|
||
|
by akpm.mtv.corp.google.com (Postfix) with ESMTP id D5FACA0329;
|
||
|
Fri, 8 Jun 2012 15:08:18 -0700 (PDT)
|
||
|
Subject: + thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch added to -mm tree
|
||
|
To: mm-commits@vger.kernel.org
|
||
|
Cc: aarcange@redhat.com, hughd@google.com, jbeulich@suse.com,
|
||
|
jrnieder@gmail.com, kosaki.motohiro@gmail.com, lwoodman@redhat.com,
|
||
|
mgorman@suse.de, pmatouse@redhat.com, riel@redhat.com,
|
||
|
stable@vger.kernel.org, uobergfe@redhat.com
|
||
|
From: akpm@linux-foundation.org
|
||
|
Date: Fri, 08 Jun 2012 15:08:18 -0700
|
||
|
Message-Id: <20120608220818.D5FACA0329@akpm.mtv.corp.google.com>
|
||
|
X-Gm-Message-State: ALoCoQnqC0C+2OVVfC5Yi43jUu5vH03b/RBncPoI4SpE4HFSgaRrM+gM2J8rR6MMoba3nM/OmDAU
|
||
|
Sender: stable-owner@vger.kernel.org
|
||
|
Precedence: bulk
|
||
|
List-ID: <stable.vger.kernel.org>
|
||
|
X-Mailing-List: stable@vger.kernel.org
|
||
|
|
||
|
|
||
|
The patch titled
|
||
|
Subject: thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
|
||
|
has been added to the -mm tree. Its filename is
|
||
|
thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch
|
||
|
|
||
|
Before you just go and hit "reply", please:
|
||
|
a) Consider who else should be cc'ed
|
||
|
b) Prefer to cc a suitable mailing list as well
|
||
|
c) Ideally: find the original patch on the mailing list and do a
|
||
|
reply-to-all to that, adding suitable additional cc's
|
||
|
|
||
|
*** Remember to use Documentation/SubmitChecklist when testing your code ***
|
||
|
|
||
|
The -mm tree is included into linux-next and is updated
|
||
|
there every 3-4 working days
|
||
|
|
||
|
------------------------------------------------------
|
||
|
From: Andrea Arcangeli <aarcange@redhat.com>
|
||
|
Subject: thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
|
||
|
|
||
|
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
|
||
|
mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
|
||
|
Xen.
|
||
|
|
||
|
So instead of dealing only with "consistent" pmdvals in
|
||
|
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
|
||
|
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
|
||
|
where the low 32bit and high 32bit could be inconsistent (to avoid having
|
||
|
to use cmpxchg8b).
|
||
|
|
||
|
The only guarantee we get from pmd_read_atomic is that if the low part of
|
||
|
the pmd was found null, the high part will be null too (so the pmd will be
|
||
|
considered unstable). And if the low part of the pmd is found "stable"
|
||
|
later, then it means the whole pmd was read atomically (because after a
|
||
|
pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
|
||
|
and we read the high part after the low part).
|
||
|
|
||
|
In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
|
||
|
atomically to declare the pmd as "stable" and that's true for THP and no
|
||
|
THP, furthermore in the THP case we also have a barrier() that will
|
||
|
prevent any inconsistent pmdvals to be cached by a later re-read of the
|
||
|
*pmd.
|
||
|
|
||
|
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
|
||
|
Cc: Jonathan Nieder <jrnieder@gmail.com>
|
||
|
Cc: Ulrich Obergfell <uobergfe@redhat.com>
|
||
|
Cc: Mel Gorman <mgorman@suse.de>
|
||
|
Cc: Hugh Dickins <hughd@google.com>
|
||
|
Cc: Larry Woodman <lwoodman@redhat.com>
|
||
|
Cc: Petr Matousek <pmatouse@redhat.com>
|
||
|
Cc: Rik van Riel <riel@redhat.com>
|
||
|
Cc: Jan Beulich <jbeulich@suse.com>
|
||
|
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
|
||
|
Cc: <stable@vger.kernel.org>
|
||
|
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||
|
---
|
||
|
|
||
|
arch/x86/include/asm/pgtable-3level.h | 30 +++++++++++++-----------
|
||
|
include/asm-generic/pgtable.h | 10 ++++++++
|
||
|
2 files changed, 27 insertions(+), 13 deletions(-)
|
||
|
|
||
|
diff -puN arch/x86/include/asm/pgtable-3level.h~thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae arch/x86/include/asm/pgtable-3level.h
|
||
|
--- a/arch/x86/include/asm/pgtable-3level.h~thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae
|
||
|
+++ a/arch/x86/include/asm/pgtable-3level.h
|
||
|
@@ -47,16 +47,26 @@ static inline void native_set_pte(pte_t
|
||
|
* they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
|
||
|
* operations.
|
||
|
*
|
||
|
- * Without THP if the mmap_sem is hold for reading, the
|
||
|
- * pmd can only transition from null to not null while pmd_read_atomic runs.
|
||
|
- * So there's no need of literally reading it atomically.
|
||
|
+ * Without THP if the mmap_sem is hold for reading, the pmd can only
|
||
|
+ * transition from null to not null while pmd_read_atomic runs. So
|
||
|
+ * we can always return atomic pmd values with this function.
|
||
|
*
|
||
|
* With THP if the mmap_sem is hold for reading, the pmd can become
|
||
|
- * THP or null or point to a pte (and in turn become "stable") at any
|
||
|
- * time under pmd_read_atomic, so it's mandatory to read it atomically
|
||
|
- * with cmpxchg8b.
|
||
|
+ * trans_huge or none or point to a pte (and in turn become "stable")
|
||
|
+ * at any time under pmd_read_atomic. We could read it really
|
||
|
+ * atomically here with a atomic64_read for the THP enabled case (and
|
||
|
+ * it would be a whole lot simpler), but to avoid using cmpxchg8b we
|
||
|
+ * only return an atomic pmdval if the low part of the pmdval is later
|
||
|
+ * found stable (i.e. pointing to a pte). And we're returning a none
|
||
|
+ * pmdval if the low part of the pmd is none. In some cases the high
|
||
|
+ * and low part of the pmdval returned may not be consistent if THP is
|
||
|
+ * enabled (the low part may point to previously mapped hugepage,
|
||
|
+ * while the high part may point to a more recently mapped hugepage),
|
||
|
+ * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part
|
||
|
+ * of the pmd to be read atomically to decide if the pmd is unstable
|
||
|
+ * or not, with the only exception of when the low part of the pmd is
|
||
|
+ * zero in which case we return a none pmd.
|
||
|
*/
|
||
|
-#ifndef CONFIG_TRANSPARENT_HUGEPAGE
|
||
|
static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
|
||
|
{
|
||
|
pmdval_t ret;
|
||
|
@@ -74,12 +84,6 @@ static inline pmd_t pmd_read_atomic(pmd_
|
||
|
|
||
|
return (pmd_t) { ret };
|
||
|
}
|
||
|
-#else /* CONFIG_TRANSPARENT_HUGEPAGE */
|
||
|
-static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
|
||
|
-{
|
||
|
- return (pmd_t) { atomic64_read((atomic64_t *)pmdp) };
|
||
|
-}
|
||
|
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
|
||
|
|
||
|
static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
|
||
|
{
|
||
|
diff -puN include/asm-generic/pgtable.h~thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae include/asm-generic/pgtable.h
|
||
|
--- a/include/asm-generic/pgtable.h~thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae
|
||
|
+++ a/include/asm-generic/pgtable.h
|
||
|
@@ -484,6 +484,16 @@ static inline int pmd_none_or_trans_huge
|
||
|
/*
|
||
|
* The barrier will stabilize the pmdval in a register or on
|
||
|
* the stack so that it will stop changing under the code.
|
||
|
+ *
|
||
|
+ * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE,
|
||
|
+ * pmd_read_atomic is allowed to return a not atomic pmdval
|
||
|
+ * (for example pointing to an hugepage that has never been
|
||
|
+ * mapped in the pmd). The below checks will only care about
|
||
|
+ * the low part of the pmd with 32bit PAE x86 anyway, with the
|
||
|
+ * exception of pmd_none(). So the important thing is that if
|
||
|
+ * the low part of the pmd is found null, the high part will
|
||
|
+ * be also null or the pmd_none() check below would be
|
||
|
+ * confused.
|
||
|
*/
|
||
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||
|
barrier();
|
||
|
_
|
||
|
Subject: Subject: thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
|
||
|
|
||
|
Patches currently in -mm which might be from aarcange@redhat.com are
|
||
|
|
||
|
origin.patch
|
||
|
linux-next.patch
|
||
|
mm-fix-slab-page-_count-corruption-when-using-slub.patch
|
||
|
thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch
|
||
|
hugetlb-rename-max_hstate-to-hugetlb_max_hstate.patch
|
||
|
hugetlbfs-dont-use-err_ptr-with-vm_fault-values.patch
|
||
|
hugetlbfs-add-an-inline-helper-for-finding-hstate-index.patch
|
||
|
hugetlbfs-add-an-inline-helper-for-finding-hstate-index-fix.patch
|
||
|
hugetlb-use-mmu_gather-instead-of-a-temporary-linked-list-for-accumulating-pages.patch
|
||
|
hugetlb-use-mmu_gather-instead-of-a-temporary-linked-list-for-accumulating-pages-fix.patch
|
||
|
hugetlb-use-mmu_gather-instead-of-a-temporary-linked-list-for-accumulating-pages-fix-fix.patch
|
||
|
hugetlb-avoid-taking-i_mmap_mutex-in-unmap_single_vma-for-hugetlb.patch
|
||
|
hugetlb-simplify-migrate_huge_page.patch
|
||
|
hugetlb-simplify-migrate_huge_page-fix.patch
|
||
|
memcg-add-hugetlb-extension.patch
|
||
|
memcg-add-hugetlb-extension-fix.patch
|
||
|
memcg-add-hugetlb-extension-fix-fix.patch
|
||
|
hugetlb-add-charge-uncharge-calls-for-hugetlb-alloc-free.patch
|
||
|
memcg-track-resource-index-in-cftype-private.patch
|
||
|
hugetlbfs-add-memcg-control-files-for-hugetlbfs.patch
|
||
|
hugetlbfs-add-memcg-control-files-for-hugetlbfs-use-scnprintf-instead-of-sprintf.patch
|
||
|
hugetlbfs-add-memcg-control-files-for-hugetlbfs-use-scnprintf-instead-of-sprintf-fix.patch
|
||
|
hugetlbfs-add-a-list-for-tracking-in-use-hugetlb-pages.patch
|
||
|
memcg-move-hugetlb-resource-count-to-parent-cgroup-on-memcg-removal.patch
|
||
|
memcg-move-hugetlb-resource-count-to-parent-cgroup-on-memcg-removal-fix.patch
|
||
|
memcg-move-hugetlb-resource-count-to-parent-cgroup-on-memcg-removal-fix-fix.patch
|
||
|
hugetlb-migrate-memcg-info-from-oldpage-to-new-page-during-migration.patch
|
||
|
memcg-add-memory-controller-documentation-for-hugetlb-management.patch
|
||
|
|
||
|
--
|
||
|
To unsubscribe from this list: send the line "unsubscribe stable" in
|
||
|
the body of a message to majordomo@vger.kernel.org
|
||
|
More majordomo info at http://vger.kernel.org/majordomo-info.html
|