kernel/thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-PAE.patch
2012-06-11 08:41:46 -04:00

229 lines
12 KiB
Diff

Delivered-To: jwboyer@gmail.com
Received: by 10.229.175.203 with SMTP id bb11csp66243qcb;
Fri, 8 Jun 2012 15:08:27 -0700 (PDT)
Received: by 10.68.222.133 with SMTP id qm5mr23412736pbc.113.1339193307132;
Fri, 08 Jun 2012 15:08:27 -0700 (PDT)
Return-Path: <stable-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
by mx.google.com with ESMTP id ku9si12482578pbc.355.2012.06.08.15.08.24;
Fri, 08 Jun 2012 15:08:25 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mail=stable-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id S964992Ab2FHWIW (ORCPT <rfc822;bigsmallbd@gmail.com> + 21 others);
Fri, 8 Jun 2012 18:08:22 -0400
Received: from mail-bk0-f74.google.com ([209.85.214.74]:41783 "EHLO
mail-bk0-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
with ESMTP id S964922Ab2FHWIV (ORCPT
<rfc822;stable@vger.kernel.org>); Fri, 8 Jun 2012 18:08:21 -0400
Received: by bkty5 with SMTP id y5so128736bkt.1
for <stable@vger.kernel.org>; Fri, 08 Jun 2012 15:08:20 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=google.com; s=20120113;
h=subject:to:cc:from:date:message-id:x-gm-message-state;
bh=RSdNZSZcXg/enKaYIM+JR4+Bd890ieO+blY9bsk9giI=;
b=NwTZEmRSdqDAiTV/EW91GXpM/yrRd7CNzfPif0JcF0iFgxGAo4lB7W1I05vmrnPcCQ
Va+P6xXLWle2rAVQLsPooKdtb3u2wnNRDEGvBPZl2alje+qzhKGlQcVgnI5+KCM6GaS+
YWoE+2gv5UFmF6JlelThyecGTyZ0D93K5aVYewSxg0H7KZ6BgvMnB/qJKFdScatv1uDH
g39MFwJzmD+DmNMn149jeUWYOLLTeMZJkymtJCLgxS8eJzQxXA0nes2Wz/pXCBdxXF2z
mft6LyzKtoEUDeTtalgm9zxkT4XJ+6bsAMEXBFgkcyNq0Ic8P79AP0ynlET2L/Ql3ARP
C5Sg==
Received: by 10.14.101.2 with SMTP id a2mr2823176eeg.6.1339193299969;
Fri, 08 Jun 2012 15:08:19 -0700 (PDT)
Received: from hpza10.eem.corp.google.com ([74.125.121.33])
by gmr-mx.google.com with ESMTPS id d52si7345113eei.1.2012.06.08.15.08.19
(version=TLSv1/SSLv3 cipher=AES128-SHA);
Fri, 08 Jun 2012 15:08:19 -0700 (PDT)
Received: from akpm.mtv.corp.google.com (akpm.mtv.corp.google.com [172.18.96.75])
by hpza10.eem.corp.google.com (Postfix) with ESMTP id 9D09620004E;
Fri, 8 Jun 2012 15:08:19 -0700 (PDT)
Received: from localhost.localdomain (localhost [127.0.0.1])
by akpm.mtv.corp.google.com (Postfix) with ESMTP id D5FACA0329;
Fri, 8 Jun 2012 15:08:18 -0700 (PDT)
Subject: + thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch added to -mm tree
To: mm-commits@vger.kernel.org
Cc: aarcange@redhat.com, hughd@google.com, jbeulich@suse.com,
jrnieder@gmail.com, kosaki.motohiro@gmail.com, lwoodman@redhat.com,
mgorman@suse.de, pmatouse@redhat.com, riel@redhat.com,
stable@vger.kernel.org, uobergfe@redhat.com
From: akpm@linux-foundation.org
Date: Fri, 08 Jun 2012 15:08:18 -0700
Message-Id: <20120608220818.D5FACA0329@akpm.mtv.corp.google.com>
X-Gm-Message-State: ALoCoQnqC0C+2OVVfC5Yi43jUu5vH03b/RBncPoI4SpE4HFSgaRrM+gM2J8rR6MMoba3nM/OmDAU
Sender: stable-owner@vger.kernel.org
Precedence: bulk
List-ID: <stable.vger.kernel.org>
X-Mailing-List: stable@vger.kernel.org
The patch titled
Subject: thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
has been added to the -mm tree. Its filename is
thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Andrea Arcangeli <aarcange@redhat.com>
Subject: thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
Xen.
So instead of dealing only with "consistent" pmdvals in
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
where the low 32bit and high 32bit could be inconsistent (to avoid having
to use cmpxchg8b).
The only guarantee we get from pmd_read_atomic is that if the low part of
the pmd was found null, the high part will be null too (so the pmd will be
considered unstable). And if the low part of the pmd is found "stable"
later, then it means the whole pmd was read atomically (because after a
pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
and we read the high part after the low part).
In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
atomically to declare the pmd as "stable" and that's true for THP and no
THP, furthermore in the THP case we also have a barrier() that will
prevent any inconsistent pmdvals to be cached by a later re-read of the
*pmd.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/x86/include/asm/pgtable-3level.h | 30 +++++++++++++-----------
include/asm-generic/pgtable.h | 10 ++++++++
2 files changed, 27 insertions(+), 13 deletions(-)
diff -puN arch/x86/include/asm/pgtable-3level.h~thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae arch/x86/include/asm/pgtable-3level.h
--- a/arch/x86/include/asm/pgtable-3level.h~thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae
+++ a/arch/x86/include/asm/pgtable-3level.h
@@ -47,16 +47,26 @@ static inline void native_set_pte(pte_t
* they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
* operations.
*
- * Without THP if the mmap_sem is hold for reading, the
- * pmd can only transition from null to not null while pmd_read_atomic runs.
- * So there's no need of literally reading it atomically.
+ * Without THP if the mmap_sem is hold for reading, the pmd can only
+ * transition from null to not null while pmd_read_atomic runs. So
+ * we can always return atomic pmd values with this function.
*
* With THP if the mmap_sem is hold for reading, the pmd can become
- * THP or null or point to a pte (and in turn become "stable") at any
- * time under pmd_read_atomic, so it's mandatory to read it atomically
- * with cmpxchg8b.
+ * trans_huge or none or point to a pte (and in turn become "stable")
+ * at any time under pmd_read_atomic. We could read it really
+ * atomically here with a atomic64_read for the THP enabled case (and
+ * it would be a whole lot simpler), but to avoid using cmpxchg8b we
+ * only return an atomic pmdval if the low part of the pmdval is later
+ * found stable (i.e. pointing to a pte). And we're returning a none
+ * pmdval if the low part of the pmd is none. In some cases the high
+ * and low part of the pmdval returned may not be consistent if THP is
+ * enabled (the low part may point to previously mapped hugepage,
+ * while the high part may point to a more recently mapped hugepage),
+ * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part
+ * of the pmd to be read atomically to decide if the pmd is unstable
+ * or not, with the only exception of when the low part of the pmd is
+ * zero in which case we return a none pmd.
*/
-#ifndef CONFIG_TRANSPARENT_HUGEPAGE
static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
{
pmdval_t ret;
@@ -74,12 +84,6 @@ static inline pmd_t pmd_read_atomic(pmd_
return (pmd_t) { ret };
}
-#else /* CONFIG_TRANSPARENT_HUGEPAGE */
-static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
-{
- return (pmd_t) { atomic64_read((atomic64_t *)pmdp) };
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
{
diff -puN include/asm-generic/pgtable.h~thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h~thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae
+++ a/include/asm-generic/pgtable.h
@@ -484,6 +484,16 @@ static inline int pmd_none_or_trans_huge
/*
* The barrier will stabilize the pmdval in a register or on
* the stack so that it will stop changing under the code.
+ *
+ * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE,
+ * pmd_read_atomic is allowed to return a not atomic pmdval
+ * (for example pointing to an hugepage that has never been
+ * mapped in the pmd). The below checks will only care about
+ * the low part of the pmd with 32bit PAE x86 anyway, with the
+ * exception of pmd_none(). So the important thing is that if
+ * the low part of the pmd is found null, the high part will
+ * be also null or the pmd_none() check below would be
+ * confused.
*/
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
barrier();
_
Subject: Subject: thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
Patches currently in -mm which might be from aarcange@redhat.com are
origin.patch
linux-next.patch
mm-fix-slab-page-_count-corruption-when-using-slub.patch
thp-avoid-atomic64_read-in-pmd_read_atomic-for-32bit-pae.patch
hugetlb-rename-max_hstate-to-hugetlb_max_hstate.patch
hugetlbfs-dont-use-err_ptr-with-vm_fault-values.patch
hugetlbfs-add-an-inline-helper-for-finding-hstate-index.patch
hugetlbfs-add-an-inline-helper-for-finding-hstate-index-fix.patch
hugetlb-use-mmu_gather-instead-of-a-temporary-linked-list-for-accumulating-pages.patch
hugetlb-use-mmu_gather-instead-of-a-temporary-linked-list-for-accumulating-pages-fix.patch
hugetlb-use-mmu_gather-instead-of-a-temporary-linked-list-for-accumulating-pages-fix-fix.patch
hugetlb-avoid-taking-i_mmap_mutex-in-unmap_single_vma-for-hugetlb.patch
hugetlb-simplify-migrate_huge_page.patch
hugetlb-simplify-migrate_huge_page-fix.patch
memcg-add-hugetlb-extension.patch
memcg-add-hugetlb-extension-fix.patch
memcg-add-hugetlb-extension-fix-fix.patch
hugetlb-add-charge-uncharge-calls-for-hugetlb-alloc-free.patch
memcg-track-resource-index-in-cftype-private.patch
hugetlbfs-add-memcg-control-files-for-hugetlbfs.patch
hugetlbfs-add-memcg-control-files-for-hugetlbfs-use-scnprintf-instead-of-sprintf.patch
hugetlbfs-add-memcg-control-files-for-hugetlbfs-use-scnprintf-instead-of-sprintf-fix.patch
hugetlbfs-add-a-list-for-tracking-in-use-hugetlb-pages.patch
memcg-move-hugetlb-resource-count-to-parent-cgroup-on-memcg-removal.patch
memcg-move-hugetlb-resource-count-to-parent-cgroup-on-memcg-removal-fix.patch
memcg-move-hugetlb-resource-count-to-parent-cgroup-on-memcg-removal-fix-fix.patch
hugetlb-migrate-memcg-info-from-oldpage-to-new-page-during-migration.patch
memcg-add-memory-controller-documentation-for-hugetlb-management.patch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html