Another fix for stalls on AMD machines with C1E caused by 2.6.35.13 (#704059)

This commit is contained in:
Chuck Ebbert 2011-05-13 02:53:14 -04:00
parent ca79828415
commit 3bf2bfa0d8
2 changed files with 70 additions and 0 deletions

View File

@ -845,6 +845,7 @@ Patch13962: af_netlink-add-needed-scm_destroy-after-scm_send.patch
# fix regression causing stalls on AMD processors in 2.6.35.13
Patch13963: x86-amd-fix-apic-timer-erratum-400-affecting-k8-rev.a-e-processors.patch
Patch13964: x86-amd-fix-another-erratum-400-bug.patch
%endif
@ -1590,6 +1591,7 @@ ApplyPatch af_netlink-add-needed-scm_destroy-after-scm_send.patch
# fix regression causing stalls on AMD processors in 2.6.35.13
ApplyPatch x86-amd-fix-apic-timer-erratum-400-affecting-k8-rev.a-e-processors.patch
ApplyPatch x86-amd-fix-another-erratum-400-bug.patch
# END OF PATCH APPLICATIONS

View File

@ -0,0 +1,68 @@
Fix a bug that causes CPU hangs due to missing timer interrupts,
introduced by these three patches:
(1) commit d78d671db478eb8b14c78501c0cee1cc7baf6967
"x86, cpu: AMD errata checking framework"
(2) commit 9d8888c2a214aece2494a49e699a097c2ba9498b
"x86, cpu: Clean up AMD erratum 400 workaround"
(3) commit b87cf80af3ba4b4c008b4face3c68d604e1715c6
"x86, AMD: Set ARAT feature on AMD processors"
Patch (1) introduced a new framework that allowed checking for errata
using AMD's OSVW (OS visible workaround) feature combined with
explicit lists of models. It checked OSVW first, and completely
relied on that if it was present and usable.
Patch (2) switched the checking for erratum 400 to use the new
framework. But the original code checked for an explicit model range
first, then used OSVW if the CPU was not within that range. Patch (2)
also inexplicably added a second model range (for Family 10h) that
was never in the original code.
Then patch (3) used the new erratum 400 checks to decide whether
to enable the ARAT feature (always running APIC timer.) However,
this causes notebooks using the Sempron processor (Family 10h
Model 6 Stepping 2) to enable ARAT when they shouldn't because the
explicit check for that model gets skipped.
The fix is to check the model list first, then use OSVW if the CPU
is not in that list.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -723,6 +723,17 @@ bool cpu_has_amd_erratum(const int *erra
if (cpu->x86_vendor != X86_VENDOR_AMD)
return false;
+ /*
+ * Must match family-model-stepping range first so that the
+ * range checks will override OSVW checking.
+ */
+ ms = (cpu->x86_model << 4) | cpu->x86_mask;
+ while ((range = *erratum++))
+ if ((cpu->x86 == AMD_MODEL_RANGE_FAMILY(range)) &&
+ (ms >= AMD_MODEL_RANGE_START(range)) &&
+ (ms <= AMD_MODEL_RANGE_END(range)))
+ return true;
+
if (osvw_id >= 0 && osvw_id < 65536 &&
cpu_has(cpu, X86_FEATURE_OSVW)) {
u64 osvw_len;
@@ -737,13 +748,5 @@ bool cpu_has_amd_erratum(const int *erra
}
}
- /* OSVW unavailable or ID unknown, match family-model-stepping range */
- ms = (cpu->x86_model << 4) | cpu->x86_mask;
- while ((range = *erratum++))
- if ((cpu->x86 == AMD_MODEL_RANGE_FAMILY(range)) &&
- (ms >= AMD_MODEL_RANGE_START(range)) &&
- (ms <= AMD_MODEL_RANGE_END(range)))
- return true;
-
return false;
}