Move all code relating to /dev/mcelog to a separate source file.
/dev/mcelog driver can now operate from the machine check notifier with
lowest prio.
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Move the mce_helper and trigger functionality behind CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-6-bp@alien8.de
[ Renamed CONFIG_X86_MCELOG to CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce a simple data structure for collecting correctable errors
along with accessors. More detailed description in the code itself.
The error decoding is done with the decoding chain now and
mce_first_notifier() gets to see the error first and the CEC decides
whether to log it and then the rest of the chain doesn't hear about it -
basically the main reason for the CE collector - or to continue running
the notifiers.
When the CEC hits the action threshold, it will try to soft-offine the
page containing the ECC and then the whole decoding chain gets to see
the error.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is confusing when staring at "struct mce_log mcelog" and then there's
also a function called mce_log(). So call the buffer what it is.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We call it everywhere "struct mce *m". Adjust that here too to avoid
confusion.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since:
cd9c57cad3 ("x86/MCE: Dump MCE to dmesg if no consumers")
all MCEs are printed even when mcelog is running. Fix the regression to
not print to dmesg when mcelog is running as it is a consumer too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org # 4.10..
Fixes: cd9c57cad3 ("x86/MCE: Dump MCE to dmesg if no consumers")
Link: http://lkml.kernel.org/r/20170327093304.10683-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Xen imposes special requirements on the GDT. Rather than using a
global variable for the pgprot, just use an explicit special case
for Xen -- this makes it clearer what's going on. It also debloats
64-bit kernels very slightly.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e9ea96abbfd6a8c87753849171bb5987ecfeb523.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__pa() cannot be used on percpu pointers because they may be
virtually mapped. Use per_cpu_ptr_to_phys() instead.
This fixes a boot crash on a some 32-bit configurations. I assume
this is related to which allocation strategy is chosen by the percpu
core.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 69218e4799 x86: ("Remap GDT tables in the fixmap section")
Link: http://lkml.kernel.org/r/22e0069c29fba31998f193201e359eebfdac4960.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. Exposing this feature to userspace will allow a
ptracer to trap and emulate the CPUID instruction.
When supported, this feature is controlled by toggling bit 0 of
MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991
Implement a new pair of arch_prctls, available on both x86-32 and x86-64.
ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
is enabled (and thus the CPUID instruction is not available) or 1 if
CPUID faulting is not enabled.
ARCH_SET_CPUID: Set the CPUID state to the second argument. If
cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
be deactivated. Returns ENODEV if CPUID faulting is not supported on
this system.
The state of the CPUID faulting flag is propagated across forks, but reset
upon exec.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. This will allow a ptracer to emulate the CPUID
instruction.
Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is
documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991
Detect support for this feature and expose it as X86_FEATURE_CPUID_FAULT.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-8-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This matches the only public Intel documentation of this MSR, in the
"Virtualization Technology FlexMigration Application Note"
(preserved at https://bugzilla.kernel.org/attachment.cgi?id=243991)
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-2-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the MCA banks in __mcheck_cpu_init_generic() are polled for leftover
errors logged during boot or from the previous boot, its required to have
CPU features detected sufficiently so that the reading out and handling of
those early errors is done correctly.
If those features are not available, the decoding may miss some information
and get incomplete errors logged. For example, on SMCA systems the MCA_IPID
and MCA_SYND registers are not logged and MCA_ADDR is not masked
appropriately.
To cure that, do a subset of the basic feature detection early while the
rest happens in its usual place in __mcheck_cpu_init_vendor().
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1489599055-20756-1-git-send-email-Yazen.Ghannam@amd.com
[ Massage commit message and simplify. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
mc is a pointer to the static u8 array amd_ucode_patch and
therefore can never be null, so the check is redundant. Remove it.
Detected by CoverityScan, CID#1372871 ("Logically Dead Code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: kernel-janitors@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20170315171010.17536-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes the GDT remapped pages read-only, to prevent accidental
(or intentional) corruption of this key data structure.
This change is done only on 64-bit, because 32-bit needs it to be writable
for TSS switches.
The native_load_tr_desc function was adapted to correctly handle a
read-only GDT. The LTR instruction always writes to the GDT TSS entry.
This generates a page fault if the GDT is read-only. This change checks
if the current GDT is a remap and swap GDTs as needed. This function was
tested by booting multiple machines and checking hibernation works
properly.
KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the
per-cpu variable was removed for functions to fetch the original GDT.
Instead of reloading the previous GDT, VMX will reload the fixmap GDT as
expected. For testing, VMs were started and restored on multiple
configurations.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Luis R . Rodriguez <mcgrof@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: kvm@vger.kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Each processor holds a GDT in its per-cpu structure. The sgdt
instruction gives the base address of the current GDT. This address can
be used to bypass KASLR memory randomization. With another bug, an
attacker could target other per-cpu structures or deduce the base of
the main memory section (PAGE_OFFSET).
This patch relocates the GDT table for each processor inside the
fixmap section. The space is reserved based on number of supported
processors.
For consistency, the remapping is done by default on 32 and 64-bit.
Each processor switches to its remapped GDT at the end of
initialization. For hibernation, the main processor returns with the
original GDT and switches back to the remapping at completion.
This patch was tested on both architectures. Hibernation and KVM were
both tested specially for their usage of the GDT.
Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
recommending changes for Xen support.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Luis R . Rodriguez <mcgrof@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: kvm@vger.kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The rdtgroup_kn_unlock waits for the last user to release and put its
node. But it's calling kernfs_put on the node which calls the
rdtgroup_kn_unlock, which might not be the group's directory node, but
another group's file node.
This race could be easily reproduced by running 2 instances
of following script:
mount -t resctrl resctrl /sys/fs/resctrl/
pushd /sys/fs/resctrl/
mkdir krava
echo "krava" > krava/schemata
rmdir krava
popd
umount /sys/fs/resctrl
It triggers the slub debug error message with following command
line config: slub_debug=,kernfs_node_cache.
Call kernfs_put on the group's node to fix it.
Fixes: 60cf5e101f ("x86/intel_rdt: Add mkdir to resctrl file system")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When we are about to kexec a crash kernel and right then and there a
broadcasted MCE fires while we're still in the first kernel and while
the other CPUs remain in a holding pattern, the #MC handler of the
first kernel will timeout and then panic due to never completing MCE
synchronization.
Handle this in a similar way as to when the CPUs are offlined when that
broadcasted MCE happens.
[ Boris: rewrote commit message and comments. ]
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: kexec@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1487857012-9059-1-git-send-email-xlpang@redhat.com
Link: http://lkml.kernel.org/r/20170313095019.19351-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove the wp_works_ok member of struct cpuinfo_x86. It's an
optimization back from Linux v0.99 times where we had no fixup support
yet and did the CR0.WP test via special code in the page fault handler.
The < 0 test was an optimization to not do the special casing for each
NULL ptr access violation but just for the first one doing the WP test.
Today it serves no real purpose as the test no longer needs special code
in the page fault handler and the only call side -- mem_init() -- calls
it just once, anyway. However, Xen pre-initializes it to 1, to skip the
test.
Doing the test again for Xen should be no issue at all, as even the
commit introducing skipping the test (commit d560bc6157 ("x86, xen:
Suppress WP test on Xen")) mentioned it being ban aid only. And, in
fact, testing the patch on Xen showed nothing breaks.
The pre-fixup times are long gone and with the removal of the fallback
handling code in commit a5c2a893db ("x86, 386 removal: Remove
CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway.
So just get rid of the "optimization" and do the test unconditionally.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/1486933932-585-3-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fixes from Ingo Molnar:
"Misc fixes and minor updates all over the place:
- an SGI/UV fix
- a defconfig update
- a build warning fix
- move the boot_params file to the arch location in debugfs
- a pkeys fix
- selftests fix
- boot message fixes
- sparse fixes
- a resume warning fix
- ioapic hotplug fixes
- reboot quirks
... plus various minor cleanups"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build/x86_64_defconfig: Enable CONFIG_R8169
x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk
x86/hpet: Prevent might sleep splat on resume
x86/boot: Correct setup_header.start_sys name
x86/purgatory: Fix sparse warning, symbol not declared
x86/purgatory: Make functions and variables static
x86/events: Remove last remnants of old filenames
x86/pkeys: Check against max pkey to avoid overflows
x86/ioapic: Split IOAPIC hot-removal into two steps
x86/PCI: Implement pcibios_release_device to release IRQ from IOAPIC
x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h
x86/vmware: Remove duplicate inclusion of asm/timer.h
x86/hyperv: Hide unused label
x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk
x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register
x86/selftests: Add clobbers for int80 on x86_64
x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR()
x86/apic: Fix a warning message in logical CPU IDs allocation
x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/
Pull scheduler fixes from Ingo Molnar:
"A fix for KVM's scheduler clock which (erroneously) was always marked
unstable, a fix for RT/DL load balancing, plus latency fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
sched/core: Fix pick_next_task() for RT,DL
sched/fair: Make select_idle_cpu() more aggressive
Wanpeng Li reported that since the following commit:
acb04058de ("sched/clock: Fix hotplug crash")
... KVM always runs with unstable sched-clock even though KVM's
kvm_clock _is_ stable.
The problem is that we've tied clear_sched_clock_stable() to the TSC
state, and overlooked that sched_clock() is a paravirt function.
Solve this by doing two things:
- tie the sched_clock() stable state more clearly to the TSC stable
state for the normal (!paravirt) case.
- only call clear_sched_clock_stable() when we mark TSC unstable
when we use native_sched_clock().
The first means we can actually run with stable sched_clock in more
situations then before, which is good. And since commit:
12907fbb1a ("sched/clock, clocksource: Add optional cs::mark_unstable() method")
... this should be reliable. Since any detection of TSC fail now results
in marking the TSC unstable.
Reported-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: acb04058de ("sched/clock: Fix hotplug crash")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.
Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.
Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Two documentation updates, plus a debugging annotation fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/crash: Update the stale comment in reserve_crashkernel()
x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers
Documentation, x86, resctrl: Recommend locking for resctrlfs
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:
git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'
This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.
(Michal Hocko provided most of the kerneldoc comment.)
Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the big char/misc driver patchset for 4.11-rc1.
Lots of different driver subsystems updated here. Rework for the hyperv
subsystem to handle new platforms better, mei and w1 and extcon driver
updates, as well as a number of other "minor" driver updates. Full
details are in the shortlog below.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2iRQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynhFACguVE+/ixj5u5bT5DXQaZNai/6zIAAmgMWwd/t
YTD2cwsJsGbTT1fY3SUe
=CiSI
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver patchset for 4.11-rc1.
Lots of different driver subsystems updated here: rework for the
hyperv subsystem to handle new platforms better, mei and w1 and extcon
driver updates, as well as a number of other "minor" driver updates.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (169 commits)
goldfish: Sanitize the broken interrupt handler
x86/platform/goldfish: Prevent unconditional loading
vmbus: replace modulus operation with subtraction
vmbus: constify parameters where possible
vmbus: expose hv_begin/end_read
vmbus: remove conditional locking of vmbus_write
vmbus: add direct isr callback mode
vmbus: change to per channel tasklet
vmbus: put related per-cpu variable together
vmbus: callback is in softirq not workqueue
binder: Add support for file-descriptor arrays
binder: Add support for scatter-gather
binder: Add extra size to allocator
binder: Refactor binder_transact()
binder: Support multiple /dev instances
binder: Deal with contexts in debugfs
binder: Support multiple context managers
binder: Split flat_binder_object
auxdisplay: ht16k33: remove private workqueue
auxdisplay: ht16k33: rework input device initialization
...
Pull x86 microcode updates from Ingo Molnar:
"The main changes are further simplification and unification of the
code between the AMD and Intel microcode loaders, plus other
simplifications - by Borislav Petkov"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Remove struct cont_desc.eq_id
x86/microcode/AMD: Remove AP scanning optimization
x86/microcode/AMD: Simplify saving from initrd
x86/microcode/AMD: Unify load_ucode_amd_ap()
x86/microcode/AMD: Check patch level only on the BSP
x86/microcode: Remove local vendor variable
x86/microcode/AMD: Use find_microcode_in_initrd()
x86/microcode/AMD: Get rid of global this_equiv_id
x86/microcode: Decrease CPUID use
x86/microcode/AMD: Rework container parsing
x86/microcode/AMD: Extend the container struct
x86/microcode/AMD: Shorten function parameter's name
x86/microcode/AMD: Clean up find_equiv_id()
x86/microcode: Convert to bare minimum MSR accessors
x86/MSR: Carve out bare minimum accessors
Pull x86 fpu updates from Ingo Molnar:
"The main changes relate to fixes between (lack of) CPUID and FPU
detection that should only affect old or weird CPUs, by Andy
Lutomirski"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Fix the "Giving up, no FPU found" test
x86/fpu: Fix CPUID-less FPU detection
x86/fpu: Fix "x86/fpu: Legacy x87 FPU detected" message
x86/cpu: Re-apply forced caps every time CPU caps are re-read
x86/cpu: Factor out application of forced CPU caps
x86/cpu: Add X86_FEATURE_CPUID
x86/fpu/xstate: Move XSAVES state init to a function
Pull x86 cpufeature updates from Ingo Molnar:
"The main changes in this cycle were related to enable ring-3
MONITOR/MWAIT instructions support on supported CPUs, by Grzegorz
Andrejczuk and Piotr Luc"
* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpufeature: Move RING3MWAIT feature to avoid conflicts
x86/cpufeature: Enable RING3MWAIT for Knights Mill
x86/cpufeature: Enable RING3MWAIT for Knights Landing
x86/cpufeature: Add RING3MWAIT to CPU features
x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT
x86/msr: Add MSR_MISC_FEATURE_ENABLES and RING3MWAIT bit
x86/cpufeature: Add AVX512_VPOPCNTDQ feature
Pull scheduler updates from Ingo Molnar:
"The main changes in this (fairly busy) cycle were:
- There was a class of scheduler bugs related to forgetting to update
the rq-clock timestamp which can cause weird and hard to debug
problems, so there's a new debug facility for this: which uncovered
a whole lot of bugs which convinced us that we want to keep the
debug facility.
(Peter Zijlstra, Matt Fleming)
- Various cputime related updates: eliminate cputime and use u64
nanoseconds directly, simplify and improve the arch interfaces,
implement delayed accounting more widely, etc. - (Frederic
Weisbecker)
- Move code around for better structure plus cleanups (Ingo Molnar)
- Move IO schedule accounting deeper into the scheduler plus related
changes to improve the situation (Tejun Heo)
- ... plus a round of sched/rt and sched/deadline fixes, plus other
fixes, updats and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
sched/core: Remove unlikely() annotation from sched_move_task()
sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
sched/topology: Split out scheduler topology code from core.c into topology.c
sched/core: Remove unnecessary #include headers
sched/rq_clock: Consolidate the ordering of the rq_clock methods
delayacct: Include <uapi/linux/taskstats.h>
sched/core: Clean up comments
sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
sched/clock: Add dummy clear_sched_clock_stable() stub function
sched/cputime: Remove generic asm headers
sched/cputime: Remove unused nsec_to_cputime()
s390, sched/cputime: Remove unused cputime definitions
powerpc, sched/cputime: Remove unused cputime definitions
s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
ia64, sched/cputime: Remove unused cputime definitions
ia64: Convert vtime to use nsec units directly
ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
sched/cputime: Remove jiffies based cputime
sched/cputime, vtime: Return nsecs instead of cputime_t to account
sched/cputime: Complete nsec conversion of tick based accounting
...
Pull RAS updates from Ingo Molnar:
"The main changes in this cycle were:
- Assign notifier chain priorities for all RAS related handlers to
make the ordering explicit (Borislav Petkov)
- Improve the AMD MCA banks sysfs output (Yazen Ghannam)
- Various cleanups and restructuring of the x86 RAS code (Borislav
Petkov)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
x86/ras: Get rid of mce_process_work()
EDAC/mce/amd: Dump TSC value
EDAC/mce/amd: Unexport amd_decode_mce()
x86/ras/amd/inj: Change dependency
x86/ras: Flip the TSC-adding logic
x86/ras/amd: Make sysfs names of banks more user-friendly
x86/ras/therm_throt: Do not log a fake MCE for thermal events
x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
After:
a33d331761 ("x86/CPU/AMD: Fix Bulldozer topology")
our SMT scheduling topology for Fam17h systems is broken, because
the ThreadId is included in the ApicId when SMT is enabled.
So, without further decoding cpu_core_id is unique for each thread
rather than the same for threads on the same core. This didn't affect
systems with SMT disabled. Make cpu_core_id be what it is defined to be.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170205105022.8705-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
a33d331761 ("x86/CPU/AMD: Fix Bulldozer topology")
restored the initial approach we had with the Fam15h topology of
enumerating CU (Compute Unit) threads as cores. And this is still
correct - they're beefier than HT threads but still have some
shared functionality.
Our current approach has a problem with the Mad Max Steam game, for
example. Yves Dionne reported a certain "choppiness" while playing on
v4.9.5.
That problem stems most likely from the fact that the CU threads share
resources within one CU and when we schedule to a thread of a different
compute unit, this incurs latency due to migrating the working set to a
different CU through the caches.
When the thread siblings mask mirrors that aspect of the CUs and
threads, the scheduler pays attention to it and tries to schedule within
one CU first. Which takes care of the latency, of course.
Reported-by: Yves Dionne <yves.dionne@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi codenamed Knights Mill. We
can't guarantee that this (KNM) will be the last CPU model that needs this
hack. But, we do recognize that this is far from optimal, and there is an
effort to ensure we don't keep doing extending this hack forever.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-6-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi x200 codenamed Knights
Landing.
Presence of this feature cannot be detected automatically (by reading any
other MSR) therefore it is required to explicitly check for the family and
model of the CPU before attempting to enable it.
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-5-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Introduce ELF_HWCAP2 variable for x86 and reserve its bit 0 to expose the
ring 3 MONITOR/MWAIT.
HWCAP variables contain bitmasks which can be used by userspace
applications to detect which instruction sets are supported by CPU. On x86
architecture information about CPU capabilities can be checked via CPUID
instructions, unfortunately presence of ring 3 MONITOR/MWAIT feature cannot
be checked this way. ELF_HWCAP cannot be used as well, because on x86 it is
set to CPUID[1].EDX which means that all bits are reserved there.
HWCAP2 approach was chosen because it reuses existing solution present
in other architectures, so only minor modifications are required to the
kernel and userspace applications. When ELF_HWCAP2 is defined
kernel maps it to AT_HWCAP2 during the start of the application.
This way the ring 3 MONITOR/MWAIT feature can be detected using getauxval()
API in a simple and fast manner. ELF_HWCAP2 type is u32 to be consistent
with x86 ELF_HWCAP type.
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-3-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Erik reported that on a preproduction hardware a CMCI storm triggers the
BUG_ON in add_timer_on(). The reason is that the per CPU MCE timer is
started by the CMCI logic before the MCE CPU hotplug callback starts the
timer with add_timer_on(). So the timer is already queued which triggers
the BUG.
Using add_timer_on() is pretty pointless in this code because the timer is
strictlty per CPU, initialized as pinned and all operations which arm the
timer happen on the CPU to which the timer belongs.
Simplify the whole machinery by using mod_timer() instead of add_timer_on()
which avoids the problem because mod_timer() can handle already queued
timers. Use __start_timer() everywhere so the earliest armed expiry time is
preserved.
Reported-by: Erik Veijola <erik.veijola@intel.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701310936080.3457@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>