Pull x86 RAS updates from Borislav Petkov:
"This time around we have a subsystem reorganization to offer, with the
new directory being
arch/x86/kernel/cpu/mce/
and all compilation units' names streamlined under it"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Restore MCE injector's module name
x86/mce: Unify pr_* prefix
x86/mce: Streamline MCE subsystem's naming
Pull x86 microcode loading updates from Borislav Petkov:
"This update contains work started by Maciej to make the microcode
container verification more robust against all kinds of corruption and
also unify verification paths between early and late loading.
The result is a set of verification routines which validate the
microcode blobs before loading it on the CPU. In addition, the code is
a lot more streamlined and unified.
In the process, some of the aspects of patch handling and loading were
simplified.
All provided by Maciej S. Szmigiero and Borislav Petkov"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Update copyright
x86/microcode/AMD: Check the equivalence table size when scanning it
x86/microcode/AMD: Convert CPU equivalence table variable into a struct
x86/microcode/AMD: Check microcode container data in the late loader
x86/microcode/AMD: Fix container size's type
x86/microcode/AMD: Convert early parser to the new verification routines
x86/microcode/AMD: Change verify_patch()'s return value
x86/microcode/AMD: Move chipset-specific check into verify_patch()
x86/microcode/AMD: Move patch family check to verify_patch()
x86/microcode/AMD: Simplify patch family detection
x86/microcode/AMD: Concentrate patch verification
x86/microcode/AMD: Cleanup verify_patch_size() more
x86/microcode/AMD: Clean up per-family patch size checks
x86/microcode/AMD: Move verify_patch_size() up in the file
x86/microcode/AMD: Add microcode container verification
x86/microcode/AMD: Subtract SECTION_HDR_SIZE from file leftover length
Pull x86 cache control updates from Borislav Petkov:
- The generalization of the RDT code to accommodate the addition of
AMD's very similar implementation of the cache monitoring feature.
This entails a subsystem move into a separate and generic
arch/x86/kernel/cpu/resctrl/ directory along with adding
vendor-specific initialization and feature detection helpers.
Ontop of that is the unification of user-visible strings, both in the
resctrl filesystem error handling and Kconfig.
Provided by Babu Moger and Sherry Hurwitz.
- Code simplifications and error handling improvements by Reinette
Chatre.
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix rdt_find_domain() return value and checks
x86/resctrl: Remove unnecessary check for cbm_validate()
x86/resctrl: Use rdt_last_cmd_puts() where possible
MAINTAINERS: Update resctrl filename patterns
Documentation: Rename and update intel_rdt_ui.txt to resctrl_ui.txt
x86/resctrl: Introduce AMD QOS feature
x86/resctrl: Fixup the user-visible strings
x86/resctrl: Add AMD's X86_FEATURE_MBA to the scattered CPUID features
x86/resctrl: Rename the config option INTEL_RDT to RESCTRL
x86/resctrl: Add vendor check for the MBA software controller
x86/resctrl: Bring cbm_validate() into the resource structure
x86/resctrl: Initialize the vendor-specific resource functions
x86/resctrl: Move all the macros to resctrl/internal.h
x86/resctrl: Re-arrange the RDT init code
x86/resctrl: Rename the RDT functions and definitions
x86/resctrl: Rename and move rdt files to a separate directory
single-stepping fixes, improved tracing, various timer and vGIC
fixes
* x86: Processor Tracing virtualization, STIBP support, some correctness fixes,
refactorings and splitting of vmx.c, use the Hyper-V range TLB flush hypercall,
reduce order of vcpu struct, WBNOINVD support, do not use -ftrace for __noclone
functions, nested guest support for PAUSE filtering on AMD, more Hyper-V
enlightenments (direct mode for synthetic timers)
* PPC: nested VFIO
* s390: bugfixes only this time
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcH0vFAAoJEL/70l94x66Dw/wH/2FZp1YOM5OgiJzgqnXyDbyf
dNEfWo472MtNiLsuf+ZAfJojVIu9cv7wtBfXNzW+75XZDfh/J88geHWNSiZDm3Fe
aM4MOnGG0yF3hQrRQyEHe4IFhGFNERax8Ccv+OL44md9CjYrIrsGkRD08qwb+gNh
P8T/3wJEKwUcVHA/1VHEIM8MlirxNENc78p6JKd/C7zb0emjGavdIpWFUMr3SNfs
CemabhJUuwOYtwjRInyx1y34FzYwW3Ejuc9a9UoZ+COahUfkuxHE8u+EQS7vLVF6
2VGVu5SA0PqgmLlGhHthxLqVgQYo+dB22cRnsLtXlUChtVAq8q9uu5sKzvqEzuE=
=b4Jx
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- selftests improvements
- large PUD support for HugeTLB
- single-stepping fixes
- improved tracing
- various timer and vGIC fixes
x86:
- Processor Tracing virtualization
- STIBP support
- some correctness fixes
- refactorings and splitting of vmx.c
- use the Hyper-V range TLB flush hypercall
- reduce order of vcpu struct
- WBNOINVD support
- do not use -ftrace for __noclone functions
- nested guest support for PAUSE filtering on AMD
- more Hyper-V enlightenments (direct mode for synthetic timers)
PPC:
- nested VFIO
s390:
- bugfixes only this time"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: x86: Add CPUID support for new instruction WBNOINVD
kvm: selftests: ucall: fix exit mmio address guessing
Revert "compiler-gcc: disable -ftracer for __noclone functions"
KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
KVM: Make kvm_set_spte_hva() return int
KVM: Replace old tlb flush function with new one to flush a specified range.
KVM/MMU: Add tlb flush with range helper function
KVM/VMX: Add hv tlb range flush support
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM: x86: Disable Intel PT when VMXON in L1 guest
KVM: x86: Set intercept for Intel PT MSRs read/write
KVM: x86: Implement Intel PT MSRs read/write emulation
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXBvKlAAKCRCAXGG7T9hj
vmIoAP0XpLCE+0Z1hhxcDcJ0hKah1NIniRSIGGr6Af+gxe8F4wEA0Vm55gtEZerU
9mL5S7e2EcuTo93XCIjsxU8uPLGtegQ=
=59wi
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.21-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"Xen features and fixes:
- a series to enable KVM guests to be booted by qemu via the Xen PVH
boot entry for speeding up KVM guest tests
- a series for a common driver to be used by Xen PV frontends (right
now drm and sound)
- two other fixes in Xen related code"
* tag 'for-linus-4.21-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
ALSA: xen-front: Use Xen common shared buffer implementation
drm/xen-front: Use Xen common shared buffer implementation
xen: Introduce shared buffer helpers for page directory...
xen/pciback: Check dev_data before using it
kprobes/x86/xen: blacklist non-attachable xen interrupt functions
KVM: x86: Allow Qemu/KVM to use PVH entry point
xen/pvh: Add memory map pointer to hvm_start_info struct
xen/pvh: Move Xen code for getting mem map via hcall out of common file
xen/pvh: Move Xen specific PVH VM initialization out of common file
xen/pvh: Create a new file for Xen specific PVH code
xen/pvh: Move PVH entry code out of Xen specific tree
xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
Pull x86 pti updates from Thomas Gleixner:
"No point in speculating what's in this parcel:
- Drop the swap storage limit when L1TF is disabled so the full space
is available
- Add support for the new AMD STIBP always on mitigation mode
- Fix a bunch of STIPB typos"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add support for STIBP always-on preferred mode
x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off
x86/speculation: Change misspelled STIPB to STIBP
Pull x86 fixes from Ingo Molnar:
"The biggest part is a series of reverts for the macro based GCC
inlining workarounds. It caused regressions in distro build and other
kernel tooling environments, and the GCC project was very receptive to
fixing the underlying inliner weaknesses - so as time ran out we
decided to do a reasonably straightforward revert of the patches. The
plan is to rely on the 'asm inline' GCC 9 feature, which might be
backported to GCC 8 and could thus become reasonably widely available
on modern distros.
Other than those reverts, there's misc fixes from all around the
place.
I wish our final x86 pull request for v4.20 was smaller..."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs"
Revert "x86/objtool: Use asm macros to work around GCC inlining bugs"
Revert "x86/refcount: Work around GCC inlining bug"
Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs"
Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs"
Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops"
Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs"
Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs"
Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"
x86/mtrr: Don't copy uninitialized gentry fields back to userspace
x86/fsgsbase/64: Fix the base write helper functions
x86/mm/cpa: Fix cpa_flush_array() TLB invalidation
x86/vdso: Pass --eh-frame-hdr to the linker
x86/mm: Fix decoy address handling vs 32-bit builds
x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence
x86/dump_pagetables: Fix LDT remap address marker
x86/mm: Fix guard hole handling
Only the mount namespace code that implements mount(2) should be using the
MS_* flags. Suppress them inside the kernel unless uapi/linux/mount.h is
included.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
Since commit 79922b8009 ("ftrace: Optimize function graph to be
called directly"), dynamic trampolines should not be calling the
function graph tracer at the end. If they do, it could cause the function
graph tracer to trace functions that it filtered out.
Right now it does not cause a problem because there's a test to check if
the function graph tracer is attached to the same function as the
function tracer, which for now is true. But the function graph tracer is
undergoing changes that can make this no longer true which will cause
the function graph tracer to trace other functions.
For example:
# cd /sys/kernel/tracing/
# echo do_IRQ > set_ftrace_filter
# mkdir instances/foo
# echo ip_rcv > instances/foo/set_ftrace_filter
# echo function_graph > current_tracer
# echo function > instances/foo/current_tracer
Would cause the function graph tracer to trace both do_IRQ and ip_rcv,
if the current tests change.
As the current tests prevent this from being a problem, this code does
not need to be backported. But it does make the code cleaner.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This reverts commit 5bdcd510c2.
The macro based workarounds for GCC's inlining bugs caused regressions: distcc
and other distro build setups broke, and the fixes are not easy nor will they
solve regressions on already existing installations.
So we are reverting this patch and the 8 followup patches.
What makes this revert easier is that GCC9 will likely include the new 'asm inline'
syntax that makes inlining of assembly blocks a lot more robust.
This is a superior method to any macro based hackeries - and might even be
backported to GCC8, which would make all modern distros get the inlining
fixes as well.
Many thanks to Masahiro Yamada and others for helping sort out these problems.
Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It was mce-inject.ko but it turned into inject.ko since the containing
source file got renamed. Restore it.
Fixes: 21afaf1813 ("x86/mce: Streamline MCE subsystem's naming")
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20181218182546.GA21386@zn.tnic
Currently the copy_to_user of data in the gentry struct is copying
uninitiaized data in field _pad from the stack to userspace.
Fix this by explicitly memset'ing gentry to zero, this also will zero any
compiler added padding fields that may be in struct (currently there are
none).
Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable")
Fixes: b263b31e8a ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: security@kernel.org
Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com
Andy spotted a regression in the fs/gs base helpers after the patch series
was committed. The helper functions which write fs/gs base are not just
writing the base, they are also changing the index. That's wrong and needs
to be separated because writing the base has not to modify the index.
While the regression is not causing any harm right now because the only
caller depends on that behaviour, it's a guarantee for subtle breakage down
the road.
Make the index explicitly changed from the caller, instead of including
the code in the helpers.
Subsequently, the task write helpers do not handle for the current task
anymore. The range check for a base value is also factored out, to minimize
code redundancy from the caller.
Fixes: b1378a561f ("x86/fsgsbase/64: Introduce FS/GS base helper functions")
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20181126195524.32179-1-chang.seok.bae@intel.com
Different AMD processors may have different implementations of STIBP.
When STIBP is conditionally enabled, some implementations would benefit
from having STIBP always on instead of toggling the STIBP bit through MSR
writes. This preference is advertised through a CPUID feature bit.
When conditional STIBP support is requested at boot and the CPU advertises
STIBP always-on mode as preferred, switch to STIBP "on" support. To show
that this transition has occurred, create a new spectre_v2_user_mitigation
value and a new spectre_v2_user_strings message. The new mitigation value
is used in spectre_v2_user_select_mitigation() to print the new mitigation
message as well as to return a new string from stibp_state().
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20181213230352.6937.74943.stgit@tlendack-t1.amdoffice.net
nr_cpu_ids can be limited on the command line via nr_cpus=. This can break the
logical package management because it results in a smaller number of packages
while in kdump kernel.
Check below case:
There is a two sockets system, each socket has 8 cores, which has 16 logical
cpus while HT was turn on.
0 1 2 3 4 5 6 7 | 16 17 18 19 20 21 22 23
cores on socket 0 threads on socket 0
8 9 10 11 12 13 14 15 | 24 25 26 27 28 29 30 31
cores on socket 1 threads on socket 1
While starting the kdump kernel with command line option nr_cpus=16 panic
was triggered on one of the cpus 24-31 eg. 26, then online cpu will be
1-15, 26(cpu 0 was disabled in kdump), ncpus will be 16 and
__max_logical_packages will be 1, but actually two packages were booted on.
This issue can reproduced by set kdump option nr_cpus=<real physical core
numbers>, and then trigger panic on last socket's thread, for example:
taskset -c 26 echo c > /proc/sysrq-trigger
Use total_cpus which will not be limited by nr_cpus command line to calculate
the value of __max_logical_packages.
Signed-off-by: Hui Wang <john.wanghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <guijianfeng@huawei.com>
Cc: <wencongyang2@huawei.com>
Cc: <douliyang1@huawei.com>
Cc: <qiaonuohan@huawei.com>
Link: https://lkml.kernel.org/r/20181107023643.22174-1-john.wanghui@huawei.com
From Mimi:
In Linux 4.19, a new LSM hook named security_kernel_load_data was
upstreamed, allowing LSMs and IMA to prevent the kexec_load
syscall. Different signature verification methods exist for verifying
the kexec'ed kernel image. This pull request adds additional support
in IMA to prevent loading unsigned kernel images via the kexec_load
syscall, independently of the IMA policy rules, based on the runtime
"secure boot" flag. An initial IMA kselftest is included.
In addition, this pull request defines a new, separate keyring named
".platform" for storing the preboot/firmware keys needed for verifying
the kexec'ed kernel image's signature and includes the associated IMA
kexec usage of the ".platform" keyring.
(David Howell's and Josh Boyer's patches for reading the
preboot/firmware keys, which were previously posted for a different
use case scenario, are included here.)
Remove x86 specific arch_within_kprobe_blacklist().
Since we have already added all blacklisted symbols to the
kprobe blacklist by arch_populate_kprobe_blacklist(),
we don't need arch_within_kprobe_blacklist() on x86
anymore.
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/154503491354.26176.13903264647254766066.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Show x86-64 specific blacklisted symbols in debugfs.
Since x86-64 prohibits probing on symbols which are in
entry text, those should be shown.
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/154503488425.26176.17136784384033608516.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
While the dma-direct code is (relatively) clean and simple we actually
have to use the swiotlb ops for the mapping on many architectures due
to devices with addressing limits. Instead of keeping two
implementations around this commit allows the dma-direct
implementation to call the swiotlb bounce buffering functions and
thus share the guts of the mapping implementation. This also
simplified the dma-mapping setup on a few architectures where we
don't have to differenciate which implementation to use.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
In order to pave the way for hypervisors other than Xen to use the PVH
entry point for VMs, we need to factor the PVH entry code into Xen specific
and hypervisor agnostic components. The first step in doing that, is to
create a new config option for PVH entry that can be enabled
independently from CONFIG_XEN.
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
rdt_find_domain() returns an ERR_PTR() that is generated from a provided
domain id when the value is negative.
Care needs to be taken when creating an ERR_PTR() from this value
because a subsequent check using IS_ERR() expects the error to
be within the MAX_ERRNO range. Using an invalid domain id as an
ERR_PTR() does work at this time since this is currently always -1.
Using this undocumented assumption is fragile since future users of
rdt_find_domain() may not be aware of thus assumption.
Two related issues are addressed:
- Ensure that rdt_find_domain() always returns a valid error value by
forcing the error to be -ENODEV when a negative domain id is provided.
- In a few instances the return value of rdt_find_domain() is just
checked for NULL - fix these to include a check of ERR_PTR.
Fixes: d89b737901 ("x86/intel_rdt/cqm: Add mon_data")
Fixes: 521348b011 ("x86/intel_rdt: Introduce utility to obtain CDP peer")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: fenghua.yu@intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/b88cd4ff6a75995bf8db9b0ea546908fe50f69f3.1544479852.git.reinette.chatre@intel.com
The user triggers the creation of a pseudo-locked region when writing
the requested schemata to the schemata resctrl file. The pseudo-locking
of a region is required to be done on a CPU that is associated with the
cache on which the pseudo-locked region will reside. In order to run the
locking code on a specific CPU, the needed CPU has to be selected and
ensured to remain online during the entire locking sequence.
At this time, the cpu_hotplug_lock is not taken during the pseudo-lock
region creation and it is thus possible for a CPU to be selected to run
the pseudo-locking code and then that CPU to go offline before the
thread is able to run on it.
Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on
which code has to run needs to be controlled. Since the cpu_hotplug_lock
is always taken before rdtgroup_mutex the lock order is maintained.
Fixes: e0bdfe8e36 ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: stable <stable@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com
dma-debug is now capable of adding new entries to its pool on-demand if
the initial preallocation was insufficient, so the IOMMU_LEAK logic no
longer needs to explicitly change the pool size. This does lose it the
ability to save a couple of megabytes of RAM by reducing the pool size
below its default, but it seems unlikely that that is a realistic
concern these days (or indeed that anyone is actively debugging AGP
drivers' DMA usage any more). Getting rid of dma_debug_resize_entries()
will make room for further streamlining in the dma-debug code itself.
Removing the call reveals quite a lot of cruft which has been useless
for nearly a decade since commit 19c1a6f576 ("x86 gart: reimplement
IOMMU_LEAK feature by using DMA_API_DEBUG"), including the entire
'iommu=leak' parameter, which controlled nothing except whether
dma_debug_resize_entries() was called or not.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The secure boot mode may not be detected on boot for some reason (eg.
buggy firmware). This patch attempts one more time to detect the
secure boot mode.
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
On x86, there are two methods of verifying a kexec'ed kernel image
signature being loaded via the kexec_file_load syscall - an architecture
specific implementaton or a IMA KEXEC_KERNEL_CHECK appraisal rule. Neither
of these methods verify the kexec'ed kernel image signature being loaded
via the kexec_load syscall.
Secure boot enabled systems require kexec images to be signed. Therefore,
this patch loads an IMA KEXEC_KERNEL_CHECK policy rule on secure boot
enabled systems not configured with CONFIG_KEXEC_VERIFY_SIG enabled.
When IMA_APPRAISE_BOOTPARAM is configured, different IMA appraise modes
(eg. fix, log) can be specified on the boot command line, allowing unsigned
or invalidly signed kernel images to be kexec'ed. This patch permits
enabling IMA_APPRAISE_BOOTPARAM or IMA_ARCH_POLICY, but not both.
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever
the system is deemed affected by L1TF vulnerability. Even though the limit
is quite high for most deployments it seems to be too restrictive for
deployments which are willing to live with the mitigation disabled.
We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is
clearly out of the limit.
Drop the swap restriction when l1tf=off is specified. It also doesn't make
much sense to warn about too much memory for the l1tf mitigation when it is
forcefully disabled by the administrator.
[ tglx: Folded the documentation delta change ]
Fixes: 377eeaa8e1 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: <linux-mm@kvack.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.org
... and make it static. It is called only by the kretprobe_trampoline()
from asm.
It was marked __visible so that it is visible outside of the current
compilation unit but that is not needed as it is used only in this
compilation unit.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20181205162526.GB109259@gmail.com
... with the goal of eventually enabling -Wmissing-prototypes by
default. At least on x86.
Make functions static where possible, otherwise add prototypes or make
them visible through includes.
asm/trace/ changes courtesy of Steven Rostedt <rostedt@goodmis.org>.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> # ACPI + cpufreq bits
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Cc: linux-acpi@vger.kernel.org
Return DMA_MAPPING_ERROR instead of the magic bad_dma_addr on a dma
mapping failure and let the core dma-mapping code handle the rest.
Remove the magic EMERGENCY_PAGES that the bad_dma_addr gets redirected to.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Return DMA_MAPPING_ERROR instead of the magic bad_dma_addr on a dma
mapping failure and let the core dma-mapping code handle the rest.
Remove the magic EMERGENCY_PAGES that the bad_dma_addr gets redirected to.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the pr_fmt prefix to internal.h and include it everywhere. This
way, all pr_* printed strings will be prepended with "mce: ".
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20181205200913.GR29510@zn.tnic
STIBP stands for Single Thread Indirect Branch Predictors. The acronym,
however, can be easily mis-spelled as STIPB. It is perhaps due to the
presence of another related term - IBPB (Indirect Branch Predictor
Barrier).
Fix the mis-spelling in the code.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1544039368-9009-1-git-send-email-longman@redhat.com
Rename the containing folder to "mce" which is the most widespread name.
Drop the "mce[-_]" filename prefix of some compilation units (while
others don't have it).
This unifies the file naming in the MCE subsystem:
mce/
|-- amd.c
|-- apei.c
|-- core.c
|-- dev-mcelog.c
|-- genpool.c
|-- inject.c
|-- intel.c
|-- internal.h
|-- Makefile
|-- p5.c
|-- severity.c
|-- therm_throt.c
|-- threshold.c
`-- winchip.c
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20181205141323.14995-1-bp@alien8.de
The User Mode Instruction Prevention (UMIP) feature is part of the x86_64
instruction set architecture and not specific to Intel. Make the message
generic.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is one user of __kernel_fpu_begin() and before invoking it,
it invokes preempt_disable(). So it could invoke kernel_fpu_begin()
right away. The 32bit version of arch_efi_call_virt_setup() and
arch_efi_call_virt_teardown() does this already.
The comment above *kernel_fpu*() claims that before invoking
__kernel_fpu_begin() preemption should be disabled and that KVM is a
good example of doing it. Well, KVM doesn't do that since commit
f775b13eed ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
so it is not an example anymore.
With EFI gone as the last user of __kernel_fpu_{begin|end}(), both can
be made static and not exported anymore.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181129150210.2k4mawt37ow6c2vq@linutronix.de
The FSEC_PER_NSEC macro has had zero users since commit
ab0e08f15d ("x86: hpet: Cleanup the clockevents init and register code").
Remove it.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181130211450.5200-1-roland@purestorage.com
After copy_optimized_instructions() copies several instructions
to the working buffer it tries to fix up the real RIP address, but it
adjusts the RIP-relative instruction with an incorrect RIP address
for the 2nd and subsequent instructions due to a bug in the logic.
This will break the kernel pretty badly (with likely outcomes such as
a kernel freeze, a crash, or worse) because probed instructions can refer
to the wrong data.
For example putting kprobes on cpumask_next() typically hits this bug.
cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y
(in this case nr_cpumask_bits is an alias of nr_cpu_ids):
<cpumask_next>:
48 89 f0 mov %rsi,%rax
8b 35 7b fb e2 00 mov 0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids>
55 push %rbp
...
If we put a kprobe on it and it gets jump-optimized, it gets
patched by the kprobes code like this:
<cpumask_next>:
e9 95 7d 07 1e jmpq 0xffffffffa000207a
7b fb jnp 0xffffffff81f8a2e2 <cpumask_next+2>
e2 00 loop 0xffffffff81f8a2e9 <cpumask_next+9>
55 push %rbp
This shows that the first two MOV instructions were copied to a
trampoline buffer at 0xffffffffa000207a.
Here is the disassembled result of the trampoline, skipping
the optprobe template instructions:
# Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea:
54 push %rsp
...
48 83 c4 08 add $0x8,%rsp
9d popfq
48 89 f0 mov %rsi,%rax
8b 35 82 7d db e2 mov -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3>
This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of
the original *nr_cpu_ids. This leads to a kernel freeze because
cpumask_next() always returns 0 and for_each_cpu() never ends.
Fix this by adding 'len' correctly to the real RIP address while
copying.
[ mingo: Improved the changelog. ]
Reported-by: Michael Rodin <michael@rodin.online>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # v4.15+
Fixes: 63fef14fc9 ("kprobes/x86: Make insn buffer always ROX and use text_poke()")
Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The comment above __raw_xsave_addr() claims that the function does not
work for compacted buffers and was introduced in:
b8b9b6ba9d ("x86/fpu: Allow setting of XSAVE state")
In this commit, the function was factored out of get_xsave_addr() and
this function claims that it works with "standard format or compacted
format of xsave area". It accesses the "xstate_comp_offsets" variable
for the actual offset and it was introduced in commit
7496d6458f ("Define kernel API to get address of each state in xsave area")
Based on the code (back then and now):
- xstate_offsets holds the standard offset.
- if compacted mode is not supported then xstate_comp_offsets gets the
xstate_offsets copied.
- if compacted mode is supported then xstate_comp_offsets will hold the
offset for the compacted buffer.
Based on that the function works for compacted buffers as long as the
CPU supports it and this what we care about.
Remove the "Note:" which is not accurate.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181128222035.2996-7-bigeasy@linutronix.de