Commit graph

40103 commits

Author SHA1 Message Date
Feng Tang
91b56fc98b x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
commit 2c66ca3949 upstream.

0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and
bisected it to commit b81fac906a ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves
the CR4_OSXSAVE enabling into a later place:

   arch_cpu_finalize_init
       identify_boot_cpu
	   identify_cpu
	       generic_identify
                   get_cpu_cap --> setup cpu capability
       ...
       fpu__init_cpu
           fpu__init_cpu_xstate
               cr4_set_bits(X86_CR4_OSXSAVE);

As the FPU is not yet initialized the CPU capability setup fails to set
X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64'
depend on this feature and therefore fail to load, causing the regression.

Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE
enabling.

[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]

Fixes: b81fac906a ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com
Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-05 01:25:09 +08:00
Rick Edgecombe
543bcdd06c x86/fpu: Invalidate FPU state correctly on exec()
commit 1f69383b20 upstream.

The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:

  1: fpu_fpregs_owner_ctx points to the current tasks FPU state
  2: the last CPU the registers were live in was the current CPU.

This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.

Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
 * If the FPU register state is valid, the kernel can skip restoring the
 * FPU state from memory.
 *
 * Any code that clobbers the FPU registers or updates the in-memory
 * FPU state for a task MUST let the rest of the kernel know that the
 * FPU registers are no longer valid for this task.
 *
 * Either one of these invalidation functions is enough. Invalidate
 * a resource you control: CPU if using the CPU for something else
 * (with preemption disabled), FPU for the current task, or a task that
 * is prevented from running by the current task.
 */

However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:

1. A task is executing in userspace on CPU0
	- CPU0’s FPU context points to tasks
	- fpu->last_cpu=CPU0

2. The task exec()’s

3. While in the kernel the task is preempted
	- CPU0 gets a thread executing in the kernel (such that no other
		FPU context is activated)
	- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out

4. Task is migrated to CPU1

5. Continuing the exec(), the task gets to
   fpu_flush_thread()->fpu_reset_fpregs()
	- Sets CPU1’s fpu context to NULL
	- Copies the init state to the task’s FPU buffer
	- Sets TIF_NEED_FPU_LOAD on the task

6. The task reschedules back to CPU0 before completing the exec() and
   returning to userspace
	- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
	- Skips saving the registers and updating task’s fpu→last_cpu,
	  because TIF_NEED_FPU_LOAD is the canonical source.

7. Now CPU0’s FPU context is still pointing to the task’s, and
   fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
   though the reset FPU state has not been restored.

So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().

Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.

Fixes: 33344368cb ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-05 01:25:09 +08:00
Sean Christopherson
ec9a583520 Revert "KVM: x86: enable TDP MMU by default"
This reverts commit 71ba3f3189.

Disable the TDP MMU by default in v5.15 kernels to "fix" several severe
performance bugs that have since been found and fixed in the TDP MMU, but
are unsuitable for backporting to v5.15.

The problematic bugs are fixed by upstream commit edbdb43fc9 ("KVM:
x86: Preserve TDP MMU roots until they are explicitly invalidated") and
commit 01b31714bd ("KVM: x86: Do not unload MMU roots when only toggling
CR0.WP with TDP enabled").  Both commits fix scenarios where KVM will
rebuild all TDP MMU page tables in paths that are frequently hit by
certain guest workloads.  While not exactly common, the guest workloads
are far from rare.  The fallout of rebuilding TDP MMU page tables can be
so severe in some cases that it induces soft lockups in the guest.

Commit edbdb43fc9 would require _significant_ effort and churn to
backport due it depending on a major rework that was done in v5.18.

Commit 01b31714bd has far fewer direct conflicts, but has several subtle
_known_ dependencies, and it's unclear whether or not there are more
unknown dependencies that have been missed.

Lastly, disabling the TDP MMU in v5.15 kernels also fixes a lurking train
wreck started by upstream commit a955cad84c ("KVM: x86/mmu: Retry page
fault if root is invalidated by memslot update").  That commit was tagged
for stable to fix a memory leak, but didn't cherry-pick cleanly and was
never backported to v5.15.  Which is extremely fortunate, as it introduced
not one but two bugs, one of which was fixed by upstream commit
18c841e1f4 ("KVM: x86: Retry page fault if MMU reload is pending and
root has no sp"), while the other was unknowingly fixed by upstream
commit ba6e3fe255 ("KVM: x86/mmu: Grab mmu_invalidate_seq in
kvm_faultin_pfn()") in v6.3 (a one-off fix will be made for v6.1 kernels,
which did receive a backport for a955cad84c).  Disabling the TDP MMU
by default reduces the probability of breaking v5.15 kernels by
backporting only a subset of the fixes.

As far as what is lost by disabling the TDP MMU, the main selling point of
the TDP MMU is its ability to service page fault VM-Exits in parallel,
i.e. the main benefactors of the TDP MMU are deployments of large VMs
(hundreds of vCPUs), and in particular delployments that live-migrate such
VMs and thus need to fault-in huge amounts of memory on many vCPUs after
restarting the VM after migration.

Smaller VMs can see performance improvements, but nowhere enough to make
up for the TDP MMU (in v5.15) absolutely cratering performance for some
workloads.  And practically speaking, anyone that is deploying and
migrating VMs with hundreds of vCPUs is likely rolling their own kernel,
not using a stock v5.15 series kernel.

Link: https://lore.kernel.org/all/ZDmEGM+CgYpvDLh6@google.com
Link: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
Acked-by: Mathias Krause <minipli@grsecurity.net>
Acked-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-05 01:25:07 +08:00
Borislav Petkov (AMD)
3d1cca5778 x86/srso: Correct the mitigation status when SMT is disabled
commit 6405b72e8d upstream.

Specify how is SRSO mitigated when SMT is disabled. Also, correct the
SMT check for that.

Fixes: e9fbc47b81 ("x86/srso: Disable the mitigation on unaffected configurations")
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20230814200813.p5czl47zssuej7nv@treble
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:08 +08:00
Petr Pavlu
dbeb65be5b x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG
commit 79cd2a1122 upstream.

The linker script arch/x86/kernel/vmlinux.lds.S matches the thunk
sections ".text.__x86.*" from arch/x86/lib/retpoline.S as follows:

  .text {
    [...]
    TEXT_TEXT
    [...]
    __indirect_thunk_start = .;
    *(.text.__x86.*)
    __indirect_thunk_end = .;
    [...]
  }

Macro TEXT_TEXT references TEXT_MAIN which normally expands to only
".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes
".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk
sections. The output layout is then different than expected. For
instance, the currently defined range [__indirect_thunk_start,
__indirect_thunk_end] becomes empty.

Prevent the problem by using ".." as the first separator, for example,
".text..__x86.indirect_thunk". This pattern is utilized by other
explicit section names which start with one of the standard prefixes,
such as ".text" or ".data", and that need to be individually selected in
the linker script.

  [ nathan: Fix conflicts with SRSO and fold in fix issue brought up by
    Andrew Cooper in post-review:
    https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ]

Fixes: dc5723b02e ("kbuild: add support for Clang LTO")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:08 +08:00
Borislav Petkov (AMD)
651b8c4c3f x86/srso: Disable the mitigation on unaffected configurations
commit e9fbc47b81 upstream.

Skip the srso cmd line parsing which is not needed on Zen1/2 with SMT
disabled and with the proper microcode applied (latter should be the
case anyway) as those are not affected.

Fixes: 5a15d83488 ("x86/srso: Tie SBPB bit setting to microcode patch detection")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230813104517.3346-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Borislav Petkov (AMD)
ad4e22de33 x86/CPU/AMD: Fix the DIV(0) initial fix attempt
commit f58d6fbcb7 upstream.

Initially, it was thought that doing an innocuous division in the #DE
handler would take care to prevent any leaking of old data from the
divider but by the time the fault is raised, the speculation has already
advanced too far and such data could already have been used by younger
operations.

Therefore, do the innocuous division on every exit to userspace so that
userspace doesn't see any potentially old data from integer divisions in
kernel space.

Do the same before VMRUN too, to protect host data from leaking into the
guest too.

Fixes: 77245f1c3c ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Sean Christopherson
cf9c6c326d x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
commit ba5ca5e5e6 upstream.

Use LEA instead of ADD when adjusting %rsp in srso_safe_ret{,_alias}()
so as to avoid clobbering flags.  Drop one of the INT3 instructions to
account for the LEA consuming one more byte than the ADD.

KVM's emulator makes indirect calls into a jump table of sorts, where
the destination of each call is a small blob of code that performs fast
emulation by executing the target instruction with fixed operands.

E.g. to emulate ADC, fastop() invokes adcb_al_dl():

  adcb_al_dl:
    <+0>:  adc    %dl,%al
    <+2>:  jmp    <__x86_return_thunk>

A major motivation for doing fast emulation is to leverage the CPU to
handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
both an input and output to the target of the call.  fastop() collects
the RFLAGS result by pushing RFLAGS onto the stack and popping them back
into a variable (held in %rdi in this case):

  asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"

  <+71>: mov    0xc0(%r8),%rdx
  <+78>: mov    0x100(%r8),%rcx
  <+85>: push   %rdi
  <+86>: popf
  <+87>: call   *%rsi
  <+89>: nop
  <+90>: nop
  <+91>: nop
  <+92>: pushf
  <+93>: pop    %rdi

and then propagating the arithmetic flags into the vCPU's emulator state:

  ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);

  <+64>:  and    $0xfffffffffffff72a,%r9
  <+94>:  and    $0x8d5,%edi
  <+109>: or     %rdi,%r9
  <+122>: mov    %r9,0x10(%r8)

The failures can be most easily reproduced by running the "emulator"
test in KVM-Unit-Tests.

If you're feeling a bit of deja vu, see commit b63f20a778
("x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386").

In addition, this breaks booting of clang-compiled guest on
a gcc-compiled host where the host contains the %rsp-modifying SRSO
mitigations.

  [ bp: Massage commit message, extend, remove addresses. ]

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Closes: https://lore.kernel.org/all/de474347-122d-54cd-eabf-9dcc95ab9eae@amd.com
Reported-by: Srikanth Aithal <sraithal@amd.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20230810013334.GA5354@dev-arch.thelio-3990X/
Link: https://lore.kernel.org/r/20230811155255.250835-1-seanjc@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Peter Zijlstra
a9828750d3 x86/static_call: Fix __static_call_fixup()
commit 5409730962 upstream.

Christian reported spurious module load crashes after some of Song's
module memory layout patches.

Turns out that if the very last instruction on the very last page of the
module is a 'JMP __x86_return_thunk' then __static_call_fixup() will
trip a fault and die.

And while the module rework made this slightly more likely to happen,
it's always been possible.

Fixes: ee88d363d1 ("x86,static_call: Use alternative RET encoding")
Reported-by: Christian Bricart <christian@bricart.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/20230816104419.GA982867@hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Borislav Petkov (AMD)
e7a43bd51f x86/srso: Explain the untraining sequences a bit more
commit 9dbd23e42f upstream.

The goal is to eventually have a proper documentation about all this.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814164447.GFZNpZ/64H4lENIe94@fat_crate.local
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Peter Zijlstra
c967a120e8 x86/cpu: Cleanup the untrain mess
commit e7c25c441e upstream.

Since there can only be one active return_thunk, there only needs be
one (matching) untrain_ret. It fundamentally doesn't make sense to
allow multiple untrain_ret at the same time.

Fold all the 3 different untrain methods into a single (temporary)
helper stub.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121149.042774962@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Peter Zijlstra
14ff5432ad x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
commit 42be649dd1 upstream.

For a more consistent namespace.

  [ bp: Fixup names in the doc too. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.976236447@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Peter Zijlstra
41a53e20d3 x86/cpu: Rename original retbleed methods
commit d025b7bac0 upstream.

Rename the original retbleed return thunk and untrain_ret to
retbleed_return_thunk() and retbleed_untrain_ret().

No functional changes.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.909378169@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Peter Zijlstra
69669472b6 x86/cpu: Clean up SRSO return thunk mess
commit d43490d0ab upstream.

Use the existing configurable return thunk. There is absolute no
justification for having created this __x86_return_thunk alternative.

To clarify, the whole thing looks like:

Zen3/4 does:

  srso_alias_untrain_ret:
	  nop2
	  lfence
	  jmp srso_alias_return_thunk
	  int3

  srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so
	  add $8, %rsp
	  ret
	  int3

  srso_alias_return_thunk:
	  call srso_alias_safe_ret
	  ud2

While Zen1/2 does:

  srso_untrain_ret:
	  movabs $foo, %rax
	  lfence
	  call srso_safe_ret           (jmp srso_return_thunk ?)
	  int3

  srso_safe_ret: // embedded in movabs instruction
	  add $8,%rsp
          ret
          int3

  srso_return_thunk:
	  call srso_safe_ret
	  ud2

While retbleed does:

  zen_untrain_ret:
	  test $0xcc, %bl
	  lfence
	  jmp zen_return_thunk
          int3

  zen_return_thunk: // embedded in the test instruction
	  ret
          int3

Where Zen1/2 flush the BTB entry using the instruction decoder trick
(test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence
(srso_safe_ret()) which forces the function return instruction to
speculate into a trap (UD2).  This RET will then mispredict and
execution will continue at the return site read from the top of the
stack.

Pick one of three options at boot (evey function can only ever return
once).

  [ bp: Fixup commit message uarch details and add them in a comment in
    the code too. Add a comment about the srso_select_mitigation()
    dependency on retbleed_select_mitigation(). Add moar ifdeffery for
    32-bit builds. Add a dummy srso_untrain_ret_alias() definition for
    32-bit alternatives needing the symbol. ]

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:07 +08:00
Peter Zijlstra
71ae8bd18d x86/alternative: Make custom return thunk unconditional
commit 095b8303f3 upstream.

There is infrastructure to rewrite return thunks to point to any
random thunk one desires, unwrap that from CALL_THUNKS, which up to
now was the sole user of that.

  [ bp: Make the thunks visible on 32-bit and add ifdeffery for the
    32-bit builds. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.775293785@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:06 +08:00
Peter Zijlstra
9a5b06e24d x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
commit af023ef335 upstream.

  vmlinux.o: warning: objtool: srso_untrain_ret() falls through to next function __x86_return_skl()
  vmlinux.o: warning: objtool: __x86_return_thunk() falls through to next function __x86_return_skl()

This is because these functions (can) end with CALL, which objtool
does not consider a terminating instruction. Therefore, replace the
INT3 instruction (which is a non-fatal trap) with UD2 (which is a
fatal-trap).

This indicates execution will not continue past this point.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.637802730@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:06 +08:00
Peter Zijlstra
2d6a4857cb x86/cpu: Fix __x86_return_thunk symbol type
commit 77f6711900 upstream.

Commit

  fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")

reimplemented __x86_return_thunk with a mix of SYM_FUNC_START and
SYM_CODE_END, this is not a sane combination.

Since nothing should ever actually 'CALL' this, make it consistently
CODE.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.571027074@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-28 23:27:06 +08:00
Arnd Bergmann
f274f9dc79 x86: Move gds_ucode_mitigated() declaration to header
commit eb3515dc99 upstream.

The declaration got placed in the .c file of the caller, but that
causes a warning for the definition:

arch/x86/kernel/cpu/bugs.c:682:6: error: no previous prototype for 'gds_ucode_mitigated' [-Werror=missing-prototypes]

Move it to a header where both sides can observe it instead.

Fixes: 81ac7e5d74 ("KVM: Add GDS_NO support to KVM")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/all/20230809130530.1913368-2-arnd%40kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:41 +08:00
Kirill A. Shutemov
b32a9d845f x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
commit 1b8b1aa90c upstream.

Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR
VMAs are placed above the 47-bit border:

8000001a9000-8000001ad000 r--p 00000000 00:00 0                          [vvar]
8000001ad000-8000001af000 r-xp 00000000 00:00 0                          [vdso]

This might confuse users who are not aware of 5-level paging and expect
all userspace addresses to be under the 47-bit border.

So far problem has only been triggered with ASLR disabled, although it
may also occur with ASLR enabled if the layout is randomized in a just
right way.

The problem happens due to custom placement for the VMAs in the VDSO
code: vdso_addr() tries to place them above the stack and checks the
result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to
the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW
instead.

Fixes: b569bab78d ("x86/mm: Prepare to expose larger address space to userspace")
Reported-by: Yingcong Wu <yingcong.wu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:40 +08:00
Cristian Ciocaltea
2a90acbfe3 x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
commit 6dbef74aeb upstream.

Commit

  522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")

provided a fix for the Zen2 VZEROUPPER data corruption bug affecting
a range of CPU models, but the AMD Custom APU 0405 found on SteamDeck
was not listed, although it is clearly affected by the vulnerability.

Add this CPU variant to the Zenbleed erratum list, in order to
unconditionally enable the fallback fix until a proper microcode update
is available.

Fixes: 522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230811203705.1699914-1-cristian.ciocaltea@collabora.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:40 +08:00
Nick Desaulniers
93939f38ee x86/srso: Fix build breakage with the LLVM linker
commit cbe8ded48b upstream.

The assertion added to verify the difference in bits set of the
addresses of srso_untrain_ret_alias() and srso_safe_ret_alias() would fail
to link in LLVM's ld.lld linker with the following error:

  ld.lld: error: ./arch/x86/kernel/vmlinux.lds:210: at least one side of
  the expression must be absolute
  ld.lld: error: ./arch/x86/kernel/vmlinux.lds:211: at least one side of
  the expression must be absolute

Use ABSOLUTE to evaluate the expression referring to at least one of the
symbols so that LLD can evaluate the linker script.

Also, add linker version info to the comment about XOR being unsupported
in either ld.bfd or ld.lld until somewhat recently.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Closes: https://lore.kernel.org/llvm/CA+G9fYsdUeNu-gwbs0+T6XHi4hYYk=Y9725-wFhZ7gJMspLDRA@mail.gmail.com/
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Daniel Kolesa <daniel@octaforge.org>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Suggested-by: Sven Volkinsfeld <thyrc@gmx.net>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://github.com/ClangBuiltLinux/linux/issues/1907
Link: https://lore.kernel.org/r/20230809-gds-v1-1-eaac90b0cbcc@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:40 +08:00
Borislav Petkov (AMD)
0b38a076be x86/CPU/AMD: Do not leak quotient data after a division by 0
commit 77245f1c3c upstream.

Under certain circumstances, an integer division by 0 which faults, can
leave stale quotient data from a previous division operation on Zen1
microarchitectures.

Do a dummy division 0/1 before returning from the #DE exception handler
in order to avoid any leaks of potentially sensitive data.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:35 +08:00
Greg Kroah-Hartman
f043370869 x86: fix backwards merge of GDS/SRSO bit
Stable-tree-only change.

Due to the way the GDS and SRSO patches flowed into the stable tree, it
was a 50% chance that the merge of the which value GDS and SRSO should
be.  Of course, I lost that bet, and chose the opposite of what Linus
chose in commit 64094e7e31 ("Merge tag 'gds-for-linus-2023-08-01' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

Fix this up by switching the values to match what is now in Linus's tree
as that is the correct value to mirror.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:30 +08:00
Borislav Petkov (AMD)
e9907fccfd x86/srso: Tie SBPB bit setting to microcode patch detection
commit 5a15d83488 upstream.

The SBPB bit in MSR_IA32_PRED_CMD is supported only after a microcode
patch has been applied so set X86_FEATURE_SBPB only then. Otherwise,
guests would attempt to set that bit and #GP on the MSR write.

While at it, make SMT detection more robust as some guests - depending
on how and what CPUID leafs their report - lead to cpu_smt_control
getting set to CPU_SMT_NOT_SUPPORTED but SRSO_NO should be set for any
guest incarnation where one simply cannot do SMT, for whatever reason.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:30 +08:00
Josh Poimboeuf
384b64f54c x86/srso: Fix return thunks in generated code
Upstream commit: 238ec850b9

Set X86_FEATURE_RETHUNK when enabling the SRSO mitigation so that
generated code (e.g., ftrace, static call, eBPF) generates "jmp
__x86_return_thunk" instead of RET.

  [ bp: Add a comment. ]

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:30 +08:00
Borislav Petkov (AMD)
6d94a0add2 x86/srso: Add IBPB on VMEXIT
Upstream commit: d893832d0e

Add the option to flush IBPB only on VMEXIT in order to protect from
malicious guests but one otherwise trusts the software that runs on the
hypervisor.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:30 +08:00
Borislav Petkov (AMD)
fe6b276bf9 x86/srso: Add IBPB
Upstream commit: 233d6f68b9

Add the option to mitigate using IBPB on a kernel entry. Pull in the
Retbleed alternative so that the IBPB call from there can be used. Also,
if Retbleed mitigation is done using IBPB, the same mitigation can and
must be used here.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:30 +08:00
Borislav Petkov (AMD)
82a625c063 x86/srso: Add SRSO_NO support
Upstream commit: 1b5277c0ea

Add support for the CPUID flag which denotes that the CPU is not
affected by SRSO.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:30 +08:00
Borislav Petkov (AMD)
1cf6b33abf x86/srso: Add IBPB_BRTYPE support
Upstream commit: 79113e4060

Add support for the synthetic CPUID flag which "if this bit is 1,
it indicates that MSR 49h (PRED_CMD) bit 0 (IBPB) flushes all branch
type predictions from the CPU branch predictor."

This flag is there so that this capability in guests can be detected
easily (otherwise one would have to track microcode revisions which is
impossible for guests).

It is also needed only for Zen3 and -4. The other two (Zen1 and -2)
always flush branch type predictions by default.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Borislav Petkov (AMD)
49a4b4f2ed x86/srso: Add a Speculative RAS Overflow mitigation
Upstream commit: fb3bd914b3

Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.

The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence.  To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.

To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference.  In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.

In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Kim Phillips
8adacb6566 x86/cpu, kvm: Add support for CPUID_80000021_EAX
commit 8415a74852 upstream.

Add support for CPUID leaf 80000021, EAX. The majority of the features will be
used in the kernel and thus a separate leaf is appropriate.

Include KVM's reverse_cpuid entry because features are used by VM guests, too.

  [ bp: Massage commit message. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230124163319.2277355-2-kim.phillips@amd.com
[bwh: Backported to 6.1: adjust context]
Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Borislav Petkov (AMD)
0146905ee0 x86/bugs: Increase the x86 bugs vector size to two u32s
Upstream commit: 0e52740ffd

There was never a doubt in my mind that they would not fit into a single
u32 eventually.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Peter Zijlstra
6be3828a8a x86/mm: Use mm_alloc() in poking_init()
commit 3f4c8211d9 upstream.

Instead of duplicating init_mm, allocate a fresh mm. The advantage is
that mm_alloc() has much simpler dependencies. Additionally it makes
more conceptual sense, init_mm has no (and must not have) user state
to duplicate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Juergen Gross
70174a6ec5 x86/mm: fix poking_init() for Xen PV guests
commit 26ce6ec364 upstream.

Commit 3f4c8211d9 ("x86/mm: Use mm_alloc() in poking_init()") broke
the kernel for running as Xen PV guest.

It seems as if the new address space is never activated before being
used, resulting in Xen rejecting to accept the new CR3 value (the PGD
isn't pinned).

Fix that by adding the now missing call of paravirt_arch_dup_mmap() to
poking_init(). That call was previously done by dup_mm()->dup_mmap() and
it is a NOP for all cases but for Xen PV, where it is just doing the
pinning of the PGD.

Fixes: 3f4c8211d9 ("x86/mm: Use mm_alloc() in poking_init()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230109150922.10578-1-jgross@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Juergen Gross
5482f887dc x86/xen: Fix secondary processors' FPU initialization
commit fe3e0a13e5 upstream.

Moving the call of fpu__init_cpu() from cpu_init() to start_secondary()
broke Xen PV guests, as those don't call start_secondary() for APs.

Call fpu__init_cpu() in Xen's cpu_bringup(), which is the Xen PV
replacement of start_secondary().

Fixes: b81fac906a ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230703130032.22916-1-jgross@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Daniel Sneddon
21b9ae4410 KVM: Add GDS_NO support to KVM
commit 81ac7e5d74 upstream

Gather Data Sampling (GDS) is a transient execution attack using
gather instructions from the AVX2 and AVX512 extensions. This attack
allows malicious code to infer data that was previously stored in
vector registers. Systems that are not vulnerable to GDS will set the
GDS_NO bit of the IA32_ARCH_CAPABILITIES MSR. This is useful for VM
guests that may think they are on vulnerable systems that are, in
fact, not affected. Guests that are running on affected hosts where
the mitigation is enabled are protected as if they were running
on an unaffected system.

On all hosts that are not affected or that are mitigated, set the
GDS_NO bit.

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Daniel Sneddon
26d98cae50 x86/speculation: Add Kconfig option for GDS
commit 53cf5797f1 upstream

Gather Data Sampling (GDS) is mitigated in microcode. However, on
systems that haven't received the updated microcode, disabling AVX
can act as a mitigation. Add a Kconfig option that uses the microcode
mitigation if available and disables AVX otherwise. Setting this
option has no effect on systems not affected by GDS. This is the
equivalent of setting gather_data_sampling=force.

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:29 +08:00
Daniel Sneddon
2f5b423f22 x86/speculation: Add force option to GDS mitigation
commit 553a5c03e9 upstream

The Gather Data Sampling (GDS) vulnerability allows malicious software
to infer stale data previously stored in vector registers. This may
include sensitive data such as cryptographic keys. GDS is mitigated in
microcode, and systems with up-to-date microcode are protected by
default. However, any affected system that is running with older
microcode will still be vulnerable to GDS attacks.

Since the gather instructions used by the attacker are part of the
AVX2 and AVX512 extensions, disabling these extensions prevents gather
instructions from being executed, thereby mitigating the system from
GDS. Disabling AVX2 is sufficient, but we don't have the granularity
to do this. The XCR0[2] disables AVX, with no option to just disable
AVX2.

Add a kernel parameter gather_data_sampling=force that will enable the
microcode mitigation if available, otherwise it will disable AVX on
affected systems.

This option will be ignored if cmdline mitigations=off.

This is a *big* hammer.  It is known to break buggy userspace that
uses incomplete, buggy AVX enumeration.  Unfortunately, such userspace
does exist in the wild:

	https://www.mail-archive.com/bug-coreutils@gnu.org/msg33046.html

[ dhansen: add some more ominous warnings about disabling AVX ]

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:28 +08:00
Daniel Sneddon
c9332c5382 x86/speculation: Add Gather Data Sampling mitigation
commit 8974eb5882 upstream

Gather Data Sampling (GDS) is a hardware vulnerability which allows
unprivileged speculative access to data which was previously stored in
vector registers.

Intel processors that support AVX2 and AVX512 have gather instructions
that fetch non-contiguous data elements from memory. On vulnerable
hardware, when a gather instruction is transiently executed and
encounters a fault, stale data from architectural or internal vector
registers may get transiently stored to the destination vector
register allowing an attacker to infer the stale data using typical
side channel techniques like cache timing attacks.

This mitigation is different from many earlier ones for two reasons.
First, it is enabled by default and a bit must be set to *DISABLE* it.
This is the opposite of normal mitigation polarity. This means GDS can
be mitigated simply by updating microcode and leaving the new control
bit alone.

Second, GDS has a "lock" bit. This lock bit is there because the
mitigation affects the hardware security features KeyLocker and SGX.
It needs to be enabled and *STAY* enabled for these features to be
mitigated against GDS.

The mitigation is enabled in the microcode by default. Disable it by
setting gather_data_sampling=off or by disabling all mitigations with
mitigations=off. The mitigation status can be checked by reading:

    /sys/devices/system/cpu/vulnerabilities/gather_data_sampling

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:28 +08:00
Thomas Gleixner
a56f08b6c7 x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
commit b81fac906a upstream

Initializing the FPU during the early boot process is a pointless
exercise. Early boot is convoluted and fragile enough.

Nothing requires that the FPU is set up early. It has to be initialized
before fork_init() because the task_struct size depends on the FPU register
buffer size.

Move the initialization to arch_cpu_finalize_init() which is the perfect
place to do so.

No functional change.

This allows to remove quite some of the custom early command line parsing,
but that's subject to the next installment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.902376621@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:28 +08:00
Thomas Gleixner
8f28b4132e x86/fpu: Mark init functions __init
commit 1703db2b90 upstream

No point in keeping them around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.841685728@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:28 +08:00
Thomas Gleixner
bb3e4482ed x86/fpu: Remove cpuinfo argument from init functions
commit 1f34bb2a24 upstream

Nothing in the call chain requires it

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.783704297@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:28 +08:00
Thomas Gleixner
b8a1b9af79 x86/init: Initialize signal frame size late
commit 54d9a91a3d upstream

No point in doing this during really early boot. Move it to an early
initcall so that it is set up before possible user mode helpers are started
during device initialization.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.727330699@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:28 +08:00
Thomas Gleixner
bc4865f001 init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
commit 439e17576e upstream

Invoke the X86ism mem_encrypt_init() from X86 arch_cpu_finalize_init() and
remove the weak fallback from the core code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.670360645@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:28 +08:00
Thomas Gleixner
26a98a3229 x86/cpu: Switch to arch_cpu_finalize_init()
commit 7c7077a726 upstream

check_bugs() is a dumping ground for finalizing the CPU bringup. Only parts of
it has to do with actual CPU bugs.

Split it apart into arch_cpu_finalize_init() and cpu_select_mitigations().

Fixup the bogus 32bit comments while at it.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230613224545.019583869@linutronix.de
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:27 +08:00
Sean Christopherson
2666a64675 KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
[ Upstream commit 26a0652cb4 ]

Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode.  The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).

Opportunistically fix a benign typo in the prototype for is_valid_cr4().

Cc: stable@vger.kernel.org
Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-20 16:01:25 +08:00
Sean Christopherson
4feeac71fc KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest
commit c4abd73520 upstream.

Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only
if KVM itself is not configured to utilize unrestricted guests, i.e. don't
stuff CR0/CR4 for a restricted L2 that is running as the guest of an
unrestricted L1.  Any attempt to VM-Enter a restricted guest with invalid
CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should
never observe a restricted L2 with incompatible CR0/CR4, since nested
VM-Enter from L1 should have failed.

And if KVM does observe an active, restricted L2 with incompatible state,
e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail
does more harm than good, as KVM will often neglect to undo the side
effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus
the damage can easily spill over to L1.  On the other hand, letting
VM-Enter fail due to bad guest state is more likely to contain the damage
to L2 as KVM relies on hardware to perform most guest state consistency
checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into
L1 irrespective of (un)restricted guest behavior.

Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Fixes: bddd82d19e ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:23 +08:00
Borislav Petkov (AMD)
a88d7d175e x86/cpu/amd: Add a Zenbleed fix
Upstream commit: 522b1d6921

Add a fix for the Zen2 VZEROUPPER data corruption bug where under
certain circumstances executing VZEROUPPER can cause register
corruption or leak data.

The optimal fix is through microcode but in the case the proper
microcode revision has not been applied, enable a fallback fix using
a chicken bit.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:07 +08:00
Borislav Petkov (AMD)
c655c9348f x86/cpu/amd: Move the errata checking functionality up
Upstream commit: 8b6f687743

Avoid new and remove old forward declarations.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-20 16:01:07 +08:00
Juergen Gross
fc8f928b97 x86/mm: Fix __swp_entry_to_pte() for Xen PV guests
[ Upstream commit 0f88130e8a ]

Normally __swp_entry_to_pte() is never called with a value translating
to a valid PTE. The only known exception is pte_swap_tests(), resulting
in a WARN splat in Xen PV guests, as __pte_to_swp_entry() did
translate the PFN of the valid PTE to a guest local PFN, while
__swp_entry_to_pte() doesn't do the opposite translation.

Fix that by using __pte() in __swp_entry_to_pte() instead of open
coding the native variant of it.

For correctness do the similar conversion for __swp_entry_to_pmd().

Fixes: 05289402d7 ("mm/debug_vm_pgtable: add tests validating arch helpers for core MM features")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230306123259.12461-1-jgross@suse.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-20 15:24:17 +08:00