No description
Find a file
Rick Edgecombe 543bcdd06c x86/fpu: Invalidate FPU state correctly on exec()
commit 1f69383b20 upstream.

The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:

  1: fpu_fpregs_owner_ctx points to the current tasks FPU state
  2: the last CPU the registers were live in was the current CPU.

This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.

Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
 * If the FPU register state is valid, the kernel can skip restoring the
 * FPU state from memory.
 *
 * Any code that clobbers the FPU registers or updates the in-memory
 * FPU state for a task MUST let the rest of the kernel know that the
 * FPU registers are no longer valid for this task.
 *
 * Either one of these invalidation functions is enough. Invalidate
 * a resource you control: CPU if using the CPU for something else
 * (with preemption disabled), FPU for the current task, or a task that
 * is prevented from running by the current task.
 */

However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:

1. A task is executing in userspace on CPU0
	- CPU0’s FPU context points to tasks
	- fpu->last_cpu=CPU0

2. The task exec()’s

3. While in the kernel the task is preempted
	- CPU0 gets a thread executing in the kernel (such that no other
		FPU context is activated)
	- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out

4. Task is migrated to CPU1

5. Continuing the exec(), the task gets to
   fpu_flush_thread()->fpu_reset_fpregs()
	- Sets CPU1’s fpu context to NULL
	- Copies the init state to the task’s FPU buffer
	- Sets TIF_NEED_FPU_LOAD on the task

6. The task reschedules back to CPU0 before completing the exec() and
   returning to userspace
	- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
	- Skips saving the registers and updating task’s fpu→last_cpu,
	  because TIF_NEED_FPU_LOAD is the canonical source.

7. Now CPU0’s FPU context is still pointing to the task’s, and
   fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
   though the reset FPU state has not been restored.

So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().

Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.

Fixes: 33344368cb ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-05 01:25:09 +08:00
arch x86/fpu: Invalidate FPU state correctly on exec() 2023-09-05 01:25:09 +08:00
block block/partition: fix signedness issue for Amiga partitions 2023-08-20 15:24:51 +08:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2023-04-19 17:50:34 +08:00
crypto KEYS: asymmetric: Copy sig and digest in public_key_verify_signature() 2023-08-20 15:21:12 +08:00
Documentation x86/cpu: Rename srso_(.*)_alias to srso_alias_\1 2023-08-28 23:27:07 +08:00
drivers drm/vmwgfx: Fix shader stage validation 2023-09-05 01:25:09 +08:00
fs nfsd: Fix race to FREE_STATEID and cl_revoked 2023-09-05 01:25:08 +08:00
include drm/display/dp: Fix the DP DSC Receiver cap size 2023-09-05 01:25:09 +08:00
init x86/mm: Initialize text poking earlier 2023-08-20 16:01:29 +08:00
io_uring io_uring: correct check for O_TMPFILE 2023-08-20 16:01:40 +08:00
ipc ipc/sem: Fix dangling sem_array access in semtimedop race 2023-04-19 17:56:54 +08:00
kernel tracing: Fix memleak due to race between current_tracer and trace 2023-09-05 01:25:05 +08:00
lib radix tree: remove unused variable 2023-09-05 01:25:09 +08:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm mm: add a call to flush_cache_vmap() in vmap_pfn() 2023-09-05 01:25:07 +08:00
net batman-adv: Hold rtnl lock during MTU update via netlink 2023-09-05 01:25:08 +08:00
samples samples: ftrace: Save required argument registers in sample trampolines 2023-08-20 16:01:05 +08:00
scripts kbuild: Disable GCOV for *.mod.o 2023-08-20 15:24:38 +08:00
security selinux: set next pointer before attaching to list 2023-09-05 01:25:08 +08:00
sound ALSA: ymfpci: Fix the missing snd_card_free() call at probe error 2023-09-05 01:25:07 +08:00
tools objtool/x86: Fix SRSO mess 2023-09-05 01:25:02 +08:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2023-04-19 17:44:58 +08:00
virt KVM: Grab a reference to KVM for VM and vCPU stats file descriptors 2023-08-20 16:01:23 +08:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: add Andrej Shadura 2021-10-18 20:22:03 -10:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Daniel Drake to credits 2021-09-21 08:34:58 +03:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS iio: stx104: Move to addac subdirectory 2023-08-28 23:26:58 +08:00
Makefile Linux 5.15.128 2023-08-28 23:27:08 +08:00
README
stfsync.txt add file to track last rebase commit from stf 2023-08-21 00:12:49 +08:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.