No description
Find a file
Rongwei Wang 324f169fa5 mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
commit 6fe7d6b992 upstream.

The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption.  The only place we have found where this
happens is in the swapoff path.  This case can be described as below:

core 0                       core 1
swapoff

del_from_avail_list(si)      waiting

try lock si->lock            acquire swap_avail_lock
                             and re-add si into
                             swap_avail_head

acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.

It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g.  stress-ng-swap).

However, in the worst case, panic can be caused by the above scene.  In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap.  This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)

------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 plist_check_prev_next_node+0x50/0x70
 plist_check_head+0x80/0xf0
 plist_add+0x28/0x140
 add_to_avail_list+0x9c/0xf0
 _enable_swap_info+0x78/0xb4
 __do_sys_swapon+0x918/0xa10
 __arm64_sys_swapon+0x20/0x30
 el0_svc_common+0x8c/0x220
 do_el0_svc+0x2c/0x90
 el0_svc+0x1c/0x30
 el0_sync_handler+0xa8/0xb0
 el0_sync+0x148/0x180
irq event stamp: 2082270

Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.

This problem exists in versions after stable 5.10.y.

Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Fixes: a2468cc9bf ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-19 18:01:23 +08:00
arch KVM: s390: pv: fix external interruption loop not always detected 2023-04-19 18:01:18 +08:00
block block: don't allow multiple bios for IOCB_NOWAIT issue 2023-04-19 18:00:13 +08:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2023-04-19 17:50:34 +08:00
crypto crypto: rsa-pkcs1pad - Use akcipher_request_complete 2023-04-19 17:59:48 +08:00
Documentation dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs 2023-04-19 18:01:22 +08:00
drivers drm/nouveau/disp: Support more modes by checking with lower bpc 2023-04-19 18:01:23 +08:00
fs fs: drop peer group ids under namespace lock 2023-04-19 18:01:23 +08:00
include ftrace: Mark get_lock_parent_ip() __always_inline 2023-04-19 18:01:23 +08:00
init kbuild: Add CONFIG_PAHOLE_VERSION 2023-04-19 17:59:33 +08:00
io_uring io_uring: avoid null-ptr-deref in io_arm_poll_handler 2023-04-19 18:00:57 +08:00
ipc ipc/sem: Fix dangling sem_array access in semtimedop race 2023-04-19 17:56:54 +08:00
kernel ring-buffer: Fix race while reader and writer are on the same page 2023-04-19 18:01:23 +08:00
lib kobject: Fix slab-out-of-bounds in fill_kobj_path() 2023-04-19 18:00:00 +08:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-19 18:01:23 +08:00
net can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events 2023-04-19 18:01:23 +08:00
samples samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe() 2023-04-19 17:57:46 +08:00
scripts kconfig: Update config changed flag before calling callback 2023-04-19 18:00:53 +08:00
security keys: Do not cache key in task struct if key is requested from kernel thread 2023-04-19 18:01:01 +08:00
sound ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots() 2023-04-19 18:01:23 +08:00
tools libbpf: Fix btf_dump's packed struct determination 2023-04-19 18:01:17 +08:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2023-04-19 17:44:58 +08:00
virt KVM: fix memoryleak in kvm_init() 2023-04-19 18:00:47 +08:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: add Andrej Shadura 2021-10-18 20:22:03 -10:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Daniel Drake to credits 2021-09-21 08:34:58 +03:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS kbuild: Add CONFIG_PAHOLE_VERSION 2023-04-19 17:59:33 +08:00
Makefile kbuild: refactor single builds of *.ko 2023-04-19 18:01:20 +08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.