Star64_linux/kernel
Chen Lin 335de919ae ring-buffer: Do not swap cpu_buffer during resize process
[ Upstream commit 8a96c0288d ]

When ring_buffer_swap_cpu was called during resize process,
the cpu buffer was swapped in the middle, resulting in incorrect state.
Continuing to run in the wrong state will result in oops.

This issue can be easily reproduced using the following two scripts:
/tmp # cat test1.sh
//#! /bin/sh
for i in `seq 0 100000`
do
         echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb
         sleep 0.5
         echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb
         sleep 0.5
done
/tmp # cat test2.sh
//#! /bin/sh
for i in `seq 0 100000`
do
        echo irqsoff > /sys/kernel/debug/tracing/current_tracer
        sleep 1
        echo nop > /sys/kernel/debug/tracing/current_tracer
        sleep 1
done
/tmp # ./test1.sh &
/tmp # ./test2.sh &

A typical oops log is as follows, sometimes with other different oops logs.

[  231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8
[  231.713375] Modules linked in:
[  231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 
[  231.716750] Hardware name: linux,dummy-virt (DT)
[  231.718152] Workqueue: events update_pages_handler
[  231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  231.721171] pc : rb_update_pages+0x378/0x3f8
[  231.722212] lr : rb_update_pages+0x25c/0x3f8
[  231.723248] sp : ffff800082b9bd50
[  231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[  231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0
[  231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a
[  231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000
[  231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510
[  231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
[  231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558
[  231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001
[  231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000
[  231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208
[  231.744196] Call trace:
[  231.744892]  rb_update_pages+0x378/0x3f8
[  231.745893]  update_pages_handler+0x1c/0x38
[  231.746893]  process_one_work+0x1f0/0x468
[  231.747852]  worker_thread+0x54/0x410
[  231.748737]  kthread+0x124/0x138
[  231.749549]  ret_from_fork+0x10/0x20
[  231.750434] ---[ end trace 0000000000000000 ]---
[  233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  233.721696] Mem abort info:
[  233.721935]   ESR = 0x0000000096000004
[  233.722283]   EC = 0x25: DABT (current EL), IL = 32 bits
[  233.722596]   SET = 0, FnV = 0
[  233.722805]   EA = 0, S1PTW = 0
[  233.723026]   FSC = 0x04: level 0 translation fault
[  233.723458] Data abort info:
[  233.723734]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  233.724176]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  233.724589]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000
[  233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  233.726231] Internal error: Oops: 0000000096000004 [] PREEMPT SMP
[  233.726720] Modules linked in:
[  233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 
[  233.727777] Hardware name: linux,dummy-virt (DT)
[  233.728225] Workqueue: events update_pages_handler
[  233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  233.729054] pc : rb_update_pages+0x1a8/0x3f8
[  233.729334] lr : rb_update_pages+0x154/0x3f8
[  233.729592] sp : ffff800082b9bd50
[  233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[  233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418
[  233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003
[  233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58
[  233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001
[  233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000
[  233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c
[  233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0
[  233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000
[  233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000
[  233.734418] Call trace:
[  233.734593]  rb_update_pages+0x1a8/0x3f8
[  233.734853]  update_pages_handler+0x1c/0x38
[  233.735148]  process_one_work+0x1f0/0x468
[  233.735525]  worker_thread+0x54/0x410
[  233.735852]  kthread+0x124/0x138
[  233.736064]  ret_from_fork+0x10/0x20
[  233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060)
[  233.736959] ---[ end trace 0000000000000000 ]---

After analysis, the seq of the error is as follows [1-5]:

int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
			int cpu_id)
{
	for_each_buffer_cpu(buffer, cpu) {
		cpu_buffer = buffer->buffers[cpu];
		//1. get cpu_buffer, aka cpu_buffer(A)
		...
		...
		schedule_work_on(cpu,
		 &cpu_buffer->update_pages_work);
		//2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to
		// update_pages_handler, do the update process, set 'update_done' in
		// complete(&cpu_buffer->update_done) and to wakeup resize process.
	//---->
		//3. Just at this moment, ring_buffer_swap_cpu is triggered,
		//cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer.
		//ring_buffer_swap_cpu is called as the 'Call trace' below.

		Call trace:
		 dump_backtrace+0x0/0x2f8
		 show_stack+0x18/0x28
		 dump_stack+0x12c/0x188
		 ring_buffer_swap_cpu+0x2f8/0x328
		 update_max_tr_single+0x180/0x210
		 check_critical_timing+0x2b4/0x2c8
		 tracer_hardirqs_on+0x1c0/0x200
		 trace_hardirqs_on+0xec/0x378
		 el0_svc_common+0x64/0x260
		 do_el0_svc+0x90/0xf8
		 el0_svc+0x20/0x30
		 el0_sync_handler+0xb0/0xb8
		 el0_sync+0x180/0x1c0
	//<----

	/* wait for all the updates to complete */
	for_each_buffer_cpu(buffer, cpu) {
		cpu_buffer = buffer->buffers[cpu];
		//4. get cpu_buffer, cpu_buffer(B) is used in the following process,
		//the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong.
		//for example, cpu_buffer(A)->update_done will leave be set 1, and will
		//not 'wait_for_completion' at the next resize round.
		  if (!cpu_buffer->nr_pages_to_update)
			continue;

		if (cpu_online(cpu))
			wait_for_completion(&cpu_buffer->update_done);
		cpu_buffer->nr_pages_to_update = 0;
	}
	...
}
	//5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong,
	//Continuing to run in the wrong state, then oops occurs.

Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cn

Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-28 23:26:57 +08:00
..
bpf bpf: aggressively forget precise markings during state checkpointing 2023-08-20 16:01:39 +08:00
cgroup cgroup: Do not corrupt task iteration when rebinding subsystem 2023-08-20 15:23:59 +08:00
configs drivers/char: remove /dev/kmem for good 2021-05-07 00:26:34 -07:00
debug lockdown: also lock down previous kgdb use 2023-04-19 17:49:02 +08:00
dma dma-remap: use kvmalloc_array/kvfree for larger dma memory remap 2023-08-28 23:26:55 +08:00
entry entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up 2023-04-19 18:01:03 +08:00
events perf: Fix function pointer case 2023-08-20 16:01:30 +08:00
futex futex: Resend potentially swallowed owner death notification 2023-04-19 17:57:16 +08:00
gcov gcov: add support for checksum field 2023-04-19 17:58:06 +08:00
irq irqdomain: Fix mapping-creation race 2023-04-19 18:00:44 +08:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-08-20 15:24:34 +08:00
livepatch livepatch: fix race between fork and KLP transition 2023-04-19 17:54:26 +08:00
locking locking/rtmutex: Fix task->pi_waiters integrity 2023-08-20 16:01:25 +08:00
power workqueue: Introduce show_one_worker_pool and show_one_workqueue. 2023-06-06 18:37:26 +08:00
printk kernel/printk/index.c: fix memory leak with using debugfs_lookup() 2023-04-19 18:00:32 +08:00
rcu rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale 2023-08-20 15:24:17 +08:00
sched sched: Fix DEBUG && !SCHEDSTATS warn 2023-06-06 18:37:33 +08:00
time timers/nohz: Last resort update jiffies on nohz_full IRQ entry 2023-08-20 16:01:45 +08:00
trace ring-buffer: Do not swap cpu_buffer during resize process 2023-08-28 23:26:57 +08:00
.gitignore .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-04-19 17:57:58 +08:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2023-04-19 17:45:26 +08:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2023-04-19 17:45:01 +08:00
audit.h audit: log AUDIT_TIME_* records only from rules 2023-04-19 17:46:41 +08:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2023-04-19 17:53:17 +08:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-08-24 18:52:36 -04:00
audit_watch.c
auditfilter.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
auditsc.c audit: log AUDIT_TIME_* records only from rules 2023-04-19 17:46:41 +08:00
backtracetest.c
bounds.c
capability.c
cfi.c cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle 2023-04-19 17:50:36 +08:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-19 18:01:10 +08:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Do not bail-out in DYING/STARTING sections 2023-04-19 17:57:16 +08:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2023-04-19 17:43:23 +08:00
crash_dump.c
cred.c ucounts: Base set_cred_ucounts changes on the real user 2023-04-19 17:45:35 +08:00
delayacct.c delayacct: Add sysctl to enable at runtime 2021-05-12 11:43:25 +02:00
dma.c
exec_domain.c
exit.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-04-19 17:59:01 +08:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-04-19 18:00:35 +08:00
fork.c mm: Move mm_cachep initialization to mm_init() 2023-08-20 16:01:29 +08:00
freezer.c sched: Add get_current_state() 2021-06-18 11:43:08 +02:00
gen_kheaders.sh kbuild: clean up ${quiet} checks in shell scripts 2021-05-27 04:01:50 +09:00
groups.c
hung_task.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
iomem.c
irq_work.c irq_work: Make irq_work_queue() NMI-safe again 2021-06-10 10:00:08 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c module: add printk formats to add module build ID to stacktraces 2021-07-08 11:48:22 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt sched/core: Disable CONFIG_SCHED_CORE by default 2021-06-28 22:43:05 +02:00
kcov.c
kexec.c panic, kexec: make __crash_kexec() NMI safe 2023-06-06 18:05:32 +08:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-08-20 15:24:20 +08:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-08-20 15:23:41 +08:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-06-06 18:05:32 +08:00
kheaders.c kheaders: Use array declaration instead of char 2023-06-06 18:34:51 +08:00
kmod.c modules: add CONFIG_MODPROBE_PATH 2021-05-07 00:26:33 -07:00
kprobes.c x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range 2023-04-19 18:00:18 +08:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-06-06 18:05:31 +08:00
kthread.c kthread: add the helper function kthread_run_on_cpu() 2023-04-19 18:00:58 +08:00
latencytop.c
Makefile futex: Move to kernel/futex/ 2023-04-19 17:57:16 +08:00
module-internal.h
module.c module: Don't wait for GOING modules 2023-04-19 17:59:01 +08:00
module_signature.c
module_signing.c
notifier.c notifier: Remove atomic_notifier_call_chain_robust() 2021-08-16 18:55:32 +02:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Fix list iterator in padata_do_serial() 2023-04-19 17:57:39 +08:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-04-19 17:59:01 +08:00
params.c params: lift param_set_uint_minmax to common code 2021-08-16 14:42:22 +02:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2021-08-10 12:53:07 +02:00
pid_namespace.c rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() 2023-04-19 17:59:43 +08:00
profile.c profiling: fix shift too large makes kernel panic 2023-04-19 17:52:42 +08:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2023-04-19 17:49:21 +08:00
range.c
reboot.c reboot: Add hardware protection power-off 2021-06-21 13:08:36 +01:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-06-06 18:34:53 +08:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-04-19 18:00:22 +08:00
resource_kunit.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2023-04-19 17:46:44 +08:00
scftorture.c scftorture: Fix distribution of short handler delays 2023-04-19 17:49:34 +08:00
scs.c scs: Release kasan vmalloc poison in scs_free process 2023-04-19 16:57:20 +08:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2023-04-19 17:45:21 +08:00
signal.c signal handling: don't use BUG_ON() for debugging 2023-04-19 17:51:27 +08:00
smp.c locking/csd_lock: Change csdlock_debug from early_param to __setup 2023-04-19 17:52:53 +08:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c timers/nohz: Last resort update jiffies on nohz_full IRQ entry 2023-08-20 16:01:45 +08:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2023-04-19 17:45:26 +08:00
stacktrace.c stacktrace: move filter_irq_stacks() to kernel/stacktrace.c 2023-04-19 17:47:53 +08:00
static_call.c static_call: Don't make __static_call_return0 static 2023-04-19 17:47:53 +08:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2023-04-19 17:47:53 +08:00
stop_machine.c stop_machine: Add caller debug info to queue_stop_cpus_work 2021-03-23 16:01:58 +01:00
sys.c kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() 2023-06-06 18:06:44 +08:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2023-04-19 17:53:17 +08:00
sysctl-test.c kernel/sysctl-test: Remove some casts which are no-longer required 2021-06-23 16:41:24 -06:00
sysctl.c kernel/panic: move panic sysctls to its own file 2023-04-19 17:58:59 +08:00
task_work.c kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Replace deprecated CPU-hotplug functions. 2021-08-10 10:48:07 -07:00
tracepoint.c tracepoint: Fix kerneldoc comments 2021-08-16 11:39:51 -04:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2023-04-19 17:44:42 +08:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2023-04-19 17:45:35 +08:00
uid16.c
uid16.h
umh.c kernel/umh.c: fix some spelling mistakes 2021-05-07 00:26:34 -07:00
up.c A set of locking related fixes and updates: 2021-05-09 13:07:03 -07:00
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2023-04-19 17:45:56 +08:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname.c
utsname_sysctl.c
watch_queue.c watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths 2023-04-19 18:00:44 +08:00
watchdog.c watchdog: export lockup_detector_reconfigure 2023-04-19 17:53:14 +08:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-08-20 15:24:20 +08:00
workqueue.c workqueue: clean up WORK_* constant types, clarify masking 2023-08-20 15:24:52 +08:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00