linux-bl808/kernel
Gregory Haskins 07903af152 sched: Fix race in cpupri introduced by cpumask_var changes
Background:

Several race conditions in the scheduler have cropped up
recently, which Steven and I have tracked down using ftrace.
The most recent one turns out to be a race in how the scheduler
determines a suitable migration target for RT tasks, introduced
recently with commit:

    commit 68e74568fb
    Date:   Tue Nov 25 02:35:13 2008 +1030

        sched: convert struct cpupri_vec cpumask_var_t.

The original design of cpupri allowed lockless readers to
quickly determine a best-estimate target.  Races between the
pri_active bitmap and the vec->mask were handled in the
original code because we would detect and return "0" when this
occured.  The design was predicated on the *effective*
atomicity (*) of caching the result of cpus_and() between the
cpus_allowed and the vec->mask.

Commit 68e74568 changed the behavior such that vec->mask is
accessed multiple times.  This introduces a subtle race, the
result of which means we can have a result that returns "1",
but with an empty bitmap.

*) yes, we know cpus_and() is not a locked operator across the
   entire composite array, but it is implicitly atomic on a
   per-word basis which is all the design required to work.

Implementation:

Rather than forgoing the lockless design, or reverting to a
stack-based cpumask_t, we simply check for when the race has
been encountered and continue processing in the event that the
race is hit.  This renders the removal race as if the priority
bit had been atomically cleared as well, and allows the
algorithm to execute correctly.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090730145728.25226.92769.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02 14:23:29 +02:00
..
gcov gcov: enable GCOV_PROFILE_ALL for x86_64 2009-06-18 13:03:58 -07:00
irq genirq: Fix UP compile failure caused by irq_thread_check_affinity 2009-07-22 23:18:46 +02:00
power headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
time clocksource: Prevent NULL pointer dereference 2009-07-19 17:15:54 +02:00
trace tracing/stat: Fix seqfile memory leak 2009-07-23 09:53:55 -04:00
.gitignore
acct.c bsdacct: fix access to invalid filp in acct_on() 2009-06-30 18:56:00 -07:00
async.c
audit.c Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit.h Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit_tree.c Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit_watch.c Audit: clean up all op= output to include string quoting 2009-06-24 00:00:52 -04:00
auditfilter.c Audit: clean up all op= output to include string quoting 2009-06-24 00:00:52 -04:00
auditsc.c Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup avoid permanent sleep at rmdir 2009-07-29 19:10:35 -07:00
cgroup_debug.c
cgroup_freezer.c
compat.c
configs.c
cpu.c mm/init: cpu_hotplug_init() must be initialized before SLAB 2009-06-22 21:18:12 -07:00
cpuset.c cpuset,mm: update tasks' mems_allowed in time 2009-06-16 19:47:31 -07:00
cred-internals.h
cred.c
delayacct.c
dma-coherent.c
dma.c
exec_domain.c
exit.c headers: mnt_namespace.h redux 2009-07-08 09:31:56 -07:00
extable.c
fork.c mm: copy over oom_adj value at fork time 2009-07-29 19:10:34 -07:00
freezer.c sched: fix nr_uninterruptible accounting of frozen tasks really 2009-07-18 14:19:53 +02:00
futex.c futexes: Fix infinite loop in get_futex_key() on huge page 2009-07-11 12:40:44 +02:00
futex_compat.c
groups.c groups: move code to kernel/groups.c 2009-06-16 19:47:48 -07:00
hrtimer.c hrtimer: Fix migration expiry check 2009-07-10 17:32:55 +02:00
hung_task.c
itimer.c
kallsyms.c
Kconfig.freezer
Kconfig.hz
Kconfig.preempt
kexec.c kexec: fix omitting offset in extended crashkernel syntax 2009-07-29 19:10:34 -07:00
kfifo.c kernel/kfifo.c: replace conditional test with is_power_of_2() 2009-06-16 19:47:47 -07:00
kgdb.c
kmod.c headers: mnt_namespace.h redux 2009-07-08 09:31:56 -07:00
kprobes.c kprobes: Use kernel_text_address() for checking probe address 2009-07-30 16:44:06 -07:00
ksysfs.c
kthread.c update the comment in kthread_stop() 2009-07-27 12:15:46 -07:00
latencytop.c
lockdep.c
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
Makefile Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-06-28 11:05:04 -07:00
marker.c
module.c module: use MODULE_SYMBOL_PREFIX with module_layout 2009-07-27 12:15:45 -07:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c nsproxy: extract create_nsproxy() 2009-06-18 13:03:56 -07:00
panic.c
params.c
perf_counter.c Merge branch 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf 2009-07-22 11:41:56 -07:00
pid.c kmemleak: Remove alloc_bootmem annotations introduced in the past 2009-07-09 17:07:02 +01:00
pid_namespace.c pidns: rewrite copy_pid_ns() 2009-06-18 13:03:55 -07:00
pm_qos_params.c
posix-cpu-timers.c
posix-timers.c
printk.c printk: Add KERN_DEFAULT printk log-level 2009-06-16 11:02:28 -07:00
profile.c profile: suppress warning about large allocations when profile=1 is specified 2009-07-29 19:10:36 -07:00
ptrace.c cred_guard_mutex: do not return -EINTR to user-space 2009-07-06 13:57:04 -07:00
rcuclassic.c
rcupdate.c
rcupreempt.c
rcupreempt_trace.c
rcutorture.c
rcutree.c rcu: Mark Hierarchical RCU no longer experimental 2009-06-24 15:02:48 +02:00
rcutree.h
rcutree_trace.c
relay.c
res_counter.c memcg: add interface to reset limits 2009-06-18 13:03:48 -07:00
resource.c kernel/resource.c: fix sign extension in reserve_setup() 2009-06-30 18:56:00 -07:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c sched: fix load average accounting vs. cpu hotplug 2009-07-18 14:19:52 +02:00
sched_clock.c
sched_cpupri.c sched: Fix race in cpupri introduced by cpumask_var changes 2009-08-02 14:23:29 +02:00
sched_cpupri.h
sched_debug.c sched: Hide runqueues from direct refer at source code level 2009-06-17 18:29:42 +02:00
sched_fair.c sched: Fix latencytop and sleep profiling vs group scheduling 2009-08-02 14:10:12 +02:00
sched_features.h
sched_idletask.c
sched_rt.c sched_rt: Fix overload bug on rt group scheduling 2009-07-10 10:43:29 +02:00
sched_stats.h
seccomp.c
semaphore.c
signal.c ptrace: do_notify_parent_cldstop: fix the wrong ->nsproxy usage 2009-06-18 13:03:52 -07:00
slow-work.c slow-work: use round_jiffies() for thread pool's cull and OOM timers 2009-06-16 19:47:49 -07:00
smp.c
softirq.c softirq: introduce tasklet_hrtimer infrastructure 2009-07-22 17:01:17 +02:00
softlockup.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys.c groups: move code to kernel/groups.c 2009-06-16 19:47:48 -07:00
sys_ni.c
sysctl.c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-06-28 11:05:28 -07:00
sysctl_check.c
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c timer: Avoid reading uninitialized data 2009-07-18 23:11:43 +02:00
tracepoint.c
tsacct.c
uid16.c
up.c
user.c sched: delayed cleanup of user_struct 2009-06-15 21:30:23 -07:00
user_namespace.c
utsname.c utsns: extract creeate_uts_ns() 2009-06-18 13:03:55 -07:00
utsname_sysctl.c
wait.c
workqueue.c