Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - introduce and use task_rcu_dereference()/try_get_task_struct() to fix
   and generalize task_struct handling (Oleg Nesterov)

 - do various per entity load tracking (PELT) fixes and optimizations
   (Peter Zijlstra)

 - cputime virt-steal time accounting enhancements/fixes (Wanpeng Li)

 - introduce consolidated cputime output file cpuacct.usage_all and
   related refactorings (Zhao Lei)

 - ... plus misc fixes and enhancements

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Panic on scheduling while atomic bugs if kernel.panic_on_warn is set
  sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together
  sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()
  sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums
  sched/fair: Rework throttle_count sync
  sched/core: Fix sched_getaffinity() return value kerneldoc comment
  sched/fair: Reorder cgroup creation code
  sched/fair: Apply more PELT fixes
  sched/fair: Fix PELT integrity for new tasks
  sched/cgroup: Fix cpu_cgroup_fork() handling
  sched/fair: Fix PELT integrity for new groups
  sched/fair: Fix and optimize the fork() path
  sched/cputime: Add steal time support to full dynticks CPU time accounting
  sched/cputime: Fix prev steal time accouting during CPU hotplug
  KVM: Fix steal clock warp during guest CPU hotplug
  sched/debug: Always show 'nr_migrations'
  sched/fair: Use task_rcu_dereference()
  sched/api: Introduce task_rcu_dereference() and try_get_task_struct()
  sched/idle: Optimize the generic idle loop
  sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()
This commit is contained in:
Linus Torvalds 2016-07-25 13:59:34 -07:00
commit cca08cd66c
10 changed files with 418 additions and 190 deletions

View file

@ -210,6 +210,82 @@ repeat:
goto repeat;
}
/*
* Note that if this function returns a valid task_struct pointer (!NULL)
* task->usage must remain >0 for the duration of the RCU critical section.
*/
struct task_struct *task_rcu_dereference(struct task_struct **ptask)
{
struct sighand_struct *sighand;
struct task_struct *task;
/*
* We need to verify that release_task() was not called and thus
* delayed_put_task_struct() can't run and drop the last reference
* before rcu_read_unlock(). We check task->sighand != NULL,
* but we can read the already freed and reused memory.
*/
retry:
task = rcu_dereference(*ptask);
if (!task)
return NULL;
probe_kernel_address(&task->sighand, sighand);
/*
* Pairs with atomic_dec_and_test() in put_task_struct(). If this task
* was already freed we can not miss the preceding update of this
* pointer.
*/
smp_rmb();
if (unlikely(task != READ_ONCE(*ptask)))
goto retry;
/*
* We've re-checked that "task == *ptask", now we have two different
* cases:
*
* 1. This is actually the same task/task_struct. In this case
* sighand != NULL tells us it is still alive.
*
* 2. This is another task which got the same memory for task_struct.
* We can't know this of course, and we can not trust
* sighand != NULL.
*
* In this case we actually return a random value, but this is
* correct.
*
* If we return NULL - we can pretend that we actually noticed that
* *ptask was updated when the previous task has exited. Or pretend
* that probe_slab_address(&sighand) reads NULL.
*
* If we return the new task (because sighand is not NULL for any
* reason) - this is fine too. This (new) task can't go away before
* another gp pass.
*
* And note: We could even eliminate the false positive if re-read
* task->sighand once again to avoid the falsely NULL. But this case
* is very unlikely so we don't care.
*/
if (!sighand)
return NULL;
return task;
}
struct task_struct *try_get_task_struct(struct task_struct **ptask)
{
struct task_struct *task;
rcu_read_lock();
task = task_rcu_dereference(ptask);
if (task)
get_task_struct(task);
rcu_read_unlock();
return task;
}
/*
* Determine if a process group is "orphaned", according to the POSIX
* definition in 2.2.2.52. Orphaned process groups are not to be affected