TCP Appropriate Byte Count was added by me, but later disabled.
There is no point in maintaining it since it is a potential source
of bugs and Linux already implements other better window protection
heuristics.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Typical cputime stats infrastructure relies on the timer tick and
its periodic polling on the CPU to account the amount of time
spent by the CPUs and the tasks per high level domains such as
userspace, kernelspace, guest, ...
Now we are preparing to implement full dynticks capability on
Linux for Real Time and HPC users who want full CPU isolation.
This feature requires a cputime accounting that doesn't depend
on the timer tick.
To implement it, this new cputime infrastructure plugs into
kernel/user/guest boundaries to take snapshots of cputime and
flush these to the stats when needed. This performs pretty
much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
and cputime snaphots are synchronized between write and read
side such that the latter can safely retrieve the pending tickless
cputime of a task and add it to its latest cputime snapshot to
return the correct result to the user.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRBsKnAAoJEIUkVEdQjox3lMgP/2R6DU2f8PyGIao3hne4M3Pu
L3q+mAG53b24Dy014KeW7gd8yv45fE7wp/rs8CGLte9VzbLkRCDSFQPgBuXVagRj
tV5nfAuqD0wHTnA+HhBE3l3C2RKAPGIu79rBpnIR/QIPPl8Z3Dby8YgmxEQKDf8G
j7MEBu2LthSuqEi2ZXemnO5r0oEnQAzAp4TTi/M38k0Fmt59nOGyjLnI+xHYCBMa
1pnz7j3jjR9NJExGu8iVvbo+jupuQngP8qmkLXHvYnj/TEJNwzO1hHVoSwOpjYpS
9ycl+T8IKQLbAkBywLtq3Mzde43xt/t8wYyGZ0oAV+Z7MIpz/9YIfDJwqQeqoNbD
dAdbNjKMbsxCgmrnyqSagfMQg/r3CPZ4vf40TMCaN4gNUJC4Ie+E4kPRKRh59+PB
Ukthmqujn0f40LAa+HXTUuzafd3b0s/ewH+8FuQ6LAG9b5+WnoN8JTJ5u6+ydokO
ZleeOowuRZZEg+abQ8Sm2GRm/BzN29gi/npb//I+ZDXWv/+3yccgsiPjCRzCAAaO
g1RmYryFSRUwHQbGNNypVWVuOLWvrBQ4jqbGO7BBuBByZMSHryKxR6mb+inH3qLE
xIDM9SdSJisc292OzoFKwVZki4MaXaadJXJduVvqYlZQvXXs7eAa4wo3euhtVITD
NLQO5OZXE4oIQmDFb0FV
=1Tzp
-----END PGP SIGNATURE-----
Merge tag 'full-dynticks-cputime-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core
Pull full-dynticks (user-space execution is undisturbed and
receives no timer IRQs) preparation changes that convert the
cputime accounting code to be full-dynticks ready,
from Frederic Weisbecker:
"This implements the cputime accounting on full dynticks CPUs.
Typical cputime stats infrastructure relies on the timer tick and
its periodic polling on the CPU to account the amount of time
spent by the CPUs and the tasks per high level domains such as
userspace, kernelspace, guest, ...
Now we are preparing to implement full dynticks capability on
Linux for Real Time and HPC users who want full CPU isolation.
This feature requires a cputime accounting that doesn't depend
on the timer tick.
To implement it, this new cputime infrastructure plugs into
kernel/user/guest boundaries to take snapshots of cputime and
flush these to the stats when needed. This performs pretty
much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
and cputime snaphots are synchronized between write and read
side such that the latter can safely retrieve the pending tickless
cputime of a task and add it to its latest cputime snapshot to
return the correct result to the user."
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In 7b270f6099 "sched: Bail out of yield_to when source and
target runqueue has one task" we changed this to store -ESRCH so
it needs to be signed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kbuild@01.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <efault@gmx.de>
Link: http://lkml.kernel.org/r/20130205113751.GA20521@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_enqueue_reprogram contains a race which could result in
timer.base switch during unlock/lock sequence.
hrtimer_enqueue_reprogram is releasing the lock protecting the timer
base for calling raise_softirq_irqsoff() due to a lock ordering issue
versus rq->lock.
If during that time another CPU calls __hrtimer_start_range_ns() on
the same hrtimer, the timer base might switch, before the current CPU
can lock base->lock again and therefor the unlock_timer_base() call
will unlock the wrong lock.
[ tglx: Added comment and massaged changelog ]
Signed-off-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1359981217-389-1-git-send-email-izik.eidus@ravellosystems.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Conflicts:
kernel/irq_work.c
Add support for printk in full dynticks CPU.
* Don't stop tick with irq works pending. This
fix is generally useful and concerns archs that
can't raise self IPIs.
* Flush irq works before CPU offlining.
* Introduce "lazy" irq works that can wait for the
next tick to be executed, unless it's stopped.
* Implement klogd wake up using irq work. This
removes the ad-hoc printk_tick()/printk_needs_cpu()
hooks and make it working even in dynticks mode.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
There's no reason kgdb.h itself needs to include the 8250 serial port
header file. So push it down to the _very_ limited number of individual
drivers that need the values in that file, and fix up the places where
people really wanted serial_core.h and platform_device.h.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull scheduler fixes from Ingo Molnar:
"Three small fixlets"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/debug: Fix format string for 32-bit platforms
sched: Fix warning in kernel/sched/fair.c
sched/rt: Use root_domain of rt_rq not current processor
Pull perf fixes from Ingo Molnar:
"Three fixlets and two small (and low risk) hw-enablement changes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix event group context move
x86/perf: Add IvyBridge EP support
perf/x86: Fix P6 driver section warning
arch/x86/tools/insn_sanity.c: Identify source of messages
perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
Pull two small RCU fixlets from Ingo Molnar.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Make rcu_nocb_poll an early_param instead of module_param
rcu: Prevent soft-lockup complaints about no-CBs CPUs
The uses of trace_clock_local() are dead code when CONFIG_RCU_TRACE=n,
but some compilers might nevertheless generate code calling this function.
This commit therefore ensures that trace_clock_local() is invoked only
when CONFIG_RCU_TRACE=y.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If the previous CPU is cache affine and idle, select it.
The current implementation simply traverses the sd_llc domain,
taking the first idle CPU encountered, which walks buddy pairs
hand in hand over the package, inflicting excruciating pain.
1 tbench pair (worst case) in a 10 core + SMT package:
pre 15.22 MB/sec 1 procs
post 252.01 MB/sec 1 procs
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1359371965.5783.127.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As no one is using the return value of irq_work_queue(),
so it is better to just make it void.
Signed-off-by: anish kumar <anish198519851985@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[ Fix stale comments, remove now unnecessary __irq_work_queue() intermediate function ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1359925703-24304-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
mips was the last architecture not using the generic variant.
Both native and compat variants switched to generic, which is
made unconditional now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
usual "call force_sigsegv or signal_delivered" logics. Takes
ksignal instead of separate siginfo/k_sigaction.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Again, protected by a temporary config symbol (GENERIC_COMPAT_RT_SIGACTION);
will be gone by the end of series.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and make it unconditional - we want the sucker on all biarch
platforms, really. All kinds of wrappers and private implementations
can go now; fortunately, they don't cause name conflicts, so we can
do that one first without any bisect hazards.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
conditional on GENERIC_COMPAT_RT_SIGQUEUEINFO; by the end of that series
it will become the same thing as COMPAT and conditional will die out.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
conditional on GENERIC_COMPAT_RT_SIGPENDING; by the end of that series
it will become the same thing as COMPAT and conditional will die out.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
conditional on GENERIC_COMPAT_RT_SIGPROCMASK; by the end of that series
it will become the same thing as COMPAT and conditional will die out.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pull compat version alongside with the native one
* make little-endian compat variant just call the native
* don't bother with separate conditional for compat (both native and
compat are going to become unconditional very soon).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Switch from __ARCH_WANT_SYS_RT_SIGACTION to opposite
(!CONFIG_ODD_RT_SIGACTION); the only two architectures that
need it are alpha and sparc. The reason for use of CONFIG_...
instead of __ARCH_... is that it's needed only kernel-side
and doing it that way avoids a mess with include order on many
architectures.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
again, strictly speaking we are in nasal daemon territory on ppc
and mips - we need to sign-extend int arguments.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Function next_prio() has been removed and pull_rt_task() is the
only user of pick_next_highest_task_rt() at the moment.
pull_rt_task is not interested in p->nr_cpus_allowed, its only
interest is the fact that cpu is allowed to execute p. If
nr_cpus_allowed == 1, cpu != task_cpu(p) and cpu is allowed then
it means that task p is in the middle of the migration
techniques; the task waits until it is moved by migration
thread. So, lets pull it earlier.
Signed-off-by: Kirill V Tkhai <tkhai@yandex.ru>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
CC: linux-rt-users <linux-rt-users@vger.kernel.org>
Link: http://lkml.kernel.org/r/70871359644177@web16d.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When we have group with mixed events (hw/sw) we want to end up
with group leader being in hw context. So if group leader is
initialy sw event, we move all the events under hw context.
The move is done for each event by removing it from its context
and adding it back into proper one. As a part of the removal the
event is automatically disabled, which is not what we want at
this stage of creating groups.
The fix is to initialize event state after removal from sw
context.
This fix resulted from the following discussion:
http://thread.gmane.org/gmane.linux.kernel.perf.user/1144
Reported-by: Andreas Hollmann <hollmann@in.tum.de>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/r/1359714225-4231-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On early boot up, when the ftrace ring buffer is initialized, the
static variable current_trace is initialized to &nop_trace.
Before this initialization, current_trace is NULL and will never
become NULL again. It is always reassigned to a ftrace tracer.
Several places check if current_trace is NULL before it uses
it, and this check is frivolous, because at the point in time
when the checks are made the only way current_trace could be
NULL is if ftrace failed its allocations at boot up, and the
paths to these locations would probably not be possible.
By initializing current_trace to &nop_trace where it is declared,
current_trace will never be NULL, and we can remove all these
checks of current_trace being NULL which never needed to be
checked in the first place.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently, the timer broadcast mechanism is defined by a function
pointer on struct clock_event_device. As the fundamental mechanism for
broadcast is architecture-specific, this means that clock_event_device
drivers cannot be shared across multiple architectures.
This patch adds an (optional) architecture-specific function for timer
tick broadcast, allowing drivers which may require broadcast
functionality to be shared across multiple architectures.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org
Cc: Will.Deacon@arm.com
Cc: Marc.Zyngier@arm.com
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1358183124-28461-3-git-send-email-mark.rutland@arm.com
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently the broadcast mechanism used for timers is abstracted by a
function pointer on struct clock_event_device. As the fundamental
mechanism for broadcast is architecture-specific, this ties each
clock_event_device driver to a single architecture, even where the
driver is otherwise generic.
This patch adds a standard path for the receipt of timer broadcasts, so
drivers and/or architecture backends need not manage redundant lists of
timers for the purpose of routing broadcast timer ticks.
[tglx: Made the implementation depend on the config switch as well ]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org
Cc: Will.Deacon@arm.com
Cc: Marc.Zyngier@arm.com
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1358183124-28461-2-git-send-email-mark.rutland@arm.com
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are several places of consecutive calls of
dequeue_task_rt() and put_prev_task_rt() in the scheduler.
For example, function rt_mutex_setprio() does it.
The both calls lead to update_curr_rt(), the second of it
receives zeroed delta_exec. The only effective action in this
case is call of sched_rt_avg_update(), which can change
rq->age_stamp and rq->rt_avg. But it is possible in case of
""floating"" rq->clock. This fact is not reasonable to be
accounted. Another actions do nothing.
Signed-off-by: Kirill V Tkhai <tkhai@yandex.ru>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
CC: linux-rt-users <linux-rt-users@vger.kernel.org>
Link: http://lkml.kernel.org/r/931541359550236@web1g.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Peter Anvin:
"This is a collection of miscellaneous fixes, the most important one is
the fix for the Samsung laptop bricking issue (auto-blacklisting the
samsung-laptop driver); the efi_enabled() changes you see below are
prerequisites for that fix.
The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI
debugging, and requiring CAP_SYS_RAWIO for MSR references, just as
with I/O port references."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
samsung-laptop: Disable on EFI hardware
efi: Make 'efi_enabled' a function to query EFI facilities
smp: Fix SMP function call empty cpu mask race
x86/msr: Add capabilities check
x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES
x86/olpc: Fix olpc-xo1-sci.c build errors
arch/x86/platform/uv: Fix incorrect tlb flush all issue
x86-64: Fix unwind annotations in recent NMI changes
x86-32: Start out cr0 clean, disable paging before modifying cr3/4
This reverts commit daee779718.
I'll requeue this after the console locking fixes, so lockdep
is useful again for people until fbcon is fixed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ftrace has a snapshot feature available from kernel space and
latency tracers (e.g. irqsoff) are using it. This patch enables
user applictions to take a snapshot via debugfs.
Add "snapshot" debugfs file in "tracing" directory.
snapshot:
This is used to take a snapshot and to read the output of the
snapshot.
# echo 1 > snapshot
This will allocate the spare buffer for snapshot (if it is
not allocated), and take a snapshot.
# cat snapshot
This will show contents of the snapshot.
# echo 0 > snapshot
This will free the snapshot if it is allocated.
Any other positive values will clear the snapshot contents if
the snapshot is allocated, or return EINVAL if it is not allocated.
Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
[
Fixed irqsoff selftest and also a conflict with a change
that fixes the update_max_tr.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently the trace buffer read functions use a static variable
"old_tracer" for detecting if the current tracer changes. This
was suitable for a single trace file ("trace"), but to add a
snapshot feature that will use the same function for its file,
a check against a static variable is not sufficient.
To use the output functions for two different files, instead of
storing the current tracer in a static variable, as the trace
iterator descriptor contains a pointer to the original current
tracer's name, that pointer can now be used to check if the
current tracer has changed between different reads of the trace
file.
Link: http://lkml.kernel.org/r/20121226025252.3252.9276.stgit@liselsia
Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
For systems with an unstable sched_clock, all cpu_clock() does is enable/
disable local irq during the call to sched_clock_cpu(). And for stable
systems they are same.
trace_clock_global() already disables interrupts, so it can call
sched_clock_cpu() directly.
Link: http://lkml.kernel.org/r/1356576585-28782-2-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a stat about the number of events read from the ring buffer:
# cat /debug/tracing/per_cpu/cpu0/stats
entries: 39869
overrun: 870512
commit overrun: 0
bytes: 1449912
oldest event ts: 6561.368690
now ts: 6565.246426
dropped events: 0
read events: 112 <-- Added
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
During kdump kernel's booting stage, it need to find low ram for
swiotlb buffer when system does not support intel iommu/dmar remapping.
kexed-tools is appending memmap=exactmap and range from /proc/iomem
with "Crash kernel", and that range is above 4G for 64bit after boot
protocol 2.12.
We need to add another range in /proc/iomem like "Crash kernel low",
so kexec-tools could find that info and append to kdump kernel
command line.
Try to reserve some under 4G if the normal "Crash kernel" is above 4G.
User could specify the size with crashkernel_low=XX[KMG].
-v2: fix warning that is found by Fengguang's test robot.
-v3: move out get_mem_size change to another patch, to solve compiling
warning that is found by Borislav Petkov <bp@alien8.de>
-v4: user must specify crashkernel_low if system does not support
intel or amd iommu.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jason pointed out the HAS_PERSISTENT_CLOCK name isn't
quite accurate for the config, as some systems may have
the persistent_clock in some cases, but not always.
So change the config name to the more clear
ALWAYS_USE_PERSISTENT_CLOCK.
Signed-off-by: John Stultz <john.stultz@linaro.org>
While debugging the virtual cputime with the function graph tracer
with a max_depth of 1 (most common use of the max_depth so far),
I found that I was missing kernel execution because of a race condition.
The code for the return side of the function has a slight race:
ftrace_pop_return_trace(&trace, &ret, frame_pointer);
trace.rettime = trace_clock_local();
ftrace_graph_return(&trace);
barrier();
current->curr_ret_stack--;
The ftrace_pop_return_trace() initializes the trace structure for
the callback. The ftrace_graph_return() uses the trace structure
for its own use as that structure is on the stack and is local
to this function. Then the curr_ret_stack is decremented which
is what the trace.depth is set to.
If an interrupt comes in after the ftrace_graph_return() but
before the curr_ret_stack, then the called function will get
a depth of 2. If max_depth is set to 1 this function will be
ignored.
The problem is that the trace has already been called, and the
timestamp for that trace will not reflect the time the function
was about to re-enter userspace. Calls to the interrupt will not
be traced because the max_depth has prevented this.
To solve this issue, the ftrace_graph_return() can safely be
moved after the current->curr_ret_stack has been updated.
This way the timestamp for the return callback will reflect
the actual time.
If an interrupt comes in after the curr_ret_stack update and
ftrace_graph_return(), it will be traced. It may look a little
confusing to see it within the other function, but at least
it will not be lost.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.
(History: Raghavendra initially worked on break out of kvm ple handler upon
seeing source runqueue length = 1, but it had to export rq length).
Peter came up with the elegant idea of return -ESRCH in scheduler core.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi)
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
doctorture.2013.01.11a: Changes to rcutorture and to RCU documentation.
fixes.2013.01.26a: Miscellaneous fixes.
tagcb.2013.01.24a: Tag RCU callbacks with grace-period number to
simplify callback advancement.
tiny.2013.01.29b: Enhancements to uniprocessor handling in tiny RCU.