double iteration list (one for registered ftrace ops, and one for
registered "global" ops), to just use one list. That simplified the code
but also broke the function tracing filtering on pid.
This updates the code to handle the filtering again with the new logic.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVsq1dAAoJEEjnJuOKh9ldfagH/39dixx3tMMElO7pGM9Q3DBE
WmJuGSOmeZA1O0balnfcV2SgegEGCSEP6sjBtyuYCfgcFDWyIdAvGrMLS/hKTOxR
pilaqyNQz7CROx1zco9gbu5pGwkSkcAfnqYzOg6IpJ0WHtiyH36GMe2wMu29u2VT
I7NgSC6ByA82N4pwvgetlUcIDcPTyrkkhGmGPBJGY+diKzeSSo8NlRbv3SNYs0ua
V072Oumu64RrZBMdn/Sb2pCF2hf6vhTXD6qS4dbpK/Rfnlblqer9SqUIx2kpg603
yDOmPY7wN9FJ94Te3EeXubLi0LqDJH4iPOndrRn1fYsgMbUpq1BlViF7W7Ajze8=
=LQ+S
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"Back in 3.16 the ftrace code was redesigned and cleaned up to remove
the double iteration list (one for registered ftrace ops, and one for
registered "global" ops), to just use one list. That simplified the
code but also broke the function tracing filtering on pid.
This updates the code to handle the filtering again with the new
logic"
* tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix breakage of set_ftrace_pid
Commit 4104d326b6 ("ftrace: Remove global function list and call function
directly") simplified the ftrace code by removing the global_ops list with a
new design. But this cleanup also broke the filtering of PIDs that are added
to the set_ftrace_pid file.
Add back the proper hooks to have pid filtering working once again.
Cc: stable@vger.kernel.org # 3.16+
Reported-by: Matt Fleming <matt@console-pimps.org>
Reported-by: Richard Weinberger <richard.weinberger@gmail.com>
Tested-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There are already two events for context switches, namely the tracepoint
sched:sched_switch and the software event context_switches.
Unfortunately neither are suitable for use by non-privileged users for
the purpose of synchronizing hardware trace data (e.g. Intel PT) to the
context switch.
Tracepoints are no good at all for non-privileged users because they
need either CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= -1.
On the other hand, kernel software events need either CAP_SYS_ADMIN or
/proc/sys/kernel/perf_event_paranoid <= 1.
Now many distributions do default perf_event_paranoid to 1 making
context_switches a contender, except it has another problem (which is
also shared with sched:sched_switch) which is that it happens before
perf schedules events out instead of after perf schedules events in.
Whereas a privileged user can see all the events anyway, a
non-privileged user only sees events for their own processes, in other
words they see when their process was scheduled out not when it was
scheduled in. That presents two problems to use the event:
1. the information comes too late, so tools have to look ahead in the
event stream to find out what the current state is
2. if they are unlucky tracing might have stopped before the
context-switches event is recorded.
This new PERF_RECORD_SWITCH event does not have those problems
and it also has a couple of other small advantages.
It is easier to use because it is an auxiliary event (like mmap, comm
and task events) which can be enabled by setting a single bit. It is
smaller than sched:sched_switch and easier to parse.
To make the event useful for privileged users also, if the
context is cpu-wide then the event record will be
PERF_RECORD_SWITCH_CPU_WIDE which is the same as
PERF_RECORD_SWITCH except it also provides the next or
previous pid/tid.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Conflicts:
net/bridge/br_mdb.c
br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU's rcu_oom_notify() disables CPU hotplug in order to stabilize the
list of online CPUs, which it traverses. However, this is completely
pointless because smp_call_function_single() will quietly fail if invoked
on an offline CPU. Because the count of requests is incremented in the
rcu_oom_notify_cpu() function that is remotely invoked, everything works
nicely even in the face of concurrent CPU-hotplug operations.
Furthermore, in recent kernels, invoking get_online_cpus() from an OOM
notifier can result in deadlock. This commit therefore removes the
call to get_online_cpus() and put_online_cpus() from rcu_oom_notify().
Reported-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
The RCU_LOCKDEP_WARN() in synchronize_rcu_tasks() triggers if the
scheduler is active, which is backwards. This commit therefore
negates the test.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros. This also requires
inverting the sense of the conditional, which this commit also does.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Although rcu_is_watching() is marked notrace, it invokes preempt_disable()
and preempt_enable(), both of which can be traced. This defeats the
purpose of the notrace on rcu_is_watching(), so this commit substitutes
preempt_disable_notrace() and preempt_enable_notrace().
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
In kernels built with CONFIG_PREEMPT, _cpu_down() waits for RCU and
RCU-sched grace periods back-to-back, incurring quite a bit more latency
than required. This commit therefore uses the new synchronize_rcu_mult()
to allow waiting for both grace periods concurrently.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There have been several requests for a primitive that waits for
grace periods for several RCU flavors concurrently, so this
commit creates it. This is a variadic macro, and you pass in
the call_rcu() functions of the flavors of RCU that you wish to
wait for.
Note that you cannot pass in call_srcu() for two reasons: (1) This
would result in a type mismatch and (2) You need to specify which
srcu_struct you want to use. Handle this by creating a wrapper
function for your SRCU domain, for example:
void call_srcu_mine(struct rcu_head *head, rcu_callback_t func)
{
call_srcu(&ss_mine, head, func);
}
You can then do something like this:
synchronize_rcu_mult(call_srcu_mine, call_rcu, call_rcu_sched);
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tasks are no longer migrated to the root rcu_node, so there is no
longer any need for a boost kthread for the root rcu_node, and there no
longer is such a kthread. This commit therefore fixes the comment in
rcu_boost_kthread()'s header to reflect this new reality.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The get_state_synchronize_rcu() and cond_synchronize_rcu() functions
allow polling for grace-period completion, with an actual wait for a
grace period occurring only when cond_synchronize_rcu() is called too
soon after the corresponding get_state_synchronize_rcu(). However,
these functions work only for vanilla RCU. This commit adds the
get_state_synchronize_sched() and cond_synchronize_sched(), which provide
the same capability for RCU-sched.
Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Move alloc_msi_entry() from PCI MSI code into generic MSI code, so it
can be reused by other generic MSI drivers. Also introduce
free_msi_entry() for completeness.
Suggested-by: Stuart Yoder <stuart.yoder@freescale.com>.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/1436428847-8886-13-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
region_is_ram() looks up the iomem_resource table to check if
a target range is in RAM. However, it always returns with -1
due to invalid range checks. It always breaks the loop at the
first entry of the table.
Another issue is that it compares p->flags and flags, but it always
fails. flags is declared as int, which makes it as a negative value
with IORESOURCE_BUSY (0x80000000) set while p->flags is unsigned long.
Fix the range check and flags so that region_is_ram() works as
advertised.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1437088996-28511-4-git-send-email-toshi.kani@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Enabling locking-selftest in a VM guest may cause the following
kernel panic:
kernel BUG at .../kernel/locking/qspinlock_paravirt.h:137!
This is due to the fact that the pvqspinlock unlock function is
expecting either a _Q_LOCKED_VAL or _Q_SLOW_VAL in the lock
byte. This patch prevents that bug report by ignoring it when
debug_locks_silent is set. Otherwise, a warning will be printed
if it contains an unexpected value.
With this patch applied, the kernel locking-selftest completed
without any noise.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436663959-53092-1-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make sure to stop tracing only once we are past a point where
all latency tracing events have been processed (irqs are not
enabled again). This has the slight advantage of capturing more
latency related events in the idle path, but most importantly it
makes sure that latency tracing doesn't get re-enabled
inadvertently when new events are coming in.
This makes the irqsoff latency tracer useful again, as we stop
capturing CPU sleep time as IRQ latency.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel@pengutronix.de
Cc: patchwork-lst@pengutronix.de
Link: http://lkml.kernel.org/r/1437410090-3747-1-git-send-email-l.stach@pengutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
improve accuracy of timing in test_bpf and add two stress tests:
- {skb->data[0], get_smp_processor_id} repeated 2k times
- {skb->data[0], vlan_push} x 68 followed by {skb->data[0], vlan_pop} x 68
1st test is useful to test performance of JIT implementation of BPF_LD_ABS
together with BPF_CALL instructions.
2nd test is stressing skb_vlan_push/pop logic together with skb->data access
via BPF_LD_ABS insn which checks that re-caching of skb->data is done correctly.
In order to call bpf_skb_vlan_push() from test_bpf.ko have to add
three export_symbol_gpl.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A fine granulity support for delay would be very useful when profiling
VM logics, such as page allocation including page reclaim and memory
compaction with function graph.
Thus, this patch adds two additional marks with two changes.
- An equal sign in mark selection function is removed to align code
behavior with comments and documentation.
- The function graph example related to delay in ftrace.txt is updated
to cover all supported marks.
Link: http://lkml.kernel.org/r/1436626300-1679-3-git-send-email-jungseoklee85@gmail.com
Cc: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Jungseok Lee noticed the following:
Currently, row's width of 7-digit duration numbers not aligned with
other cases like the following example.
3) $ 3999884 us | }
3) | finish_task_switch() {
3) 0.365 us | _raw_spin_unlock_irq();
3) 3.333 us | }
3) $ 3999976 us | }
3) $ 3999979 us | } /* schedule */
As adding a single white space in case of 7-digit numbers, the format
could be unified easily as follows.
3) $ 2237472 us | }
3) | finish_task_switch() {
3) 0.364 us | _raw_spin_unlock_irq();
3) 3.125 us | }
3) $ 2237556 us | }
3) $ 2237559 us | } /* schedule */
Instead of making a special case for 7-digit numbers, the logic
of the len and the space loop is slightly modified to make the
two cases have the same format.
Link: http://lkml.kernel.org/r/1436626300-1679-2-git-send-email-jungseoklee85@gmail.com
Reported-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch extends tracing_thresh functionality to function profile tracer.
If tracing_thresh is set, print those entries only,
whose average is > tracing thresh.
Link: http://lkml.kernel.org/r/1434972488-8571-1-git-send-email-umesh.t@samsung.com
Signed-off-by: Umesh Tiwari <umesh.t@samsung.com>
[ Removed unnecessary 'moved' comment ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Akashi Takahiro was porting the stack tracer to arm64 and found some
issues with it. One was that it repeats the top function, due to the
stack frame added by the mcount caller and added by itself. This
was added when fentry came in, and before fentry created its own stack
frame. But x86's fentry now creates its own stack frame, and there's
no need to insert the function again.
This also cleans up the code a bit, where it doesn't need to do something
special for fentry, and doesn't include insertion of a duplicate
entry for the called function being traced.
Link: http://lkml.kernel.org/r/55A646EE.6030402@linaro.org
Some-suggestions-by: Jungseok Lee <jungseoklee85@gmail.com>
Some-suggestions-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Functions in ring-buffer.c have gotten interleaved between different
use cases. Move the functions around to get like functions closer
together. This may or may not help gcc keep cache locality, but it
makes it a little easier to work with the code.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Now that events only add time extends after it is committed, in case
an event comes in before it can discard the allocated event, the time
extend needs to be stored within the event. If the event is bigger
than then size needed for the time extend, padding must be added.
The minimum padding size is 8 bytes. Thus if the event is 12 bytes
(size of time extend + 4), there will not be enough room to add both
the time extend and padding. Make sure all events are either 8 bytes
or 16 or more bytes.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move the capturing of the timestamp to after an event is allocated.
If the event is not a commit (where it is an event that preempted
another event), then no timestamp is needed, because the delta of
nested events is always zero.
If the event starts on a new page, no delta needs to be calculated
as the full timestamp will be added to the page header, and the
event will have a delta of zero.
Now if the event requires a time extend (the delta does not fit
in the 27 bit delta slot in the header), then the event is discarded,
the length is extended to hold the TIME_EXTEND event that allows for
a 59 bit delta, and the commit is tried again.
If the event can't be discarded (another event came in after it),
then the TIME_EXTEND is added directly to the allocated event and
the rest of the event is given padding.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Requiring a extended time stamp is an uncommon occurrence, and it is
best to do it out of line when needed.
Add a noinline function that handles the extended timestamp and
have it called with an unlikely to completely move it out of the
fast path.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add rb_event_info descriptor to pass event info to functions a bit
easier than using a bunch of parameters. This will also allow for
changing the code around a bit to find better fast paths.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
In ftrace_dump, for disabling buffer, iter.tr->trace_buffer.data is used.
But for enabling, iter.trace_buffer->data is used.
Even though, both point to same buffer, for readability, same convention
should be used.
Link: http://lkml.kernel.org/r/1434972306-20043-1-git-send-email-umesh.t@samsung.com
Signed-off-by: Umesh Tiwari <umesh.t@samsung.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently exception occures due to access beyond buffer_iter
range while using index of cpu bigger than num_possible_cpus().
Below there is an example for such exception when we use
cpus 0,1,16,17.
In order to fix buffer allocation size for non-continuous cpu ids
we allocate according to the max cpu id and not according to the
amount of possible cpus.
Example:
$ cat /sys/kernel/debug/tracing/per_cpu/cpu1/trace
Path: /bin/busybox
CPU: 0 PID: 82 Comm: cat Not tainted 4.0.0 #29
task: 80734c80 ti: 80012000 task.ti: 80012000
[ECR ]: 0x00220100 => Invalid Read @ 0x00000000 by insn @ 0x800abafc
[EFA ]: 0x00000000
[BLINK ]: ring_buffer_read_finish+0x24/0x64
[ERET ]: rb_check_pages+0x20/0x188
[STAT32]: 0x00001a00 :
BTA: 0x800abafc SP: 0x80013f0c FP: 0x57719cf8
LPS: 0x200036b4 LPE: 0x200036b8 LPC: 0x00000000
r00: 0x8002aca0 r01: 0x00001606 r02: 0x00000000
r03: 0x00000001 r04: 0x00000000 r05: 0x804b4954
r06: 0x00030003 r07: 0x8002a260 r08: 0x00000286
r09: 0x00080002 r10: 0x00001006 r11: 0x807351a4
r12: 0x00000001
Stack Trace:
rb_check_pages+0x20/0x188
ring_buffer_read_finish+0x24/0x64
tracing_release+0x4e/0x170
__fput+0x62/0x158
task_work_run+0xa2/0xd4
do_notify_resume+0x52/0x7c
resume_user_mode_begin+0xdc/0xe0
Link: http://lkml.kernel.org/r/1433835155-6894-3-git-send-email-gilf@ezchip.com
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Gil Fruchter <gilf@ezchip.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Use kcalloc for allocating an array instead of kzalloc with multiply,
as that is what kcalloc is used for.
Found with checkpatch.
Link: http://lkml.kernel.org/r/1433835155-6894-2-git-send-email-gilf@ezchip.com
Signed-off-by: Gil Fruchter <gilf@ezchip.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
No one uses this anymore, and this is not the first time the
idea of replacing it with a (now possible) userspace side.
Lock stealing logic was removed long ago in when the lock
was granted to the highest prio.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1435782588-4177-2-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Although futexes are well known for being a royal pita,
we really have very little debugging capabilities - except
for relying on tglx's eye half the time.
By simply making use of the existing fault-injection machinery,
we can improve this situation, allowing generating artificial
uaddress faults and deadlock scenarios. Of course, when this is
disabled in production systems, the overhead for failure checks
is practically zero -- so this is very cheap at the same time.
Future work would be nice to now enhance trinity to make use of
this.
There is a special tunable 'ignore-private', which can filter
out private futexes. Given the tsk->make_it_fail filter and
this option, pi futexes can be narrowed down pretty closely.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-3-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
... serves a bit better to clarify between blocking
and non-blocking code paths.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-2-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fixes from Ingo Molnar:
"Two families of fixes:
- Fix an FPU context related boot crash on newer x86 hardware with
larger context sizes than what most people test. To fix this
without ugly kludges or extensive reverts we had to touch core task
allocator, to allow x86 to determine the task size dynamically, at
boot time.
I've tested it on a number of x86 platforms, and I cross-built it
to a handful of architectures:
(warns) (warns)
testing x86-64: -git: pass ( 0), -tip: pass ( 0)
testing x86-32: -git: pass ( 0), -tip: pass ( 0)
testing arm: -git: pass ( 1359), -tip: pass ( 1359)
testing cris: -git: pass ( 1031), -tip: pass ( 1031)
testing m32r: -git: pass ( 1135), -tip: pass ( 1135)
testing m68k: -git: pass ( 1471), -tip: pass ( 1471)
testing mips: -git: pass ( 1162), -tip: pass ( 1162)
testing mn10300: -git: pass ( 1058), -tip: pass ( 1058)
testing parisc: -git: pass ( 1846), -tip: pass ( 1846)
testing sparc: -git: pass ( 1185), -tip: pass ( 1185)
... so I hope the cross-arch impact 'none', as intended.
(by Dave Hansen)
- Fix various NMI handling related bugs unearthed by the big asm code
rewrite and generally make the NMI code more robust and more
maintainable while at it. These changes are a bit late in the
cycle, I hope they are still acceptable.
(by Andy Lutomirski)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
x86/fpu, sched: Dynamically allocate 'struct fpu'
x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
x86/nmi/64: Make the "NMI executing" variable more consistent
x86/nmi/64: Minor asm simplification
x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
x86/nmi/64: Reorder nested NMI checks
x86/nmi/64: Improve nested NMI comments
x86/nmi/64: Switch stacks on userspace NMI entry
x86/nmi/64: Remove asm code that saves CR2
x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
Pull timer fix from Ingo Molnar:
"Fix for a misplaced export that can cause build failures in certain
(rare) Kconfig situations"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Move the export of tick_broadcast_oneshot_control to the proper place
Pull scheduler fix from Ingo Molnar:
"A oneliner rq throttling fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Test list head instead of list entry in throttle_cfs_rq()
Pull irq fixes from Ingo Molnar:
"Misc irq fixes:
- two driver fixes
- a Xen regression fix
- a nested irq thread crash fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gicv3-its: Fix mapping of LPIs to collections
genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
gpio/davinci: Fix race in installing chained irq handler
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.
Also optimize the x86 code a bit by caching task_struct_size.
Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).
Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already. This saves from doing an extra slab
allocation at fork().
The only real downside here is that we have to stick everything
and the end of the task_struct. But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the common case, there will be only one expedited grace period in
the system at a given time, in which case it is not helpful to use
funnel locking. This commit therefore adds a fastpath that bypasses
funnel locking when the root ->exp_funnel_mutex is not held.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The grace-period kthread sleeps waiting to do a force-quiescent-state
scan, and when awakened sets rsp->gp_state to RCU_GP_DONE_FQS.
However, this is confusing because the kthread has not done the
force-quiescent-state, but is instead just starting to do it. This commit
therefore renames RCU_GP_DONE_FQS to RCU_GP_DOING_FQS in order to make
things a bit easier on reviewers.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The condition for the wait_event_interruptible_timeout() that waits
to do the next force-quiescent-state scan is a bit ornate:
((gf = READ_ONCE(rsp->gp_flags)) &
RCU_GP_FLAG_FQS) ||
(!READ_ONCE(rnp->qsmask) &&
!rcu_preempt_blocked_readers_cgp(rnp))
This commit therefore pulls this condition out into a helper function
and comments its component conditions.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Although synchronize_sched_expedited() historically has no RCU CPU stall
warnings, the availability of the rcupdate.rcu_expedited boot parameter
invalidates the old assumption that synchronize_sched()'s stall warnings
would suffice. This commit therefore adds RCU CPU stall warnings to
synchronize_sched_expedited().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The strictly rcu_node based funnel-locking scheme works well in many
cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get
all that much concurrency. This commit therefore extends the funnel
locking into the per-CPU rcu_data structure, providing concurrency equal
to the number of CPUs.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
One of the requirements on RCU grace periods is that if there is a
causal chain of operations that starts after one grace period and
ends before another grace period, then the two grace periods must
be serialized. There has been (and might still be) code that relies
on this, for example, certain types of reference-counting code that
does a call_rcu() within an RCU callback function.
This requirement is why there is an smp_mb() at the end of both
synchronize_sched_expedited() and synchronize_rcu_expedited().
However, this is the only smp_mb() in these functions, so it would
be nicer to consolidate it into rcu_exp_gp_seq_end(). This commit
does just that.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_seq operations were open-coded in _rcu_barrier(), so this commit
replaces the open-coding with the shiny new rcu_seq operations.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit gets rid of synchronize_rcu_expedited()'s mutex_trylock()
polling loop in favor of the funnel-locking scheme that was abstracted
from synchronize_sched_expedited().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The type of "s" has been "long" rather than the correct "unsigned long"
for quite some time. This commit fixes this type error.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit abstracts funnel locking from synchronize_sched_expedited()
so that it may be used by synchronize_rcu_expedited().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>