Commit graph

21095 commits

Author SHA1 Message Date
Linus Torvalds
763e326c8b Back in 3.16 the ftrace code was redesigned and cleaned up to remove the
double iteration list (one for registered ftrace ops, and one for
 registered "global" ops), to just use one list. That simplified the code
 but also broke the function tracing filtering on pid.
 
 This updates the code to handle the filtering again with the new logic.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVsq1dAAoJEEjnJuOKh9ldfagH/39dixx3tMMElO7pGM9Q3DBE
 WmJuGSOmeZA1O0balnfcV2SgegEGCSEP6sjBtyuYCfgcFDWyIdAvGrMLS/hKTOxR
 pilaqyNQz7CROx1zco9gbu5pGwkSkcAfnqYzOg6IpJ0WHtiyH36GMe2wMu29u2VT
 I7NgSC6ByA82N4pwvgetlUcIDcPTyrkkhGmGPBJGY+diKzeSSo8NlRbv3SNYs0ua
 V072Oumu64RrZBMdn/Sb2pCF2hf6vhTXD6qS4dbpK/Rfnlblqer9SqUIx2kpg603
 yDOmPY7wN9FJ94Te3EeXubLi0LqDJH4iPOndrRn1fYsgMbUpq1BlViF7W7Ajze8=
 =LQ+S
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
 "Back in 3.16 the ftrace code was redesigned and cleaned up to remove
  the double iteration list (one for registered ftrace ops, and one for
  registered "global" ops), to just use one list.  That simplified the
  code but also broke the function tracing filtering on pid.

  This updates the code to handle the filtering again with the new
  logic"

* tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix breakage of set_ftrace_pid
2015-07-25 11:42:54 -07:00
Steven Rostedt (Red Hat)
e3eea1404f ftrace: Fix breakage of set_ftrace_pid
Commit 4104d326b6 ("ftrace: Remove global function list and call function
directly") simplified the ftrace code by removing the global_ops list with a
new design. But this cleanup also broke the filtering of PIDs that are added
to the set_ftrace_pid file.

Add back the proper hooks to have pid filtering working once again.

Cc: stable@vger.kernel.org # 3.16+
Reported-by: Matt Fleming <matt@console-pimps.org>
Reported-by: Richard Weinberger <richard.weinberger@gmail.com>
Tested-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-24 13:58:14 -04:00
Adrian Hunter
45ac1403f5 perf: Add PERF_RECORD_SWITCH to indicate context switches
There are already two events for context switches, namely the tracepoint
sched:sched_switch and the software event context_switches.
Unfortunately neither are suitable for use by non-privileged users for
the purpose of synchronizing hardware trace data (e.g. Intel PT) to the
context switch.

Tracepoints are no good at all for non-privileged users because they
need either CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= -1.

On the other hand, kernel software events need either CAP_SYS_ADMIN or
/proc/sys/kernel/perf_event_paranoid <= 1.

Now many distributions do default perf_event_paranoid to 1 making
context_switches a contender, except it has another problem (which is
also shared with sched:sched_switch) which is that it happens before
perf schedules events out instead of after perf schedules events in.
Whereas a privileged user can see all the events anyway, a
non-privileged user only sees events for their own processes, in other
words they see when their process was scheduled out not when it was
scheduled in. That presents two problems to use the event:

1. the information comes too late, so tools have to look ahead in the
   event stream to find out what the current state is

2. if they are unlucky tracing might have stopped before the
   context-switches event is recorded.

This new PERF_RECORD_SWITCH event does not have those problems
and it also has a couple of other small advantages.

It is easier to use because it is an auxiliary event (like mmap, comm
and task events) which can be enabled by setting a single bit. It is
smaller than sched:sched_switch and easier to parse.

To make the event useful for privileged users also, if the
context is cpu-wide then the event record will be
PERF_RECORD_SWITCH_CPU_WIDE which is the same as
PERF_RECORD_SWITCH except it also provides the next or
previous pid/tid.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-23 22:51:12 -03:00
David S. Miller
c5e40ee287 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/bridge/br_mdb.c

br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23 00:41:16 -07:00
Paul E. McKenney
9a54f98e34 rcu: Don't disable CPU hotplug during OOM notifiers
RCU's rcu_oom_notify() disables CPU hotplug in order to stabilize the
list of online CPUs, which it traverses.  However, this is completely
pointless because smp_call_function_single() will quietly fail if invoked
on an offline CPU.  Because the count of requests is incremented in the
rcu_oom_notify_cpu() function that is remotely invoked, everything works
nicely even in the face of concurrent CPU-hotplug operations.

Furthermore, in recent kernels, invoking get_online_cpus() from an OOM
notifier can result in deadlock.  This commit therefore removes the
call to get_online_cpus() and put_online_cpus() from rcu_oom_notify().

Reported-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
2015-07-22 15:27:43 -07:00
Paul E. McKenney
a76a9a485d rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
The RCU_LOCKDEP_WARN() in synchronize_rcu_tasks() triggers if the
scheduler is active, which is backwards.  This commit therefore
negates the test.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:33 -07:00
Paul E. McKenney
f78f5b90c4 rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros.  This also requires
inverting the sense of the conditional, which this commit also does.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2015-07-22 15:27:32 -07:00
Alexei Starovoitov
46f00d18fc rcu: Make rcu_is_watching() really notrace
Although rcu_is_watching() is marked notrace, it invokes preempt_disable()
and preempt_enable(), both of which can be traced.  This defeats the
purpose of the notrace on rcu_is_watching(), so this commit substitutes
preempt_disable_notrace() and preempt_enable_notrace().

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-22 15:27:31 -07:00
Paul E. McKenney
779de6ce54 cpu: Wait for RCU grace periods concurrently
In kernels built with CONFIG_PREEMPT, _cpu_down() waits for RCU and
RCU-sched grace periods back-to-back, incurring quite a bit more latency
than required.  This commit therefore uses the new synchronize_rcu_mult()
to allow waiting for both grace periods concurrently.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:30 -07:00
Paul E. McKenney
ec90a194ae rcu: Create a synchronize_rcu_mult()
There have been several requests for a primitive that waits for
grace periods for several RCU flavors concurrently, so this
commit creates it.  This is a variadic macro, and you pass in
the call_rcu() functions of the flavors of RCU that you wish to
wait for.

Note that you cannot pass in call_srcu() for two reasons: (1) This
would result in a type mismatch and (2) You need to specify which
srcu_struct you want to use.  Handle this by creating a wrapper
function for your SRCU domain, for example:

	void call_srcu_mine(struct rcu_head *head, rcu_callback_t func)
	{
		call_srcu(&ss_mine, head, func);
	}

You can then do something like this:

	synchronize_rcu_mult(call_srcu_mine, call_rcu, call_rcu_sched);

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:29 -07:00
Paul E. McKenney
bc17ea1092 rcu: Fix obsolete priority-boosting comment
Tasks are no longer migrated to the root rcu_node, so there is no
longer any need for a boost kthread for the root rcu_node, and there no
longer is such a kthread.  This commit therefore fixes the comment in
rcu_boost_kthread()'s header to reflect this new reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:28 -07:00
Paul E. McKenney
24560056de rcu: Add RCU-sched flavors of get-state and cond-sync
The get_state_synchronize_rcu() and cond_synchronize_rcu() functions
allow polling for grace-period completion, with an actual wait for a
grace period occurring only when cond_synchronize_rcu() is called too
soon after the corresponding get_state_synchronize_rcu().  However,
these functions work only for vanilla RCU.  This commit adds the
get_state_synchronize_sched() and cond_synchronize_sched(), which provide
the same capability for RCU-sched.

Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:26:58 -07:00
Jiang Liu
aa48b6f708 genirq/MSI: Move alloc_msi_entry() from PCI into generic MSI code
Move alloc_msi_entry() from PCI MSI code into generic MSI code, so it
can be reused by other generic MSI drivers.  Also introduce
free_msi_entry() for completeness.

Suggested-by: Stuart Yoder <stuart.yoder@freescale.com>.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/1436428847-8886-13-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-22 18:37:44 +02:00
Toshi Kani
8c38de992b mm: Fix bugs in region_is_ram()
region_is_ram() looks up the iomem_resource table to check if
a target range is in RAM.  However, it always returns with -1
due to invalid range checks. It always breaks the loop at the
first entry of the table.

Another issue is that it compares p->flags and flags, but it always
fails. flags is declared as int, which makes it as a negative value
with IORESOURCE_BUSY (0x80000000) set while p->flags is unsigned long.

Fix the range check and flags so that region_is_ram() works as
advertised.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1437088996-28511-4-git-send-email-toshi.kani@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-22 17:20:34 +02:00
Waiman Long
cba77f03f2 locking/pvqspinlock: Fix kernel panic in locking-selftest
Enabling locking-selftest in a VM guest may cause the following
kernel panic:

  kernel BUG at .../kernel/locking/qspinlock_paravirt.h:137!

This is due to the fact that the pvqspinlock unlock function is
expecting either a _Q_LOCKED_VAL or _Q_SLOW_VAL in the lock
byte. This patch prevents that bug report by ignoring it when
debug_locks_silent is set. Otherwise, a warning will be printed
if it contains an unexpected value.

With this patch applied, the kernel locking-selftest completed
without any noise.

Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436663959-53092-1-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 10:18:07 +02:00
Lucas Stach
63caae8480 sched/idle: Move latency tracing stop/start calls deeper inside the idle loop
Make sure to stop tracing only once we are past a point where
all latency tracing events have been processed (irqs are not
enabled again). This has the slight advantage of capturing more
latency related events in the idle path, but most importantly it
makes sure that latency tracing doesn't get re-enabled
inadvertently when new events are coming in.

This makes the irqsoff latency tracer useful again, as we stop
capturing CPU sleep time as IRQ latency.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel@pengutronix.de
Cc: patchwork-lst@pengutronix.de
Link: http://lkml.kernel.org/r/1437410090-3747-1-git-send-email-l.stach@pengutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 08:18:51 +02:00
Alexei Starovoitov
4d9c5c53ac test_bpf: add bpf_skb_vlan_push/pop() tests
improve accuracy of timing in test_bpf and add two stress tests:
- {skb->data[0], get_smp_processor_id} repeated 2k times
- {skb->data[0], vlan_push} x 68 followed by {skb->data[0], vlan_pop} x 68

1st test is useful to test performance of JIT implementation of BPF_LD_ABS
together with BPF_CALL instructions.
2nd test is stressing skb_vlan_push/pop logic together with skb->data access
via BPF_LD_ABS insn which checks that re-caching of skb->data is done correctly.

In order to call bpf_skb_vlan_push() from test_bpf.ko have to add
three export_symbol_gpl.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:52:32 -07:00
Jungseok Lee
b838e1d96c tracing: Introduce two additional marks for delay
A fine granulity support for delay would be very useful when profiling
VM logics, such as page allocation including page reclaim and memory
compaction with function graph.

Thus, this patch adds two additional marks with two changes.

 - An equal sign in mark selection function is removed to align code
   behavior with comments and documentation.

 - The function graph example related to delay in ftrace.txt is updated
   to cover all supported marks.

Link: http://lkml.kernel.org/r/1436626300-1679-3-git-send-email-jungseoklee85@gmail.com

Cc: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:52 -04:00
Steven Rostedt (Red Hat)
82c355e81a ftrace: Fix function_graph duration spacing with 7-digits
Jungseok Lee noticed the following:

Currently, row's width of 7-digit duration numbers not aligned with
other cases like the following example.

 3) $ 3999884 us |      }
 3)               |      finish_task_switch() {
 3)   0.365 us    |        _raw_spin_unlock_irq();
 3)   3.333 us    |      }
 3) $ 3999976 us |    }
 3) $ 3999979 us |  } /* schedule */

As adding a single white space in case of 7-digit numbers, the format
could be unified easily as follows.

 3) $ 2237472 us  |      }
 3)               |      finish_task_switch() {
 3)   0.364 us    |        _raw_spin_unlock_irq();
 3)   3.125 us    |      }
 3) $ 2237556 us  |    }
 3) $ 2237559 us  |  } /* schedule */

Instead of making a special case for 7-digit numbers, the logic
of the len and the space loop is slightly modified to make the
two cases have the same format.

Link: http://lkml.kernel.org/r/1436626300-1679-2-git-send-email-jungseoklee85@gmail.com

Reported-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:52 -04:00
Umesh Tiwari
8e436ca042 ftrace: add tracing_thresh to function profile
This patch extends tracing_thresh functionality to function profile tracer.
If tracing_thresh is set, print those entries only,
whose average is > tracing thresh.

Link: http://lkml.kernel.org/r/1434972488-8571-1-git-send-email-umesh.t@samsung.com

Signed-off-by: Umesh Tiwari <umesh.t@samsung.com>
[ Removed unnecessary 'moved' comment ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:51 -04:00
Steven Rostedt (Red Hat)
72ac426a5b tracing: Clean up stack tracing and fix fentry updates
Akashi Takahiro was porting the stack tracer to arm64 and found some
issues with it. One was that it repeats the top function, due to the
stack frame added by the mcount caller and added by itself. This
was added when fentry came in, and before fentry created its own stack
frame. But x86's fentry now creates its own stack frame, and there's
no need to insert the function again.

This also cleans up the code a bit, where it doesn't need to do something
special for fentry, and doesn't include insertion of a duplicate
entry for the called function being traced.

Link: http://lkml.kernel.org/r/55A646EE.6030402@linaro.org

Some-suggestions-by: Jungseok Lee <jungseoklee85@gmail.com>
Some-suggestions-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:50 -04:00
Steven Rostedt (Red Hat)
d90fd77402 ring-buffer: Reorganize function locations
Functions in ring-buffer.c have gotten interleaved between different
use cases. Move the functions around to get like functions closer
together. This may or may not help gcc keep cache locality, but it
makes it a little easier to work with the code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:49 -04:00
Steven Rostedt (Red Hat)
7d75e6833b ring-buffer: Make sure event has enough room for extend and padding
Now that events only add time extends after it is committed, in case
an event comes in before it can discard the allocated event, the time
extend needs to be stored within the event. If the event is bigger
than then size needed for the time extend, padding must be added.
The minimum padding size is 8 bytes. Thus if the event is 12 bytes
(size of time extend + 4), there will not be enough room to add both
the time extend and padding. Make sure all events are either 8 bytes
or 16 or more bytes.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:49 -04:00
Steven Rostedt (Red Hat)
a4543a2fa9 ring-buffer: Get timestamp after event is allocated
Move the capturing of the timestamp to after an event is allocated.
If the event is not a commit (where it is an event that preempted
another event), then no timestamp is needed, because the delta of
nested events is always zero.

If the event starts on a new page, no delta needs to be calculated
as the full timestamp will be added to the page header, and the
event will have a delta of zero.

Now if the event requires a time extend (the delta does not fit
in the 27 bit delta slot in the header), then the event is discarded,
the length is extended to hold the TIME_EXTEND event that allows for
a 59 bit delta, and the commit is tried again.

If the event can't be discarded (another event came in after it),
then the TIME_EXTEND is added directly to the allocated event and
the rest of the event is given padding.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:48 -04:00
Steven Rostedt (Red Hat)
9826b2733a ring-buffer: Move the adding of the extended timestamp out of line
Requiring a extended time stamp is an uncommon occurrence, and it is
best to do it out of line when needed.

Add a noinline function that handles the extended timestamp and
have it called with an unlikely to completely move it out of the
fast path.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:47 -04:00
Steven Rostedt (Red Hat)
fcc742eaad ring-buffer: Add event descriptor to simplify passing data
Add rb_event_info descriptor to pass event info to functions a bit
easier than using a bunch of parameters. This will also allow for
changing the code around a bit to find better fast paths.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:46 -04:00
Umesh Tiwari
5e2d5ef8ec ftrace: correct the counter increment for trace_buffer data
In ftrace_dump, for disabling buffer, iter.tr->trace_buffer.data is used.
But for enabling, iter.trace_buffer->data is used.
Even though, both point to same buffer, for readability, same convention
should be used.

Link: http://lkml.kernel.org/r/1434972306-20043-1-git-send-email-umesh.t@samsung.com

Signed-off-by: Umesh Tiwari <umesh.t@samsung.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:45 -04:00
Gil Fruchter
72917235fd tracing: Fix for non-continuous cpu ids
Currently exception occures due to access beyond buffer_iter
range while using index of cpu bigger than num_possible_cpus().
Below there is an example for such exception when we use
cpus 0,1,16,17.

In order to fix buffer allocation size for non-continuous cpu ids
we allocate according to the max cpu id and not according to the
amount of possible cpus.

Example:
  $ cat /sys/kernel/debug/tracing/per_cpu/cpu1/trace
  Path: /bin/busybox
  CPU: 0 PID: 82 Comm: cat Not tainted 4.0.0 #29
  task: 80734c80 ti: 80012000 task.ti: 80012000

  [ECR   ]: 0x00220100 => Invalid Read @ 0x00000000 by insn @ 0x800abafc
  [EFA   ]: 0x00000000
  [BLINK ]: ring_buffer_read_finish+0x24/0x64
  [ERET  ]: rb_check_pages+0x20/0x188
  [STAT32]: 0x00001a00 :
  BTA: 0x800abafc  SP: 0x80013f0c  FP: 0x57719cf8
  LPS: 0x200036b4 LPE: 0x200036b8 LPC: 0x00000000
  r00: 0x8002aca0 r01: 0x00001606 r02: 0x00000000
  r03: 0x00000001 r04: 0x00000000 r05: 0x804b4954
  r06: 0x00030003 r07: 0x8002a260 r08: 0x00000286
  r09: 0x00080002 r10: 0x00001006 r11: 0x807351a4
  r12: 0x00000001

  Stack Trace:
    rb_check_pages+0x20/0x188
    ring_buffer_read_finish+0x24/0x64
    tracing_release+0x4e/0x170
    __fput+0x62/0x158
    task_work_run+0xa2/0xd4
    do_notify_resume+0x52/0x7c
    resume_user_mode_begin+0xdc/0xe0

Link: http://lkml.kernel.org/r/1433835155-6894-3-git-send-email-gilf@ezchip.com

Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Gil Fruchter <gilf@ezchip.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:45 -04:00
Gil Fruchter
9fe6b778ca tracing: Prefer kcalloc over kzalloc with multiply
Use kcalloc for allocating an array instead of kzalloc with multiply,
as that is what kcalloc is used for.
Found with checkpatch.

Link: http://lkml.kernel.org/r/1433835155-6894-2-git-send-email-gilf@ezchip.com

Signed-off-by: Gil Fruchter <gilf@ezchip.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:42 -04:00
kbuild test robot
5d285a7f35 futex: Make should_fail_futex() static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brian Silverman <bsilver16384@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-20 21:43:54 +02:00
Davidlohr Bueso
1b0b7c1762 rtmutex: Delete scriptable tester
No one uses this anymore, and this is not the first time the
idea of replacing it with a (now possible) userspace side.
Lock stealing logic was removed long ago in when the lock
was granted to the highest prio.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1435782588-4177-2-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-20 11:45:45 +02:00
Davidlohr Bueso
ab51fbab39 futex: Fault/error injection capabilities
Although futexes are well known for being a royal pita,
we really have very little debugging capabilities - except
for relying on tglx's eye half the time.

By simply making use of the existing fault-injection machinery,
we can improve this situation, allowing generating artificial
uaddress faults and deadlock scenarios. Of course, when this is
disabled in production systems, the overhead for failure checks
is practically zero -- so this is very cheap at the same time.
Future work would be nice to now enhance trinity to make use of
this.

There is a special tunable 'ignore-private', which can filter
out private futexes. Given the tsk->make_it_fail filter and
this option, pi futexes can be narrowed down pretty closely.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-3-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-20 11:45:45 +02:00
Davidlohr Bueso
767f509ca1 futex: Enhance comments in futex_lock_pi() for blocking paths
... serves a bit better to clarify between blocking
and non-blocking code paths.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-2-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-20 11:45:45 +02:00
James Morris
fe6c59dc17 Merge tag 'seccomp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into next 2015-07-20 17:19:19 +10:00
Linus Torvalds
0e1dbccd8f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two families of fixes:

   - Fix an FPU context related boot crash on newer x86 hardware with
     larger context sizes than what most people test.  To fix this
     without ugly kludges or extensive reverts we had to touch core task
     allocator, to allow x86 to determine the task size dynamically, at
     boot time.

     I've tested it on a number of x86 platforms, and I cross-built it
     to a handful of architectures:

                                        (warns)               (warns)
       testing     x86-64:  -git:  pass (    0),  -tip:  pass (    0)
       testing     x86-32:  -git:  pass (    0),  -tip:  pass (    0)
       testing        arm:  -git:  pass ( 1359),  -tip:  pass ( 1359)
       testing       cris:  -git:  pass ( 1031),  -tip:  pass ( 1031)
       testing       m32r:  -git:  pass ( 1135),  -tip:  pass ( 1135)
       testing       m68k:  -git:  pass ( 1471),  -tip:  pass ( 1471)
       testing       mips:  -git:  pass ( 1162),  -tip:  pass ( 1162)
       testing    mn10300:  -git:  pass ( 1058),  -tip:  pass ( 1058)
       testing     parisc:  -git:  pass ( 1846),  -tip:  pass ( 1846)
       testing      sparc:  -git:  pass ( 1185),  -tip:  pass ( 1185)

     ... so I hope the cross-arch impact 'none', as intended.

     (by Dave Hansen)

   - Fix various NMI handling related bugs unearthed by the big asm code
     rewrite and generally make the NMI code more robust and more
     maintainable while at it.  These changes are a bit late in the
     cycle, I hope they are still acceptable.

     (by Andy Lutomirski)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
  x86/fpu, sched: Dynamically allocate 'struct fpu'
  x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
  x86/nmi/64: Make the "NMI executing" variable more consistent
  x86/nmi/64: Minor asm simplification
  x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
  x86/nmi/64: Reorder nested NMI checks
  x86/nmi/64: Improve nested NMI comments
  x86/nmi/64: Switch stacks on userspace NMI entry
  x86/nmi/64: Remove asm code that saves CR2
  x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
2015-07-18 10:49:57 -07:00
Linus Torvalds
dae57fb64e Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "Fix for a misplaced export that can cause build failures in certain
  (rare) Kconfig situations"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: Move the export of tick_broadcast_oneshot_control to the proper place
2015-07-18 10:49:11 -07:00
Linus Torvalds
d65b78f5d8 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "A oneliner rq throttling fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Test list head instead of list entry in throttle_cfs_rq()
2015-07-18 10:47:44 -07:00
Linus Torvalds
59ee762156 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "Misc irq fixes:

   - two driver fixes
   - a Xen regression fix
   - a nested irq thread crash fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gicv3-its: Fix mapping of LPIs to collections
  genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
  genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
  gpio/davinci: Fix race in installing chained irq handler
2015-07-18 10:27:12 -07:00
Ingo Molnar
5aaeb5c01c x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:51 +02:00
Dave Hansen
0c8c0f03e3 x86/fpu, sched: Dynamically allocate 'struct fpu'
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).

Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already.  This saves from doing an extra slab
allocation at fork().

The only real downside here is that we have to stick everything
and the end of the task_struct.  But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:35 +02:00
Paul E. McKenney
cdacbe1f91 rcu: Add fastpath bypassing funnel locking
In the common case, there will be only one expedited grace period in
the system at a given time, in which case it is not helpful to use
funnel locking.  This commit therefore adds a fastpath that bypasses
funnel locking when the root ->exp_funnel_mutex is not held.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:06 -07:00
Paul E. McKenney
32bb1c7999 rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
The grace-period kthread sleeps waiting to do a force-quiescent-state
scan, and when awakened sets rsp->gp_state to RCU_GP_DONE_FQS.
However, this is confusing because the kthread has not done the
force-quiescent-state, but is instead just starting to do it.  This commit
therefore renames RCU_GP_DONE_FQS to RCU_GP_DOING_FQS in order to make
things a bit easier on reviewers.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:05 -07:00
Paul E. McKenney
b9a425cfcb rcu: Pull out wait_event*() condition into helper function
The condition for the wait_event_interruptible_timeout() that waits
to do the next force-quiescent-state scan is a bit ornate:

	((gf = READ_ONCE(rsp->gp_flags)) &
	 RCU_GP_FLAG_FQS) ||
	(!READ_ONCE(rnp->qsmask) &&
	 !rcu_preempt_blocked_readers_cgp(rnp))

This commit therefore pulls this condition out into a helper function
and comments its component conditions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:04 -07:00
Paul E. McKenney
cf3620a6c7 rcu: Add stall warnings to synchronize_sched_expedited()
Although synchronize_sched_expedited() historically has no RCU CPU stall
warnings, the availability of the rcupdate.rcu_expedited boot parameter
invalidates the old assumption that synchronize_sched()'s stall warnings
would suffice.  This commit therefore adds RCU CPU stall warnings to
synchronize_sched_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:01 -07:00
Paul E. McKenney
2cd6ffafec rcu: Extend expedited funnel locking to rcu_data structure
The strictly rcu_node based funnel-locking scheme works well in many
cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get
all that much concurrency.  This commit therefore extends the funnel
locking into the per-CPU rcu_data structure, providing concurrency equal
to the number of CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:59:00 -07:00
Paul E. McKenney
704dd435ac rcu: Consolidate last open-coded expedited memory barrier
One of the requirements on RCU grace periods is that if there is a
causal chain of operations that starts after one grace period and
ends before another grace period, then the two grace periods must
be serialized.  There has been (and might still be) code that relies
on this, for example, certain types of reference-counting code that
does a call_rcu() within an RCU callback function.

This requirement is why there is an smp_mb() at the end of both
synchronize_sched_expedited() and synchronize_rcu_expedited().
However, this is the only smp_mb() in these functions, so it would
be nicer to consolidate it into rcu_exp_gp_seq_end().  This commit
does just that.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:59 -07:00
Paul E. McKenney
4f525a528b rcu: Apply rcu_seq operations to _rcu_barrier()
The rcu_seq operations were open-coded in _rcu_barrier(), so this commit
replaces the open-coding with the shiny new rcu_seq operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:57 -07:00
Paul E. McKenney
29fd930940 rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop
This commit gets rid of synchronize_rcu_expedited()'s mutex_trylock()
polling loop in favor of the funnel-locking scheme that was abstracted
from synchronize_sched_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:56 -07:00
Paul E. McKenney
7fd0ddc5bf rcu: Fix synchronize_sched_expedited() type error for "s"
The type of "s" has been "long" rather than the correct "unsigned long"
for quite some time.  This commit fixes this type error.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:55 -07:00
Paul E. McKenney
b09e5f8601 rcu: Abstract funnel locking from synchronize_sched_expedited()
This commit abstracts funnel locking from synchronize_sched_expedited()
so that it may be used by synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17 14:58:53 -07:00