Roman Gushchin says:
====================
This patchset adds basic cgroup bpf operations to bpftool.
Right now there is no convenient way to perform these operations.
The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
bpf introspection, but lacks any cgroup-related specific.
I find having a tool to perform these basic operations in the kernel tree
very useful, as it can be used in the corresponding bpf documentation
without creating additional dependencies. And bpftool seems to be
a right tool to extend with such functionality.
v4:
- ATTACH_FLAGS and ATTACH_TYPE are listed and described in docs and usage
- ATTACH_FLAG names converted to "multi" and "override"
- do_attach() recognizes ATTACH_FLAG abbreviations, e.g "mul"
- Local variables sorted ("reverse Christmas tree")
- unknown attach flags value will be never truncated
v3:
- SRC replaced with OBJ in prog load docs
- Output unknown attach type in hex
- License header in SPDX format
- Minor style fixes (e.g. variable reordering)
v2:
- Added prog load operations
- All cgroup operations are looking like bpftool cgroup <command>
- All cgroup-related stuff is moved to a separate file
- Added support for attach flags
- Added support for attaching/detaching programs by id, pinned name, etc
- Changed cgroup detach arguments order
- Added empty json output for succesful programs
- Style fixed: includes order, strncmp and macroses, error handling
- Added man pages
v1:
https://lwn.net/Articles/740366/
====================
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds basic cgroup bpf operations to bpftool:
cgroup list, attach and detach commands.
Usage is described in the corresponding man pages,
and examples are provided.
Syntax:
$ bpftool cgroup list CGROUP
$ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
$ bpftool cgroup detach CGROUP ATTACH_TYPE PROG
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add the prog load command to load a bpf program from a specified
binary file and pin it to bpffs.
Usage description and examples are given in the corresponding man
page.
Syntax:
$ bpftool prog load OBJ FILE
FILE is a non-existing file on bpffs.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Libbpf picks the name of the first symbol in the corresponding
elf section to use as a program name. But without taking symbol's
scope into account it may end's up with some local label
as a program name. E.g.:
$ bpftool prog
1: type 15 name LBB0_10 tag 0390a5136ba23f5c
loaded_at Dec 07/17:22 uid 0
xlated 456B not jited memlock 4096B
Fix this by preferring global symbols as program name.
For instance:
$ bpftool prog
1: type 15 name bpf_prog1 tag 0390a5136ba23f5c
loaded_at Dec 07/17:26 uid 0
xlated 456B not jited memlock 4096B
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The bpf_prog_load() function will guess program type if it's not
specified explicitly. This functionality will be used to implement
loading of different programs without asking a user to specify
the program type. In first order it will be used by bpftool.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In ptdump_check_wx(), we pass walk_pgd() a start address of 0 (rather
than VA_START) for the init_mm. This means that any reported W&X
addresses are offset by VA_START, which is clearly wrong and can make
them appear like userspace addresses.
Fix this by telling the ptdump code that we're walking init_mm starting
at VA_START. We don't need to update the addr_markers, since these are
still valid bounds regardless.
Cc: <stable@vger.kernel.org>
Fixes: 1404d6f13e ("arm64: dump: Add checking for writable and exectuable pages")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@redhat.com>
Reported-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Conform two stray warning messages to the standard overlayfs: prefix.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add a line for the total number of events and current average at the
bottom of the body.
Note that both values exclude child trace events. I.e. if drilldown is
activated via interactive command 'x', only the totals are accounted, or
we'd be counting these twice (see previous commit "tools/kvm_stat: fix
child trace events accounting").
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unhandled arguments, which could easily include typos, are simply
ignored. We should be strict to avoid undetected typos.
To reproduce start kvm_stat with an extra argument, e.g.
'kvm_stat -d bnuh5ol' and note that this will actually work.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Errors while parsing the '-g' command line argument result in display of
usage information prior to the error message. This is a bit confusing,
as the command line is syntactically correct.
To reproduce, run 'kvm_stat -g' and specify a non-existing or inactive
guest.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Passing an invalid regular expression on the command line results in a
traceback. Note that interactive specification of invalid regular
expressions is not affected
To reproduce, run "kvm_stat -f '*'".
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The man page update for this new functionality was omitted.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Child trace events were included in calculation of the overall total,
which is used for calculation of the percentages of the '%Total' column.
However, the parent trace envents' stats summarize the child trace
events, hence we'd incorrectly account for them twice, leading to
slightly wrong stats.
With this fix, we use the correct total. Consequently, the sum of the
child trace events' '%Total' column values is identical to the
respective value of the respective parent event. However, this also
means that the sum of the '%Total' column values will aggregate to more
than 100 percent.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 67fbcd62f5 ("tools/kvm_stat: add '-f help' to get the available
event list") added support for '-f help'. However, the extra handling of
'help' will also take effect when 'help' is specified as a regex in
interactive mode via 'f'. This results in display of all events while
only those matching this regex should be shown.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When updating the fields filter, tracepoint events of fields previously
not visible were not enabled, as TracepointProvider.update_fields()
updated the member variable directly instead of using the setter, which
triggers the event enable/disable.
To reproduce, run 'kvm_stat -f kvm_exit', press 'c' to remove the
filter, and notice that no add'l fields that do not match the regex
'kvm_exit' will appear.
This issue was introduced by commit c469117df0 ("tools/kvm_stat:
simplify initializers").
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When displaying debugfs events listed by guests, an attempt to switch to
reporting of stats for individual child trace events results in garbled
output. Reason is that when toggling drilldown, the update of the stats
doesn't honor when events are displayed by guests, as indicated by
Tui._display_guests.
To reproduce, run 'kvm_stat -d' and press 'b' followed by 'x'.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Specifying a guest via '-g foo' always results in an error:
$ kvm_stat -g foo
Usage: kvm_stat [options]
kvm_stat: error: Error while searching for guest "foo", use "-p" to
specify a pid instead
Reason is that Tui.get_pid_from_gname() is not static, as it is supposed
to be.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
------------[ cut here ]------------
Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10
RIP: 0010:ex_handler_fprestore+0x88/0x90
Call Trace:
fixup_exception+0x4e/0x60
do_general_protection+0xff/0x270
general_protection+0x22/0x30
RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
kvm_apic_accept_events+0x1c0/0x240 [kvm]
kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
kvm_vcpu_ioctl+0x479/0x880 [kvm]
do_vfs_ioctl+0x142/0x9a0
SyS_ioctl+0x74/0x80
do_syscall_64+0x15f/0x600
where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
To fix it, move kvm_load_guest_fpu to the very beginning of
kvm_arch_vcpu_ioctl_run.
Cc: stable@vger.kernel.org
Fixes: f775b13eed
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The below test case can cause infinite loop in kvm when ept=0.
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <stdint.h>
#include <linux/kvm.h>
#include <fcntl.h>
#include <sys/ioctl.h>
long r[5];
int main()
{
r[2] = open("/dev/kvm", O_RDONLY);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
ioctl(r[4], KVM_RUN, 0);
}
It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in
kvm return 1 when kvm fails to allocate root page table which can result
in beblow infinite loop:
vcpu_run() {
for (;;) {
r = vcpu_enter_guest()::kvm_mmu_reload() returns 1
if (r <= 0)
break;
if (need_resched())
cond_resched();
}
}
This patch fixes it by returning -ENOSPC when there is no available kvm mmu
page for root page table.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 26eeb53cf0 (KVM: MMU: Bail out immediately if there is no available mmu page)
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This case can been seen when creating the lease with the same objects passed.
[ 605.515097] 2 locks held by testapp/3337:
[ 605.519027] #0: (&dev->mode_config.idr_mutex){......}, at: [<ffff0000085f1664>] drm_mode_create_lease_ioctl+0x384/0x858
[ 605.530045] #1: (&dev->mode_config.idr_mutex){......}, at: [<ffff0000085f11bc>] drm_lease_destroy+0x2c/0x110
Which was causing the process to hang:
[ 605.398827] [<ffff0000080856cc>] __switch_to+0x94/0xa8
[ 605.404030] [<ffff000008c05d00>] __schedule+0x1b0/0x698
[ 605.409322] [<ffff000008c06224>] schedule+0x3c/0xa8
[ 605.414260] [<ffff000008c06628>] schedule_preempt_disabled+0x20/0x38
[ 605.420677] [<ffff000008c07370>] mutex_lock_nested+0x158/0x340
[ 605.426572] [<ffff0000085f11bc>] drm_lease_destroy+0x2c/0x110
[ 605.432389] [<ffff0000085cecf0>] drm_master_put+0xc0/0xc8
[ 605.437845] [<ffff0000085f175c>] drm_mode_create_lease_ioctl+0x47c/0x858
[ 605.444612] [<ffff0000085d4460>] drm_ioctl+0x198/0x448
[ 605.449811] [<ffff000008201134>] do_vfs_ioctl+0xa4/0x748
[ 605.455192] [<ffff000008201864>] SyS_ioctl+0x8c/0xa0
[ 605.460216] [<ffff000008082f4c>] __sys_trace_return+0x0/0x4
drm_mode_create_lease_ioctl() calls drm_lease_create() which acquires a lock
on dev->mode_config.idr_mutex. In case of failure, drm_lease_create() calls
drm_master_put() which in turn tries to acquire the same lock when calling
drm_lease_destroy().
v2: - Reverse the order at exit in case of fail, so that unlocking takes place
before dropping the reference.
- Include detail information about deadlock (Daniel Vetter)
Signed-off-by: Marius Vlad <marius-cristian.vlad@nxp.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213181048.32719-1-marius-cristian.vlad@nxp.com
- Clean up duplicate includes
- Remove ancient 'no-alloc' crap code that occasionally caused hard fs
shutdowns due to lack of proper space reservations
- Fix regression in FIEMAP behavior when reporting xattr extents
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJaK0JUAAoJEPh/dxk0SrTrWOcP/iDoE1nV8BHru8ynwCr0ABun
Hc+dmtQ1uQezu1qewzWkxH/zkyvpMBtH3wkqkYQApbPw7jSN4WDUazEGPY4Ju6pJ
gMyg64EEC6UEGN8B9M2mf1QB/Q/TjZSeFiKOLw78ikWYSG/dbf814zC2fyWO79eG
mjGzNbdvBbId35HLd62vd8VAW7zYY3acOyzQEl41LqKoGXD9eFWIh/uvH0bGuxN3
3YipW/PM7MBq+1rCi6pFVX+wt7pemi8hQ4vRZqMp24SB5JmvruP9E45iOt/8sep+
D/x1YjDyhutshAjbXyIaruxeIfsrs/r/3SAkOQgktwc8ihadBTJF3TPL9aTUGwLS
1dCL7Gd2Mx317yeHzSFs+FCq8pc+ioysbyZcCIlJPnhb1ZCaA98XD/desbNL/BY4
uf/Uq/5dJ6Kwllzol1VVz4CVKne4x1vQhPuIT1/wYsd2tSIYiBg+XlFV67CB7Fsv
9wRetybw2c22qINLNPc50tocGcormQT940PieketssFsOHa96GduT5Z5DEbZa7FV
/yk68o50VU2zlKuAMtTYbLT+uL/TimgeHU1pSCXOwT2wvJA/O5hVQEadIZ51cMct
KSFlY8xEGwDZM8S88Xf1H7yFmUpGvmAnIwPHCZSJur026rZMWeANl6MTZJTJSpTx
Wdj87C+2s5awNUcZmX0n
=cmic
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.15-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are a few more bug fixes & cleanups for 4.15-rc4:
- clean up duplicate includes
- remove ancient 'no-alloc' crap code that occasionally caused hard
fs shutdowns due to lack of proper space reservations
- fix regression in FIEMAP behavior when reporting xattr extents"
* tag 'xfs-4.15-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: make iomap_begin functions trim iomaps consistently
xfs: remove "no-allocation" reservations for file creations
fs: xfs: remove duplicate includes
This pull request contains three small fixes that I'm hoping to get into
4.15-rc4:
* A fix to a typo in sys_riscv_flush_icache. This only effects error
handling, but I think it's a small and obvious enough change that it's
sane outside the merge window.
* The addition of smp_mb__after_spinlock(), which was recently removed
due to an incorrect comment. This is largly a comment change (as
there's a big one now), and while it's necessary for complience with
the RISC-V memory model the lack of this fence shouldn't manifest as a
bug on current implementations. Nonetheless, it still seems saner to
have the fence in 4.15.
* The removal of some of the HVC_RISCV_SBI driver that snuck into the
arch port. This is compile-time dead code in 4.15 (as the driver
isn't in yet), and during the review process we found a better way to
implement early printk on RISC-V. While this change doesn't do
anything, it will make staging our HVC driver easier: without this
change the HVC driver we hope to upstream won't build on 4.15 (because
the 4.15 arch code would reference a function that no longer exists).
Additionally, I'm instituting a bit of a process change so I don't
submit things too quickly again:
* All the patches I submit during an RC will be from the week before, so
everyone has gotten a chance to see them and they've made it through
our autobuilders and integration trees.
* I'll cherry-pick (single patches) or merge (if it's a patch set) on
top of the new RC on Monday morning.
* I'll sign the tag on Monday morning, to let the autobuilders pick up
exactly what I'm submitting.
* Assuming nothing goes wrong, I'll mail the pull request out on
Wednesday. If something goes wrong, I'll wait at least a day after
re-spinning the tag to let the autobuilders pick things up.
Hopefully this will avoid any headaches in the future, barring any
emergency fixes.
I don't think this is the last patch set we'll want for 4.15: I think
I'll want to remove some of the first-level irqchip driver that snuck in
as well, which will look a lot like the HVC patch here. This is pending
some asm-generic cleanup I'm doing that I haven't quite gotten clean
enough to send out yet, though, but hopefully it'll be ready by next
week (and still OK for that late).
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlou0sQTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQebsD/4gPSr0adXxl3hXhrHWQIcigWM1QNgp
EjNnsHiT1zLs9SNK+CBLQXEjXp/GwqfYDiE6yzaJEhoBm5rHB9c72IThCJkt6fRR
5fuybYXxBXaXVxq0CbaItNzTdqm99iqSCD2p3mh7Dmw/X23EglkYdPqQlUIxWTtw
BmlCaH8tn8PwGIgVATwJe8fWEe8tep8uPUW6NAMznLVCKKaOd56fkP+FVA1+M7u4
7kaoG6B8tg3mEbHdsfvo4VSII7HcHsaTa4tOaD7RPjEDKdxFQbzY8HUP38BunqOu
v9bsV0BXYvG5dHF0eWtQAfB+y5V7n1fipcfKtZl2HagN9hbKruwQMPZTO+xPT7Am
F/VyYxoirzfY5quA0ancS+DV54LfcHC5MH9Xvhacvg7yY3zYTvhsZfEaND0GAMLT
sqO3Mi24+yw5Cs24uSJEj6qkxxuHMZUePpzWJWET2vduXe+pqOYF8jQn13SqHyxs
QoIaw18RqcBPtX6vABFoMKP6WHPk8No3iki5DrSVSQ0jN9r3HSBxvYCIdG89P6A+
E0a65H91GhY/kVNJAE4YC8MF3e6D8/+KX9+h/q1wyueJV2tTE0cd0ZiQY7hN36HY
/QIf8zH4tmYhHAJFNLQnK1sg1Ipino6ABHNv76MhZMxguYD4FAE0nAHsfv1AAa97
Webx5zolwHEz8A==
=FRZX
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains three small fixes:
- A fix to a typo in sys_riscv_flush_icache. This only effects error
handling, but I think it's a small and obvious enough change that
it's sane outside the merge window.
- The addition of smp_mb__after_spinlock(), which was recently
removed due to an incorrect comment. This is largly a comment
change (as there's a big one now), and while it's necessary for
complience with the RISC-V memory model the lack of this fence
shouldn't manifest as a bug on current implementations.
Nonetheless, it still seems saner to have the fence in 4.15.
- The removal of some of the HVC_RISCV_SBI driver that snuck into the
arch port. This is compile-time dead code in 4.15 (as the driver
isn't in yet), and during the review process we found a better way
to implement early printk on RISC-V. While this change doesn't do
anything, it will make staging our HVC driver easier: without this
change the HVC driver we hope to upstream won't build on 4.15
(because the 4.15 arch code would reference a function that no
longer exists).
I don't think this is the last patch set we'll want for 4.15: I think
I'll want to remove some of the first-level irqchip driver that snuck
in as well, which will look a lot like the HVC patch here. This is
pending some asm-generic cleanup I'm doing that I haven't quite gotten
clean enough to send out yet, though, but hopefully it'll be ready by
next week (and still OK for that late)"
* tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
RISC-V: Remove unused CONFIG_HVC_RISCV_SBI code
RISC-V: Resurrect smp_mb__after_spinlock()
RISC-V: Logical vs Bitwise typo
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-13
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Addition of explicit scheduling points to map alloc/free
in order to avoid having to hold the CPU for too long,
from Eric.
2) Fixing of a corruption in overlapping perf_event_output
calls from different BPF prog types on the same CPU out
of different contexts, from Daniel.
3) Fallout fixes for recent correction of broken uapi for
BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header
that needed to be pulled in from asm-generic and for
BPF selftests the asm-generic include did not work,
so similar asm include scheme was adapted for that
problematic header that perf is having with other
header files under tools, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
PROBE_DEFER also uses system_wq to reprobe drivers, which means when
that again fails, and we try to flush the overall system_wq (to get
all the delayed connectore cleanup work_struct completed), we
deadlock.
Fix this by using just a single cleanup work, so that we can only
flush that one and don't block on anything else. That means a free
list plus locking, a standard pattern.
v2:
- Correctly free connectors only on last ref. Oops (Chris).
- use llist_head/node (Chris).
v3
- Add init_llist_head (Chris).
Fixes: a703c55004 ("drm: safely free connectors from connector_iter")
Fixes: 613051dac4 ("drm: locking&new iterators for connector_list")
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: <stable@vger.kernel.org> # v4.11+: 613051dac4 ("drm: locking&new iterators for connector_list"
Cc: <stable@vger.kernel.org> # v4.11+
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Javier Martinez Canillas <javier@dowhile0.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Guillaume Tucker <guillaume.tucker@collabora.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Matt Hart <matthew.hart@linaro.org>
Cc: Thierry Escande <thierry.escande@collabora.co.uk>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213124936.17914-1-daniel.vetter@ffwll.ch
Commit f371b304f1 ("bpf/tracing: allow user space to
query prog array on the same tp") introduced a perf
ioctl command to query prog array attached to the
same perf tracepoint. The commit introduced a
compilation error under certain config conditions, e.g.,
(1). CONFIG_BPF_SYSCALL is not defined, or
(2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
nor CONFIG_KPROBE_EVENTS is defined.
Error message:
kernel/events/core.o: In function `perf_ioctl':
core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'
This patch fixed this error by guarding the real definition under
CONFIG_BPF_EVENTS and provided static inline dummy function
if CONFIG_BPF_EVENTS was not defined.
It renamed the function from bpf_event_query_prog_array to
perf_event_query_prog_array and moved the definition from linux/bpf.h
to linux/trace_events.h so the definition is in proximity to
other prog_array related functions.
Fixes: f371b304f1 ("bpf/tracing: allow user space to query prog array on the same tp")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tariq Toukan says:
====================
mlx4 misc fixes
This patchset contains misc bug fixes from the team
to the mlx4 Core and Eth drivers.
Patch 1 by Eugenia fixes an MTU issue in selftest.
Patch 2 by Eran fixes an accounting issue in the resource tracker.
Patch 3 by Eran fixes a race condition that causes counter inconsistency.
Series generated against net commit:
200809716a fou: fix some member types in guehdr
v2:
Patch 2: Add reviewer credit, rephrase commit message.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this patch, the stats_lock was acquired twice. In between the
locks Driver sent command to gather some more statistics (per priority
and counter statistics). If the stats lock was acquired by get
statistics NDO in between we would have report out of sync counters.
Fix this by collecting all stats from Firmware in advance and then
fill the Software structs under one lock.
Fixes: 0b131561a7 ("net/mlx4_en: Add Flow control statistics display via ethtool")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The field res_free indicates the total number of counters which are
available for allocation (reserved and unreserved). Fixed a bug where
the reserved counters were subtracted from res_free before any
allocation was performed.
Before this fix, free counters which were not reserved could not be
allocated.
Fixes: 9de92c60be ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the minimal MTU threshold for running loopback selftest.
MTU should be big enough to include packet payload, NET_IP_ALIGN,
Ethernet headers and preamble length.
Fixes: e7c1c2c462 ("mlx4_en: Added self diagnostics test implementation")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify lan9303_indirect_phy_wait_for_completion()
and lan9303_switch_wait_for_completion() by using a new function
lan9303_read_wait()
Changes v1 -> v2:
- param 'mask' type u32
- removed param 'value' (will probably never be used)
- add newline before return
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When in SGMII-to-Copper mode, the fiber page is used for the MAC facing
link, and does not require configuration of the fiber auto-negotiation
settings. Avoid trying.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu will join to maintain dwc-xlgmac.
He will help with new feature development for
this driver. Thanks Jose and welcome on board!
Signed-off-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only the retransmit timer currently refreshes tcp_mstamp
We should do the same for delayed acks and keepalives.
Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.
Fixes: 385e20706f ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Mike Maloney <maloney@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Mike Maloney <maloney@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ms timestamp is used, current logic uses 1us in
tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us.
This could cause rcv_rtt underestimation.
Fix it by always using a min value of 1ms if ms timestamp is used.
Fixes: 645f4c6f2e ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
hv_netvsc: minor changes
This includes minor cleanup of code in send and receive path and
also a new statistic to check for allocation failures. This also
eliminates some of the extra RCU when not needed.
There is a theoritical bug where buffered data could be blocked
for longer than necessary if the ring buffer got full. This
has not been seen in the wild, found by inspection.
The reference count between net device and internal RNDIS
is not needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If the transmit queue is known full, then don't keep aggregating
data. And the cp_partial flag which indicates that the current
aggregation buffer is full can be folded in to avoid more
conditionals.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is only ever a single instance of network device object
referencing the internal rndis object. Therefore the open_cnt atomic
is not necessary.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc_receive_callback function was using RCU to find the
appropriate underlying netvsc_device. Since calling function already
had that pointer, this was unnecessary.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The caller (netvsc_receive) already has the net device pointer,
and should just pass that to functions rather than the hyperv device.
This eliminates several impossible error paths in the process.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When skb can not be allocated, update ethtool statisitics
rather than rx_dropped which is intended for netif_receive.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since only caller does not care about return value.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
PHYLINK preparatory patches for DSA
In preparation for having DSA migrate to PHYLINK, I had to come up with a
number of preparatory patches:
- we need to be able to pass phy_flags from an external component calling
phylink_of_phy_connect()
- DSA tries to connect through OF first, then fallsback using its own internal
MDIO bus, in that case we would both show an error, but also not know what
the correct phy_interface_t would be, instead use the PHY device/driver provided
one
- Finally bcm_sf2 makes use of all possible PHYs out there: internal, external,
fixed, and MoCA, the latter requires a bit of help to signal link notifications
through a MMIO interrupt, as well a report a correct PORT type
Changes in v2:
- rebased against latest net-next/master
- added kernel doc documentation
- dropped error message in phylink_of_phy_connect() as suggested by Russell
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to what PHYLIB already does, make sure that
PHY_INTERFACE_MODE_MOCA is reported as PORT_BNC.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
phylink_get_fixed_state() currently consults an optional "link_gpio"
GPIO descriptor, expand this mechanism to allow specifying a custom
callback. This is necessary to support out of band link notifcation
(e.g: from an interrupt within a MMIO register).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some subsystems like DSA may be trying to connect to a PHY through OF first,
and then attempt a connect using a local MDIO bus, remove the error message:
"unable to find PHY node" so we can let MAC drivers whether to print it or not.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
We may not always be able to resolve a correct phy_interface_t value before
actually connecting to the PHY device, when that happens, just have
phylink_connect_phy() utilize what the PHY device/driver provided.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to let subsystems like DSA fully utilize PHYLINK, we need to be able
to communicate phy_device::flags from of_phy_{connect,attach} even when using
PHYLINK APIs.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, active Fast Open is paused on a specific
destination IP address if the previous connections to the
IP address have experienced recurring timeouts . But recent
experiments by Microsoft (https://goo.gl/cykmn7) and Mozilla
browsers indicate the isssue is often caused by broken middle-boxes
sitting close to the client. Therefore it is much better user
experience if Fast Open is disabled out-right globally to avoid
experiencing further timeouts on connections toward other
destinations.
This patch changes the destination-IP disablement to global
disablement if a connection experiencing recurring timeouts
or aborts due to timeout. Repeated incidents would still
exponentially increase the pause time, starting from an hour.
This is extremely conservative but an unfortunate compromise to
minimize bad experience due to broken middle-boxes.
Reported-by: Dragana Damjanovic <ddamjanovic@mozilla.com>
Reported-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not correct to return NULL when that is actually an error and
function returns errors in any other wrong case. In the same time,
the cpsw driver and davinci emac doesn't check error case while
creating channel and it can miss actual error. Also remove WARNs
replacing them on dev_err msgs.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>