Commit graph

724073 commits

Author SHA1 Message Date
Daniel Borkmann
7310c23328 Merge branch 'bpf-bpftool-cgroup-ops'
Roman Gushchin says:

====================
This patchset adds basic cgroup bpf operations to bpftool.

Right now there is no convenient way to perform these operations.
The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
bpf introspection, but lacks any cgroup-related specific.

I find having a tool to perform these basic operations in the kernel tree
very useful, as it can be used in the corresponding bpf documentation
without creating additional dependencies. And bpftool seems to be
a right tool to extend with such functionality.

v4:
  - ATTACH_FLAGS and ATTACH_TYPE are listed and described in docs and usage
  - ATTACH_FLAG names converted to "multi" and "override"
  - do_attach() recognizes ATTACH_FLAG abbreviations, e.g "mul"
  - Local variables sorted ("reverse Christmas tree")
  - unknown attach flags value will be never truncated

v3:
  - SRC replaced with OBJ in prog load docs
  - Output unknown attach type in hex
  - License header in SPDX format
  - Minor style fixes (e.g. variable reordering)

v2:
  - Added prog load operations
  - All cgroup operations are looking like bpftool cgroup <command>
  - All cgroup-related stuff is moved to a separate file
  - Added support for attach flags
  - Added support for attaching/detaching programs by id, pinned name, etc
  - Changed cgroup detach arguments order
  - Added empty json output for succesful programs
  - Style fixed: includes order, strncmp and macroses, error handling
  - Added man pages

v1:
  https://lwn.net/Articles/740366/
====================

Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14 13:37:14 +01:00
Roman Gushchin
5ccda64d38 bpftool: implement cgroup bpf operations
This patch adds basic cgroup bpf operations to bpftool:
cgroup list, attach and detach commands.

Usage is described in the corresponding man pages,
and examples are provided.

Syntax:
$ bpftool cgroup list CGROUP
$ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
$ bpftool cgroup detach CGROUP ATTACH_TYPE PROG

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14 13:37:13 +01:00
Roman Gushchin
49a086c201 bpftool: implement prog load command
Add the prog load command to load a bpf program from a specified
binary file and pin it to bpffs.

Usage description and examples are given in the corresponding man
page.

Syntax:
$ bpftool prog load OBJ FILE

FILE is a non-existing file on bpffs.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14 13:37:13 +01:00
Roman Gushchin
fe4d44b23f libbpf: prefer global symbols as bpf program name source
Libbpf picks the name of the first symbol in the corresponding
elf section to use as a program name. But without taking symbol's
scope into account it may end's up with some local label
as a program name. E.g.:

$ bpftool prog
1: type 15  name LBB0_10    tag 0390a5136ba23f5c
	loaded_at Dec 07/17:22  uid 0
	xlated 456B  not jited  memlock 4096B

Fix this by preferring global symbols as program name.

For instance:
$ bpftool prog
1: type 15  name bpf_prog1  tag 0390a5136ba23f5c
	loaded_at Dec 07/17:26  uid 0
	xlated 456B  not jited  memlock 4096B

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14 13:37:13 +01:00
Roman Gushchin
583c90097f libbpf: add ability to guess program type based on section name
The bpf_prog_load() function will guess program type if it's not
specified explicitly. This functionality will be used to implement
loading of different programs without asking a user to specify
the program type. In first order it will be used by bpftool.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-14 13:37:13 +01:00
Mark Rutland
1d08a044cf arm64: fix CONFIG_DEBUG_WX address reporting
In ptdump_check_wx(), we pass walk_pgd() a start address of 0 (rather
than VA_START) for the init_mm. This means that any reported W&X
addresses are offset by VA_START, which is clearly wrong and can make
them appear like userspace addresses.

Fix this by telling the ptdump code that we're walking init_mm starting
at VA_START. We don't need to update the addr_markers, since these are
still valid bounds regardless.

Cc: <stable@vger.kernel.org>
Fixes: 1404d6f13e ("arm64: dump: Add checking for writable and exectuable pages")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@redhat.com>
Reported-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-14 10:18:23 +00:00
Amir Goldstein
da2e6b7eed ovl: fix overlay: warning prefix
Conform two stray warning messages to the standard overlayfs: prefix.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-14 11:14:52 +01:00
Stefan Raspl
cf656c7661 tools/kvm_stat: add line for totals
Add a line for the total number of events and current average at the
bottom of the body.
Note that both values exclude child trace events. I.e. if drilldown is
activated via interactive command 'x', only the totals are accounted, or
we'd be counting these twice (see previous commit "tools/kvm_stat: fix
child trace events accounting").

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:47 +01:00
Stefan Raspl
73fab6ffbd tools/kvm_stat: stop ignoring unhandled arguments
Unhandled arguments, which could easily include typos, are simply
ignored. We should be strict to avoid undetected typos.
To reproduce start kvm_stat with an extra argument, e.g.
'kvm_stat -d bnuh5ol' and note that this will actually work.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:46 +01:00
Stefan Raspl
822cfe3e48 tools/kvm_stat: suppress usage information on command line errors
Errors while parsing the '-g' command line argument result in display of
usage information prior to the error message. This is a bit confusing,
as the command line is syntactically correct.
To reproduce, run 'kvm_stat -g' and specify a non-existing or inactive
guest.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:46 +01:00
Stefan Raspl
08e20a6300 tools/kvm_stat: handle invalid regular expressions
Passing an invalid regular expression on the command line results in a
traceback. Note that interactive specification of invalid regular
expressions is not affected
To reproduce, run "kvm_stat -f '*'".

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:45 +01:00
Stefan Raspl
f3d11b0e86 tools/kvm_stat: add hint on '-f help' to man page
The man page update for this new functionality was omitted.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:44 +01:00
Stefan Raspl
fff8c9eb48 tools/kvm_stat: fix child trace events accounting
Child trace events were included in calculation of the overall total,
which is used for calculation of the percentages of the '%Total' column.
However, the parent trace envents' stats summarize the child trace
events, hence we'd incorrectly account for them twice, leading to
slightly wrong stats.
With this fix, we use the correct total. Consequently, the sum of the
child trace events' '%Total' column values is identical to the
respective value of the respective parent event. However, this also
means that the sum of the '%Total' column values will aggregate to more
than 100 percent.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:44 +01:00
Stefan Raspl
b74faa930d tools/kvm_stat: fix extra handling of 'help' with fields filter
Commit 67fbcd62f5 ("tools/kvm_stat: add '-f help' to get the available
event list") added support for '-f help'. However, the extra handling of
'help' will also take effect when 'help' is specified as a regex in
interactive mode via 'f'. This results in display of all events while
only those matching this regex should be shown.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:43 +01:00
Stefan Raspl
67c162b089 tools/kvm_stat: fix missing field update after filter change
When updating the fields filter, tracepoint events of fields previously
not visible were not enabled, as TracepointProvider.update_fields()
updated the member variable directly instead of using the setter, which
triggers the event enable/disable.
To reproduce, run 'kvm_stat -f kvm_exit', press 'c' to remove the
filter, and notice that no add'l fields that do not match the regex
'kvm_exit' will appear.
This issue was introduced by commit c469117df0 ("tools/kvm_stat:
simplify initializers").

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:42 +01:00
Stefan Raspl
faa0665041 tools/kvm_stat: fix drilldown in events-by-guests mode
When displaying debugfs events listed by guests, an attempt to switch to
reporting of stats for individual child trace events results in garbled
output. Reason is that when toggling drilldown, the update of the stats
doesn't honor when events are displayed by guests, as indicated by
Tui._display_guests.
To reproduce, run 'kvm_stat -d' and press 'b' followed by 'x'.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:42 +01:00
Stefan Raspl
19e8e54f43 tools/kvm_stat: fix command line option '-g'
Specifying a guest via '-g foo' always results in an error:
  $ kvm_stat -g foo
  Usage: kvm_stat [options]

  kvm_stat: error: Error while searching for guest "foo", use "-p" to
  specify a pid instead

Reason is that Tui.get_pid_from_gname() is not static, as it is supposed
to be.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:25:41 +01:00
Peter Xu
5663d8f9bb kvm: x86: fix WARN due to uninitialized guest FPU state
------------[ cut here ]------------
 Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
 WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
 CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G    B      OE    4.15.0-rc2+ #10
 RIP: 0010:ex_handler_fprestore+0x88/0x90
 Call Trace:
  fixup_exception+0x4e/0x60
  do_general_protection+0xff/0x270
  general_protection+0x22/0x30
 RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
 RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
  kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
  kvm_apic_accept_events+0x1c0/0x240 [kvm]
  kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
  kvm_vcpu_ioctl+0x479/0x880 [kvm]
  do_vfs_ioctl+0x142/0x9a0
  SyS_ioctl+0x74/0x80
  do_syscall_64+0x15f/0x600

where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
To fix it, move kvm_load_guest_fpu to the very beginning of
kvm_arch_vcpu_ioctl_run.

Cc: stable@vger.kernel.org
Fixes: f775b13eed
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:24:35 +01:00
Wanpeng Li
d73235d17b KVM: X86: Fix load RFLAGS w/o the fixed bit
*** Guest State ***
 CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
 CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
 CR3 = 0x00000000fffbc000
 RSP = 0x0000000000000000  RIP = 0x0000000000000000
 RFLAGS=0x00000000         DR7 = 0x0000000000000400
        ^^^^^^^^^^

The failed vmentry is triggered by the following testcase when ept=Y:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
    	r[2] = open("/dev/kvm", O_RDONLY);
    	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
    	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
    	struct kvm_regs regs = {
    		.rflags = 0,
    	};
    	ioctl(r[4], KVM_SET_REGS, &regs);
    	ioctl(r[4], KVM_RUN, 0);
    }

X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1
of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails.
This patch fixes it by oring X86_EFLAGS_FIXED during ioctl.

Cc: stable@vger.kernel.org
Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Quan Xu <quan.xu0@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:24:26 +01:00
Wanpeng Li
ed52870f46 KVM: MMU: Fix infinite loop when there is no available mmu page
The below test case can cause infinite loop in kvm when ept=0.

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
    	r[2] = open("/dev/kvm", O_RDONLY);
    	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
    	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
    	ioctl(r[4], KVM_RUN, 0);
    }

It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in
kvm return 1 when kvm fails to allocate root page table which can result
in beblow infinite loop:

    vcpu_run() {
    	for (;;) {
	    	r = vcpu_enter_guest()::kvm_mmu_reload() returns 1
	    	if (r <= 0)
	    		break;
	    	if (need_resched())
	    		cond_resched();
      }
    }

This patch fixes it by returning -ENOSPC when there is no available kvm mmu
page for root page table.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 26eeb53cf0 (KVM: MMU: Bail out immediately if there is no available mmu page)
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:24:14 +01:00
Marius Vlad
bd36d3bab2 drm/drm_lease: Prevent deadlock in case drm_lease_create() fails
This case can been seen when creating the lease with the same objects passed.

[  605.515097] 2 locks held by testapp/3337:
[  605.519027]  #0:  (&dev->mode_config.idr_mutex){......}, at: [<ffff0000085f1664>] drm_mode_create_lease_ioctl+0x384/0x858
[  605.530045]  #1:  (&dev->mode_config.idr_mutex){......}, at: [<ffff0000085f11bc>] drm_lease_destroy+0x2c/0x110

Which was causing the process to hang:

[  605.398827] [<ffff0000080856cc>] __switch_to+0x94/0xa8
[  605.404030] [<ffff000008c05d00>] __schedule+0x1b0/0x698
[  605.409322] [<ffff000008c06224>] schedule+0x3c/0xa8
[  605.414260] [<ffff000008c06628>] schedule_preempt_disabled+0x20/0x38
[  605.420677] [<ffff000008c07370>] mutex_lock_nested+0x158/0x340
[  605.426572] [<ffff0000085f11bc>] drm_lease_destroy+0x2c/0x110
[  605.432389] [<ffff0000085cecf0>] drm_master_put+0xc0/0xc8
[  605.437845] [<ffff0000085f175c>] drm_mode_create_lease_ioctl+0x47c/0x858
[  605.444612] [<ffff0000085d4460>] drm_ioctl+0x198/0x448
[  605.449811] [<ffff000008201134>] do_vfs_ioctl+0xa4/0x748
[  605.455192] [<ffff000008201864>] SyS_ioctl+0x8c/0xa0
[  605.460216] [<ffff000008082f4c>] __sys_trace_return+0x0/0x4

drm_mode_create_lease_ioctl() calls drm_lease_create() which acquires a lock
on dev->mode_config.idr_mutex. In case of failure, drm_lease_create() calls
drm_master_put() which in turn tries to acquire the same lock when calling
drm_lease_destroy().

v2: - Reverse the order at exit in case of fail, so that unlocking takes place
before dropping the reference.
    - Include detail information about deadlock (Daniel Vetter)

Signed-off-by: Marius Vlad <marius-cristian.vlad@nxp.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213181048.32719-1-marius-cristian.vlad@nxp.com
2017-12-14 08:25:37 +01:00
Linus Torvalds
7c5cac1bc7 Changes since last update:
- Clean up duplicate includes
 - Remove ancient 'no-alloc' crap code that occasionally caused hard fs
   shutdowns due to lack of proper space reservations
 - Fix regression in FIEMAP behavior when reporting xattr extents
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJaK0JUAAoJEPh/dxk0SrTrWOcP/iDoE1nV8BHru8ynwCr0ABun
 Hc+dmtQ1uQezu1qewzWkxH/zkyvpMBtH3wkqkYQApbPw7jSN4WDUazEGPY4Ju6pJ
 gMyg64EEC6UEGN8B9M2mf1QB/Q/TjZSeFiKOLw78ikWYSG/dbf814zC2fyWO79eG
 mjGzNbdvBbId35HLd62vd8VAW7zYY3acOyzQEl41LqKoGXD9eFWIh/uvH0bGuxN3
 3YipW/PM7MBq+1rCi6pFVX+wt7pemi8hQ4vRZqMp24SB5JmvruP9E45iOt/8sep+
 D/x1YjDyhutshAjbXyIaruxeIfsrs/r/3SAkOQgktwc8ihadBTJF3TPL9aTUGwLS
 1dCL7Gd2Mx317yeHzSFs+FCq8pc+ioysbyZcCIlJPnhb1ZCaA98XD/desbNL/BY4
 uf/Uq/5dJ6Kwllzol1VVz4CVKne4x1vQhPuIT1/wYsd2tSIYiBg+XlFV67CB7Fsv
 9wRetybw2c22qINLNPc50tocGcormQT940PieketssFsOHa96GduT5Z5DEbZa7FV
 /yk68o50VU2zlKuAMtTYbLT+uL/TimgeHU1pSCXOwT2wvJA/O5hVQEadIZ51cMct
 KSFlY8xEGwDZM8S88Xf1H7yFmUpGvmAnIwPHCZSJur026rZMWeANl6MTZJTJSpTx
 Wdj87C+2s5awNUcZmX0n
 =cmic
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.15-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Here are a few more bug fixes & cleanups for 4.15-rc4:

   - clean up duplicate includes

   - remove ancient 'no-alloc' crap code that occasionally caused hard
     fs shutdowns due to lack of proper space reservations

   - fix regression in FIEMAP behavior when reporting xattr extents"

* tag 'xfs-4.15-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: make iomap_begin functions trim iomaps consistently
  xfs: remove "no-allocation" reservations for file creations
  fs: xfs: remove duplicate includes
2017-12-13 20:15:49 -08:00
Linus Torvalds
4e746cf4f7 RISC-V Fixes for 4.15-rc4
This pull request contains three small fixes that I'm hoping to get into
 4.15-rc4:
 
 * A fix to a typo in sys_riscv_flush_icache.  This only effects error
   handling, but I think it's a small and obvious enough change that it's
   sane outside the merge window.
 * The addition of smp_mb__after_spinlock(), which was recently removed
   due to an incorrect comment.  This is largly a comment change (as
   there's a big one now), and while it's necessary for complience with
   the RISC-V memory model the lack of this fence shouldn't manifest as a
   bug on current implementations.  Nonetheless, it still seems saner to
   have the fence in 4.15.
 * The removal of some of the HVC_RISCV_SBI driver that snuck into the
   arch port.  This is compile-time dead code in 4.15 (as the driver
   isn't in yet), and during the review process we found a better way to
   implement early printk on RISC-V.  While this change doesn't do
   anything, it will make staging our HVC driver easier: without this
   change the HVC driver we hope to upstream won't build on 4.15 (because
   the 4.15 arch code would reference a function that no longer exists).
 
 Additionally, I'm instituting a bit of a process change so I don't
 submit things too quickly again:
 
 * All the patches I submit during an RC will be from the week before, so
   everyone has gotten a chance to see them and they've made it through
   our autobuilders and integration trees.
 * I'll cherry-pick (single patches) or merge (if it's a patch set) on
   top of the new RC on Monday morning.
 * I'll sign the tag on Monday morning, to let the autobuilders pick up
   exactly what I'm submitting.
 * Assuming nothing goes wrong, I'll mail the pull request out on
   Wednesday.  If something goes wrong, I'll wait at least a day after
   re-spinning the tag to let the autobuilders pick things up.
 
 Hopefully this will avoid any headaches in the future, barring any
 emergency fixes.
 
 I don't think this is the last patch set we'll want for 4.15: I think
 I'll want to remove some of the first-level irqchip driver that snuck in
 as well, which will look a lot like the HVC patch here.  This is pending
 some asm-generic cleanup I'm doing that I haven't quite gotten clean
 enough to send out yet, though, but hopefully it'll be ready by next
 week (and still OK for that late).
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlou0sQTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQebsD/4gPSr0adXxl3hXhrHWQIcigWM1QNgp
 EjNnsHiT1zLs9SNK+CBLQXEjXp/GwqfYDiE6yzaJEhoBm5rHB9c72IThCJkt6fRR
 5fuybYXxBXaXVxq0CbaItNzTdqm99iqSCD2p3mh7Dmw/X23EglkYdPqQlUIxWTtw
 BmlCaH8tn8PwGIgVATwJe8fWEe8tep8uPUW6NAMznLVCKKaOd56fkP+FVA1+M7u4
 7kaoG6B8tg3mEbHdsfvo4VSII7HcHsaTa4tOaD7RPjEDKdxFQbzY8HUP38BunqOu
 v9bsV0BXYvG5dHF0eWtQAfB+y5V7n1fipcfKtZl2HagN9hbKruwQMPZTO+xPT7Am
 F/VyYxoirzfY5quA0ancS+DV54LfcHC5MH9Xvhacvg7yY3zYTvhsZfEaND0GAMLT
 sqO3Mi24+yw5Cs24uSJEj6qkxxuHMZUePpzWJWET2vduXe+pqOYF8jQn13SqHyxs
 QoIaw18RqcBPtX6vABFoMKP6WHPk8No3iki5DrSVSQ0jN9r3HSBxvYCIdG89P6A+
 E0a65H91GhY/kVNJAE4YC8MF3e6D8/+KX9+h/q1wyueJV2tTE0cd0ZiQY7hN36HY
 /QIf8zH4tmYhHAJFNLQnK1sg1Ipino6ABHNv76MhZMxguYD4FAE0nAHsfv1AAa97
 Webx5zolwHEz8A==
 =FRZX
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "This contains three small fixes:

   - A fix to a typo in sys_riscv_flush_icache. This only effects error
     handling, but I think it's a small and obvious enough change that
     it's sane outside the merge window.

   - The addition of smp_mb__after_spinlock(), which was recently
     removed due to an incorrect comment. This is largly a comment
     change (as there's a big one now), and while it's necessary for
     complience with the RISC-V memory model the lack of this fence
     shouldn't manifest as a bug on current implementations.
     Nonetheless, it still seems saner to have the fence in 4.15.

   - The removal of some of the HVC_RISCV_SBI driver that snuck into the
     arch port. This is compile-time dead code in 4.15 (as the driver
     isn't in yet), and during the review process we found a better way
     to implement early printk on RISC-V. While this change doesn't do
     anything, it will make staging our HVC driver easier: without this
     change the HVC driver we hope to upstream won't build on 4.15
     (because the 4.15 arch code would reference a function that no
     longer exists).

  I don't think this is the last patch set we'll want for 4.15: I think
  I'll want to remove some of the first-level irqchip driver that snuck
  in as well, which will look a lot like the HVC patch here. This is
  pending some asm-generic cleanup I'm doing that I haven't quite gotten
  clean enough to send out yet, though, but hopefully it'll be ready by
  next week (and still OK for that late)"

 * tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
  RISC-V: Remove unused CONFIG_HVC_RISCV_SBI code
  RISC-V: Resurrect smp_mb__after_spinlock()
  RISC-V: Logical vs Bitwise typo
2017-12-13 20:13:05 -08:00
David S. Miller
8c8f67a46f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-13

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Addition of explicit scheduling points to map alloc/free
   in order to avoid having to hold the CPU for too long,
   from Eric.

2) Fixing of a corruption in overlapping perf_event_output
   calls from different BPF prog types on the same CPU out
   of different contexts, from Daniel.

3) Fallout fixes for recent correction of broken uapi for
   BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header
   that needed to be pulled in from asm-generic and for
   BPF selftests the asm-generic include did not work,
   so similar asm include scheme was adapted for that
   problematic header that perf is having with other
   header files under tools, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 17:30:04 -05:00
Daniel Vetter
ea497bb920 drm: rework delayed connector cleanup in connector_iter
PROBE_DEFER also uses system_wq to reprobe drivers, which means when
that again fails, and we try to flush the overall system_wq (to get
all the delayed connectore cleanup work_struct completed), we
deadlock.

Fix this by using just a single cleanup work, so that we can only
flush that one and don't block on anything else. That means a free
list plus locking, a standard pattern.

v2:
- Correctly free connectors only on last ref. Oops (Chris).
- use llist_head/node (Chris).

v3
- Add init_llist_head (Chris).

Fixes: a703c55004 ("drm: safely free connectors from connector_iter")
Fixes: 613051dac4 ("drm: locking&new iterators for connector_list")
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: <stable@vger.kernel.org> # v4.11+: 613051dac4 ("drm: locking&new iterators for connector_list"
Cc: <stable@vger.kernel.org> # v4.11+
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Javier Martinez Canillas <javier@dowhile0.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Guillaume Tucker <guillaume.tucker@collabora.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Matt Hart <matthew.hart@linaro.org>
Cc: Thierry Escande <thierry.escande@collabora.co.uk>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213124936.17914-1-daniel.vetter@ffwll.ch
2017-12-13 22:59:00 +01:00
Yonghong Song
f4e2298e63 bpf/tracing: fix kernel/events/core.c compilation error
Commit f371b304f1 ("bpf/tracing: allow user space to
query prog array on the same tp") introduced a perf
ioctl command to query prog array attached to the
same perf tracepoint. The commit introduced a
compilation error under certain config conditions, e.g.,
  (1). CONFIG_BPF_SYSCALL is not defined, or
  (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
       nor CONFIG_KPROBE_EVENTS is defined.

Error message:
  kernel/events/core.o: In function `perf_ioctl':
  core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'

This patch fixed this error by guarding the real definition under
CONFIG_BPF_EVENTS and provided static inline dummy function
if CONFIG_BPF_EVENTS was not defined.
It renamed the function from bpf_event_query_prog_array to
perf_event_query_prog_array and moved the definition from linux/bpf.h
to linux/trace_events.h so the definition is in proximity to
other prog_array related functions.

Fixes: f371b304f1 ("bpf/tracing: allow user space to query prog array on the same tp")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-13 22:44:10 +01:00
David S. Miller
f6e168b4a1 Merge branch 'mlx4-misc-fixes'
Tariq Toukan says:

====================
mlx4 misc fixes

This patchset contains misc bug fixes from the team
to the mlx4 Core and Eth drivers.

Patch 1 by Eugenia fixes an MTU issue in selftest.
Patch 2 by Eran fixes an accounting issue in the resource tracker.
Patch 3 by Eran fixes a race condition that causes counter inconsistency.

Series generated against net commit:
200809716a fou: fix some member types in guehdr

v2:
Patch 2: Add reviewer credit, rephrase commit message.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:37 -05:00
Eran Ben Elisha
5a1647c391 net/mlx4_en: Fill all counters under one call of stats lock
Before this patch, the stats_lock was acquired twice. In between the
locks Driver sent command to gather some more statistics (per priority
and counter statistics). If the stats lock was acquired by get
statistics NDO in between we would have report out of sync counters.

Fix this by collecting all stats from Firmware in advance and then
fill the Software structs under one lock.

Fixes: 0b131561a7 ("net/mlx4_en: Add Flow control statistics display via ethtool")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:37 -05:00
Eran Ben Elisha
0bb9fc4f54 net/mlx4_core: Fix wrong calculation of free counters
The field res_free indicates the total number of counters which are
available for allocation (reserved and unreserved). Fixed a bug where
the reserved counters were subtracted from res_free before any
allocation was performed.

Before this fix, free counters which were not reserved could not be
allocated.

Fixes: 9de92c60be ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:36 -05:00
Eugenia Emantayev
78034f5fdd net/mlx4_en: Fix selftest for small MTUs
Set the minimal MTU threshold for running loopback selftest.
MTU should be big enough to include packet payload, NET_IP_ALIGN,
Ethernet headers and preamble length.

Fixes: e7c1c2c462 ("mlx4_en: Added self diagnostics test implementation")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:36 -05:00
Egil Hjelmeland
5c13e07580 net: dsa: lan9303: Introduce lan9303_read_wait
Simplify lan9303_indirect_phy_wait_for_completion()
and lan9303_switch_wait_for_completion() by using a new function
lan9303_read_wait()

Changes v1 -> v2:
 - param 'mask' type u32
 - removed param 'value' (will probably never be used)
 - add newline before return

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:25:34 -05:00
Russell King
de9c4e06bb net: phy: marvell: avoid configuring fiber page for SGMII-to-Copper
When in SGMII-to-Copper mode, the fiber page is used for the MAC facing
link, and does not require configuration of the fiber auto-negotiation
settings.  Avoid trying.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:10:54 -05:00
Jie Deng
53c64870d0 dwc-xlgmac: Add co-maintainer
Jose Abreu will join to maintain dwc-xlgmac.
He will help with new feature development for
this driver. Thanks Jose and welcome on board!

Signed-off-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:09:20 -05:00
Eric Dumazet
4688eb7cf3 tcp: refresh tcp_mstamp from timers callbacks
Only the retransmit timer currently refreshes tcp_mstamp

We should do the same for delayed acks and keepalives.

Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.

Fixes: 385e20706f ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Mike Maloney <maloney@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by:  Mike Maloney <maloney@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:04:04 -05:00
Wei Wang
9ee11bd03c tcp: fix potential underestimation on rcv_rtt
When ms timestamp is used, current logic uses 1us in
tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us.
This could cause rcv_rtt underestimation.
Fix it by always using a min value of 1ms if ms timestamp is used.

Fixes: 645f4c6f2e ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:01:17 -05:00
David S. Miller
f4d87ad2a3 Merge branch 'hv_netvsc-minor-changes'
Stephen Hemminger says:

====================
hv_netvsc: minor changes

This includes minor cleanup of code in send and receive path and
also a new statistic to check for allocation failures. This also
eliminates some of the extra RCU when not needed.

There is a theoritical bug where buffered data could be blocked
for longer than necessary if the ring buffer got full. This
has not been seen in the wild, found by inspection.

The reference count between net device and internal RNDIS
is not needed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:39 -05:00
Stephen Hemminger
cfd8afd986 hv_netvsc: empty current transmit aggregation if flow blocked
If the transmit queue is known full, then don't keep aggregating
data. And the cp_partial flag which indicates that the current
aggregation buffer is full can be folded in to avoid more
conditionals.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:39 -05:00
Stephen Hemminger
0da6edbd3a hv_netvsc: remove open_cnt reference count
There is only ever a single instance of network device object
referencing the internal rndis object. Therefore the open_cnt atomic
is not necessary.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:39 -05:00
Stephen Hemminger
345ac08990 hv_netvsc: pass netvsc_device to receive callback
The netvsc_receive_callback function was using RCU to find the
appropriate underlying netvsc_device. Since calling function already
had that pointer, this was unnecessary.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:39 -05:00
Stephen Hemminger
79cf1bae38 hv_netvsc: simplify function args in receive status path
The caller (netvsc_receive) already has the net device pointer,
and should just pass that to functions rather than the hyperv device.
This eliminates several impossible error paths in the process.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:38 -05:00
Stephen Hemminger
f61a9d62b2 hv_netvsc: track memory allocation failures in ethtool stats
When skb can not be allocated, update ethtool statisitics
rather than rx_dropped which is intended for netif_receive.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:38 -05:00
Stephen Hemminger
26a112626d hv_netvsc: copy_to_send buf can be void
Since only caller does not care about return value.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:57:38 -05:00
David S. Miller
824c2d672a Merge branch 'phylink-dsa-prep'
Florian Fainelli says:

====================
PHYLINK preparatory patches for DSA

In preparation for having DSA migrate to PHYLINK, I had to come up with a
number of preparatory patches:

- we need to be able to pass phy_flags from an external component calling
  phylink_of_phy_connect()
- DSA tries to connect through OF first, then fallsback using its own internal
  MDIO bus, in that case we would both show an error, but also not know what
  the correct phy_interface_t would be, instead use the PHY device/driver provided
  one
- Finally bcm_sf2 makes use of all possible PHYs out there: internal, external,
  fixed, and MoCA, the latter requires a bit of help to signal link notifications
  through a MMIO interrupt, as well a report a correct PORT type

Changes in v2:

- rebased against latest net-next/master
- added kernel doc documentation
- dropped error message in phylink_of_phy_connect() as suggested by Russell
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:55:02 -05:00
Florian Fainelli
4be11ef0bd net: phy: phylink: Report MoCA as PORT_BNC
Similarly to what PHYLIB already does, make sure that
PHY_INTERFACE_MODE_MOCA is reported as PORT_BNC.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:55:01 -05:00
Florian Fainelli
1ac63e392e net: phy: phylink: Allow setting a custom link state callback
phylink_get_fixed_state() currently consults an optional "link_gpio"
GPIO descriptor, expand this mechanism to allow specifying a custom
callback. This is necessary to support out of band link notifcation
(e.g: from an interrupt within a MMIO register).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:55:01 -05:00
Florian Fainelli
d38b4afd51 net: phy: phylink: Remove error message
Some subsystems like DSA may be trying to connect to a PHY through OF first,
and then attempt a connect using a local MDIO bus, remove the error message:
"unable to find PHY node" so we can let MAC drivers whether to print it or not.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:55:01 -05:00
Florian Fainelli
4904b6ea1f net: phy: phylink: Use PHY device interface if N/A
We may not always be able to resolve a correct phy_interface_t value before
actually connecting to the PHY device, when that happens, just have
phylink_connect_phy() utilize what the PHY device/driver provided.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:55:01 -05:00
Florian Fainelli
0a62964c90 net: phy: phylink: Allow specifying PHY device flags
In order to let subsystems like DSA fully utilize PHYLINK, we need to be able
to communicate phy_device::flags from of_phy_{connect,attach} even when using
PHYLINK APIs.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:55:00 -05:00
Yuchung Cheng
7268586baa tcp: pause Fast Open globally after third consecutive timeout
Prior to this patch, active Fast Open is paused on a specific
destination IP address if the previous connections to the
IP address have experienced recurring timeouts . But recent
experiments by Microsoft (https://goo.gl/cykmn7) and Mozilla
browsers indicate the isssue is often caused by broken middle-boxes
sitting close to the client. Therefore it is much better user
experience if Fast Open is disabled out-right globally to avoid
experiencing further timeouts on connections toward other
destinations.

This patch changes the destination-IP disablement to global
disablement if a connection experiencing recurring timeouts
or aborts due to timeout.  Repeated incidents would still
exponentially increase the pause time, starting from an hour.
This is extremely conservative but an unfortunate compromise to
minimize bad experience due to broken middle-boxes.

Reported-by: Dragana Damjanovic <ddamjanovic@mozilla.com>
Reported-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:51:12 -05:00
Ivan Khoronzhuk
8a83c5d796 net: ethernet: ti: cpdma: correct error handling for chan create
It's not correct to return NULL when that is actually an error and
function returns errors in any other wrong case. In the same time,
the cpsw driver and davinci emac doesn't check error case while
creating channel and it can miss actual error. Also remove WARNs
replacing them on dev_err msgs.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:49:53 -05:00