Commit graph

797222 commits

Author SHA1 Message Date
Paolo Abeni
bcd1665e35 udp: add support for UDP_GRO cmsg
When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
datagram size. User-space can use such info to compute the original
packets layout.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:04 -08:00
Paolo Abeni
e20cf8d3f1 udp: implement GRO for plain UDP sockets.
This is the RX counterpart of commit bec1f6f697 ("udp: generate gso
with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
eligible for GRO in the rx path: UDP segments directed to such socket
are assembled into a larger GSO_UDP_L4 packet.

The core UDP GRO support is enabled with setsockopt(UDP_GRO).

Initial benchmark numbers:

Before:
udp rx:   1079 MB/s   769065 calls/s

After:
udp rx:   1466 MB/s    24877 calls/s

This change introduces a side effect in respect to UDP tunnels:
after a UDP tunnel creation, now the kernel performs a lookup per ingress
UDP packet, while before such lookup happened only if the ingress packet
carried a valid internal header csum.

rfc v2 -> rfc v3:
 - fixed typos in macro name and comments
 - really enforce UDP_GRO_CNT_MAX, instead of UDP_GRO_CNT_MAX + 1
 - acquire socket lock in UDP_GRO setsockopt

rfc v1 -> rfc v2:
 - use a new option to enable UDP GRO
 - use static keys to protect the UDP GRO socket lookup

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:04 -08:00
Paolo Abeni
60fb9567bf udp: implement complete book-keeping for encap_needed
The *encap_needed static keys are enabled by UDP tunnels
and several UDP encapsulations type, but they are never
turned off. This can cause unneeded overall performance
degradation for systems where such features are used
transiently.

This patch introduces complete book-keeping for such keys,
decreasing the usage at socket destruction time, if needed,
and avoiding that the same socket could increase the key
usage multiple times.

rfc v3 -> v1:
 - add socket lock around udp_tunnel_encap_enable()

rfc v2 -> rfc v3:
 - use udp_tunnel_encap_enable() in setsockopt()

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:04 -08:00
David S. Miller
7e225619e8 Merge branch 'vrf-allow-simultaneous-service-instances-in-default-and-other-VRFs'
Mike Manning says:

====================
vrf: allow simultaneous service instances in default and other VRFs

Services currently have to be VRF-aware if they are using an unbound
socket. One cannot have multiple service instances running in the
default and other VRFs for services that are not VRF-aware and listen
on an unbound socket. This is because there is no easy way of isolating
packets received in the default VRF from those arriving in other VRFs.

This series provides this isolation for stream sockets subject to the
existing kernel parameter net.ipv4.tcp_l3mdev_accept not being set,
given that this is documented as allowing a single service instance to
work across all VRF domains. Similarly, net.ipv4.udp_l3mdev_accept is
checked for datagram sockets, and net.ipv4.raw_l3mdev_accept is
introduced for raw sockets. The functionality applies to UDP & TCP
services as well as those using raw sockets, and is for IPv4 and IPv6.

Example of running ssh instances in default and blue VRF:

$ /usr/sbin/sshd -D
$ ip vrf exec vrf-blue /usr/sbin/sshd
$ ss -ta | egrep 'State|ssh'
State   Recv-Q   Send-Q           Local Address:Port       Peer Address:Port
LISTEN  0        128           0.0.0.0%vrf-blue:ssh             0.0.0.0:*
LISTEN  0        128                    0.0.0.0:ssh             0.0.0.0:*
ESTAB   0        0              192.168.122.220:ssh       192.168.122.1:50282
LISTEN  0        128              [::]%vrf-blue:ssh                [::]:*
LISTEN  0        128                       [::]:ssh                [::]:*
ESTAB   0        0           [3000::2]%vrf-blue:ssh           [3000::9]:45896
ESTAB   0        0                    [2000::2]:ssh           [2000::9]:46398

v1:
   - Address Paolo Abeni's comments (patch 4/5)
   - Fix build when CONFIG_NET_L3_MASTER_DEV not defined (patch 1/5)
v2:
   - Address David Aherns' comments (patches 4/5 and 5/5)
   - Remove patches 3/5 and 5/5 from series for individual submissions
   - Include a sysctl for raw sockets as recommended by David Ahern
   - Expand series into 10 patches and provide improved descriptions
v3:
   - Update description for patch 1/10 and remove patch 6/10
v4:
   - Set default to enabled for raw socket sysctl as recommended by David Ahern
v5:
   - Address review comments from David Ahern in patches 2-5
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:39 -08:00
Dewi Morgan
7bd2db404e ipv6: do not drop vrf udp multicast packets
For bound udp sockets in a vrf, also check the sdif to get the index
for ingress devices enslaved to an l3mdev.

Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:39 -08:00
Mike Manning
5226b6a920 ipv6: handling of multicast packets received in VRF
If the skb for multicast packets marked as enslaved to a VRF are
received, then the secondary device index should be used to obtain
the real device. And verify the multicast address against the
enslaved rather than the l3mdev device.

Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:39 -08:00
Mike Manning
d839a0ebeb ipv6: allow ping to link-local address in VRF
If link-local packets are marked as enslaved to a VRF, then to allow
ping to the link-local from a vrf, the error handling for IPV6_PKTINFO
needs to be relaxed to also allow the pkt ipi6_ifindex to be that of a
slave device to the vrf.

Note that the real device also needs to be retrieved in icmp6_iif()
to set the ipv6 flow oif to this for icmp echo reply handling. The
recent commit 24b711edfc ("net/ipv6: Fix linklocal to global address
with VRF") takes care of this, so the sdif does not need checking here.

This fix makes ping to link-local consistent with that to global
addresses, in that this can now be done from within the same VRF that
the address is in.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:39 -08:00
Mike Manning
6f12fa7755 vrf: mark skb for multicast or link-local as enslaved to VRF
The skb for packets that are multicast or to a link-local address are
not marked as being enslaved to a VRF, if they are received on a socket
bound to the VRF. This is needed for ND and it is preferable for the
kernel not to have to deal with the additional use-cases if ll or mcast
packets are handled as enslaved. However, this does not allow service
instances listening on unbound and bound to VRF sockets to distinguish
the VRF used, if packets are sent as multicast or to a link-local
address. The fix is for the VRF driver to also mark these skb as being
enslaved to the VRF.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:39 -08:00
Duncan Eastoe
7055420fb6 net: fix raw socket lookup device bind matching with VRFs
When there exist a pair of raw sockets one unbound and one bound
to a VRF but equal in all other respects, when a packet is received
in the VRF context, __raw_v4_lookup() matches on both sockets.

This results in the packet being delivered over both sockets,
instead of only the raw socket bound to the VRF. The bound device
checks in __raw_v4_lookup() are replaced with a call to
raw_sk_bound_dev_eq() which correctly handles whether the packet
should be delivered over the unbound socket in such cases.

In __raw_v6_lookup() the match on the device binding of the socket is
similarly updated to use raw_sk_bound_dev_eq() which matches the
handling in __raw_v4_lookup().

Importantly raw_sk_bound_dev_eq() takes the raw_l3mdev_accept sysctl
into account.

Signed-off-by: Duncan Eastoe <deastoe@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:39 -08:00
Mike Manning
6897445fb1 net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs
Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner
similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept
for datagram sockets. Have this default to enabled for reasons of
backwards compatibility. This is so as to specify the output device
with cmsg and IP_PKTINFO, but using a socket not bound to the
corresponding VRF. This allows e.g. older ping implementations to be
run with specifying the device but without executing it in the VRF.
If the option is disabled, packets received in a VRF context are only
handled by a raw socket bound to the VRF, and correspondingly packets
in the default VRF are only handled by a socket not bound to any VRF.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:38 -08:00
Mike Manning
6da5b0f027 net: ensure unbound datagram socket to be chosen when not in a VRF
Ensure an unbound datagram skt is chosen when not in a VRF. The check
for a device match in compute_score() for UDP must be performed when
there is no device match. For this, a failure is returned when there is
no device match. This ensures that bound sockets are never selected,
even if there is no unbound socket.

Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
packets are currently blocked, as flowi6_oif was set to that of the
master vrf device, and the ipi6_ifindex is that of the slave device.
Allow these packets to be sent by checking the device with ipi6_ifindex
has the same L3 scope as that of the bound device of the skt, which is
the master vrf device. Note that this check always succeeds if the skt
is unbound.

Even though the right datagram skt is now selected by compute_score(),
a different skt is being returned that is bound to the wrong vrf. The
difference between these and stream sockets is the handling of the skt
option for SO_REUSEPORT. While the handling when adding a skt for reuse
correctly checks that the bound device of the skt is a match, the skts
in the hashslot are already incorrect. So for the same hash, a skt for
the wrong vrf may be selected for the required port. The root cause is
that the skt is immediately placed into a slot when it is created,
but when the skt is then bound using SO_BINDTODEVICE, it remains in the
same slot. The solution is to move the skt to the correct slot by
forcing a rehash.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:38 -08:00
Mike Manning
e78190581a net: ensure unbound stream socket to be chosen when not in a VRF
The commit a04a480d43 ("net: Require exact match for TCP socket
lookups if dif is l3mdev") only ensures that the correct socket is
selected for packets in a VRF. However, there is no guarantee that
the unbound socket will be selected for packets when not in a VRF.
By checking for a device match in compute_score() also for the case
when there is no bound device and attaching a score to this, the
unbound socket is selected. And if a failure is returned when there
is no device match, this ensures that bound sockets are never selected,
even if there is no unbound socket.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:38 -08:00
Robert Shearman
3c82a21f43 net: allow binding socket in a VRF when there's an unbound socket
Change the inet socket lookup to avoid packets arriving on a device
enslaved to an l3mdev from matching unbound sockets by removing the
wildcard for non sk_bound_dev_if and instead relying on check against
the secondary device index, which will be 0 when the input device is
not enslaved to an l3mdev and so match against an unbound socket and
not match when the input device is enslaved.

Change the socket binding to take the l3mdev into account to allow an
unbound socket to not conflict sockets bound to an l3mdev given the
datapath isolation now guaranteed.

Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:38 -08:00
Lyude Paul
63237f8748 drm/amd/amdgpu/dm: Fix dm_dp_create_fake_mst_encoder()
[why]
Removing connector reusage from DM to match the rest of the tree ended
up revealing an issue that was surprisingly subtle. The original amdgpu
code for DC that was submitted appears to have left a chunk in
dm_dp_create_fake_mst_encoder() that tries to find a "master encoder",
the likes of which isn't actually used or stored anywhere. It does so at
the wrong time as well by trying to access parts of the drm_connector
from the encoder init before it's actually been initialized. This
results in a NULL pointer deref on MST hotplugs:

[  160.696613] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  160.697234] PGD 0 P4D 0
[  160.697814] Oops: 0010 [#1] SMP PTI
[  160.698430] CPU: 2 PID: 64 Comm: kworker/2:1 Kdump: loaded Tainted: G           O      4.19.0Lyude-Test+ #2
[  160.699020] Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.22 05/17/2018
[  160.699672] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  160.700322] RIP: 0010:          (null)
[  160.700920] Code: Bad RIP value.
[  160.701541] RSP: 0018:ffffc9000029fc78 EFLAGS: 00010206
[  160.702183] RAX: 0000000000000000 RBX: ffff8804440ed468 RCX: ffff8804440e9158
[  160.702778] RDX: 0000000000000000 RSI: ffff8804556c5700 RDI: ffff8804440ed000
[  160.703408] RBP: ffff880458e21800 R08: 0000000000000002 R09: 000000005fca0a25
[  160.704002] R10: ffff88045a077a3d R11: ffff88045a077a3c R12: ffff8804440ed000
[  160.704614] R13: ffff880458e21800 R14: ffff8804440e9000 R15: ffff8804440e9000
[  160.705260] FS:  0000000000000000(0000) GS:ffff88045f280000(0000) knlGS:0000000000000000
[  160.705854] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  160.706478] CR2: ffffffffffffffd6 CR3: 000000000200a001 CR4: 00000000003606e0
[  160.707124] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  160.707724] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  160.708372] Call Trace:
[  160.708998]  ? dm_dp_add_mst_connector+0xed/0x1d0 [amdgpu]
[  160.709625]  ? drm_dp_add_port+0x2fa/0x470 [drm_kms_helper]
[  160.710284]  ? wake_up_q+0x54/0x70
[  160.710877]  ? __mutex_unlock_slowpath.isra.18+0xb3/0x110
[  160.711512]  ? drm_dp_dpcd_access+0xe7/0x110 [drm_kms_helper]
[  160.712161]  ? drm_dp_send_link_address+0x155/0x1e0 [drm_kms_helper]
[  160.712762]  ? drm_dp_check_and_send_link_address+0xa3/0xd0 [drm_kms_helper]
[  160.713408]  ? drm_dp_mst_link_probe_work+0x4b/0x80 [drm_kms_helper]
[  160.714013]  ? process_one_work+0x1a1/0x3a0
[  160.714667]  ? worker_thread+0x30/0x380
[  160.715326]  ? wq_update_unbound_numa+0x10/0x10
[  160.715939]  ? kthread+0x112/0x130
[  160.716591]  ? kthread_create_worker_on_cpu+0x70/0x70
[  160.717262]  ? ret_from_fork+0x35/0x40
[  160.717886] Modules linked in: amdgpu(O) vfat fat snd_hda_codec_generic joydev i915 chash gpu_sched ttm i2c_algo_bit drm_kms_helper snd_hda_codec_hdmi hp_wmi syscopyarea iTCO_wdt sysfillrect sparse_keymap sysimgblt fb_sys_fops snd_hda_intel usbhid wmi_bmof drm snd_hda_codec btusb snd_hda_core intel_rapl btrtl x86_pkg_temp_thermal btbcm btintel coretemp snd_pcm crc32_pclmul bluetooth psmouse snd_timer snd pcspkr i2c_i801 mei_me i2c_core soundcore mei tpm_tis wmi tpm_tis_core hp_accel ecdh_generic lis3lv02d tpm video rfkill acpi_pad input_polldev hp_wireless pcc_cpufreq crc32c_intel serio_raw tg3 xhci_pci xhci_hcd [last unloaded: amdgpu]
[  160.720141] CR2: 0000000000000000

Somehow the connector reusage DM was using for MST connectors managed to
paper over this issue entirely; hence why this was never caught until
now.

[how]
Since this code isn't used anywhere and seems useless anyway, we can
just drop it entirely. This appears to fix the issue on my HP ZBook with
an AMD WX4150.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-07 18:21:05 -05:00
Jerry (Fangzhi) Zuo
0e6613e46f drm/amd/display: Drop reusing drm connector for MST
[why]
It is not safe to keep existing connector while entire topology
has been removed. Could lead potential impact to uapi.
Entirely unregister all the connectors on the topology,
and use a new set of connectors when the topology is plugged back
on.

[How]
Remove the drm connector entirely each time when the
corresponding MST topology is gone.
When hotunplug a connector (e.g., DP2)
1. Remove connector from userspace.
2. Drop it's reference.
When hotplug back on:
1. Detect new topology, and create new connectors.
2. Notify userspace with sysfs hotplug event.
3. Reprobe new connectors, and reassign CRTC from old (e.g., DP2)
to new (e.g., DP3) connector.

Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-07 18:20:56 -05:00
Jerry (Fangzhi) Zuo
8be17ac95f drm/amd/display: Cleanup MST non-atomic code workaround
[why]
It is not correct to touch aconnector within atomic_check.

[How]
It was added as workaround before, and no longer needed.

Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-07 18:20:49 -05:00
Evan Quan
108110a3ff drm/amd/powerplay: always use fast UCLK switching when UCLK DPM enabled
With UCLK DPM enabled, slow switching is not supported any more.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-07 18:20:40 -05:00
Evan Quan
3c7eda0b65 drm/amd/powerplay: set a default fclk/gfxclk ratio
Otherwise big gap between these two clocks may causes
some hangs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-07 18:20:32 -05:00
Keith Busch
f3587d76da block: Clear kernel memory before copying to user
If the kernel allocates a bounce buffer for user read data, this memory
needs to be cleared before copying it to the user, otherwise it may leak
kernel memory to user space.

Laurence Oberman <loberman@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 15:41:31 -07:00
Geert Uytterhoeven
e31d36b0a4 MAINTAINERS: Fix remaining pointers to obsolete libata.git
libata.git no longer exists.  Replace the remaining pointers to it by
pointers to the block tree, which is where all libata development
happens now.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 15:27:26 -07:00
Jens Axboe
6961cd4d0f ubd: fix missing lock around request issue
We need to hold the device lock (and disable interrupts) while
writing new commands, or we could be interrupted while that
is happening and read invalid requests in the completion path.

Fixes: 4e6da0fe80 ("um: Convert ubd driver to blk-mq")
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 14:34:05 -07:00
Geert Uytterhoeven
406e7f986b Documentation: ABI: led-trigger-pattern: Fix typos
- Spelling s/brigntess/brightness/,
  - Double "use".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
2018-11-07 21:43:46 +01:00
Baolin Wang
3a40cfe8ba leds: trigger: Fix sleeping function called from invalid context
We will meet below issue due to mutex_lock() is called in interrupt context.
The mutex lock is used to protect the pattern trigger data, but before changing
new pattern trigger data (pattern values or repeat value) by users, we always
cancel the timer firstly to clear previous patterns' performance. That means
there is no race in pattern_trig_timer_function(), so we can drop the mutex
lock in pattern_trig_timer_function() to avoid this issue.

Moreover we can move the timer cancelling into mutex protection, since there
is no deadlock risk if we remove the mutex lock in pattern_trig_timer_function().

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:254
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted
4.20.0-rc1-koelsch-00841-ga338c8181013c1a9 #171
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
[<c020f19c>] (unwind_backtrace) from [<c020aecc>] (show_stack+0x10/0x14)
[<c020aecc>] (show_stack) from [<c07affb8>] (dump_stack+0x7c/0x9c)
[<c07affb8>] (dump_stack) from [<c02417d4>] (___might_sleep+0xf4/0x158)
[<c02417d4>] (___might_sleep) from [<c07c92c4>] (mutex_lock+0x18/0x60)
[<c07c92c4>] (mutex_lock) from [<c067b28c>] (pattern_trig_timer_function+0x1c/0x11c)
[<c067b28c>] (pattern_trig_timer_function) from [<c027f6fc>] (call_timer_fn+0x1c/0x90)
[<c027f6fc>] (call_timer_fn) from [<c027f944>] (expire_timers+0x94/0xa4)
[<c027f944>] (expire_timers) from [<c027fc98>] (run_timer_softirq+0x108/0x15c)
[<c027fc98>] (run_timer_softirq) from [<c02021cc>] (__do_softirq+0x1d4/0x258)
[<c02021cc>] (__do_softirq) from [<c0224d24>] (irq_exit+0x64/0xc4)
[<c0224d24>] (irq_exit) from [<c0268dd0>] (__handle_domain_irq+0x80/0xb4)
[<c0268dd0>] (__handle_domain_irq) from [<c045e1b0>] (gic_handle_irq+0x58/0x90)
[<c045e1b0>] (gic_handle_irq) from [<c02019f8>] (__irq_svc+0x58/0x74)
Exception stack(0xeb483f60 to 0xeb483fa8)
3f60: 00000000 00000000 eb9afaa0 c0217e80 00000000 ffffe000 00000000 c0e06408
3f80: 00000002 c0e0647c c0c6a5f0 00000000 c0e04900 eb483fb0 c0207ea8 c0207e98
3fa0: 60020013 ffffffff
[<c02019f8>] (__irq_svc) from [<c0207e98>] (arch_cpu_idle+0x1c/0x38)
[<c0207e98>] (arch_cpu_idle) from [<c0247ca8>] (do_idle+0x138/0x268)
[<c0247ca8>] (do_idle) from [<c0248050>] (cpu_startup_entry+0x18/0x1c)
[<c0248050>] (cpu_startup_entry) from [<402022ec>] (0x402022ec)

Fixes: 5fd752b6b3 ("leds: core: Introduce LED pattern trigger")
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
2018-11-07 21:43:25 +01:00
Johannes Thumshirn
df376b2ed5 block: respect virtual boundary mask in bvecs
With drivers that are settting a virtual boundary constrain, we are
seeing a lot of bio splitting and smaller I/Os being submitted to the
driver.

This happens because the bio gap detection code does not account cases
where PAGE_SIZE - 1 is bigger than queue_virt_boundary() and thus will
split the bio unnecessarily.

Cc: Jan Kara <jack@suse.cz>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:04:22 -07:00
YueHaibing
f601a85bd7 net: hns3: Remove set but not used variable 'reset_level'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c: In function 'hclge_log_and_clear_ppp_error':
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c:821:24: warning:
 variable 'reset_level' set but not used [-Wunused-but-set-variable]
  enum hnae3_reset_type reset_level = HNAE3_NONE_RESET;

It never used since introduction in commit
01865a50d7 ("net: hns3: Add enable and process hw errors of TM scheduler")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:46:37 -08:00
David S. Miller
75790a7425 Merge branch 'nfp-more-set-actions-and-notifier-refactor'
Jakub Kicinski says:

====================
nfp: more set actions and notifier refactor

This series brings updates to flower offload code.  First Pieter adds
support for setting TTL, ToS, Flow Label and Hop Limit fields in IPv4
and IPv6 headers.

Remaining 5 patches deal with factoring out netdev notifiers from flower
code.  We already have two instances, and more is coming, so it's time
to move to one central notifier which then feeds individual feature
handlers.

I start that part by cleaning up the existing notifiers.  Next a central
notifier is added, and used by flower offloads.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:45:22 -08:00
Jakub Kicinski
0c665e2bf4 nfp: flower: use the common netdev notifier
Use driver's common notifier for LAG and tunnel configuration.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:45:22 -08:00
Jakub Kicinski
3e33359040 nfp: register a notifier handler in a central location for the device
Code interested in networking events registers its own notifier
handlers.  Create one device-wide notifier instance.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:45:22 -08:00
Jakub Kicinski
659bb404eb nfp: flower: make nfp_fl_lag_changels_event() void
nfp_fl_lag_changels_event() never fails, and therefore we would
never return NOTIFY_BAD for NETDEV_CHANGELOWERSTATE.  Make this
clearer by changing nfp_fl_lag_changels_event()'s return type
to void.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:45:22 -08:00
Jakub Kicinski
a558c982a8 nfp: flower: don't try to nack device unregister events
Returning an error from a notifier means we want to veto the change.
We shouldn't veto NETDEV_UNREGISTER just because we couldn't find
the tracking info for given master.

I can't seem to find a way to trigger this unless we have some
other bug, so it's probably not fix-worthy.

While at it move the checking if the netdev really is of interest
into the handling functions, like we do for other events.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:45:22 -08:00
Jakub Kicinski
e50bfdf74d nfp: flower: remove unnecessary iteration over devices
For flower tunnel offloads FW has to be informed about MAC addresses
of tunnel devices.  We use a netdev notifier to keep track of these
addresses.

Remove unnecessary loop over netdevices after notifier is registered.
The intention of the loop was to catch devices which already existed
on the system before nfp driver got loaded, but netdev notifier will
replay NETDEV_REGISTER events.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:45:22 -08:00
Pieter Jansen van Vuuren
4234d62c27 nfp: flower: add ipv6 set flow label and hop limit offload
Add ipv6 set flow label and hop limit action offload. Since pedit sets
headers per 4 byte word, we need to ensure that setting either version,
priority, payload_len or nexthdr does not get offloaded.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:45:21 -08:00
Pieter Jansen van Vuuren
a3c6b063fe nfp: flower: add ipv4 set ttl and tos offload
Add ipv4 set ttl and tos action offload. Since pedit sets headers per 4
byte word, we need to ensure that setting either version, ihl, protocol,
total length or checksum does not get offloaded.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:45:21 -08:00
David S. Miller
6a02d1fa03 Merge branch 'hns3-next'
Huazhong Tan says:

====================
hns3: provide new interfaces & bugfixes & code optimization

This patchset provides some reset interfaces for RAS & RoCE, also
some bugfixes and optimization related to reset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:18 -08:00
Huazhong Tan
8b0195a305 net: hns3: fix for cmd queue memory not freed problem during reset
It is not necessary to reallocate the descriptor and remap the
descriptor memory in reset process, otherwise it may cause memory
not freed problem.

Also, this patch initializes the cmd queue's spinlocks in
hclgevf_alloc_cmd_queue, and take the spinlocks when reinitializing
cmd queue' registers.

Fixes: fedd0c15d2 ("net: hns3: Add HNS3 VF IMP(Integrated Management Proc) cmd interface")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:18 -08:00
Huazhong Tan
65e41e7e68 net: hns3: add error handler for hclge_reset()
When hclge_reset() is called, it may fail for several reasons.
For example, an higher-level reset event occurs, memory allocation
failure, hardware reset timeout, etc. Therefore, it is necessary
to add corresponding error handling for these situations.
1. A high-level reset is required due to a high-level reset failure.
2. For memory allocation failure, a high-level reset is initiated by
the timer to recover. The reason for using the timer is to prevent this
new high-level reset to interrupt the reset process of other pf/vf;
3. For the case of hardware reset timeout, reschedule the reset task
to wait for the hardware to complete the reset.
For memory allocation failure and reset timeouts, in order to prevent
an infinite number of scheduled reset tasks, the number of error
recovery needs to be limited.

This patch also add some reset related debug log printing.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:18 -08:00
Huazhong Tan
f403a84fb2 net: hns3: call roce's reset notify callback when resetting
While doing resetting, roce should do its uninitailization part
before nic's, and do its initialization part after nic's.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:18 -08:00
Huazhong Tan
35d93a3004 net: hns3: adjust the process of PF reset
When doing PF reset, the driver needs to do some preparatory work
before asserting PF reset. Since when hardware is resetting, it
is necessary to stop tx/rx queue, clear hardware table, etc,
otherwise hardware may run into unrecoverable state if there is
still IO running when the hardware is resetting.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:18 -08:00
Huazhong Tan
0742ed7c24 net: hns3: move some reset information from hnae3_handle into hclge_dev/hclgevf_dev
Saving reset related information in the hclge_dev/hclgevf_dev
structure is more suitable than the hnae3_handle, since hardware
related information is kept in these two structure.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:18 -08:00
Huazhong Tan
7cea834d94 net: hns3: ignore new coming low-level reset while doing high-level reset
When processing a higher level reset, the pending lower level reset
does not have to be processed anymore, because the higher level
reset is the superset of the lower level reset.

Therefore, when processing an higher level reset, the request of
lower level reset needs to be cleared.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:17 -08:00
Huazhong Tan
257e4f2994 net: hns3: use HNS3_NIC_STATE_RESETTING to indicate resetting
While hclge is going to reset, it will notify its client with
HNAE3_DOWN_CLIENT, so this client should get into a resetting
status from this moment, other operations from the stack need to
be blocked as well. And when the reset is finished, the client
will be notified with HNAE3_UP_CLIENT, so this is the end of
the resetting status.

This patch uses HNS3_NIC_STATE_RESETTING flag to implement that,
and adds hns3_nic_resetting() to indicate which operation is not
allowed.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:17 -08:00
Huazhong Tan
8df0fa9168 net: hns3: enable/disable ring in the enet while doing UP/DOWN
While hardware gets into reset status, the firmware will not respond to
driver's command request, which may cause ring not disabled problem
during reset process.

So this patch uses register instead of command to enable/disable the ring
in the enet while doing UP/DOWN operation.

Also, HNS3_RING_RX_VM_REG is previously unused, so change it to the
correct meaning, and add a wrapper function for readl().

Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:17 -08:00
Huazhong Tan
7edff5339a net: hns3: adjust the location of clearing the table when doing reset
When doing a function reset, the hardware table should be cleared
before the hardware reset. In current code, this clearing is done
in hns3_reset_notify_uninit_enet, but it is too late, because
the hardware reset is already done, hns3_reset_notify_down_enet
is more suitable to do that.

Fixes: bb6b94a896 ("net: hns3: Add reset interface implementation in client")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:17 -08:00
Huazhong Tan
4d60291b6b net: hns3: provide some interface & information for the client
The client needs to know if the hardware is resetting when
loading or unloading itself, because client may abort the loading
process or wait for the reset process to finish when unloading
if hardware is resetting.

So this patch provides these interfaces to do it.
1. get_hw_reset_stat, the reset status of hardware.
2. ae_dev_resetting, whether reset task is scheduling.
3. ae_dev_reset_cnt, how many reset has been done.

Also, the RoCE client needs some field in the hnae3_roce_private_info
to save its state, and process_hw_error interface in the
hnae3_client_ops to process hardware errors.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:17 -08:00
Huazhong Tan
720bd5837e net: hns3: add set_default_reset_request in the hnae3_ae_ops
Currently, when reset_event is called because of tx timeout, it will
upgrade the reset level (For PF, HNAE3_FUNC_RESET -> HNAE3_CORE_RESET
-> HNAE3_GLOBAL_RESET) if the time between the new reset and last reset
is within 20 secs, or restore the reset level to HNAE3_FUNC_RESET if
the time between the new reset and last reset is over 20 secs.

There is requirement that the caller needs to decide the reset level
when triggering a reset, for example, RAS recovery. So this patch
adds the set_default_reset_request to meet this requirement.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:17 -08:00
Huazhong Tan
814da63c55 net: hns3: use HNS3_NIC_STATE_INITED to indicate the initialization state of enet
Besides of module_init and module_exit, the process of reset will
also uninitialize and initialize the enet client. When reset process
fails with enet client uninitialized, the module_exit does not need
to uninitialize the enet client, otherwise it may cause double
uninitialization problem.

So we need the HNS3_NIC_STATE_INITED flag to indicate whether
the enet client is initialized.

Also HNS3_NIC_STATE_REINITING is previously unused, so change it to
HNS3_NIC_STATE_INITED.

Fixes: bb6b94a896 ("net: hns3: Add reset interface implementation in client")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 11:42:17 -08:00
Linus Torvalds
85758777c2 hwmon fixes for v4.20-rc2
- Remove bogus __init annotations in ibmpowernv driver
 - Fix double-free in error handling of __hwmon_device_register()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJb4enqAAoJEMsfJm/On5mBcX4QAJntHN4e0jXYqe6kY0lq5DBT
 pvL43WoqkpjM5//vOzyLXBAvM2nlZvisCBPGVnB9TD0kPZ8cdG00SEuwrOXyt8Pr
 RoTbiPXV4PY3t2Ksmm8KPmbkysbAJMKsNsfGCQqy3HK+8OrbHjgsrBvdrDF9bCDO
 d4tPttVidZuTx4X7+kSCkpwUqTslOFw1kcOs4meGSPi1lHphwRXwI2+pC/fgz+IQ
 voqG5v20fJEZ6AM9rVEn/qRQubNGmBxenMyo5LuPjwJzTLo6xtVgdE6zGtxiKeJE
 NIgJxKvwFAdVsHOqMU1O+THsi7l1HefMK0K/1vS1X49UOnwE/CcOLznxRfdd9yqm
 dxaqqnTrUQ453HCjnlT79FSQPINAWDMsyu7ztsHQ2864rfkCZgpiEPDtvt15rT/K
 f5QutzRYOO2bma+tSumprlAU3TXnXuTohbFBxis+384N1VuBtxwJ63dY3cG9Ed6v
 xCOqxndB/y8epHErujFBSX1NO5NE8bU3W9eW7CITrqrxclUuG9F5RkKDbyFYyRnk
 HNR7RB2SBF5M4XdZQwjH+IGR9j288ChUGKbzEJNDVFbTfdD29Sw6MbSS4dmkNepL
 9KKdBWEu/Q2py5Z0SXpY+kAwIl3MIixZ0Uofqt43R9KGdHZaBRxuKrcd8Kppsu7A
 4OZxBtHSF9Z9MAWvDTiv
 =MwAW
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Remove bogus __init annotations in ibmpowernv driver

 - Fix double-free in error handling of __hwmon_device_register()

* tag 'hwmon-for-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (ibmpowernv) Remove bogus __init annotations
  hwmon: (core) Fix double-free in __hwmon_device_register()
2018-11-07 11:39:07 -08:00
Omar Sandoval
d6fd0ae25c Btrfs: fix missing delayed iputs on unmount
There's a race between close_ctree() and cleaner_kthread().
close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it
sees it set, but this is racy; the cleaner might have already checked
the bit and could be cleaning stuff. In particular, if it deletes unused
block groups, it will create delayed iputs for the free space cache
inodes. As of "btrfs: don't run delayed_iputs in commit", we're no
longer running delayed iputs after a commit. Therefore, if the cleaner
creates more delayed iputs after delayed iputs are run in
btrfs_commit_super(), we will leak inodes on unmount and get a busy
inode crash from the VFS.

Fix it by parking the cleaner before we actually close anything. Then,
any remaining delayed iputs will always be handled in
btrfs_commit_super(). This also ensures that the commit in close_ctree()
is really the last commit, so we can get rid of the commit in
cleaner_kthread().

The fstest/generic/475 followed by 476 can trigger a crash that
manifests as a slab corruption caused by accessing the freed kthread
structure by a wake up function. Sample trace:

[ 5657.077612] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
[ 5657.079432] PGD 1c57a067 P4D 1c57a067 PUD da10067 PMD 0
[ 5657.080661] Oops: 0000 [#1] PREEMPT SMP
[ 5657.081592] CPU: 1 PID: 5157 Comm: fsstress Tainted: G        W         4.19.0-rc8-default+ #323
[ 5657.083703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 5657.086577] RIP: 0010:shrink_page_list+0x2f9/0xe90
[ 5657.091937] RSP: 0018:ffffb5c745c8f728 EFLAGS: 00010287
[ 5657.092953] RAX: 0000000000000074 RBX: ffffb5c745c8f830 RCX: 0000000000000000
[ 5657.094590] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9a8747fdf3d0
[ 5657.095987] RBP: ffffb5c745c8f9e0 R08: 0000000000000000 R09: 0000000000000000
[ 5657.097159] R10: ffff9a8747fdf5e8 R11: 0000000000000000 R12: ffffb5c745c8f788
[ 5657.098513] R13: ffff9a877f6ff2c0 R14: ffff9a877f6ff2c8 R15: dead000000000200
[ 5657.099689] FS:  00007f948d853b80(0000) GS:ffff9a877d600000(0000) knlGS:0000000000000000
[ 5657.101032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5657.101953] CR2: 00000000000000cc CR3: 00000000684bd000 CR4: 00000000000006e0
[ 5657.103159] Call Trace:
[ 5657.103776]  shrink_inactive_list+0x194/0x410
[ 5657.104671]  shrink_node_memcg.constprop.84+0x39a/0x6a0
[ 5657.105750]  shrink_node+0x62/0x1c0
[ 5657.106529]  try_to_free_pages+0x1a4/0x500
[ 5657.107408]  __alloc_pages_slowpath+0x2c9/0xb20
[ 5657.108418]  __alloc_pages_nodemask+0x268/0x2b0
[ 5657.109348]  kmalloc_large_node+0x37/0x90
[ 5657.110205]  __kmalloc_node+0x236/0x310
[ 5657.111014]  kvmalloc_node+0x3e/0x70

Fixes: 30928e9baa ("btrfs: don't run delayed_iputs in commit")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add trace ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-07 20:17:45 +01:00
Jacob Keller
d5596fd467 i40e: enable NETIF_F_NTUPLE and NETIF_F_HW_TC at driver load
The assignment of the feature flag NETIF_F_NTUPLE and NETIF_F_HW_TC
occurs prior to the initial setup of the local hw_features variable.

This means the features are set as user-changeable, but are not set in
the currently active feature list. This results in the features being
disabled at the driver's initial load.

Move the assignment after the initial assignment of hw_features, and
assign to the local variable. This ensures that NETIF_F_NTUPLE and
NETIF_F_HW_TC are marked as user-changeable, and also enables them by
default when the driver loads.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-07 10:32:15 -08:00
Sasha Neftin
920664a8f7 igc: Clean up code
Address few community comments.
Remove unused code, will be added per demand.
Remove blank lines and unneeded includes.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-07 09:47:01 -08:00