gpstate_timer_handler() uses synchronous smp_call to set the pstate
on the requested core. This causes the below hard lockup:
smp_call_function_single+0x110/0x180 (unreliable)
smp_call_function_any+0x180/0x250
gpstate_timer_handler+0x1e8/0x580
call_timer_fn+0x50/0x1c0
expire_timers+0x138/0x1f0
run_timer_softirq+0x1e8/0x270
__do_softirq+0x158/0x3e4
irq_exit+0xe8/0x120
timer_interrupt+0x9c/0xe0
decrementer_common+0x114/0x120
-- interrupt: 901 at doorbell_global_ipi+0x34/0x50
LR = arch_send_call_function_ipi_mask+0x120/0x130
arch_send_call_function_ipi_mask+0x4c/0x130
smp_call_function_many+0x340/0x450
pmdp_invalidate+0x98/0xe0
change_huge_pmd+0xe0/0x270
change_protection_range+0xb88/0xe40
mprotect_fixup+0x140/0x340
SyS_mprotect+0x1b4/0x350
system_call+0x58/0x6c
One way to avoid this is removing the smp-call. We can ensure that the
timer always runs on one of the policy-cpus. If the timer gets
migrated to a cpu outside the policy then re-queue it back on the
policy->cpus. This way we can get rid of the smp-call which was being
used to set the pstate on the policy->cpus.
Fixes: 7bc54b652f ("timers, cpufreq/powernv: Initialize the gpstate timer as pinned")
Cc: stable@vger.kernel.org # v4.8+
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- Fix for black screen issues (FDO #104158 and #104425)
- A correction for wrongly applied display W/A
- Fixes for HDA codec interop issue (no audio) and too eager HW timeouts
* tag 'drm-intel-fixes-2018-04-26' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915/fbdev: Enable late fbdev initial configuration
drm/i915: Use ktime on wait_for
drm/i915: Enable display WA#1183 from its correct spot
drm/i915/audio: set minimum CD clock to twice the BCLK
If the user specifies an invalid field modifier for a hist trigger,
the current code correctly flags that as an error, but doesn't tell
the user what happened.
Fix this by invoking hist_err() with an appropriate message when
invalid modifiers are specified.
Before:
# echo 'hist:keys=pid:ts0=common_timestamp.junkusecs' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
-su: echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
After:
# echo 'hist:keys=pid:ts0=common_timestamp.junkusecs' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
-su: echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
ERROR: Invalid field modifier: junkusecs
Last command: keys=pid:ts0=common_timestamp.junkusecs
Link: http://lkml.kernel.org/r/b043c59fa79acd06a5f14a1d44dee9e5a3cd1248.1524790601.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If the user specifies a nonexistent field for a hist trigger, the
current code correctly flags that as an error, but doesn't tell the
user what happened.
Fix this by invoking hist_err() with an appropriate message when
nonexistent fields are specified.
Before:
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
-su: echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
After:
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
-su: echo: write error: Invalid argument
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
ERROR: Couldn't find field: pid
Last command: keys=pid:ts0=common_timestamp.usecs
Link: http://lkml.kernel.org/r/fdc8746969d16906120f162b99dd71c741e0b62c.1524790601.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The flag-printing code used when displaying hist triggers somehow got
dropped during refactoring of the inter-event patchset. This restores
it.
Below are a couple examples - in the first case, .usecs wasn't being
displayed properly for common_timestamps and the second illustrates
the same for other flags such as .execname.
Before:
# echo 'hist:key=common_pid.execname:val=count:sort=count' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
hist:keys=common_pid:vals=hitcount,count:sort=count:size=2048 [active]
# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
hist:keys=pid:vals=hitcount:ts0=common_timestamp:sort=hitcount:size=2048:clock=global if comm=="cyclictest" [active]
After:
# echo 'hist:key=common_pid.execname:val=count:sort=count' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
hist:keys=common_pid.execname:vals=hitcount,count:sort=count:size=2048 [active]
# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global if comm=="cyclictest" [active]
Link: http://lkml.kernel.org/r/492bab42ff21806600af98a8ea901af10efbee0c.1524790601.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-04-27
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add extensive BPF helper description into include/uapi/linux/bpf.h
and a new script bpf_helpers_doc.py which allows for generating a
man page out of it. Thus, every helper in BPF now comes with proper
function signature, detailed description and return code explanation,
from Quentin.
2) Migrate the BPF collect metadata tunnel tests from BPF samples over
to the BPF selftests and further extend them with v6 vxlan, geneve
and ipip tests, simplify the ipip tests, improve documentation and
convert to bpf_ntoh*() / bpf_hton*() api, from William.
3) Currently, helpers that expect ARG_PTR_TO_MAP_{KEY,VALUE} can only
access stack and packet memory. Extend this to allow such helpers
to also use map values, which enabled use cases where value from
a first lookup can be directly used as a key for a second lookup,
from Paul.
4) Add a new helper bpf_skb_get_xfrm_state() for tc BPF programs in
order to retrieve XFRM state information containing SPI, peer
address and reqid values, from Eyal.
5) Various optimizations in nfp driver's BPF JIT in order to turn ADD
and SUB instructions with negative immediate into the opposite
operation with a positive immediate such that nfp can better fit
small immediates into instructions. Savings in instruction count
up to 4% have been observed, from Jakub.
6) Add the BPF prog's gpl_compatible flag to struct bpf_prog_info
and add support for dumping this through bpftool, from Jiri.
7) Move the BPF sockmap samples over into BPF selftests instead since
sockmap was rather a series of tests than sample anyway and this way
this can be run from automated bots, from John.
8) Follow-up fix for bpf_adjust_tail() helper in order to make it work
with generic XDP, from Nikita.
9) Some follow-up cleanups to BTF, namely, removing unused defines from
BTF uapi header and renaming 'name' struct btf_* members into name_off
to make it more clear they are offsets into string section, from Martin.
10) Remove test_sock_addr from TEST_GEN_PROGS in BPF selftests since
not run directly but invoked from test_sock_addr.sh, from Yonghong.
11) Remove redundant ret assignment in sample BPF loader, from Wang.
12) Add couple of missing files to BPF selftest's gitignore, from Anders.
There are two trivial merge conflicts while pulling:
1) Remove samples/sockmap/Makefile since all sockmap tests have been
moved to selftests.
2) Add both hunks from tools/testing/selftests/bpf/.gitignore to the
file since git should ignore all of them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Latest header update will break QEMU (if it's rebuilt with the new
header) - and it seems that the code there is so fragile that any change
in this header will break it. Add a better interface so users do not
need to change their code every time that header changes.
Fix virtio console for spec compliance.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJa4L3jAAoJECgfDbjSjVRpJiAIAMLVjPeMTsES6BX4duG/jhhc
QmAflHg73Qmgvanbpqit/B1TRRsOsVnUGQ/4SubfQdEFZld8u/1ZNur9LKDika7h
qhCM1HN9KN3O7E4IIF45i8jmsXoqBWOIb3BqBdAyeqNDWH4q48524IvYizPMgkDd
ZnEZ/2pRi2HRstlwBD/JTcsfWRp/nUjarxnj8ZhUEUDFbJfjr7sPTeDwPSDShuIQ
PrC9U8gliNRuxuq1v5Afn9F6mQptgvMxMLmtUqvYydlYgwu7cJUQ+Qxp8i7rNfM8
kCKkn/24UdUYHft4596bEEgDWR6nriMFCQAYKWlsCtwIvbZnURURl5TKT5ceI7Y=
=N0il
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixups from Michael Tsirkin:
- Latest header update will break QEMU (if it's rebuilt with the new
header) - and it seems that the code there is so fragile that any
change in this header will break it. Add a better interface so users
do not need to change their code every time that header changes.
- Fix virtio console for spec compliance.
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_console: reset on out of memory
virtio_console: move removal code
virtio_console: drop custom control queue cleanup
virtio_console: free buffers after reset
virtio: add ability to iterate over vqs
virtio_console: don't tie bufs to a vq
virtio_balloon: add array of stat names
- Add support for new Ryzen chips to k10temp driver
... making Phoronix happy
- Fix inconsistent chip access in nct6683 driver
- Handle absence of few types of sensors in scmi driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJa4h6GAAoJEMsfJm/On5mBwy8P+gOh3tqRsFZTRzMcVjjgUw7Z
aCDuqix3qvD+utmNE5d/E7l0KUwqaEcBnsLoyJE6cOsC2z5xgoGMHCoN2sQXv/wU
NDf2tE1n5IaEgZx5AgJcGlquCh5t/9KqRohqXZCJ/guncV3YW1XcrlaANgSpRYul
w/jrVi4/2E2tf2v8RdOugZsAkbLscHnjbTgt+oPWbr1MkQshe1/9mxtIzv4zXkcB
+bGOJiS7Wo3Fos8Wnd76VycWHQ27Sqdxxv0SDz811kMFq/HefZQvprb6izYSq4I3
g8Lso7OULXFWQNaaiSegs5yOmGlreS9UBIrlYHHhkf50fgpvfe6kLeJFvtHWW+AA
8oX1u1tXyt56y0Im6WwPtHRa9xyAaj9jL8kTRcZlLNzGFP/Jz3HKQxziXEQUIq9B
ZYQccLdHBA8jfwCZ4nhy7RArYXCDsMHGeY4VEnsycWM5TnuKBjAipSVlbjcTVmbJ
0+Xe2ug4lA1odEpJIb8vHAxTDSQg09M12hPRa8UGPmUKgfCHqSQidmd5xjdmQEbS
N5yWjXXIHolIT0CxaEh46empFZjLJjYzfPDqgOmus7MGoMJ/I3VKsBuwpCDouw7N
j5j4g2Il3fcwyHAWzHRfOQSzMeKPhikSbzT9dIX9I0/9oT6u96uShKVZAiXlUEhQ
7EPv9hAsrMNM0KdSf6Yq
=JV1n
-----END PGP SIGNATURE-----
Merge tag 'hwmon-for-linus-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Add support for new Ryzen chips to k10temp driver
... making Phoronix happy
- Fix inconsistent chip access in nct6683 driver
- Handle absence of few types of sensors in scmi driver
* tag 'hwmon-for-linus-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (k10temp) Add support for AMD Ryzen w/ Vega graphics
hwmon: (k10temp) Add temperature offset for Ryzen 2700X
hwmon: (nct6683) Enable EC access if disabled at boot
hwmon: (scmi) handle absence of few types of sensors
- Add workqueue forward declaration (for new work, but a nice clean up)
- seftest fixes for the new histogram code
- Print output fix for hwlat tracer
- Fix missing system call events - due to change in x86 syscall naming
- Fix kprobe address being used by perf being hashed
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCWuIMShQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qkrdAQDRrgIGcm4pRGrvPiGhp4FeQKUx3woM
LY10qMYo3St7zwEAn5oor/e/7KQaQSdKQ7QkL690QU2bTO6FXz4VwE1OcgM=
=OHJk
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Add workqueue forward declaration (for new work, but a nice clean up)
- seftest fixes for the new histogram code
- Print output fix for hwlat tracer
- Fix missing system call events - due to change in x86 syscall naming
- Fix kprobe address being used by perf being hashed
* tag 'trace-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix missing tab for hwlat_detector print format
selftests: ftrace: Add a testcase for multiple actions on trigger
selftests: ftrace: Fix trigger extended error testcase
kprobes: Fix random address output of blacklist file
tracing: Fix kernel crash while using empty filter with perf
tracing/x86: Update syscall trace events to handle new prefixed syscall func names
tracing: Add missing forward declaration
2 redundant ret assignments removed:
* 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
label 'done'. No 'ret = 1' needed before the error jump.
* After the '/* load programs */' part, if everything goes well, then
the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
after the for clause will make the error skipped.
For example, if some BPF code cannot provide supported program types
in ELF SEC("unknown"), the for clause will not call load_and_attach()
to load the BPF code. 1 should be returned to callees instead of 0.
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet says:
====================
eBPF helper functions can be called from within eBPF programs to perform
a variety of tasks that would be otherwise hard or impossible to do with
eBPF itself. There is a growing number of such helper functions in the
kernel, but documentation is scarce. The main user space header file
does contain a short commented description of most helpers, but it is
somewhat outdated and not complete. It is more a "cheat sheet" than a
real documentation accessible to new eBPF developers.
This commit attempts to improve the situation by replacing the existing
overview for the helpers with a more developed description. Furthermore,
a Python script is added to generate a manual page for eBPF helpers. The
workflow is the following, and requires the rst2man utility:
$ ./scripts/bpf_helpers_doc.py \
--filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
$ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
$ man /tmp/bpf-helpers.7
The objective is to keep all documentation related to the helpers in a
single place, and to be able to generate from here a manual page that
could be packaged in the man-pages repository and shipped with most
distributions.
Additionally, parsing the prototypes of the helper functions could
hopefully be reused, with a different Printer object, to generate
header files needed in some eBPF-related projects.
Regarding the description of each helper, it comprises several items:
- The function prototype.
- A description of the function and of its arguments (except for a
couple of cases, when there are no arguments and the return value
makes the function usage really obvious).
- A description of return values (if not void).
Additional items such as the list of compatible eBPF program and map
types for each helper, Linux kernel version that introduced the helper,
GPL-only restriction, and commit hash could be added in the future, but
it was decided on the mailing list to leave them aside for now.
For several helpers, descriptions are inspired (at times, nearly copied)
from the commit logs introducing them in the kernel--Many thanks to
their respective authors! Some sentences were also adapted from comments
from the reviews, thanks to the reviewers as well. Descriptions were
completed as much as possible, the objective being to have something easily
accessible even for people just starting with eBPF. There is probably a bit
more work to do in this direction for some helpers.
Some RST formatting is used in the descriptions (not in function
prototypes, to keep them readable, but the Python script provided in
order to generate the RST for the manual page does add formatting to
prototypes, to produce something pretty) to get "bold" and "italics" in
manual pages. Hopefully, the descriptions in bpf.h file remains
perfectly readable. Note that the few trailing white spaces are
intentional, removing them would break paragraphs for rst2man.
The descriptions should ideally be updated each time someone adds a new
helper, or updates the behaviour (new socket option supported, ...) or
the interface (new flags available, ...) of existing ones.
To ease the review process, the documentation has been split into several
patches.
v3 -> v4:
- Add a patch (#9) for newly added BPF helpers.
- Add a patch (#10) to update UAPI bpf.h version under tools/.
- Use SPDX tag in Python script.
- Several fixes on man page header and footer, and helpers documentation.
Please refer to individual patches for details.
RFC v2 -> PATCH v3:
Several fixes on man page header and footer, and helpers documentation.
Please refer to individual patches for details.
RFC v1 -> RFC v2:
- Remove "For" (compatible program and map types), "Since" (minimal
Linux kernel version required), "GPL only" sections and commit hashes
for the helpers.
- Add comment on top of the description list to explain how this
documentation is supposed to be processed.
- Update Python script accordingly (remove the same sections, and remove
paragraphs on program types and GPL restrictions from man page
header).
- Split series into several patches.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Update tools/include/uapi/linux/bpf.h file in order to reflect the
changes for BPF helper functions documentation introduced in previous
commits.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions:
Helper from Nikita:
- bpf_xdp_adjust_tail()
Helper from Eyal:
- bpf_skb_get_xfrm_state()
v4:
- New patch (helpers did not exist yet for previous versions).
Cc: Nikita V. Shirokov <tehnerd@tehnerd.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by John:
- bpf_redirect_map()
- bpf_sk_redirect_map()
- bpf_sock_map_update()
- bpf_msg_redirect_map()
- bpf_msg_apply_bytes()
- bpf_msg_cork_bytes()
- bpf_msg_pull_data()
v4:
- bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED",
"his" to "this". Also add a paragraph on performance improvement over
bpf_redirect() helper.
v3:
- bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag.
- bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag.
- bpf_redirect_map(): Fix note on CPU redirection, not fully implemented
for generic XDP but supported on native XDP.
- bpf_msg_pull_data(): Clarify comment about invalidated verifier
checks.
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions:
Helpers from Lawrence:
- bpf_setsockopt()
- bpf_getsockopt()
- bpf_sock_ops_cb_flags_set()
Helpers from Yonghong:
- bpf_perf_event_read_value()
- bpf_perf_prog_read_value()
Helper from Josef:
- bpf_override_return()
Helper from Andrey:
- bpf_bind()
v4:
- bpf_perf_event_read_value(): State that this helper should be
preferred over bpf_perf_event_read().
v3:
- bpf_perf_event_read_value(): Fix time of selection for perf event type
in description. Remove occurences of "cores" to avoid confusion with
"CPU".
- bpf_bind(): Remove last paragraph of description, which was off topic.
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
[for bpf_perf_event_read_value(), bpf_perf_prog_read_value()]
Acked-by: Andrey Ignatov <rdna@fb.com>
[for bpf_bind()]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions:
Helper from Kaixu:
- bpf_perf_event_read()
Helpers from Martin:
- bpf_skb_under_cgroup()
- bpf_xdp_adjust_head()
Helpers from Sargun:
- bpf_probe_write_user()
- bpf_current_task_under_cgroup()
Helper from Thomas:
- bpf_skb_change_head()
Helper from Gianluca:
- bpf_probe_read_str()
Helpers from Chenbo:
- bpf_get_socket_cookie()
- bpf_get_socket_uid()
v4:
- bpf_perf_event_read(): State that bpf_perf_event_read_value() should
be preferred over this helper.
- bpf_skb_change_head(): Clarify comment about invalidated verifier
checks.
- bpf_xdp_adjust_head(): Clarify comment about invalidated verifier
checks.
- bpf_probe_write_user(): Add that dst must be a valid user space
address.
- bpf_get_socket_cookie(): Improve description by making clearer that
the cockie belongs to the socket, and state that it remains stable for
the life of the socket.
v3:
- bpf_perf_event_read(): Fix time of selection for perf event type in
description. Remove occurences of "cores" to avoid confusion with
"CPU".
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Gianluca Borello <g.borello@gmail.com>
Cc: Chenbo Feng <fengc@google.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
[for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by Daniel:
- bpf_get_hash_recalc()
- bpf_skb_change_tail()
- bpf_skb_pull_data()
- bpf_csum_update()
- bpf_set_hash_invalid()
- bpf_get_numa_node_id()
- bpf_set_hash()
- bpf_skb_adjust_room()
- bpf_xdp_adjust_meta()
v4:
- bpf_skb_change_tail(): Clarify comment about invalidated verifier
checks.
- bpf_skb_pull_data(): Clarify the motivation for using this helper or
bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for
*skb*. Clarify comment about invalidated verifier checks.
- bpf_csum_update(): Fix description of checksum (entire packet, not IP
checksum). Fix a typo: "header" instead of "helper".
- bpf_set_hash_invalid(): Mention bpf_get_hash_recalc().
- bpf_get_numa_node_id(): State that the helper is not restricted to
programs attached to sockets.
- bpf_skb_adjust_room(): Clarify comment about invalidated verifier
checks.
- bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier
checks.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by Daniel:
- bpf_get_prandom_u32()
- bpf_get_smp_processor_id()
- bpf_get_cgroup_classid()
- bpf_get_route_realm()
- bpf_skb_load_bytes()
- bpf_csum_diff()
- bpf_skb_get_tunnel_opt()
- bpf_skb_set_tunnel_opt()
- bpf_skb_change_proto()
- bpf_skb_change_type()
v4:
- bpf_get_prandom_u32(): Warn that the prng is not cryptographically
secure.
- bpf_get_smp_processor_id(): Fix a typo (case).
- bpf_get_cgroup_classid(): Clarify description. Add notes on the helper
being limited to cgroup v1, and to egress path.
- bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid().
Add a note about usage with TC and advantage of clsact. Fix a typo in
return value ("sdb" instead of "skb").
- bpf_skb_load_bytes(): Make explicit loading large data loads it to the
eBPF stack.
- bpf_csum_diff(): Add a note on seed that can be cascaded. Link to
bpf_l3|l4_csum_replace().
- bpf_skb_get_tunnel_opt(): Add a note about usage with "collect
metadata" mode, and example of this with Geneve.
- bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt()
description.
- bpf_skb_change_proto(): Mention that the main use case is NAT64.
Clarify comment about invalidated verifier checks.
v3:
- bpf_get_prandom_u32(): Fix helper name :(. Add description, including
a note on the internal random state.
- bpf_get_smp_processor_id(): Add description, including a note on the
processor id remaining stable during program run.
- bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
required to use the helper. Add a reference to related documentation.
State that placing a task in net_cls controller disables cgroup-bpf.
- bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
required to use this helper.
- bpf_skb_load_bytes(): Fix comment on current use cases for the helper.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by Alexei:
- bpf_get_current_pid_tgid()
- bpf_get_current_uid_gid()
- bpf_get_current_comm()
- bpf_skb_vlan_push()
- bpf_skb_vlan_pop()
- bpf_skb_get_tunnel_key()
- bpf_skb_set_tunnel_key()
- bpf_redirect()
- bpf_perf_event_output()
- bpf_get_stackid()
- bpf_get_current_task()
v4:
- bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add
note on bpf_redirect_map() providing better performance. Replace "Save
for" with "Except for".
- bpf_skb_vlan_push(): Clarify comment about invalidated verifier
checks.
- bpf_skb_vlan_pop(): Clarify comment about invalidated verifier
checks.
- bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata"
mode, and example tunneling protocols with which it can be used.
- bpf_skb_set_tunnel_key(): Add a reference to the description of
bpf_skb_get_tunnel_key().
- bpf_perf_event_output(): Specify that, and for what purpose, the
helper can be used with programs attached to TC and XDP.
v3:
- bpf_skb_get_tunnel_key(): Change and improve description and example.
- bpf_redirect(): Improve description of BPF_F_INGRESS flag.
- bpf_perf_event_output(): Fix first sentence of description. Delete
wrong statement on context being evaluated as a struct pt_reg. Remove
the long yet incomplete example.
- bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
configurable.
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by Alexei:
- bpf_map_lookup_elem()
- bpf_map_update_elem()
- bpf_map_delete_elem()
- bpf_probe_read()
- bpf_ktime_get_ns()
- bpf_trace_printk()
- bpf_skb_store_bytes()
- bpf_l3_csum_replace()
- bpf_l4_csum_replace()
- bpf_tail_call()
- bpf_clone_redirect()
v4:
- bpf_map_lookup_elem(): Add "const" qualifier for key.
- bpf_map_update_elem(): Add "const" qualifier for key and value.
- bpf_map_lookup_elem(): Add "const" qualifier for key.
- bpf_skb_store_bytes(): Clarify comment about invalidated verifier
checks.
- bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note
about bpf_csum_diff().
- bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a
note about bpf_csum_diff().
- bpf_tail_call(): Bring minor edits to description.
- bpf_clone_redirect(): Add a note about the relation with
bpf_redirect(). Also clarify comment about invalidated verifier
checks.
v3:
- bpf_map_lookup_elem(): Fix description of restrictions for flags
related to the existence of the entry.
- bpf_trace_printk(): State that trace_pipe can be configured. Fix
return value in case an unknown format specifier is met. Add a note on
kernel log notice when the helper is used. Edit example.
- bpf_tail_call(): Improve comment on stack inheritance.
- bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Remove previous "overview" of eBPF helpers from user bpf.h header.
Replace it by a comment explaining how to process the new documentation
(to come in following patches) with a Python script to produce RST, then
man page documentation.
Also add the aforementioned Python script under scripts/. It is used to
process include/uapi/linux/bpf.h and to extract helper descriptions, to
turn it into a RST document that can further be processed with rst2man
to produce a man page. The script takes one "--filename <path/to/file>"
option. If the script is launched from scripts/ in the kernel root
directory, it should be able to find the location of the header to
parse, and "--filename <path/to/file>" is then optional. If it cannot
find the file, then the option becomes mandatory. RST-formatted
documentation is printed to standard output.
Typical workflow for producing the final man page would be:
$ ./scripts/bpf_helpers_doc.py \
--filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
$ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
$ man /tmp/bpf-helpers.7
Note that the tool kernel-doc cannot be used to document eBPF helpers,
whose signatures are not available directly in the header files
(pre-processor directives are used to produce them at the beginning of
the compilation process).
v4:
- Also remove overviews for newly added bpf_xdp_adjust_tail() and
bpf_skb_get_xfrm_state().
- Remove vague statement about what helpers are restricted to GPL
programs in "LICENSE" section for man page footer.
- Replace license boilerplate with SPDX tag for Python script.
v3:
- Change license for man page.
- Remove "for safety reasons" from man page header text.
- Change "packets metadata" to "packets" in man page header text.
- Move and fix comment on helpers introducing no overhead.
- Remove "NOTES" section from man page footer.
- Add "LICENSE" section to man page footer.
- Edit description of file include/uapi/linux/bpf.h in man page footer.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Similarly, tbl->entries is not initialized after kmalloc(),
therefore causes an uninit-value warning in ip_vs_lblc_check_expire(),
as reported by syzbot.
Reported-by: <syzbot+3e9695f147fb529aa9bc@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tbl->entries is not initialized after kmalloc(), therefore
causes an uninit-value warning in ip_vs_lblc_check_expire()
as reported by syzbot.
Reported-by: <syzbot+3dfdea57819073a04f21@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Horman says:
====================
IPVS Updates for v4.18
please consider these IPVS enhancements for v4.18.
* Whitepace cleanup
* Add Maglev hashing algorithm as a IPVS scheduler
Inju Song says "Implements the Google's Maglev hashing algorithm as a
IPVS scheduler. Basically it provides consistent hashing but offers some
special features about disruption and load balancing.
1) minimal disruption: when the set of destinations changes,
a connection will likely be sent to the same destination
as it was before.
2) load balancing: each destination will receive an almost
equal number of connections.
Seel also: [3.4 Consistent Hasing] in
https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
"
* Fix to correct implementation of Knuth's multiplicative hashing
which is used in sh/dh/lblc/lblcr algorithms. Instead the
implementation provided by the hash_32() macro is used.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
William Tu says:
====================
The patch series provide end-to-end eBPF tunnel testsute. A common topology
is created below for all types of tunnels:
Topology:
---------
root namespace | at_ns0 namespace
|
----------- | -----------
| tnl dev | | | tnl dev | (overlay network)
----------- | -----------
metadata-mode | native-mode
with bpf |
|
---------- | ----------
| veth1 | --------- | veth0 | (underlay network)
---------- peer ----------
Device Configuration
--------------------
Root namespace with metadata-mode tunnel + BPF
Device names and addresses:
veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)
tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)
Namespace at_ns0 with native tunnel
Device names and addresses:
veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)
tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)
End-to-end ping packet flow
---------------------------
Most of the tests start by namespace creation, device configuration,
then ping the underlay and overlay network. When doing 'ping 10.1.1.100'
from root namespace, the following operations happen:
1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.
2) Tnl device's egress BPF program is triggered and set the tunnel metadata,
with remote_ip=172.16.1.200 and others.
3) Outer tunnel header is prepended and route the packet to veth1's egress
4) veth0's ingress queue receive the tunneled packet at namespace at_ns0
5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet
6) Forward the packet to the overlay tnl dev
Test Cases
-----------------------------
Tunnel Type | BPF Programs
-----------------------------
GRE: gre_set_tunnel, gre_get_tunnel
IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
ERSPAN: erspan_set_tunnel, erspan_get_tunnel
IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
GENEVE: geneve_set_tunnel, geneve_get_tunnel
IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
IPIP: ipip_set_tunnel, ipip_get_tunnel
IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
ip6ip6_set_tunnel, ip6ip6_get_tunnel
XFRM: xfrm_get_state
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Move the testsuite to
selftests/bpf/{test_tunnel_kern.c, test_tunnel.sh}
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
and samples/bpf/test_tunnel_bpf.sh to selftests. There are a couple
changes from the original:
1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
2) simplify the original ipip tests (remove iperf tests)
3) improve documentation
4) use bpf_ntoh* and bpf_hton* api
In summary, 'test_tunnel_kern.o' contains the following bpf program:
GRE: gre_set_tunnel, gre_get_tunnel
IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
ERSPAN: erspan_set_tunnel, erspan_get_tunnel
IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
GENEVE: geneve_set_tunnel, geneve_get_tunnel
IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
IPIP: ipip_set_tunnel, ipip_get_tunnel
IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
ip6ip6_set_tunnel, ip6ip6_get_tunnel
XFRM: xfrm_get_state
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When bpf_adjust_tail was introduced for generic xdp, it changed skb's tail
pointer, so it was pointing to the new "end of the packet". However skb's
len field wasn't properly modified, so on the wire ethernet frame had
original (or even bigger, if adjust_head was used) size. This diff is
fixing this.
Fixes: 198d83bb3 (" bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail")
Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Adding gpl_compatible flag to struct bpf_prog_info
so it can be dumped via bpf_prog_get_info_by_fd and
displayed via bpftool progs dump.
Alexei noticed 4-byte hole in struct bpf_prog_info,
so we put the u32 flags field in there, and we can
keep adding bit fields in there without breaking
user space.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Here the variable cont is used as the saved_pointer for a call to
strtok_r(). It is safe to use the value uninitialized in this
context however and the later reference is only ever used if
the strtok_r is successful. But, 'gcc-5' at least doesn't have all
this knowledge so initialize cont to NULL. Additionally, do the
natural NULL check before accessing just for completness.
The warning is the following:
./bpf/tools/bpf/bpf_dbg.c: In function ‘cmd_load’:
./bpf/tools/bpf/bpf_dbg.c:1077:13: warning: ‘cont’ may be used uninitialized in this function [-Wmaybe-uninitialized]
} else if (matches(subcmd, "pcap") == 0) {
Fixes: fd981e3c32 "filter: bpf_dbg: add minimal bpf debugger"
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When deleting a flow counter, the modify mask should be the action and
the flow counter. Otherwise the flow counter is not deleted and we'll
get a firmware warning when deleting the remaining destinations on the
same FTE.
It only happens in the presence of flow counter and multiple vport
destinations. If there is only one vport destination, there is no
need to update the FTE when deleting the only vport destination,
we just delete the FTE.
Fixes: ae05831424 ("net/mlx5: Add option to add fwd rule with counter")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
After the cited commit, WQE RQ size is calculated based on sw_mtu but it
was not set for representors. This commit fixes that.
Fixes: 472a1e44b3 ("net/mlx5e: Save MTU in channels params")
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When we fail to initialize the RX root namespace, we need
to clean only that and not the entire flow steering.
Currently the code may try to clean the flow steering twice
on error witch leads to null pointer deference.
Make sure we clean correctly.
Fixes: fba53f7b57 ("net/mlx5: Introduce mlx5_flow_steering structure")
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In case of a dma_mapping_error, do not use wi->num_dma
as a parameter for dma unmap function because it's yet
to be set, and holds an out-of-date value.
Use actual value (local variable num_dma) instead.
Fixes: 34802a42b3 ("net/mlx5e: Do not modify the TX SKB")
Fixes: e586b3b0ba ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
and the fact that the internal completion vectors that mlx5 allocates
don't get an EQ index.
The second problem here is that using effective_affinity_mask gives the same
CPU for different vectors.
This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
This doesn't happen when using affinity_hint mask.
Fixes: 2572cf57d7 ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
Fixes: 05e0cc84e0 ("net/mlx5: Fix get vector affinity helper function")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
When the trust state is set to dscp and the netdev is down, the inline
header size is not updated. When netdev is up, the inline header size
stays at L2 instead of IP.
Fix this issue by updating the private parameter when the netdev is in
down so that when netdev is up, it picks up the right header size.
Fixes: fbcb127e89 ("net/mlx5e: Support DSCP trust state ...")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For ICMPv4, the checksum is calculated from the ICMP headers and data.
Since the ICMPv4 checksum doesn't cover the IP header, we can allow to
do L3 header re-write for this protocol.
Fixes: bdd66ac0ae ('net/mlx5e: Disallow TC offloading of unsupported match/action combinations')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Make kernel print the correct number of TLB entries on Intel Xeon Phi 7210
(and others)
Before:
[ 0.320005] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
After:
[ 0.320005] Last level dTLB entries: 4KB 256, 2MB 128, 4MB 128, 1GB 16
The entries do exist in the official Intel SMD but the type column there is
incorrect (states "Cache" where it should read "TLB"), but the entries for
the values 0x6B, 0x6C and 0x6D are correctly described as 'Data TLB'.
Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180423161425.24366-1-jacekt@dugeo.com
Willem de Bruijn says:
====================
udp gso
Segmentation offload reduces cycles/byte for large packets by
amortizing the cost of protocol stack traversal.
This patchset implements GSO for UDP. A process can concatenate and
submit multiple datagrams to the same destination in one send call
by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
or passing an analogous cmsg at send time.
The stack will send the entire large (up to network layer max size)
datagram through the protocol layer. At the GSO layer, it is broken
up in individual segments. All receive the same network layer header
and UDP src and dst port. All but the last segment have the same UDP
header, but the last may differ in length and checksum.
Initial results show a significant reduction in UDP cycles/byte.
See the main patch for more details and benchmark results.
udp
876 MB/s 14873 msg/s 624666 calls/s
11,205,777,429 cycles
udp gso
2139 MB/s 36282 msg/s 36282 calls/s
11,204,374,561 cycles
The patch set is broken down as follows:
- patch 1 is a prerequisite: code rearrangement, noop otherwise
- patch 2 implements the gso logic
- patch 3 adds protocol stack support for UDP_SEGMENT
- patch 4,5,7 are refinements
- patch 6 adds the cmsg interface
- patch 8..11 are tests
This idea was presented previously at netconf 2017-2
http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
Changes v1 -> v2
- Convert __udp_gso_segment to modify headers after skb_segment
- Split main patch into two, one for gso logic, one for UDP_SEGMENT
Changes RFC -> v1
- MSG_MORE:
fixed, by allowing checksum offload with corking if gso
- SKB_GSO_UDP_L4:
made independent from SKB_GSO_UDP
and removed skb_is_ufo() wrapper
- NETIF_F_GSO_UDP_L4:
add to netdev_features_string
and to netdev-features.txt
add BUILD_BUG_ON to match SKB_GSO_UDP_L4 value
- UDP_MAX_SEGMENTS:
introduce limit on number of segments per gso skb
to avoid extreme cases like IP_MAX_MTU/IPV4_MIN_MTU
- CHECKSUM_PARTIAL:
test against missing feature after ndo_features_check
if not supported return error, analogous to udp_send_check
- MSG_ZEROCOPY: removed, deferred for now
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Send udp data between a source and sink, optionally with udp gso.
The two processes are expected to be run on separate hosts.
A script is included that runs them together over loopback in a
single namespace for functionality testing.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corked sockets take a different path to construct a udp datagram than
the lockless fast path. Test this alternate path.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Connected sockets use path mtu instead of device mtu.
Test this path by inserting a route mtu that is lower than the device
mtu. Verify that the path mtu for the connection matches this lower
number, then run the same test as in the connectionless case.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validate udp gso, including edge cases (such as min/max gso sizes).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Virtual devices such as tunnels and bonding can handle large packets.
Only segment packets when reaching a physical or loopback device.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>