mirror of
https://github.com/Fishwaldo/Star64_linux.git
synced 2025-06-14 10:37:51 +00:00
On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
elimination" optimization results in ___bpf_prog_run()'s jumptable code
changing from this:
select_insn:
jmp *jumptable(, %rax, 8)
...
ALU64_ADD_X:
...
jmp *jumptable(, %rax, 8)
ALU_ADD_X:
...
jmp *jumptable(, %rax, 8)
to this:
select_insn:
mov jumptable, %r12
jmp *(%r12, %rax, 8)
...
ALU64_ADD_X:
...
jmp *(%r12, %rax, 8)
ALU_ADD_X:
...
jmp *(%r12, %rax, 8)
The jumptable address is placed in a register once, at the beginning of
the function. The function execution can then go through multiple
indirect jumps which rely on that same register value. This has a few
issues:
1) Objtool isn't smart enough to be able to track such a register value
across multiple recursive indirect jumps through the jump table.
2) With CONFIG_RETPOLINE enabled, this optimization actually results in
a small slowdown. I measured a ~4.7% slowdown in the test_bpf
"tcpdump port 22" selftest.
This slowdown is actually predicted by the GCC manual:
Note: When compiling a program using computed gotos, a GCC
extension, you may get better run-time performance if you
disable the global common subexpression elimination pass by
adding -fno-gcse to the command line.
So just disable the optimization for this function.
Fixes:
|
||
---|---|---|
.. | ||
arraymap.c | ||
bpf_lru_list.c | ||
bpf_lru_list.h | ||
btf.c | ||
cgroup.c | ||
core.c | ||
cpumap.c | ||
devmap.c | ||
disasm.c | ||
disasm.h | ||
hashtab.c | ||
helpers.c | ||
inode.c | ||
local_storage.c | ||
lpm_trie.c | ||
Makefile | ||
map_in_map.c | ||
map_in_map.h | ||
offload.c | ||
percpu_freelist.c | ||
percpu_freelist.h | ||
queue_stack_maps.c | ||
reuseport_array.c | ||
stackmap.c | ||
syscall.c | ||
tnum.c | ||
verifier.c | ||
xskmap.c |