Commit graph

724073 commits

Author SHA1 Message Date
Edward Cree
03714bbb22 sfc: make mem_bar a function rather than a constant
Support using BAR 0 on SFC9250, even though the driver doesn't bind to such
 devices yet.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18 13:07:49 -05:00
Linus Torvalds
64a48099b3 Merge branch 'WIP.x86-pti.entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 syscall entry code changes for PTI from Ingo Molnar:
 "The main changes here are Andy Lutomirski's changes to switch the
  x86-64 entry code to use the 'per CPU entry trampoline stack'. This,
  besides helping fix KASLR leaks (the pending Page Table Isolation
  (PTI) work), also robustifies the x86 entry code"

* 'WIP.x86-pti.entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  x86/cpufeatures: Make CPU bugs sticky
  x86/paravirt: Provide a way to check for hypervisors
  x86/paravirt: Dont patch flush_tlb_single
  x86/entry/64: Make cpu_entry_area.tss read-only
  x86/entry: Clean up the SYSENTER_stack code
  x86/entry/64: Remove the SYSENTER stack canary
  x86/entry/64: Move the IST stacks into struct cpu_entry_area
  x86/entry/64: Create a per-CPU SYSCALL entry trampoline
  x86/entry/64: Return to userspace from the trampoline stack
  x86/entry/64: Use a per-CPU trampoline stack for IDT entries
  x86/espfix/64: Stop assuming that pt_regs is on the entry stack
  x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0
  x86/entry: Remap the TSS into the CPU entry area
  x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct
  x86/dumpstack: Handle stack overflow on all stacks
  x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  x86/kasan/64: Teach KASAN about the cpu_entry_area
  x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce struct cpu_entry_area
  x86/entry/gdt: Put per-CPU GDT remaps in ascending order
  x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
  ...
2017-12-18 08:59:15 -08:00
David S. Miller
59436c9ee1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-18

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Allow arbitrary function calls from one BPF function to another BPF function.
   As of today when writing BPF programs, __always_inline had to be used in
   the BPF C programs for all functions, unnecessarily causing LLVM to inflate
   code size. Handle this more naturally with support for BPF to BPF calls
   such that this __always_inline restriction can be overcome. As a result,
   it allows for better optimized code and finally enables to introduce core
   BPF libraries in the future that can be reused out of different projects.
   x86 and arm64 JIT support was added as well, from Alexei.

2) Add infrastructure for tagging functions as error injectable and allow for
   BPF to return arbitrary error values when BPF is attached via kprobes on
   those. This way of injecting errors generically eases testing and debugging
   without having to recompile or restart the kernel. Tags for opting-in for
   this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.

3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
   call for XDP programs. First part of this work adds handling of BPF
   capabilities included in the firmware, and the later patches add support
   to the nfp verifier part and JIT as well as some small optimizations,
   from Jakub.

4) The bpftool now also gets support for basic cgroup BPF operations such
   as attaching, detaching and listing current BPF programs. As a requirement
   for the attach part, bpftool can now also load object files through
   'bpftool prog load'. This reuses libbpf which we have in the kernel tree
   as well. bpftool-cgroup man page is added along with it, from Roman.

5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
   a single perf event") added support for attaching multiple BPF programs
   to a single perf event. Given they are configured through perf's ioctl()
   interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
   command in this work in order to return an array of one or multiple BPF
   prog ids that are currently attached, from Yonghong.

6) Various minor fixes and cleanups to the bpftool's Makefile as well
   as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
   itself or prior installed documentation related to it, from Quentin.

7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
   required for the test_dev_cgroup test case to run, from Naresh.

8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.

9) Fix libbpf's exit code from the Makefile when libelf was not found in
   the system, also from Jakub.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18 10:51:06 -05:00
David S. Miller
b36025b19a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-17

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a corner case in generic XDP where we have non-linear skbs
   but enough tailroom in the skb to not miss to linearizing there,
   from Song.

2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when
   BPF context is not skb, from Daniel.

3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper
   call would use the wrong register for the skb, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18 10:49:22 -05:00
Paolo Bonzini
43aabca38a KVM/ARM Fixes for v4.15, Round 2
Fixes:
  - A bug in our handling of SPE state for non-vhe systems
  - A bug that causes hyp unmapping to go off limits and crash the system on
    shutdown
  - Three timer fixes that were introduced as part of the timer optimizations
    for v4.15
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaN5C1AAoJEEtpOizt6ddyodkH/jN1lquFVdYJBlEO6NXiumEk
 GBH6x6CmuGyiUL3J0ffx5U51x0NN2jE89TpH5d1dsnQg77CCjTCxHtQ9suHne3n1
 5/r0BzHZhaCbnbY0f7+E4EL0UOTpiAwUIqin1ufLPjs4XywcFyiLa7xiWkQkDmyr
 WXKOdppTc4j/FUyqb1fQBmYY8pENR5jjfgdaeZ6C6o7e6aksXgrPWqXhV/6OSRLd
 MOcxA06QfwTWy+MT1x4yo1hzCTjOEvvQXT2Va09moiNxT7hVWWvO/kwJVQL+YpWW
 di7t4CLCvGYUsxM5t8fHHV7X+dfd2nvpJA46TWggPye7yMYkTYXFQu1LHwPIdDU=
 =c5Kt
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-fixes-for-v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Fixes for v4.15, Round 2

Fixes:
 - A bug in our handling of SPE state for non-vhe systems
 - A bug that causes hyp unmapping to go off limits and crash the system on
   shutdown
 - Three timer fixes that were introduced as part of the timer optimizations
   for v4.15
2017-12-18 12:57:43 +01:00
Wanpeng Li
e39d200fa5 KVM: Fix stack-out-of-bounds read in write_mmio
Reported by syzkaller:

  BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
  Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298

  CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
  Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
  Call Trace:
   dump_stack+0xab/0xe1
   print_address_description+0x6b/0x290
   kasan_report+0x28a/0x370
   write_mmio+0x11e/0x270 [kvm]
   emulator_read_write_onepage+0x311/0x600 [kvm]
   emulator_read_write+0xef/0x240 [kvm]
   emulator_fix_hypercall+0x105/0x150 [kvm]
   em_hypercall+0x2b/0x80 [kvm]
   x86_emulate_insn+0x2b1/0x1640 [kvm]
   x86_emulate_instruction+0x39a/0xb90 [kvm]
   handle_exception+0x1b4/0x4d0 [kvm_intel]
   vcpu_enter_guest+0x15a0/0x2640 [kvm]
   kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
   kvm_vcpu_ioctl+0x479/0x880 [kvm]
   do_vfs_ioctl+0x142/0x9a0
   SyS_ioctl+0x74/0x80
   entry_SYSCALL_64_fastpath+0x23/0x9a

The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
it by just accessing the bytes which we operate on.

Before patch:

syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f

After patch:

syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-18 12:57:01 +01:00
Takashi Iwai
bb82e0b4a7 ACPI: APEI / ERST: Fix missing error handling in erst_reader()
The commit f6f8285132 ("pstore: pass allocated memory region back to
caller") changed the check of the return value from erst_read() in
erst_reader() in the following way:

        if (len == -ENOENT)
                goto skip;
-       else if (len < 0) {
-               rc = -1;
+       else if (len < sizeof(*rcd)) {
+               rc = -EIO;
                goto out;

This introduced another bug: since the comparison with sizeof() is
cast to unsigned, a negative len value doesn't hit any longer.
As a result, when an error is returned from erst_read(), the code
falls through, and it may eventually lead to some weird thing like
memory corruption.

This patch adds the negative error value check more explicitly for
addressing the issue.

Fixes: f6f8285132 (pstore: pass allocated memory region back to caller)
Cc: All applicable <stable@vger.kernel.org>
Tested-by: Jerry Tang <jtang@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-12-18 12:12:08 +01:00
Colin Ian King
951ef0e19f ACPI: CPPC: remove initial assignment of pcc_ss_data
The initialization of pcc_ss_data from pcc_data[pcc_ss_id] before
pcc_ss_id is being range checked could lead to an out-of-bounds array
read.  This very same initialization is also being performed after
the range check on pcc_ss_id, so we can just remove this problematic
and also redundant assignment to fix the issue.

Detected by cppcheck:
warning: Value stored to 'pcc_ss_data' during its initialization is never
read

Fixes: 85b1407bf6 (ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-12-18 12:10:37 +01:00
Rafael J. Wysocki
56026645e2 cpufreq: governor: Ensure sufficiently large sampling intervals
After commit aa7519af45 (cpufreq: Use transition_delay_us for legacy
governors as well) the sampling_rate field of struct dbs_data may be
less than the tick period which causes dbs_update() to produce
incorrect results, so make the code ensure that the value of that
field will always be sufficiently large.

Fixes: aa7519af45 (cpufreq: Use transition_delay_us for legacy governors as well)
Reported-by: Andy Tang <andy.tang@nxp.com>
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Andy Tang <andy.tang@nxp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-12-18 12:09:39 +01:00
Lucas Stach
ccc153a6de cpufreq: imx6q: fix speed grading regression on i.MX6 QuadPlus
The commit moving the speed grading check to the cpufreq driver introduced
some additional checks, so the OPP disable is only attempted on SoCs where
those OPPs are present. The compatible checks are missing the QuadPlus
compatible, so invalid OPPs are not correctly disabled there.

Move both checks to a single condition, so we don't need to sprinkle even
more calls to of_machine_is_compatible().

Fixes: 2b3d58a3ad (cpufreq: imx6q: Move speed grading check to cpufreq driver)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-12-18 12:06:37 +01:00
Rafael J. Wysocki
5839ee7389 PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
It is incorrect to call pci_restore_state() for devices in low-power
states (D1-D3), as that involves the restoration of MSI setup which
requires MMIO to be operational and that is only the case in D0.

However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
callbacks put the device into a low-power state, so fix it by making
it force devices into D0 via pci_set_power_state() instead of trying
to "update" their power state which is pointless.

Fixes: e60514bd44 (PCI/PM: Restore the status of PCI devices across hibernation)
Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2017-12-18 12:06:07 +01:00
Kailang Yang
9226665159 ALSA: hda/realtek - Fix Dell AIO LineOut issue
Dell AIO had LineOut jack.
Add LineOut verb into this patch.

[ Additional notes:
  the ALC274 codec seems requiring the fixed pin / DAC connections for
  HP / line-out pins for enabling EQ for speakers; i.e. the HP / LO
  pins expect to be connected with NID 0x03 while keeping the speaker
  with NID 0x02.  However, by adding a new line-out pin, the
  auto-parser assigns the NID 0x02 for HP/LO pins as primary outputs.
  As an easy workaround, we provide the preferred_pairs[] to map
  forcibly for these pins. -- tiwai ]

Fixes: 75ee94b20b ("ALSA: hda - fix headset mic problem for Dell machines with alc274")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-12-18 11:09:05 +01:00
Christoffer Dall
0eb7c33cad KVM: arm/arm64: Fix timer enable flow
When enabling the timer on the first run, we fail to ever restore the
state and mark it as loaded.  That means, that in the initial entry to
the VCPU ioctl, unless we exit to userspace for some reason such as a
pending signal, if the guest programs a timer and blocks, we will wait
forever, because we never read back the hardware state (the loaded flag
is not set), and so we think the timer is disabled, and we never
schedule a background soft timer.

The end result?  The VCPU blocks forever, and the only solution is to
kill the thread.

Fixes: 4a2c4da125 ("arm/arm64: KVM: Load the timer state when enabling the timer")
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:24 +01:00
Christoffer Dall
36e5cfd410 KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state
The recent timer rework was assuming that once the timer was disabled,
we should no longer see any interrupts from the timer.  This assumption
turns out to not be true, and instead we have to handle the case when
the timer ISR runs even after the timer has been disabled.

This requires a couple of changes:

First, we should never overwrite the cached guest state of the timer
control register when the ISR runs, because KVM may have disabled its
timers when doing vcpu_put(), even though the guest still had the timer
enabled.

Second, we shouldn't assume that the timer is actually firing just
because we see an interrupt, but we should check the actual state of the
timer in the timer control register to understand if the hardware timer
is really firing or not.

We also add an ISB to vtimer_save_state() to ensure the timer is
actually disabled once we enable interrupts, which should clarify the
intention of the implementation, and reduce the risk of unwanted
interrupts.

Fixes: b103cc3f10 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Reported-by: Jia He <hejianet@gmail.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:24 +01:00
Marc Zyngier
f384dcfe4d KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC
If we don't have a usable GIC, do not try to set the vcpu affinity
as this is guaranteed to fail.

Reported-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:23 +01:00
Marc Zyngier
7839c672e5 KVM: arm/arm64: Fix HYP unmapping going off limits
When we unmap the HYP memory, we try to be clever and unmap one
PGD at a time. If we start with a non-PGD aligned address and try
to unmap a whole PGD, things go horribly wrong in unmap_hyp_range
(addr and end can never match, and it all goes really badly as we
keep incrementing pgd and parse random memory as page tables...).

The obvious fix is to let unmap_hyp_range do what it does best,
which is to iterate over a range.

The size of the linear mapping, which begins at PAGE_OFFSET, can be
easily calculated by subtracting PAGE_OFFSET form high_memory, because
high_memory is defined as the linear map address of the last byte of
DRAM, plus one.

The size of the vmalloc region is given trivially by VMALLOC_END -
VMALLOC_START.

Cc: stable@vger.kernel.org
Reported-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:23 +01:00
Julien Thierry
bfe766cf65 arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu
When VHE is not present, KVM needs to save and restores PMSCR_EL1 when
possible. If SPE is used by the host, value of PMSCR_EL1 cannot be saved
for the guest.
If the host starts using SPE between two save+restore on the same vcpu,
restore will write the value of PMSCR_EL1 read during the first save.

Make sure __debug_save_spe_nvhe clears the value of the saved PMSCR_EL1
when the guest cannot use SPE.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:22 +01:00
Miquel Raynal
d82c368216 mtd: Fix mtd_check_oob_ops()
The mtd_check_oob_ops() helper verifies if the operation defined by the
user is correct.

Fix the check that verifies if the entire requested area exists. This
check is too restrictive and will fail anytime the last data byte of the
very last page is included in an operation.

Fixes: 5cdd929da5 ("mtd: Add sanity checks in mtd_write/read_oob()")
Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2017-12-18 09:16:35 +01:00
Linus Torvalds
1291a0d504 Linux 4.15-rc4 2017-12-17 18:59:59 -08:00
Kees Cook
779f4e1c6c Revert "exec: avoid RLIMIT_STACK races with prlimit()"
This reverts commit 04e35f4495.

SELinux runs with secureexec for all non-"noatsecure" domain transitions,
which means lots of processes end up hitting the stack hard-limit change
that was introduced in order to fix a race with prlimit(). That race fix
will need to be redesigned.

Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Tomáš Trnka <trnka@scm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-17 14:26:25 -08:00
Chunyan Zhang
36b0cb84ee ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
An additional 'ip' will be pushed to the stack, for restoring the
DACR later, if CONFIG_CPU_SW_DOMAIN_PAN defined.

However, the fixup still get the err_ptr by add #8*4 to sp, which
results in the fact that the code area pointed by the LR will be
overwritten, or the kernel will crash if CONFIG_DEBUG_RODATA is enabled.

This patch fixes the stack mismatch.

Fixes: a5e090acbf ("ARM: software-based priviledged-no-access support")
Signed-off-by: Lvqiang Huang <Lvqiang.Huang@spreadtrum.com>
Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-12-17 22:20:39 +00:00
Linus Torvalds
f8940a0f20 Merge branch 'WIP.x86-pti.base-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Page Table Isolation (PTI) v4.14 backporting base tree from Ingo Molnar:
 "This tree contains the v4.14 PTI backport preparatory tree, which
  consists of four merges of upstream trees and 7 cherry-picked commits,
  which the upcoming PTI work depends on"

NOTE! The resulting tree is exactly the same as the original base tree
(ie the diff between this commit and its immediate first parent is
empty).

The only reason for this merge is literally to have a common point for
the actual PTI changes so that the commits can be shared in both the
4.15 and 4.14 trees.

* 'WIP.x86-pti.base-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
  locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
  locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()
  bpf: fix build issues on um due to mising bpf_perf_event.h
  perf/x86: Enable free running PEBS for REGS_USER/INTR
  x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
  x86/cpufeature: Add User-Mode Instruction Prevention definitions
2017-12-17 13:57:08 -08:00
Linus Torvalds
6ba64feff6 Merge branch 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Page Table Isolation (PTI) preparatory tree from Ingo Molnar:
 "This does a rename to free up linux/pti.h to be used by the upcoming
  page table isolation feature"

* 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  drivers/misc/intel/pti: Rename the header file to free up the namespace
2017-12-17 13:54:31 -08:00
Linus Torvalds
2ffb448ccb Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single bugfix which prevents arbitrary sigev_notify values in
  posix-timers"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-timer: Properly check sigevent->sigev_notify
2017-12-17 13:48:50 -08:00
Linus Torvalds
c43727908f dmaengine fixes for v4.15-rc4
Here are fixes for this round
 - Fix for disable clk on error path in fsl-edma driver
 - Disable clk fail fix in jz4740 driver
 - Fix long pending bug in dmatest driver for dangling pointer
 - Fix potential NULL pointer dereference in at_hdmac driver
 - Error handling path in ioat driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaNp2PAAoJEHwUBw8lI4NHNfEP/RcERW0adeh4HXIges3FMW9N
 cqhRQpnUWmYIOTsfrWDH6WyR7pmEBOLryBzZXQ6Y8I3AOW4FTwBOG/W8NqFj+aiA
 JjeKRkXdpHD1eZpav72FUUhf4inb7BTtpGN5lBEFFk+gA3rWLscUw33OJLZmxsb5
 Vi82pJARfst42XB/jZUBDq+YNYlgG+NspYgUWvS5EnNXJJmy+47c5TS9EQUovctM
 X0yrKNomSW2qEO/IEXTUvXtArxpmxBrZI1q9RU21JJvF3izElqYO9C3OhGijM27u
 ERHWM1iStOxOl7zrzp8JJmVswdHyW/rT5uSlhl4689S2UZEJrqBGPo67Nh5w0Isi
 Nk+2X8I7Z/tNByggXASHX/xXymQGJF/mxAs1KypnSE0DpCJQi7oJAG6UY7lbIOz8
 CRGRt0gM8Xe/GrZa+7ZXmc7BoCrNPaGmW070XX3BoMcqT9m+SFOD2DhT351AVoOI
 uIPw5gAw0OCYhzkPP1c6AZPdLLFlVGrmPARRto/Af7w4Fnw4UlWGQ/+d6oQlvEyG
 G0kPxKE3e6uHXOizgm3GkbqpQWjTz649JMNw+xVey9e/pSVXORYIIomPmwahbh8x
 7Rfojaccq3w0YYG58z/Q5+WmzWf0jU//G/Cq5LsOm/5ghZTYRna7olA6lxFcn+ab
 6WQBW1rV2y8XUpDl4ScB
 =2egq
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-4.15-rc4' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "This time consisting of fixes in a bunch of drivers and the dmatest
  module:

   - Fix for disable clk on error path in fsl-edma driver
   - Disable clk fail fix in jz4740 driver
   - Fix long pending bug in dmatest driver for dangling pointer
   - Fix potential NULL pointer dereference in at_hdmac driver
   - Error handling path in ioat driver"

* tag 'dmaengine-fix-4.15-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: fsl-edma: disable clks on all error paths
  dmaengine: jz4740: disable/unprepare clk if probe fails
  dmaengine: dmatest: move callback wait queue to thread context
  dmaengine: at_hdmac: fix potential NULL pointer dereference in atc_prep_dma_interleaved
  dmaengine: ioat: Fix error handling path
2017-12-17 13:28:49 -08:00
Arnd Bergmann
b9f5fb1800 cramfs: fix MTD dependency
With CONFIG_MTD=m and CONFIG_CRAMFS=y, we now get a link failure:

  fs/cramfs/inode.o: In function `cramfs_mount': inode.c:(.text+0x220): undefined reference to `mount_mtd'
  fs/cramfs/inode.o: In function `cramfs_mtd_fill_super':
  inode.c:(.text+0x6d8): undefined reference to `mtd_point'
  inode.c:(.text+0xae4): undefined reference to `mtd_unpoint'

This adds a more specific Kconfig dependency to avoid the broken
configuration.

Alternatively we could make CRAMFS itself depend on "MTD || !MTD" with a
similar result.

Fixes: 99c18ce580 ("cramfs: direct memory access support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-17 12:20:58 -08:00
Linus Torvalds
73d080d374 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "The alloc_super() one is a regression in this merge window, lazytime
  thing is older..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Handle lazytime in do_mount()
  alloc_super(): do ->s_umount initialization earlier
2017-12-17 12:18:35 -08:00
Linus Torvalds
1c6b942d7d Fix a regression which caused us to fail to interpret symlinks in very
ancient ext3 file system images.  Also fix two xfstests failures, one
 of which could cause a OOPS, plus an additional bug fix caught by fuzz
 testing.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlo1y3EACgkQ8vlZVpUN
 gaNFOQf/bMf6ynai1dGGRwef+UcT874NZ2Hqm+UqI6pxusz0ZeKWm8HWfPfg31Fa
 o+OnUsZ7NXFBIHyfXKFJzdOgutjZ5eY0vMu+NrlyBdd6W+ZcHwn1PvQsLapFYvqK
 Rt+8nWTKqtnksSfh0vyODmUYgItOULOPPepjnIPm/Pd0DinJwo0GY/8MzLkz4SpX
 g6R60ou0ToEYNqBXAKIBnZ4aq8KWMtCMGcD270U5eAm/63Pt4riRwJbjITxZPAH1
 wKzivP4Ce5ce8W2g2/6mFFlBFWvtlB491T+BsgHUEv3OLze+kYS2PcxQthhEmBR8
 zeZ2o2/0tTxejE//cyJ4gCe3fYGRDg==
 =xqLC
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a regression which caused us to fail to interpret symlinks in very
  ancient ext3 file system images.

  Also fix two xfstests failures, one of which could cause an OOPS, plus
  an additional bug fix caught by fuzz testing"

* tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix crash when a directory's i_size is too small
  ext4: add missing error check in __ext4_new_inode()
  ext4: fix fdatasync(2) after fallocate(2) operation
  ext4: support fast symlinks from ext3 file systems
2017-12-17 12:14:33 -08:00
John David Anglin
da57c5414f parisc: Reduce thread stack to 16 kb
In testing, I found that the thread stack can be 16 kB when using an irq
stack.  Without it, the thread stack needs to be 32 kB. Currently, the irq
stack is 32 kB. While it probably could be 16 kB, I would prefer to leave it
as is for safety.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2017-12-17 21:06:25 +01:00
John David Anglin
9352aeada4 Revert "parisc: Re-enable interrupts early"
This reverts commit 5c38602d83.

Interrupts can't be enabled early because the register saves are done on
the thread stack prior to switching to the IRQ stack.  This caused stack
overflows and the thread stack needed increasing to 32k.  Even then,
stack overflows still occasionally occurred.

Background:
Even with a 32 kB thread stack, I have seen instances where the thread
stack overflowed on the mx3210 buildd.  Detection of stack overflow only
occurs when we have an external interrupt.  When an external interrupt
occurs, we switch to the thread stack if we are not already on a kernel
stack.  Then, registers and specials are saved to the kernel stack.

The bug occurs in intr_return where interrupts are reenabled prior to
returning from the interrupt.  This was done incase we need to schedule
or deliver signals.  However, it introduces the possibility that
multiple external interrupts may occur on the thread stack and cause a
stack overflow.  These might not be detected and cause the kernel to
misbehave in random ways.

This patch changes the code back to only reenable interrupts when we are
going to schedule or deliver signals.  As a result, we generally return
from an interrupt before reenabling interrupts.  This minimizes the
growth of the thread stack.

Fixes: 5c38602d83 ("parisc: Re-enable interrupts early")
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Helge Deller <deller@gmx.de>
2017-12-17 21:06:25 +01:00
Pravin Shedge
6a16fc3220 parisc: remove duplicate includes
These duplicate includes have been found with scripts/checkincludes.pl
but they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2017-12-17 21:06:25 +01:00
Helge Deller
bcf3f1752a parisc: Hide Diva-built-in serial aux and graphics card
Diva GSP card has built-in serial AUX port and ATI graphic card which simply
don't work and which both don't have external connectors.  User Guides even
mention that those devices shouldn't be used.
So, prevent that Linux drivers try to enable those devices.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v3.0+
2017-12-17 21:06:25 +01:00
Helge Deller
0ed9d3de5f parisc: Align os_hpmc_size on word boundary
The os_hpmc_size variable sometimes wasn't aligned at word boundary and thus
triggered the unaligned fault handler at startup.
Fix it by aligning it properly.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.14+
2017-12-17 21:06:25 +01:00
Helge Deller
203c110b39 parisc: Fix indenting in puts()
Static analysis tools complain that we intended to have curly braces
around this indent block. In this case this assumption is wrong, so fix
the indenting.

Fixes: 2f3c7b8137 ("parisc: Add core code for self-extracting kernel")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.14+
2017-12-17 21:06:25 +01:00
Josef Bacik
46df3d209d trace: reenable preemption if we modify the ip
Things got moved around between the original bpf_override_return patches
and the final version, and now the ftrace kprobe dispatcher assumes if
you modified the ip that you also enabled preemption.  Make a comment of
this and enable preemption, this fixes the lockdep splat that happened
when using this feature.

Fixes: 9802d86585 ("bpf: add a bpf_override_function helper")
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:47:32 +01:00
Jakub Kicinski
4a29c0db69 nfp: set flags in the correct member of netdev_bpf
netdev_bpf.flags is the input member for installing the program.
netdev_bpf.prog_flags is the output member for querying.  Set
the correct one on query.

Fixes: 92f0292b35 ("net: xdp: report flags program was installed with on query")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:41:59 +01:00
Jakub Kicinski
21567eded9 libbpf: fix Makefile exit code if libelf not found
/bin/sh's exit does not recognize -1 as a number, leading to
the following error message:

/bin/sh: 1: exit: Illegal number: -1

Use 1 as the exit code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:40:29 +01:00
Daniel Borkmann
ef9fde06a2 Merge branch 'bpf-to-bpf-function-calls'
Alexei Starovoitov says:

====================
First of all huge thank you to Daniel, John, Jakub, Edward and others who
reviewed multiple iterations of this patch set over the last many months
and to Dave and others who gave critical feedback during netconf/netdev.

The patch is solid enough and we thought through numerous corner cases,
but it's not the end. More followups with code reorg and features to follow.

TLDR: Allow arbitrary function calls from bpf function to another bpf function.

Since the beginning of bpf all bpf programs were represented as a single function
and program authors were forced to use always_inline for all functions
in their C code. That was causing llvm to unnecessary inflate the code size
and forcing developers to move code to header files with little code reuse.

With a bit of additional complexity teach verifier to recognize
arbitrary function calls from one bpf function to another as long as
all of functions are presented to the verifier as a single bpf program.
Extended program layout:
..
r1 = ..    // arg1
r2 = ..    // arg2
call pc+1  // function call pc-relative
exit
.. = r1    // access arg1
.. = r2    // access arg2
..
call pc+20 // second level of function call
...

It allows for better optimized code and finally allows to introduce
the core bpf libraries that can be reused in different projects,
since programs are no longer limited by single elf file.
With function calls bpf can be compiled into multiple .o files.

This patch is the first step. It detects programs that contain
multiple functions and checks that calls between them are valid.
It splits the sequence of bpf instructions (one program) into a set
of bpf functions that call each other. Calls to only known
functions are allowed. Since all functions are presented to
the verifier at once conceptually it is 'static linking'.

Future plans:
- introduce BPF_PROG_TYPE_LIBRARY and allow a set of bpf functions
  to be loaded into the kernel that can be later linked to other
  programs with concrete program types. Aka 'dynamic linking'.

- introduce function pointer type and indirect calls to allow
  bpf functions call other dynamically loaded bpf functions while
  the caller bpf function is already executing. Aka 'runtime linking'.
  This will be more generic and more flexible alternative
  to bpf_tail_calls.

FAQ:
Q: Interpreter and JIT changes mean that new instruction is introduced ?
A: No. The call instruction technically stays the same. Now it can call
   both kernel helpers and other bpf functions.
   Calling convention stays the same as well.
   From uapi point of view the call insn got new 'relocation' BPF_PSEUDO_CALL
   similar to BPF_PSEUDO_MAP_FD 'relocation' of bpf_ldimm64 insn.

Q: What had to change on LLVM side?
A: Trivial LLVM patch to allow calls was applied to upcoming 6.0 release:
   https://reviews.llvm.org/rL318614
   with few bugfixes as well.
   Make sure to build the latest llvm to have bpf_call support.

More details in the patches.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:37 +01:00
Daniel Borkmann
28ab173e96 selftests/bpf: additional bpf_call tests
Add some additional checks for few more corner cases.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:36 +01:00
Alexei Starovoitov
db496944fd bpf: arm64: add JIT support for multi-function programs
similar to x64 add support for bpf-to-bpf calls.
When program has calls to in-kernel helpers the target call offset
is known at JIT time and arm64 architecture needs 2 passes.
With bpf-to-bpf calls the dynamically allocated function start
is unknown until all functions of the program are JITed.
Therefore (just like x64) arm64 JIT needs one extra pass over
the program to emit correct call offsets.

Implementation detail:
Avoid being too clever in 64-bit immediate moves and
always use 4 instructions (instead of 3-4 depending on the address)
to make sure only one extra pass is needed.
If some future optimization would make it worth while to optimize
'call 64-bit imm' further, the JIT would need to do 4 passes
over the program instead of 3 as in this patch.
For typical bpf program address the mov needs 3 or 4 insns,
so unconditional 4 insns to save extra pass is a worthy trade off
at this state of JIT.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:36 +01:00
Alexei Starovoitov
1c2a088a66 bpf: x64: add JIT support for multi-function programs
Typical JIT does several passes over bpf instructions to
compute total size and relative offsets of jumps and calls.
With multitple bpf functions calling each other all relative calls
will have invalid offsets intially therefore we need to additional
last pass over the program to emit calls with correct offsets.
For example in case of three bpf functions:
main:
  call foo
  call bpf_map_lookup
  exit
foo:
  call bar
  exit
bar:
  exit

We will call bpf_int_jit_compile() indepedently for main(), foo() and bar()
x64 JIT typically does 4-5 passes to converge.
After these initial passes the image for these 3 functions
will be good except call targets, since start addresses of
foo() and bar() are unknown when we were JITing main()
(note that call bpf_map_lookup will be resolved properly
during initial passes).
Once start addresses of 3 functions are known we patch
call_insn->imm to point to right functions and call
bpf_int_jit_compile() again which needs only one pass.
Additional safety checks are done to make sure this
last pass doesn't produce image that is larger or smaller
than previous pass.

When constant blinding is on it's applied to all functions
at the first pass, since doing it once again at the last
pass can change size of the JITed code.

Tested on x64 and arm64 hw with JIT on/off, blinding on/off.
x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter.
All other JITs that support normal BPF_CALL will behave the same way
since bpf-to-bpf call is equivalent to bpf-to-kernel call from
JITs point of view.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:36 +01:00
Alexei Starovoitov
60b58afc96 bpf: fix net.core.bpf_jit_enable race
global bpf_jit_enable variable is tested multiple times in JITs,
blinding and verifier core. The malicious root can try to toggle
it while loading the programs. This race condition was accounted
for and there should be no issues, but it's safer to avoid
this race condition.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:36 +01:00
Alexei Starovoitov
1ea47e01ad bpf: add support for bpf_call to interpreter
though bpf_call is still the same call instruction and
calling convention 'bpf to bpf' and 'bpf to helper' is the same
the interpreter has to oparate on 'struct bpf_insn *'.
To distinguish these two cases add a kernel internal opcode and
mark call insns with it.
This opcode is seen by interpreter only. JITs will never see it.
Also add tiny bit of debug code to aid interpreter debugging.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:36 +01:00
Alexei Starovoitov
b0b04fc49e selftests/bpf: add xdp noinline test
add large semi-artificial XDP test with 18 functions to stress test
bpf call verification logic

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:36 +01:00
Alexei Starovoitov
3bc35c63cb selftests/bpf: add bpf_call test
strip always_inline from test_l4lb.c and compile it with -fno-inline
to let verifier go through 11 function with various function arguments
and return values

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:36 +01:00
Alexei Starovoitov
48cca7e44f libbpf: add support for bpf_call
- recognize relocation emitted by llvm
- since all regular function will be kept in .text section and llvm
  takes care of pc-relative offsets in bpf_call instruction
  simply copy all of .text to relevant program section while adjusting
  bpf_call instructions in program section to point to newly copied
  body of instructions from .text
- do so for all programs in the elf file
- set all programs types to the one passed to bpf_prog_load()

Note for elf files with multiple programs that use different
functions in .text section we need to do 'linker' style logic.
This work is still TBD

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:35 +01:00
Alexei Starovoitov
d98588cef0 selftests/bpf: add tests for stack_zero tracking
adjust two tests, since verifier got smarter
and add new one to test stack_zero logic

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:35 +01:00
Alexei Starovoitov
cc2b14d510 bpf: teach verifier to recognize zero initialized stack
programs with function calls are often passing various
pointers via stack. When all calls are inlined llvm
flattens stack accesses and optimizes away extra branches.
When functions are not inlined it becomes the job of
the verifier to recognize zero initialized stack to avoid
exploring paths that program will not take.
The following program would fail otherwise:

ptr = &buffer_on_stack;
*ptr = 0;
...
func_call(.., ptr, ...) {
  if (..)
    *ptr = bpf_map_lookup();
}
...
if (*ptr != 0) {
  // Access (*ptr)->field is valid.
  // Without stack_zero tracking such (*ptr)->field access
  // will be rejected
}

since stack slots are no longer uniform invalid | spill | misc
add liveness marking to all slots, but do it in 8 byte chunks.
So if nothing was read or written in [fp-16, fp-9] range
it will be marked as LIVE_NONE.
If any byte in that range was read, it will be marked LIVE_READ
and stacksafe() check will perform byte-by-byte verification.
If all bytes in the range were written the slot will be
marked as LIVE_WRITTEN.
This significantly speeds up state equality comparison
and reduces total number of states processed.

                    before   after
bpf_lb-DLB_L3.o       2051    2003
bpf_lb-DLB_L4.o       3287    3164
bpf_lb-DUNKNOWN.o     1080    1080
bpf_lxc-DDROP_ALL.o   24980   12361
bpf_lxc-DUNKNOWN.o    34308   16605
bpf_netdev.o          15404   10962
bpf_overlay.o         7191    6679

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:35 +01:00
Alexei Starovoitov
a7ff3eca95 selftests/bpf: add verifier tests for bpf_call
Add extensive set of tests for bpf_call verification logic:

calls: basic sanity
calls: using r0 returned by callee
calls: callee is using r1
calls: callee using args1
calls: callee using wrong args2
calls: callee using two args
calls: callee changing pkt pointers
calls: two calls with args
calls: two calls with bad jump
calls: recursive call. test1
calls: recursive call. test2
calls: unreachable code
calls: invalid call
calls: jumping across function bodies. test1
calls: jumping across function bodies. test2
calls: call without exit
calls: call into middle of ld_imm64
calls: call into middle of other call
calls: two calls with bad fallthrough
calls: two calls with stack read
calls: two calls with stack write
calls: spill into caller stack frame
calls: two calls with stack write and void return
calls: ambiguous return value
calls: two calls that return map_value
calls: two calls that return map_value with bool condition
calls: two calls that return map_value with incorrect bool check
calls: two calls that receive map_value via arg=ptr_stack_of_caller. test1
calls: two calls that receive map_value via arg=ptr_stack_of_caller. test2
calls: two jumps that receive map_value via arg=ptr_stack_of_jumper. test3
calls: two calls that receive map_value_ptr_or_null via arg. test1
calls: two calls that receive map_value_ptr_or_null via arg. test2
calls: pkt_ptr spill into caller stack

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:35 +01:00
Alexei Starovoitov
f4d7e40a5b bpf: introduce function calls (verification)
Allow arbitrary function calls from bpf function to another bpf function.

To recognize such set of bpf functions the verifier does:
1. runs control flow analysis to detect function boundaries
2. proceeds with verification of all functions starting from main(root) function
It recognizes that the stack of the caller can be accessed by the callee
(if the caller passed a pointer to its stack to the callee) and the callee
can store map_value and other pointers into the stack of the caller.
3. keeps track of the stack_depth of each function to make sure that total
stack depth is still less than 512 bytes
4. disallows pointers to the callee stack to be stored into the caller stack,
since they will be invalid as soon as the callee returns
5. to reuse all of the existing state_pruning logic each function call
is considered to be independent call from the verifier point of view.
The verifier pretends to inline all function calls it sees are being called.
It stores the callsite instruction index as part of the state to make sure
that two calls to the same callee from two different places in the caller
will be different from state pruning point of view
6. more safety checks are added to liveness analysis

Implementation details:
. struct bpf_verifier_state is now consists of all stack frames that
  led to this function
. struct bpf_func_state represent one stack frame. It consists of
  registers in the given frame and its stack
. propagate_liveness() logic had a premature optimization where
  mark_reg_read() and mark_stack_slot_read() were manually inlined
  with loop iterating over parents for each register or stack slot.
  Undo this optimization to reuse more complex mark_*_read() logic
. skip_callee() logic is not necessary from safety point of view,
  but without it mark_*_read() markings become too conservative,
  since after returning from the funciton call a read of r6-r9
  will incorrectly propagate the read marks into callee causing
  inefficient pruning later
. mark_*_read() logic is now aware of control flow which makes it
  more complex. In the future the plan is to rewrite liveness
  to be hierarchical. So that liveness can be done within
  basic block only and control flow will be responsible for
  propagation of liveness information along cfg and between calls.
. tail_calls and ld_abs insns are not allowed in the programs with
  bpf-to-bpf calls
. returning stack pointers to the caller or storing them into stack
  frame of the caller is not allowed

Testing:
. no difference in cilium processed_insn numbers
. large number of tests follows in next patches

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:35 +01:00