mirror of
https://github.com/Fishwaldo/Star64_linux.git
synced 2025-06-21 14:11:20 +00:00
Merge branch 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull seccomp updates from James Morris: - Add SECCOMP_RET_USER_NOTIF - seccomp fixes for sparse warnings and s390 build (Tycho) * 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: seccomp, s390: fix build for syscall type change seccomp: fix poor type promotion samples: add an example of seccomp user trap seccomp: add a return code to trap to userspace seccomp: switch system call argument type to void * seccomp: hoist struct seccomp_data recalculation higher
This commit is contained in:
commit
d9a7fa67b4
11 changed files with 1411 additions and 24 deletions
|
@ -122,6 +122,11 @@ In precedence order, they are:
|
|||
Results in the lower 16-bits of the return value being passed
|
||||
to userland as the errno without executing the system call.
|
||||
|
||||
``SECCOMP_RET_USER_NOTIF``:
|
||||
Results in a ``struct seccomp_notif`` message sent on the userspace
|
||||
notification fd, if it is attached, or ``-ENOSYS`` if it is not. See below
|
||||
on discussion of how to handle user notifications.
|
||||
|
||||
``SECCOMP_RET_TRACE``:
|
||||
When returned, this value will cause the kernel to attempt to
|
||||
notify a ``ptrace()``-based tracer prior to executing the system
|
||||
|
@ -183,6 +188,85 @@ The ``samples/seccomp/`` directory contains both an x86-specific example
|
|||
and a more generic example of a higher level macro interface for BPF
|
||||
program generation.
|
||||
|
||||
Userspace Notification
|
||||
======================
|
||||
|
||||
The ``SECCOMP_RET_USER_NOTIF`` return code lets seccomp filters pass a
|
||||
particular syscall to userspace to be handled. This may be useful for
|
||||
applications like container managers, which wish to intercept particular
|
||||
syscalls (``mount()``, ``finit_module()``, etc.) and change their behavior.
|
||||
|
||||
To acquire a notification FD, use the ``SECCOMP_FILTER_FLAG_NEW_LISTENER``
|
||||
argument to the ``seccomp()`` syscall:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
fd = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog);
|
||||
|
||||
which (on success) will return a listener fd for the filter, which can then be
|
||||
passed around via ``SCM_RIGHTS`` or similar. Note that filter fds correspond to
|
||||
a particular filter, and not a particular task. So if this task then forks,
|
||||
notifications from both tasks will appear on the same filter fd. Reads and
|
||||
writes to/from a filter fd are also synchronized, so a filter fd can safely
|
||||
have many readers.
|
||||
|
||||
The interface for a seccomp notification fd consists of two structures:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct seccomp_notif_sizes {
|
||||
__u16 seccomp_notif;
|
||||
__u16 seccomp_notif_resp;
|
||||
__u16 seccomp_data;
|
||||
};
|
||||
|
||||
struct seccomp_notif {
|
||||
__u64 id;
|
||||
__u32 pid;
|
||||
__u32 flags;
|
||||
struct seccomp_data data;
|
||||
};
|
||||
|
||||
struct seccomp_notif_resp {
|
||||
__u64 id;
|
||||
__s64 val;
|
||||
__s32 error;
|
||||
__u32 flags;
|
||||
};
|
||||
|
||||
The ``struct seccomp_notif_sizes`` structure can be used to determine the size
|
||||
of the various structures used in seccomp notifications. The size of ``struct
|
||||
seccomp_data`` may change in the future, so code should use:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct seccomp_notif_sizes sizes;
|
||||
seccomp(SECCOMP_GET_NOTIF_SIZES, 0, &sizes);
|
||||
|
||||
to determine the size of the various structures to allocate. See
|
||||
samples/seccomp/user-trap.c for an example.
|
||||
|
||||
Users can read via ``ioctl(SECCOMP_IOCTL_NOTIF_RECV)`` (or ``poll()``) on a
|
||||
seccomp notification fd to receive a ``struct seccomp_notif``, which contains
|
||||
five members: the input length of the structure, a unique-per-filter ``id``,
|
||||
the ``pid`` of the task which triggered this request (which may be 0 if the
|
||||
task is in a pid ns not visible from the listener's pid namespace), a ``flags``
|
||||
member which for now only has ``SECCOMP_NOTIF_FLAG_SIGNALED``, representing
|
||||
whether or not the notification is a result of a non-fatal signal, and the
|
||||
``data`` passed to seccomp. Userspace can then make a decision based on this
|
||||
information about what to do, and ``ioctl(SECCOMP_IOCTL_NOTIF_SEND)`` a
|
||||
response, indicating what should be returned to userspace. The ``id`` member of
|
||||
``struct seccomp_notif_resp`` should be the same ``id`` as in ``struct
|
||||
seccomp_notif``.
|
||||
|
||||
It is worth noting that ``struct seccomp_data`` contains the values of register
|
||||
arguments to the syscall, but does not contain pointers to memory. The task's
|
||||
memory is accessible to suitably privileged traces via ``ptrace()`` or
|
||||
``/proc/pid/mem``. However, care should be taken to avoid the TOCTOU mentioned
|
||||
above in this document: all arguments being read from the tracee's memory
|
||||
should be read into the tracer's memory before any policy decisions are made.
|
||||
This allows for an atomic decision on syscall arguments.
|
||||
|
||||
Sysctls
|
||||
=======
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue