| .\" Man page generated from reStructuredText. |
| .\" Copyright (C) All BPF authors and contributors from 2014 to present. |
| .\" See git log include/uapi/linux/bpf.h in kernel tree for details. |
| .\" |
| .\" %%%LICENSE_START(VERBATIM) |
| .\" Permission is granted to make and distribute verbatim copies of this |
| .\" manual provided the copyright notice and this permission notice are |
| .\" preserved on all copies. |
| .\" |
| .\" Permission is granted to copy and distribute modified versions of this |
| .\" manual under the conditions for verbatim copying, provided that the |
| .\" entire resulting derived work is distributed under the terms of a |
| .\" permission notice identical to this one. |
| .\" |
| .\" Since the Linux kernel and libraries are constantly changing, this |
| .\" manual page may be incorrect or out-of-date. The author(s) assume no |
| .\" responsibility for errors or omissions, or for damages resulting from |
| .\" the use of the information contained herein. The author(s) may not |
| .\" have taken the same level of care in the production of this manual, |
| .\" which is licensed free of charge, as they might when working |
| .\" professionally. |
| .\" |
| .\" Formatted or processed versions of this manual, if unaccompanied by |
| .\" the source, must acknowledge the copyright and authors of this work. |
| .\" %%%LICENSE_END |
| .\" |
| .\" Please do not edit this file. It was generated from the documentation |
| .\" located in file include/uapi/linux/bpf.h of the Linux kernel sources |
| .\" (helpers description), and from scripts/bpf_helpers_doc.py in the same |
| .\" repository (header and footer). |
| .TH BPF-HELPERS 7 2019-03-06 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| BPF-HELPERS \- list of eBPF helper functions |
| .nr rst2man-indent-level 0 |
| .de1 rstReportMargin |
| \\$1 \\n[an-margin] |
| level \\n[rst2man-indent-level] |
| level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] |
| \\n[rst2man-indent0] |
| \\n[rst2man-indent1] |
| \\n[rst2man-indent2] |
| .. |
| .de1 INDENT |
| .\" .rstReportMargin pre: |
| . RS \\$1 |
| . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] |
| . nr rst2man-indent-level +1 |
| .\" .rstReportMargin post: |
| .. |
| .de UNINDENT |
| . RE |
| .\" indent \\n[an-margin] |
| .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] |
| .nr rst2man-indent-level -1 |
| .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] |
| .in \\n[rst2man-indent\\n[rst2man-indent-level]]u |
| .. |
| .SH DESCRIPTION |
| .sp |
| The extended Berkeley Packet Filter (eBPF) subsystem consists in programs |
| written in a pseudo\-assembly language, then attached to one of the several |
| kernel hooks and run in reaction of specific events. This framework differs |
| from the older, "classic" BPF (or "cBPF") in several aspects, one of them being |
| the ability to call special functions (or "helpers") from within a program. |
| These functions are restricted to a white\-list of helpers defined in the |
| kernel. |
| .sp |
| These helpers are used by eBPF programs to interact with the system, or with |
| the context in which they work. For instance, they can be used to print |
| debugging messages, to get the time since the system was booted, to interact |
| with eBPF maps, or to manipulate network packets. Since there are several eBPF |
| program types, and that they do not run in the same context, each program type |
| can only call a subset of those helpers. |
| .sp |
| Due to eBPF conventions, a helper can not have more than five arguments. |
| .sp |
| Internally, eBPF programs call directly into the compiled helper functions |
| without requiring any foreign\-function interface. As a result, calling helpers |
| introduces no overhead, thus offering excellent performance. |
| .sp |
| This document is an attempt to list and document the helpers available to eBPF |
| developers. They are sorted by chronological order (the oldest helpers in the |
| kernel at the top). |
| .SH HELPERS |
| .INDENT 0.0 |
| .TP |
| .B \fBvoid *bpf_map_lookup_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Perform a lookup in \fImap\fP for an entry associated to \fIkey\fP\&. |
| .TP |
| .B Return |
| Map value associated to \fIkey\fP, or \fBNULL\fP if no entry was |
| found. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_map_update_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Add or update the value of the entry associated to \fIkey\fP in |
| \fImap\fP with \fIvalue\fP\&. \fIflags\fP is one of: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_NOEXIST\fP |
| The entry for \fIkey\fP must not exist in the map. |
| .TP |
| .B \fBBPF_EXIST\fP |
| The entry for \fIkey\fP must already exist in the map. |
| .TP |
| .B \fBBPF_ANY\fP |
| No condition on the existence of the entry for \fIkey\fP\&. |
| .UNINDENT |
| .sp |
| Flag value \fBBPF_NOEXIST\fP cannot be used for maps of types |
| \fBBPF_MAP_TYPE_ARRAY\fP or \fBBPF_MAP_TYPE_PERCPU_ARRAY\fP (all |
| elements always exist), the helper would return an error. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_map_delete_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Delete entry with \fIkey\fP from \fImap\fP\&. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_map_push_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Push an element \fIvalue\fP in \fImap\fP\&. \fIflags\fP is one of: |
| .sp |
| \fBBPF_EXIST\fP |
| If the queue/stack is full, the oldest element is removed to |
| make room for this. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_probe_read(void *\fP\fIdst\fP\fB, u32\fP \fIsize\fP\fB, const void *\fP\fIsrc\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| For tracing programs, safely attempt to read \fIsize\fP bytes from |
| address \fIsrc\fP and store the data in \fIdst\fP\&. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_ktime_get_ns(void)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Return the time elapsed since system boot, in nanoseconds. |
| .TP |
| .B Return |
| Current \fIktime\fP\&. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_trace_printk(const char *\fP\fIfmt\fP\fB, u32\fP \fIfmt_size\fP\fB, ...)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper is a "printk()\-like" facility for debugging. It |
| prints a message defined by format \fIfmt\fP (of size \fIfmt_size\fP) |
| to file \fI/sys/kernel/debug/tracing/trace\fP from DebugFS, if |
| available. It can take up to three additional \fBu64\fP |
| arguments (as an eBPF helpers, the total number of arguments is |
| limited to five). |
| .sp |
| Each time the helper is called, it appends a line to the trace. |
| The format of the trace is customizable, and the exact output |
| one will get depends on the options set in |
| \fI/sys/kernel/debug/tracing/trace_options\fP (see also the |
| \fIREADME\fP file under the same directory). However, it usually |
| defaults to something like: |
| .INDENT 7.0 |
| .INDENT 3.5 |
| .sp |
| .nf |
| .ft C |
| telnet\-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg> |
| .ft P |
| .fi |
| .UNINDENT |
| .UNINDENT |
| .sp |
| In the above: |
| .INDENT 7.0 |
| .INDENT 3.5 |
| .INDENT 0.0 |
| .IP \(bu 2 |
| \fBtelnet\fP is the name of the current task. |
| .IP \(bu 2 |
| \fB470\fP is the PID of the current task. |
| .IP \(bu 2 |
| \fB001\fP is the CPU number on which the task is |
| running. |
| .IP \(bu 2 |
| In \fB\&.N..\fP, each character refers to a set of |
| options (whether irqs are enabled, scheduling |
| options, whether hard/softirqs are running, level of |
| preempt_disabled respectively). \fBN\fP means that |
| \fBTIF_NEED_RESCHED\fP and \fBPREEMPT_NEED_RESCHED\fP |
| are set. |
| .IP \(bu 2 |
| \fB419421.045894\fP is a timestamp. |
| .IP \(bu 2 |
| \fB0x00000001\fP is a fake value used by BPF for the |
| instruction pointer register. |
| .IP \(bu 2 |
| \fB<formatted msg>\fP is the message formatted with |
| \fIfmt\fP\&. |
| .UNINDENT |
| .UNINDENT |
| .UNINDENT |
| .sp |
| The conversion specifiers supported by \fIfmt\fP are similar, but |
| more limited than for printk(). They are \fB%d\fP, \fB%i\fP, |
| \fB%u\fP, \fB%x\fP, \fB%ld\fP, \fB%li\fP, \fB%lu\fP, \fB%lx\fP, \fB%lld\fP, |
| \fB%lli\fP, \fB%llu\fP, \fB%llx\fP, \fB%p\fP, \fB%s\fP\&. No modifier (size |
| of field, padding with zeroes, etc.) is available, and the |
| helper will return \fB\-EINVAL\fP (but print nothing) if it |
| encounters an unknown specifier. |
| .sp |
| Also, note that \fBbpf_trace_printk\fP() is slow, and should |
| only be used for debugging purposes. For this reason, a notice |
| bloc (spanning several lines) is printed to kernel logs and |
| states that the helper should not be used "for production use" |
| the first time this helper is used (or more precisely, when |
| \fBtrace_printk\fP() buffers are allocated). For passing values |
| to user space, perf events should be preferred. |
| .TP |
| .B Return |
| The number of bytes written to the buffer, or a negative error |
| in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBu32 bpf_get_prandom_u32(void)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Get a pseudo\-random number. |
| .sp |
| From a security point of view, this helper uses its own |
| pseudo\-random internal state, and cannot be used to infer the |
| seed of other random functions in the kernel. However, it is |
| essential to note that the generator used by the helper is not |
| cryptographically secure. |
| .TP |
| .B Return |
| A random 32\-bit unsigned value. |
| .UNINDENT |
| .TP |
| .B \fBu32 bpf_get_smp_processor_id(void)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Get the SMP (symmetric multiprocessing) processor id. Note that |
| all programs run with preemption disabled, which means that the |
| SMP processor id is stable during all the execution of the |
| program. |
| .TP |
| .B Return |
| The SMP id of the processor running the program. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Store \fIlen\fP bytes from address \fIfrom\fP into the packet |
| associated to \fIskb\fP, at \fIoffset\fP\&. \fIflags\fP are a combination of |
| \fBBPF_F_RECOMPUTE_CSUM\fP (automatically recompute the |
| checksum for the packet after storing the bytes) and |
| \fBBPF_F_INVALIDATE_HASH\fP (set \fIskb\fP\fB\->hash\fP, \fIskb\fP\fB\->swhash\fP and \fIskb\fP\fB\->l4hash\fP to 0). |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_l3_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIsize\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Recompute the layer 3 (e.g. IP) checksum for the packet |
| associated to \fIskb\fP\&. Computation is incremental, so the helper |
| must know the former value of the header field that was |
| modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the |
| number of bytes (2 or 4) for this field, stored in \fIsize\fP\&. |
| Alternatively, it is possible to store the difference between |
| the previous and the new values of the header field in \fIto\fP, by |
| setting \fIfrom\fP and \fIsize\fP to 0. For both methods, \fIoffset\fP |
| indicates the location of the IP checksum within the packet. |
| .sp |
| This helper works in combination with \fBbpf_csum_diff\fP(), |
| which does not update the checksum in\-place, but offers more |
| flexibility and can handle sizes larger than 2 or 4 for the |
| checksum to update. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_l4_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the |
| packet associated to \fIskb\fP\&. Computation is incremental, so the |
| helper must know the former value of the header field that was |
| modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the |
| number of bytes (2 or 4) for this field, stored on the lowest |
| four bits of \fIflags\fP\&. Alternatively, it is possible to store |
| the difference between the previous and the new values of the |
| header field in \fIto\fP, by setting \fIfrom\fP and the four lowest |
| bits of \fIflags\fP to 0. For both methods, \fIoffset\fP indicates the |
| location of the IP checksum within the packet. In addition to |
| the size of the field, \fIflags\fP can be added (bitwise OR) actual |
| flags. With \fBBPF_F_MARK_MANGLED_0\fP, a null checksum is left |
| untouched (unless \fBBPF_F_MARK_ENFORCE\fP is added as well), and |
| for updates resulting in a null checksum the value is set to |
| \fBCSUM_MANGLED_0\fP instead. Flag \fBBPF_F_PSEUDO_HDR\fP indicates |
| the checksum is to be computed against a pseudo\-header. |
| .sp |
| This helper works in combination with \fBbpf_csum_diff\fP(), |
| which does not update the checksum in\-place, but offers more |
| flexibility and can handle sizes larger than 2 or 4 for the |
| checksum to update. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_tail_call(void *\fP\fIctx\fP\fB, struct bpf_map *\fP\fIprog_array_map\fP\fB, u32\fP \fIindex\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This special helper is used to trigger a "tail call", or in |
| other words, to jump into another eBPF program. The same stack |
| frame is used (but values on stack and in registers for the |
| caller are not accessible to the callee). This mechanism allows |
| for program chaining, either for raising the maximum number of |
| available eBPF instructions, or to execute given programs in |
| conditional blocks. For security reasons, there is an upper |
| limit to the number of successive tail calls that can be |
| performed. |
| .sp |
| Upon call of this helper, the program attempts to jump into a |
| program referenced at index \fIindex\fP in \fIprog_array_map\fP, a |
| special map of type \fBBPF_MAP_TYPE_PROG_ARRAY\fP, and passes |
| \fIctx\fP, a pointer to the context. |
| .sp |
| If the call succeeds, the kernel immediately runs the first |
| instruction of the new program. This is not a function call, |
| and it never returns to the previous program. If the call |
| fails, then the helper has no effect, and the caller continues |
| to run its subsequent instructions. A call can fail if the |
| destination program for the jump does not exist (i.e. \fIindex\fP |
| is superior to the number of entries in \fIprog_array_map\fP), or |
| if the maximum number of tail calls has been reached for this |
| chain of programs. This limit is defined in the kernel by the |
| macro \fBMAX_TAIL_CALL_CNT\fP (not accessible to user space), |
| which is currently set to 32. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_clone_redirect(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Clone and redirect the packet associated to \fIskb\fP to another |
| net device of index \fIifindex\fP\&. Both ingress and egress |
| interfaces can be used for redirection. The \fBBPF_F_INGRESS\fP |
| value in \fIflags\fP is used to make the distinction (ingress path |
| is selected if the flag is present, egress path otherwise). |
| This is the only flag supported for now. |
| .sp |
| In comparison with \fBbpf_redirect\fP() helper, |
| \fBbpf_clone_redirect\fP() has the associated cost of |
| duplicating the packet buffer, but this can be executed out of |
| the eBPF program. Conversely, \fBbpf_redirect\fP() is more |
| efficient, but it is handled through an action code where the |
| redirection happens only after the eBPF program has returned. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_get_current_pid_tgid(void)\fP |
| .INDENT 7.0 |
| .TP |
| .B Return |
| A 64\-bit integer containing the current tgid and pid, and |
| created as such: |
| \fIcurrent_task\fP\fB\->tgid << 32 |\fP |
| \fIcurrent_task\fP\fB\->pid\fP\&. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_get_current_uid_gid(void)\fP |
| .INDENT 7.0 |
| .TP |
| .B Return |
| A 64\-bit integer containing the current GID and UID, and |
| created as such: \fIcurrent_gid\fP \fB<< 32 |\fP \fIcurrent_uid\fP\&. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_get_current_comm(char *\fP\fIbuf\fP\fB, u32\fP \fIsize_of_buf\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Copy the \fBcomm\fP attribute of the current task into \fIbuf\fP of |
| \fIsize_of_buf\fP\&. The \fBcomm\fP attribute contains the name of |
| the executable (excluding the path) for the current task. The |
| \fIsize_of_buf\fP must be strictly positive. On success, the |
| helper makes sure that the \fIbuf\fP is NUL\-terminated. On failure, |
| it is filled with zeroes. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBu32 bpf_get_cgroup_classid(struct sk_buff *\fP\fIskb\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Retrieve the classid for the current task, i.e. for the net_cls |
| cgroup to which \fIskb\fP belongs. |
| .sp |
| This helper can be used on TC egress path, but not on ingress. |
| .sp |
| The net_cls cgroup provides an interface to tag network packets |
| based on a user\-provided identifier for all traffic coming from |
| the tasks belonging to the related cgroup. See also the related |
| kernel documentation, available from the Linux sources in file |
| \fIDocumentation/cgroup\-v1/net_cls.txt\fP\&. |
| .sp |
| The Linux kernel has two versions for cgroups: there are |
| cgroups v1 and cgroups v2. Both are available to users, who can |
| use a mixture of them, but note that the net_cls cgroup is for |
| cgroup v1 only. This makes it incompatible with BPF programs |
| run on cgroups, which is a cgroup\-v2\-only feature (a socket can |
| only hold data for one version of cgroups at a time). |
| .sp |
| This helper is only available is the kernel was compiled with |
| the \fBCONFIG_CGROUP_NET_CLASSID\fP configuration option set to |
| "\fBy\fP" or to "\fBm\fP". |
| .TP |
| .B Return |
| The classid, or 0 for the default unconfigured classid. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_vlan_push(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIvlan_proto\fP\fB, u16\fP \fIvlan_tci\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Push a \fIvlan_tci\fP (VLAN tag control information) of protocol |
| \fIvlan_proto\fP to the packet associated to \fIskb\fP, then update |
| the checksum. Note that if \fIvlan_proto\fP is different from |
| \fBETH_P_8021Q\fP and \fBETH_P_8021AD\fP, it is considered to |
| be \fBETH_P_8021Q\fP\&. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_vlan_pop(struct sk_buff *\fP\fIskb\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Pop a VLAN header from the packet associated to \fIskb\fP\&. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_get_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Get tunnel metadata. This helper takes a pointer \fIkey\fP to an |
| empty \fBstruct bpf_tunnel_key\fP of \fBsize\fP, that will be |
| filled with tunnel metadata for the packet associated to \fIskb\fP\&. |
| The \fIflags\fP can be set to \fBBPF_F_TUNINFO_IPV6\fP, which |
| indicates that the tunnel is based on IPv6 protocol instead of |
| IPv4. |
| .sp |
| The \fBstruct bpf_tunnel_key\fP is an object that generalizes the |
| principal parameters used by various tunneling protocols into a |
| single struct. This way, it can be used to easily make a |
| decision based on the contents of the encapsulation header, |
| "summarized" in this struct. In particular, it holds the IP |
| address of the remote end (IPv4 or IPv6, depending on the case) |
| in \fIkey\fP\fB\->remote_ipv4\fP or \fIkey\fP\fB\->remote_ipv6\fP\&. Also, |
| this struct exposes the \fIkey\fP\fB\->tunnel_id\fP, which is |
| generally mapped to a VNI (Virtual Network Identifier), making |
| it programmable together with the \fBbpf_skb_set_tunnel_key\fP() helper. |
| .sp |
| Let\(aqs imagine that the following code is part of a program |
| attached to the TC ingress interface, on one end of a GRE |
| tunnel, and is supposed to filter out all messages coming from |
| remote ends with IPv4 address other than 10.0.0.1: |
| .INDENT 7.0 |
| .INDENT 3.5 |
| .sp |
| .nf |
| .ft C |
| int ret; |
| struct bpf_tunnel_key key = {}; |
| |
| ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0); |
| if (ret < 0) |
| return TC_ACT_SHOT; // drop packet |
| |
| if (key.remote_ipv4 != 0x0a000001) |
| return TC_ACT_SHOT; // drop packet |
| |
| return TC_ACT_OK; // accept packet |
| .ft P |
| .fi |
| .UNINDENT |
| .UNINDENT |
| .sp |
| This interface can also be used with all encapsulation devices |
| that can operate in "collect metadata" mode: instead of having |
| one network device per specific configuration, the "collect |
| metadata" mode only requires a single device where the |
| configuration can be extracted from this helper. |
| .sp |
| This can be used together with various tunnels such as VXLan, |
| Geneve, GRE or IP in IP (IPIP). |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_set_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Populate tunnel metadata for packet associated to \fIskb.\fP The |
| tunnel metadata is set to the contents of \fIkey\fP, of \fIsize\fP\&. The |
| \fIflags\fP can be set to a combination of the following values: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_F_TUNINFO_IPV6\fP |
| Indicate that the tunnel is based on IPv6 protocol |
| instead of IPv4. |
| .TP |
| .B \fBBPF_F_ZERO_CSUM_TX\fP |
| For IPv4 packets, add a flag to tunnel metadata |
| indicating that checksum computation should be skipped |
| and checksum set to zeroes. |
| .TP |
| .B \fBBPF_F_DONT_FRAGMENT\fP |
| Add a flag to tunnel metadata indicating that the |
| packet should not be fragmented. |
| .TP |
| .B \fBBPF_F_SEQ_NUMBER\fP |
| Add a flag to tunnel metadata indicating that a |
| sequence number should be added to tunnel header before |
| sending the packet. This flag was added for GRE |
| encapsulation, but might be used with other protocols |
| as well in the future. |
| .UNINDENT |
| .sp |
| Here is a typical usage on the transmit path: |
| .INDENT 7.0 |
| .INDENT 3.5 |
| .sp |
| .nf |
| .ft C |
| struct bpf_tunnel_key key; |
| populate key ... |
| bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0); |
| bpf_clone_redirect(skb, vxlan_dev_ifindex, 0); |
| .ft P |
| .fi |
| .UNINDENT |
| .UNINDENT |
| .sp |
| See also the description of the \fBbpf_skb_get_tunnel_key\fP() |
| helper for additional information. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_perf_event_read(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Read the value of a perf event counter. This helper relies on a |
| \fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of |
| the perf event counter is selected when \fImap\fP is updated with |
| perf event file descriptors. The \fImap\fP is an array whose size |
| is the number of available CPUs, and each cell contains a value |
| relative to one CPU. The value to retrieve is indicated by |
| \fIflags\fP, that contains the index of the CPU to look up, masked |
| with \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to |
| \fBBPF_F_CURRENT_CPU\fP to indicate that the value for the |
| current CPU should be retrieved. |
| .sp |
| Note that before Linux 4.13, only hardware perf event can be |
| retrieved. |
| .sp |
| Also, be aware that the newer helper |
| \fBbpf_perf_event_read_value\fP() is recommended over |
| \fBbpf_perf_event_read\fP() in general. The latter has some ABI |
| quirks where error and counter value are used as a return code |
| (which is wrong to do since ranges may overlap). This issue is |
| fixed with \fBbpf_perf_event_read_value\fP(), which at the same |
| time provides more features over the \fBbpf_perf_event_read\fP() interface. Please refer to the description of |
| \fBbpf_perf_event_read_value\fP() for details. |
| .TP |
| .B Return |
| The value of the perf event counter read from the map, or a |
| negative error code in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_redirect(u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Redirect the packet to another net device of index \fIifindex\fP\&. |
| This helper is somewhat similar to \fBbpf_clone_redirect\fP(), except that the packet is not cloned, which provides |
| increased performance. |
| .sp |
| Except for XDP, both ingress and egress interfaces can be used |
| for redirection. The \fBBPF_F_INGRESS\fP value in \fIflags\fP is used |
| to make the distinction (ingress path is selected if the flag |
| is present, egress path otherwise). Currently, XDP only |
| supports redirection to the egress interface, and accepts no |
| flag at all. |
| .sp |
| The same effect can be attained with the more generic |
| \fBbpf_redirect_map\fP(), which requires specific maps to be |
| used but offers better performance. |
| .TP |
| .B Return |
| For XDP, the helper returns \fBXDP_REDIRECT\fP on success or |
| \fBXDP_ABORTED\fP on error. For other program types, the values |
| are \fBTC_ACT_REDIRECT\fP on success or \fBTC_ACT_SHOT\fP on |
| error. |
| .UNINDENT |
| .TP |
| .B \fBu32 bpf_get_route_realm(struct sk_buff *\fP\fIskb\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Retrieve the realm or the route, that is to say the |
| \fBtclassid\fP field of the destination for the \fIskb\fP\&. The |
| indentifier retrieved is a user\-provided tag, similar to the |
| one used with the net_cls cgroup (see description for |
| \fBbpf_get_cgroup_classid\fP() helper), but here this tag is |
| held by a route (a destination entry), not by a task. |
| .sp |
| Retrieving this identifier works with the clsact TC egress hook |
| (see also \fBtc\-bpf(8)\fP), or alternatively on conventional |
| classful egress qdiscs, but not on TC ingress path. In case of |
| clsact TC egress hook, this has the advantage that, internally, |
| the destination entry has not been dropped yet in the transmit |
| path. Therefore, the destination entry does not need to be |
| artificially held via \fBnetif_keep_dst\fP() for a classful |
| qdisc until the \fIskb\fP is freed. |
| .sp |
| This helper is available only if the kernel was compiled with |
| \fBCONFIG_IP_ROUTE_CLASSID\fP configuration option. |
| .TP |
| .B Return |
| The realm of the route for the packet associated to \fIskb\fP, or 0 |
| if none was found. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_perf_event_output(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, void *\fP\fIdata\fP\fB, u64\fP \fIsize\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Write raw \fIdata\fP blob into a special BPF perf event held by |
| \fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. This perf |
| event must have the following attributes: \fBPERF_SAMPLE_RAW\fP |
| as \fBsample_type\fP, \fBPERF_TYPE_SOFTWARE\fP as \fBtype\fP, and |
| \fBPERF_COUNT_SW_BPF_OUTPUT\fP as \fBconfig\fP\&. |
| .sp |
| The \fIflags\fP are used to indicate the index in \fImap\fP for which |
| the value must be put, masked with \fBBPF_F_INDEX_MASK\fP\&. |
| Alternatively, \fIflags\fP can be set to \fBBPF_F_CURRENT_CPU\fP |
| to indicate that the index of the current CPU core should be |
| used. |
| .sp |
| The value to write, of \fIsize\fP, is passed through eBPF stack and |
| pointed by \fIdata\fP\&. |
| .sp |
| The context of the program \fIctx\fP needs also be passed to the |
| helper. |
| .sp |
| On user space, a program willing to read the values needs to |
| call \fBperf_event_open\fP() on the perf event (either for |
| one or for all CPUs) and to store the file descriptor into the |
| \fImap\fP\&. This must be done before the eBPF program can send data |
| into it. An example is available in file |
| \fIsamples/bpf/trace_output_user.c\fP in the Linux kernel source |
| tree (the eBPF program counterpart is in |
| \fIsamples/bpf/trace_output_kern.c\fP). |
| .sp |
| \fBbpf_perf_event_output\fP() achieves better performance |
| than \fBbpf_trace_printk\fP() for sharing data with user |
| space, and is much better suitable for streaming data from eBPF |
| programs. |
| .sp |
| Note that this helper is not restricted to tracing use cases |
| and can be used with programs attached to TC or XDP as well, |
| where it allows for passing data to user space listeners. Data |
| can be: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| Only custom structs, |
| .IP \(bu 2 |
| Only the packet payload, or |
| .IP \(bu 2 |
| A combination of both. |
| .UNINDENT |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_load_bytes(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper was provided as an easy way to load data from a |
| packet. It can be used to load \fIlen\fP bytes from \fIoffset\fP from |
| the packet associated to \fIskb\fP, into the buffer pointed by |
| \fIto\fP\&. |
| .sp |
| Since Linux 4.7, usage of this helper has mostly been replaced |
| by "direct packet access", enabling packet data to be |
| manipulated with \fIskb\fP\fB\->data\fP and \fIskb\fP\fB\->data_end\fP |
| pointing respectively to the first byte of packet data and to |
| the byte after the last byte of packet data. However, it |
| remains useful if one wishes to read large quantities of data |
| at once from a packet into the eBPF stack. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_get_stackid(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Walk a user or a kernel stack and return its id. To achieve |
| this, the helper needs \fIctx\fP, which is a pointer to the context |
| on which the tracing program is executed, and a pointer to a |
| \fImap\fP of type \fBBPF_MAP_TYPE_STACK_TRACE\fP\&. |
| .sp |
| The last argument, \fIflags\fP, holds the number of stack frames to |
| skip (from 0 to 255), masked with |
| \fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set |
| a combination of the following flags: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_F_USER_STACK\fP |
| Collect a user space stack instead of a kernel stack. |
| .TP |
| .B \fBBPF_F_FAST_STACK_CMP\fP |
| Compare stacks by hash only. |
| .TP |
| .B \fBBPF_F_REUSE_STACKID\fP |
| If two different stacks hash into the same \fIstackid\fP, |
| discard the old one. |
| .UNINDENT |
| .sp |
| The stack id retrieved is a 32 bit long integer handle which |
| can be further combined with other data (including other stack |
| ids) and used as a key into maps. This can be useful for |
| generating a variety of graphs (such as flame graphs or off\-cpu |
| graphs). |
| .sp |
| For walking a stack, this helper is an improvement over |
| \fBbpf_probe_read\fP(), which can be used with unrolled loops |
| but is not efficient and consumes a lot of eBPF instructions. |
| Instead, \fBbpf_get_stackid\fP() can collect up to |
| \fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames. Note that |
| this limit can be controlled with the \fBsysctl\fP program, and |
| that it should be manually increased in order to profile long |
| user stacks (such as stacks for Java programs). To do so, use: |
| .INDENT 7.0 |
| .INDENT 3.5 |
| .sp |
| .nf |
| .ft C |
| # sysctl kernel.perf_event_max_stack=<new value> |
| .ft P |
| .fi |
| .UNINDENT |
| .UNINDENT |
| .TP |
| .B Return |
| The positive or null stack id on success, or a negative error |
| in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBs64 bpf_csum_diff(__be32 *\fP\fIfrom\fP\fB, u32\fP \fIfrom_size\fP\fB, __be32 *\fP\fIto\fP\fB, u32\fP \fIto_size\fP\fB, __wsum\fP \fIseed\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Compute a checksum difference, from the raw buffer pointed by |
| \fIfrom\fP, of length \fIfrom_size\fP (that must be a multiple of 4), |
| towards the raw buffer pointed by \fIto\fP, of size \fIto_size\fP |
| (same remark). An optional \fIseed\fP can be added to the value |
| (this can be cascaded, the seed may come from a previous call |
| to the helper). |
| .sp |
| This is flexible enough to be used in several ways: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| With \fIfrom_size\fP == 0, \fIto_size\fP > 0 and \fIseed\fP set to |
| checksum, it can be used when pushing new data. |
| .IP \(bu 2 |
| With \fIfrom_size\fP > 0, \fIto_size\fP == 0 and \fIseed\fP set to |
| checksum, it can be used when removing data from a packet. |
| .IP \(bu 2 |
| With \fIfrom_size\fP > 0, \fIto_size\fP > 0 and \fIseed\fP set to 0, it |
| can be used to compute a diff. Note that \fIfrom_size\fP and |
| \fIto_size\fP do not need to be equal. |
| .UNINDENT |
| .sp |
| This helper can be used in combination with |
| \fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(), to |
| which one can feed in the difference computed with |
| \fBbpf_csum_diff\fP(). |
| .TP |
| .B Return |
| The checksum result, or a negative error code in case of |
| failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_get_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Retrieve tunnel options metadata for the packet associated to |
| \fIskb\fP, and store the raw tunnel option data to the buffer \fIopt\fP |
| of \fIsize\fP\&. |
| .sp |
| This helper can be used with encapsulation devices that can |
| operate in "collect metadata" mode (please refer to the related |
| note in the description of \fBbpf_skb_get_tunnel_key\fP() for |
| more details). A particular example where this can be used is |
| in combination with the Geneve encapsulation protocol, where it |
| allows for pushing (with \fBbpf_skb_get_tunnel_opt\fP() helper) |
| and retrieving arbitrary TLVs (Type\-Length\-Value headers) from |
| the eBPF program. This allows for full customization of these |
| headers. |
| .TP |
| .B Return |
| The size of the option data retrieved. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_set_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Set tunnel options metadata for the packet associated to \fIskb\fP |
| to the option data contained in the raw buffer \fIopt\fP of \fIsize\fP\&. |
| .sp |
| See also the description of the \fBbpf_skb_get_tunnel_opt\fP() |
| helper for additional information. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_change_proto(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIproto\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Change the protocol of the \fIskb\fP to \fIproto\fP\&. Currently |
| supported are transition from IPv4 to IPv6, and from IPv6 to |
| IPv4. The helper takes care of the groundwork for the |
| transition, including resizing the socket buffer. The eBPF |
| program is expected to fill the new headers, if any, via |
| \fBskb_store_bytes\fP() and to recompute the checksums with |
| \fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(). The main case for this helper is to perform NAT64 |
| operations out of an eBPF program. |
| .sp |
| Internally, the GSO type is marked as dodgy so that headers are |
| checked and segments are recalculated by the GSO/GRO engine. |
| The size for GSO target is adapted as well. |
| .sp |
| All values for \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_change_type(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Change the packet type for the packet associated to \fIskb\fP\&. This |
| comes down to setting \fIskb\fP\fB\->pkt_type\fP to \fItype\fP, except |
| the eBPF program does not have a write access to \fIskb\fP\fB\->pkt_type\fP beside this helper. Using a helper here allows |
| for graceful handling of errors. |
| .sp |
| The major use case is to change incoming \fIskb*s to |
| **PACKET_HOST*\fP in a programmatic way instead of having to |
| recirculate via \fBredirect\fP(..., \fBBPF_F_INGRESS\fP), for |
| example. |
| .sp |
| Note that \fItype\fP only allows certain values. At this time, they |
| are: |
| .INDENT 7.0 |
| .TP |
| .B \fBPACKET_HOST\fP |
| Packet is for us. |
| .TP |
| .B \fBPACKET_BROADCAST\fP |
| Send packet to all. |
| .TP |
| .B \fBPACKET_MULTICAST\fP |
| Send packet to group. |
| .TP |
| .B \fBPACKET_OTHERHOST\fP |
| Send packet to someone else. |
| .UNINDENT |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_under_cgroup(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Check whether \fIskb\fP is a descendant of the cgroup2 held by |
| \fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&. |
| .TP |
| .B Return |
| The return value depends on the result of the test, and can be: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| 0, if the \fIskb\fP failed the cgroup2 descendant test. |
| .IP \(bu 2 |
| 1, if the \fIskb\fP succeeded the cgroup2 descendant test. |
| .IP \(bu 2 |
| A negative error code, if an error occurred. |
| .UNINDENT |
| .UNINDENT |
| .TP |
| .B \fBu32 bpf_get_hash_recalc(struct sk_buff *\fP\fIskb\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Retrieve the hash of the packet, \fIskb\fP\fB\->hash\fP\&. If it is |
| not set, in particular if the hash was cleared due to mangling, |
| recompute this hash. Later accesses to the hash can be done |
| directly with \fIskb\fP\fB\->hash\fP\&. |
| .sp |
| Calling \fBbpf_set_hash_invalid\fP(), changing a packet |
| prototype with \fBbpf_skb_change_proto\fP(), or calling |
| \fBbpf_skb_store_bytes\fP() with the |
| \fBBPF_F_INVALIDATE_HASH\fP are actions susceptible to clear |
| the hash and to trigger a new computation for the next call to |
| \fBbpf_get_hash_recalc\fP(). |
| .TP |
| .B Return |
| The 32\-bit hash. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_get_current_task(void)\fP |
| .INDENT 7.0 |
| .TP |
| .B Return |
| A pointer to the current task struct. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_probe_write_user(void *\fP\fIdst\fP\fB, const void *\fP\fIsrc\fP\fB, u32\fP \fIlen\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Attempt in a safe way to write \fIlen\fP bytes from the buffer |
| \fIsrc\fP to \fIdst\fP in memory. It only works for threads that are in |
| user context, and \fIdst\fP must be a valid user space address. |
| .sp |
| This helper should not be used to implement any kind of |
| security mechanism because of TOC\-TOU attacks, but rather to |
| debug, divert, and manipulate execution of semi\-cooperative |
| processes. |
| .sp |
| Keep in mind that this feature is meant for experiments, and it |
| has a risk of crashing the system and running programs. |
| Therefore, when an eBPF program using this helper is attached, |
| a warning including PID and process name is printed to kernel |
| logs. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_current_task_under_cgroup(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Check whether the probe is being run is the context of a given |
| subset of the cgroup2 hierarchy. The cgroup2 to test is held by |
| \fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&. |
| .TP |
| .B Return |
| The return value depends on the result of the test, and can be: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| 0, if the \fIskb\fP task belongs to the cgroup2. |
| .IP \(bu 2 |
| 1, if the \fIskb\fP task does not belong to the cgroup2. |
| .IP \(bu 2 |
| A negative error code, if an error occurred. |
| .UNINDENT |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_change_tail(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Resize (trim or grow) the packet associated to \fIskb\fP to the |
| new \fIlen\fP\&. The \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .sp |
| The basic idea is that the helper performs the needed work to |
| change the size of the packet, then the eBPF program rewrites |
| the rest via helpers like \fBbpf_skb_store_bytes\fP(), |
| \fBbpf_l3_csum_replace\fP(), \fBbpf_l3_csum_replace\fP() |
| and others. This helper is a slow path utility intended for |
| replies with control messages. And because it is targeted for |
| slow path, the helper itself can afford to be slow: it |
| implicitly linearizes, unclones and drops offloads from the |
| \fIskb\fP\&. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_pull_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Pull in non\-linear data in case the \fIskb\fP is non\-linear and not |
| all of \fIlen\fP are part of the linear section. Make \fIlen\fP bytes |
| from \fIskb\fP readable and writable. If a zero value is passed for |
| \fIlen\fP, then the whole length of the \fIskb\fP is pulled. |
| .sp |
| This helper is only needed for reading and writing with direct |
| packet access. |
| .sp |
| For direct packet access, testing that offsets to access |
| are within packet boundaries (test on \fIskb\fP\fB\->data_end\fP) is |
| susceptible to fail if offsets are invalid, or if the requested |
| data is in non\-linear parts of the \fIskb\fP\&. On failure the |
| program can just bail out, or in the case of a non\-linear |
| buffer, use a helper to make the data available. The |
| \fBbpf_skb_load_bytes\fP() helper is a first solution to access |
| the data. Another one consists in using \fBbpf_skb_pull_data\fP |
| to pull in once the non\-linear parts, then retesting and |
| eventually access the data. |
| .sp |
| At the same time, this also makes sure the \fIskb\fP is uncloned, |
| which is a necessary condition for direct write. As this needs |
| to be an invariant for the write part only, the verifier |
| detects writes and adds a prologue that is calling |
| \fBbpf_skb_pull_data()\fP to effectively unclone the \fIskb\fP from |
| the very beginning in case it is indeed cloned. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBs64 bpf_csum_update(struct sk_buff *\fP\fIskb\fP\fB, __wsum\fP \fIcsum\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Add the checksum \fIcsum\fP into \fIskb\fP\fB\->csum\fP in case the |
| driver has supplied a checksum for the entire packet into that |
| field. Return an error otherwise. This helper is intended to be |
| used in combination with \fBbpf_csum_diff\fP(), in particular |
| when the checksum needs to be updated after data has been |
| written into the packet through direct packet access. |
| .TP |
| .B Return |
| The checksum on success, or a negative error code in case of |
| failure. |
| .UNINDENT |
| .TP |
| .B \fBvoid bpf_set_hash_invalid(struct sk_buff *\fP\fIskb\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Invalidate the current \fIskb\fP\fB\->hash\fP\&. It can be used after |
| mangling on headers through direct packet access, in order to |
| indicate that the hash is outdated and to trigger a |
| recalculation the next time the kernel tries to access this |
| hash or when the \fBbpf_get_hash_recalc\fP() helper is called. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_get_numa_node_id(void)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Return the id of the current NUMA node. The primary use case |
| for this helper is the selection of sockets for the local NUMA |
| node, when the program is attached to sockets using the |
| \fBSO_ATTACH_REUSEPORT_EBPF\fP option (see also \fBsocket(7)\fP), |
| but the helper is also available to other eBPF program types, |
| similarly to \fBbpf_get_smp_processor_id\fP(). |
| .TP |
| .B Return |
| The id of current NUMA node. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_change_head(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Grows headroom of packet associated to \fIskb\fP and adjusts the |
| offset of the MAC header accordingly, adding \fIlen\fP bytes of |
| space. It automatically extends and reallocates memory as |
| required. |
| .sp |
| This helper can be used on a layer 3 \fIskb\fP to push a MAC header |
| for redirection into a layer 2 device. |
| .sp |
| All values for \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_xdp_adjust_head(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Adjust (move) \fIxdp_md\fP\fB\->data\fP by \fIdelta\fP bytes. Note that |
| it is possible to use a negative value for \fIdelta\fP\&. This helper |
| can be used to prepare the packet for pushing or popping |
| headers. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_probe_read_str(void *\fP\fIdst\fP\fB, int\fP \fIsize\fP\fB, const void *\fP\fIunsafe_ptr\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Copy a NUL terminated string from an unsafe address |
| \fIunsafe_ptr\fP to \fIdst\fP\&. The \fIsize\fP should include the |
| terminating NUL byte. In case the string length is smaller than |
| \fIsize\fP, the target is not padded with further NUL bytes. If the |
| string length is larger than \fIsize\fP, just \fIsize\fP\-1 bytes are |
| copied and the last byte is set to NUL. |
| .sp |
| On success, the length of the copied string is returned. This |
| makes this helper useful in tracing programs for reading |
| strings, and more importantly to get its length at runtime. See |
| the following snippet: |
| .INDENT 7.0 |
| .INDENT 3.5 |
| .sp |
| .nf |
| .ft C |
| SEC("kprobe/sys_open") |
| void bpf_sys_open(struct pt_regs *ctx) |
| char buf[PATHLEN]; // PATHLEN is defined to 256 |
| int res = bpf_probe_read_str(buf, sizeof(buf), |
| ctx\->di); |
| |
| // Consume buf, for example push it to |
| // userspace via bpf_perf_event_output(); we |
| // can use res (the string length) as event |
| // size, after checking its boundaries. |
| .ft P |
| .fi |
| .UNINDENT |
| .UNINDENT |
| .sp |
| In comparison, using \fBbpf_probe_read()\fP helper here instead |
| to read the string would require to estimate the length at |
| compile time, and would often result in copying more memory |
| than necessary. |
| .sp |
| Another useful use case is when parsing individual process |
| arguments or individual environment variables navigating |
| \fIcurrent\fP\fB\->mm\->arg_start\fP and \fIcurrent\fP\fB\->mm\->env_start\fP: using this helper and the return value, |
| one can quickly iterate at the right offset of the memory area. |
| .TP |
| .B Return |
| On success, the strictly positive length of the string, |
| including the trailing NUL character. On error, a negative |
| value. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_get_socket_cookie(struct sk_buff *\fP\fIskb\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| If the \fBstruct sk_buff\fP pointed by \fIskb\fP has a known socket, |
| retrieve the cookie (generated by the kernel) of this socket. |
| If no cookie has been set yet, generate a new cookie. Once |
| generated, the socket cookie remains stable for the life of the |
| socket. This helper can be useful for monitoring per socket |
| networking traffic statistics as it provides a unique socket |
| identifier per namespace. |
| .TP |
| .B Return |
| A 8\-byte long non\-decreasing number on success, or 0 if the |
| socket field is missing inside \fIskb\fP\&. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_get_socket_cookie(struct bpf_sock_addr *\fP\fIctx\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Equivalent to bpf_get_socket_cookie() helper that accepts |
| \fIskb\fP, but gets socket from \fBstruct bpf_sock_addr\fP contex. |
| .TP |
| .B Return |
| A 8\-byte long non\-decreasing number. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_get_socket_cookie(struct bpf_sock_ops *\fP\fIctx\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Equivalent to bpf_get_socket_cookie() helper that accepts |
| \fIskb\fP, but gets socket from \fBstruct bpf_sock_ops\fP contex. |
| .TP |
| .B Return |
| A 8\-byte long non\-decreasing number. |
| .UNINDENT |
| .TP |
| .B \fBu32 bpf_get_socket_uid(struct sk_buff *\fP\fIskb\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Return |
| The owner UID of the socket associated to \fIskb\fP\&. If the socket |
| is \fBNULL\fP, or if it is not a full socket (i.e. if it is a |
| time\-wait or a request socket instead), \fBoverflowuid\fP value |
| is returned (note that \fBoverflowuid\fP might also be the actual |
| UID value for the socket). |
| .UNINDENT |
| .TP |
| .B \fBu32 bpf_set_hash(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIhash\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Set the full hash for \fIskb\fP (set the field \fIskb\fP\fB\->hash\fP) |
| to value \fIhash\fP\&. |
| .TP |
| .B Return |
| .UNINDENT |
| .TP |
| .B \fBint bpf_setsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Emulate a call to \fBsetsockopt()\fP on the socket associated to |
| \fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at |
| which the option resides and the name \fIoptname\fP of the option |
| must be specified, see \fBsetsockopt(2)\fP for more information. |
| The option value of length \fIoptlen\fP is pointed by \fIoptval\fP\&. |
| .sp |
| This helper actually implements a subset of \fBsetsockopt()\fP\&. |
| It supports the following \fIlevel\fPs: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| \fBSOL_SOCKET\fP, which supports the following \fIoptname\fPs: |
| \fBSO_RCVBUF\fP, \fBSO_SNDBUF\fP, \fBSO_MAX_PACING_RATE\fP, |
| \fBSO_PRIORITY\fP, \fBSO_RCVLOWAT\fP, \fBSO_MARK\fP\&. |
| .IP \(bu 2 |
| \fBIPPROTO_TCP\fP, which supports the following \fIoptname\fPs: |
| \fBTCP_CONGESTION\fP, \fBTCP_BPF_IW\fP, |
| \fBTCP_BPF_SNDCWND_CLAMP\fP\&. |
| .IP \(bu 2 |
| \fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&. |
| .IP \(bu 2 |
| \fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&. |
| .UNINDENT |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_adjust_room(struct sk_buff *\fP\fIskb\fP\fB, s32\fP \fIlen_diff\fP\fB, u32\fP \fImode\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Grow or shrink the room for data in the packet associated to |
| \fIskb\fP by \fIlen_diff\fP, and according to the selected \fImode\fP\&. |
| .sp |
| There is a single supported mode at this time: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| \fBBPF_ADJ_ROOM_NET\fP: Adjust room at the network layer |
| (room space is added or removed below the layer 3 header). |
| .UNINDENT |
| .sp |
| All values for \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Redirect the packet to the endpoint referenced by \fImap\fP at |
| index \fIkey\fP\&. Depending on its type, this \fImap\fP can contain |
| references to net devices (for forwarding packets through other |
| ports), or to CPUs (for redirecting XDP frames to another CPU; |
| but this is only implemented for native XDP (with driver |
| support) as of this writing). |
| .sp |
| All values for \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .sp |
| When used to redirect packets to net devices, this helper |
| provides a high performance increase over \fBbpf_redirect\fP(). |
| This is due to various implementation details of the underlying |
| mechanisms, one of which is the fact that \fBbpf_redirect_map\fP() tries to send packet as a "bulk" to the device. |
| .TP |
| .B Return |
| \fBXDP_REDIRECT\fP on success, or \fBXDP_ABORTED\fP on error. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_sk_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Redirect the packet to the socket referenced by \fImap\fP (of type |
| \fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and |
| egress interfaces can be used for redirection. The |
| \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the |
| distinction (ingress path is selected if the flag is present, |
| egress path otherwise). This is the only flag supported for now. |
| .TP |
| .B Return |
| \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_sock_map_update(struct bpf_sock_ops *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Add an entry to, or update a \fImap\fP referencing sockets. The |
| \fIskops\fP is used as a new value for the entry associated to |
| \fIkey\fP\&. \fIflags\fP is one of: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_NOEXIST\fP |
| The entry for \fIkey\fP must not exist in the map. |
| .TP |
| .B \fBBPF_EXIST\fP |
| The entry for \fIkey\fP must already exist in the map. |
| .TP |
| .B \fBBPF_ANY\fP |
| No condition on the existence of the entry for \fIkey\fP\&. |
| .UNINDENT |
| .sp |
| If the \fImap\fP has eBPF programs (parser and verdict), those will |
| be inherited by the socket being added. If the socket is |
| already attached to eBPF programs, this results in an error. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_xdp_adjust_meta(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Adjust the address pointed by \fIxdp_md\fP\fB\->data_meta\fP by |
| \fIdelta\fP (which can be positive or negative). Note that this |
| operation modifies the address stored in \fIxdp_md\fP\fB\->data\fP, |
| so the latter must be loaded only after the helper has been |
| called. |
| .sp |
| The use of \fIxdp_md\fP\fB\->data_meta\fP is optional and programs |
| are not required to use it. The rationale is that when the |
| packet is processed with XDP (e.g. as DoS filter), it is |
| possible to push further meta data along with it before passing |
| to the stack, and to give the guarantee that an ingress eBPF |
| program attached as a TC classifier on the same device can pick |
| this up for further post\-processing. Since TC works with socket |
| buffers, it remains possible to set from XDP the \fBmark\fP or |
| \fBpriority\fP pointers, or other pointers for the socket buffer. |
| Having this scratch space generic and programmable allows for |
| more flexibility as the user is free to store whatever meta |
| data they need. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_perf_event_read_value(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Read the value of a perf event counter, and store it into \fIbuf\fP |
| of size \fIbuf_size\fP\&. This helper relies on a \fImap\fP of type |
| \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of the perf event |
| counter is selected when \fImap\fP is updated with perf event file |
| descriptors. The \fImap\fP is an array whose size is the number of |
| available CPUs, and each cell contains a value relative to one |
| CPU. The value to retrieve is indicated by \fIflags\fP, that |
| contains the index of the CPU to look up, masked with |
| \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to |
| \fBBPF_F_CURRENT_CPU\fP to indicate that the value for the |
| current CPU should be retrieved. |
| .sp |
| This helper behaves in a way close to |
| \fBbpf_perf_event_read\fP() helper, save that instead of |
| just returning the value observed, it fills the \fIbuf\fP |
| structure. This allows for additional data to be retrieved: in |
| particular, the enabled and running times (in \fIbuf\fP\fB\->enabled\fP and \fIbuf\fP\fB\->running\fP, respectively) are |
| copied. In general, \fBbpf_perf_event_read_value\fP() is |
| recommended over \fBbpf_perf_event_read\fP(), which has some |
| ABI issues and provides fewer functionalities. |
| .sp |
| These values are interesting, because hardware PMU (Performance |
| Monitoring Unit) counters are limited resources. When there are |
| more PMU based perf events opened than available counters, |
| kernel will multiplex these events so each event gets certain |
| percentage (but not all) of the PMU time. In case that |
| multiplexing happens, the number of samples or counter value |
| will not reflect the case compared to when no multiplexing |
| occurs. This makes comparison between different runs difficult. |
| Typically, the counter value should be normalized before |
| comparing to other experiments. The usual normalization is done |
| as follows. |
| .INDENT 7.0 |
| .INDENT 3.5 |
| .sp |
| .nf |
| .ft C |
| normalized_counter = counter * t_enabled / t_running |
| .ft P |
| .fi |
| .UNINDENT |
| .UNINDENT |
| .sp |
| Where t_enabled is the time enabled for event and t_running is |
| the time running for event since last normalization. The |
| enabled and running times are accumulated since the perf event |
| open. To achieve scaling factor between two invocations of an |
| eBPF program, users can can use CPU id as the key (which is |
| typical for perf array usage model) to remember the previous |
| value and do the calculation inside the eBPF program. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_perf_prog_read_value(struct bpf_perf_event_data *\fP\fIctx\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| For en eBPF program attached to a perf event, retrieve the |
| value of the event counter associated to \fIctx\fP and store it in |
| the structure pointed by \fIbuf\fP and of size \fIbuf_size\fP\&. Enabled |
| and running times are also stored in the structure (see |
| description of helper \fBbpf_perf_event_read_value\fP() for |
| more details). |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_getsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Emulate a call to \fBgetsockopt()\fP on the socket associated to |
| \fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at |
| which the option resides and the name \fIoptname\fP of the option |
| must be specified, see \fBgetsockopt(2)\fP for more information. |
| The retrieved value is stored in the structure pointed by |
| \fIopval\fP and of length \fIoptlen\fP\&. |
| .sp |
| This helper actually implements a subset of \fBgetsockopt()\fP\&. |
| It supports the following \fIlevel\fPs: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| \fBIPPROTO_TCP\fP, which supports \fIoptname\fP |
| \fBTCP_CONGESTION\fP\&. |
| .IP \(bu 2 |
| \fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&. |
| .IP \(bu 2 |
| \fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&. |
| .UNINDENT |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_override_return(struct pt_reg *\fP\fIregs\fP\fB, u64\fP \fIrc\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Used for error injection, this helper uses kprobes to override |
| the return value of the probed function, and to set it to \fIrc\fP\&. |
| The first argument is the context \fIregs\fP on which the kprobe |
| works. |
| .sp |
| This helper works by setting setting the PC (program counter) |
| to an override function which is run in place of the original |
| probed function. This means the probed function is not run at |
| all. The replacement function just returns with the required |
| value. |
| .sp |
| This helper has security implications, and thus is subject to |
| restrictions. It is only available if the kernel was compiled |
| with the \fBCONFIG_BPF_KPROBE_OVERRIDE\fP configuration |
| option, and in this case it only works on functions tagged with |
| \fBALLOW_ERROR_INJECTION\fP in the kernel code. |
| .sp |
| Also, the helper is only available for the architectures having |
| the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing, |
| x86 architecture is the only one to support this feature. |
| .TP |
| .B Return |
| .UNINDENT |
| .TP |
| .B \fBint bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *\fP\fIbpf_sock\fP\fB, int\fP \fIargval\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Attempt to set the value of the \fBbpf_sock_ops_cb_flags\fP field |
| for the full TCP socket associated to \fIbpf_sock_ops\fP to |
| \fIargval\fP\&. |
| .sp |
| The primary use of this field is to determine if there should |
| be calls to eBPF programs of type |
| \fBBPF_PROG_TYPE_SOCK_OPS\fP at various points in the TCP |
| code. A program of the same type can change its value, per |
| connection and as necessary, when the connection is |
| established. This field is directly accessible for reading, but |
| this helper must be used for updates in order to return an |
| error if an eBPF program tries to set a callback that is not |
| supported in the current kernel. |
| .sp |
| The supported callback values that \fIargval\fP can combine are: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| \fBBPF_SOCK_OPS_RTO_CB_FLAG\fP (retransmission time out) |
| .IP \(bu 2 |
| \fBBPF_SOCK_OPS_RETRANS_CB_FLAG\fP (retransmission) |
| .IP \(bu 2 |
| \fBBPF_SOCK_OPS_STATE_CB_FLAG\fP (TCP state change) |
| .UNINDENT |
| .sp |
| Here are some examples of where one could call such eBPF |
| program: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| When RTO fires. |
| .IP \(bu 2 |
| When a packet is retransmitted. |
| .IP \(bu 2 |
| When the connection terminates. |
| .IP \(bu 2 |
| When a packet is sent. |
| .IP \(bu 2 |
| When a packet is received. |
| .UNINDENT |
| .TP |
| .B Return |
| Code \fB\-EINVAL\fP if the socket is not a full TCP socket; |
| otherwise, a positive number containing the bits that could not |
| be set is returned (which comes down to 0 if all bits were set |
| as required). |
| .UNINDENT |
| .TP |
| .B \fBint bpf_msg_redirect_map(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper is used in programs implementing policies at the |
| socket level. If the message \fImsg\fP is allowed to pass (i.e. if |
| the verdict eBPF program returns \fBSK_PASS\fP), redirect it to |
| the socket referenced by \fImap\fP (of type |
| \fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and |
| egress interfaces can be used for redirection. The |
| \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the |
| distinction (ingress path is selected if the flag is present, |
| egress path otherwise). This is the only flag supported for now. |
| .TP |
| .B Return |
| \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_msg_apply_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| For socket policies, apply the verdict of the eBPF program to |
| the next \fIbytes\fP (number of bytes) of message \fImsg\fP\&. |
| .sp |
| For example, this helper can be used in the following cases: |
| .INDENT 7.0 |
| .IP \(bu 2 |
| A single \fBsendmsg\fP() or \fBsendfile\fP() system call |
| contains multiple logical messages that the eBPF program is |
| supposed to read and for which it should apply a verdict. |
| .IP \(bu 2 |
| An eBPF program only cares to read the first \fIbytes\fP of a |
| \fImsg\fP\&. If the message has a large payload, then setting up |
| and calling the eBPF program repeatedly for all bytes, even |
| though the verdict is already known, would create unnecessary |
| overhead. |
| .UNINDENT |
| .sp |
| When called from within an eBPF program, the helper sets a |
| counter internal to the BPF infrastructure, that is used to |
| apply the last verdict to the next \fIbytes\fP\&. If \fIbytes\fP is |
| smaller than the current data being processed from a |
| \fBsendmsg\fP() or \fBsendfile\fP() system call, the first |
| \fIbytes\fP will be sent and the eBPF program will be re\-run with |
| the pointer for start of data pointing to byte number \fIbytes\fP |
| \fB+ 1\fP\&. If \fIbytes\fP is larger than the current data being |
| processed, then the eBPF verdict will be applied to multiple |
| \fBsendmsg\fP() or \fBsendfile\fP() calls until \fIbytes\fP are |
| consumed. |
| .sp |
| Note that if a socket closes with the internal counter holding |
| a non\-zero value, this is not a problem because data is not |
| being buffered for \fIbytes\fP and is sent as it is received. |
| .TP |
| .B Return |
| .UNINDENT |
| .TP |
| .B \fBint bpf_msg_cork_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| For socket policies, prevent the execution of the verdict eBPF |
| program for message \fImsg\fP until \fIbytes\fP (byte number) have been |
| accumulated. |
| .sp |
| This can be used when one needs a specific number of bytes |
| before a verdict can be assigned, even if the data spans |
| multiple \fBsendmsg\fP() or \fBsendfile\fP() calls. The extreme |
| case would be a user calling \fBsendmsg\fP() repeatedly with |
| 1\-byte long message segments. Obviously, this is bad for |
| performance, but it is still valid. If the eBPF program needs |
| \fIbytes\fP bytes to validate a header, this helper can be used to |
| prevent the eBPF program to be called again until \fIbytes\fP have |
| been accumulated. |
| .TP |
| .B Return |
| .UNINDENT |
| .TP |
| .B \fBint bpf_msg_pull_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIend\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| For socket policies, pull in non\-linear data from user space |
| for \fImsg\fP and set pointers \fImsg\fP\fB\->data\fP and \fImsg\fP\fB\->data_end\fP to \fIstart\fP and \fIend\fP bytes offsets into \fImsg\fP, |
| respectively. |
| .sp |
| If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a |
| \fImsg\fP it can only parse data that the (\fBdata\fP, \fBdata_end\fP) |
| pointers have already consumed. For \fBsendmsg\fP() hooks this |
| is likely the first scatterlist element. But for calls relying |
| on the \fBsendpage\fP handler (e.g. \fBsendfile\fP()) this will |
| be the range (\fB0\fP, \fB0\fP) because the data is shared with |
| user space and by default the objective is to avoid allowing |
| user space to modify data while (or after) eBPF verdict is |
| being decided. This helper can be used to pull in data and to |
| set the start and end pointer to given values. Data will be |
| copied if necessary (i.e. if data was not linear and if start |
| and end pointers do not point to the same chunk). |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .sp |
| All values for \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_bind(struct bpf_sock_addr *\fP\fIctx\fP\fB, struct sockaddr *\fP\fIaddr\fP\fB, int\fP \fIaddr_len\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Bind the socket associated to \fIctx\fP to the address pointed by |
| \fIaddr\fP, of length \fIaddr_len\fP\&. This allows for making outgoing |
| connection from the desired IP address, which can be useful for |
| example when all processes inside a cgroup should use one |
| single IP address on a host that has multiple IP configured. |
| .sp |
| This helper works for IPv4 and IPv6, TCP and UDP sockets. The |
| domain (\fIaddr\fP\fB\->sa_family\fP) must be \fBAF_INET\fP (or |
| \fBAF_INET6\fP). Looking for a free port to bind to can be |
| expensive, therefore binding to port is not permitted by the |
| helper: \fIaddr\fP\fB\->sin_port\fP (or \fBsin6_port\fP, respectively) |
| must be set to zero. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_xdp_adjust_tail(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Adjust (move) \fIxdp_md\fP\fB\->data_end\fP by \fIdelta\fP bytes. It is |
| only possible to shrink the packet as of this writing, |
| therefore \fIdelta\fP must be a negative integer. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_get_xfrm_state(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIindex\fP\fB, struct bpf_xfrm_state *\fP\fIxfrm_state\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Retrieve the XFRM state (IP transform framework, see also |
| \fBip\-xfrm(8)\fP) at \fIindex\fP in XFRM "security path" for \fIskb\fP\&. |
| .sp |
| The retrieved value is stored in the \fBstruct bpf_xfrm_state\fP |
| pointed by \fIxfrm_state\fP and of length \fIsize\fP\&. |
| .sp |
| All values for \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .sp |
| This helper is available only if the kernel was compiled with |
| \fBCONFIG_XFRM\fP configuration option. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_get_stack(struct pt_regs *\fP\fIregs\fP\fB, void *\fP\fIbuf\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Return a user or a kernel stack in bpf program provided buffer. |
| To achieve this, the helper needs \fIctx\fP, which is a pointer |
| to the context on which the tracing program is executed. |
| To store the stacktrace, the bpf program provides \fIbuf\fP with |
| a nonnegative \fIsize\fP\&. |
| .sp |
| The last argument, \fIflags\fP, holds the number of stack frames to |
| skip (from 0 to 255), masked with |
| \fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set |
| the following flags: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_F_USER_STACK\fP |
| Collect a user space stack instead of a kernel stack. |
| .TP |
| .B \fBBPF_F_USER_BUILD_ID\fP |
| Collect buildid+offset instead of ips for user stack, |
| only valid if \fBBPF_F_USER_STACK\fP is also specified. |
| .UNINDENT |
| .sp |
| \fBbpf_get_stack\fP() can collect up to |
| \fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames, subject |
| to sufficient large buffer size. Note that |
| this limit can be controlled with the \fBsysctl\fP program, and |
| that it should be manually increased in order to profile long |
| user stacks (such as stacks for Java programs). To do so, use: |
| .INDENT 7.0 |
| .INDENT 3.5 |
| .sp |
| .nf |
| .ft C |
| # sysctl kernel.perf_event_max_stack=<new value> |
| .ft P |
| .fi |
| .UNINDENT |
| .UNINDENT |
| .TP |
| .B Return |
| A non\-negative value equal to or less than \fIsize\fP on success, |
| or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_skb_load_bytes_relative(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB, u32\fP \fIstart_header\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper is similar to \fBbpf_skb_load_bytes\fP() in that |
| it provides an easy way to load \fIlen\fP bytes from \fIoffset\fP |
| from the packet associated to \fIskb\fP, into the buffer pointed |
| by \fIto\fP\&. The difference to \fBbpf_skb_load_bytes\fP() is that |
| a fifth argument \fIstart_header\fP exists in order to select a |
| base offset to start from. \fIstart_header\fP can be one of: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_HDR_START_MAC\fP |
| Base offset to load data from is \fIskb\fP\(aqs mac header. |
| .TP |
| .B \fBBPF_HDR_START_NET\fP |
| Base offset to load data from is \fIskb\fP\(aqs network header. |
| .UNINDENT |
| .sp |
| In general, "direct packet access" is the preferred method to |
| access packet data, however, this helper is in particular useful |
| in socket filters where \fIskb\fP\fB\->data\fP does not always point |
| to the start of the mac header and where "direct packet access" |
| is not available. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_fib_lookup(void *\fP\fIctx\fP\fB, struct bpf_fib_lookup *\fP\fIparams\fP\fB, int\fP \fIplen\fP\fB, u32\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Do FIB lookup in kernel tables using parameters in \fIparams\fP\&. |
| If lookup is successful and result shows packet is to be |
| forwarded, the neighbor tables are searched for the nexthop. |
| If successful (ie., FIB lookup shows forwarding and nexthop |
| is resolved), the nexthop address is returned in ipv4_dst |
| or ipv6_dst based on family, smac is set to mac address of |
| egress device, dmac is set to nexthop mac address, rt_metric |
| is set to metric from route (IPv4/IPv6 only), and ifindex |
| is set to the device index of the nexthop from the FIB lookup. |
| .sp |
| \fIplen\fP argument is the size of the passed in struct. |
| \fIflags\fP argument can be a combination of one or more of the |
| following values: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_FIB_LOOKUP_DIRECT\fP |
| Do a direct table lookup vs full lookup using FIB |
| rules. |
| .TP |
| .B \fBBPF_FIB_LOOKUP_OUTPUT\fP |
| Perform lookup from an egress perspective (default is |
| ingress). |
| .UNINDENT |
| .sp |
| \fIctx\fP is either \fBstruct xdp_md\fP for XDP programs or |
| \fBstruct sk_buff\fP tc cls_act programs. |
| .TP |
| .B Return |
| .INDENT 7.0 |
| .IP \(bu 2 |
| < 0 if any input argument is invalid |
| .IP \(bu 2 |
| 0 on success (packet is forwarded, nexthop neighbor exists) |
| .IP \(bu 2 |
| > 0 one of \fBBPF_FIB_LKUP_RET_\fP codes explaining why the |
| packet is not forwarded or needs assist from full stack |
| .UNINDENT |
| .UNINDENT |
| .TP |
| .B \fBint bpf_sock_hash_update(struct bpf_sock_ops_kern *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Add an entry to, or update a sockhash \fImap\fP referencing sockets. |
| The \fIskops\fP is used as a new value for the entry associated to |
| \fIkey\fP\&. \fIflags\fP is one of: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_NOEXIST\fP |
| The entry for \fIkey\fP must not exist in the map. |
| .TP |
| .B \fBBPF_EXIST\fP |
| The entry for \fIkey\fP must already exist in the map. |
| .TP |
| .B \fBBPF_ANY\fP |
| No condition on the existence of the entry for \fIkey\fP\&. |
| .UNINDENT |
| .sp |
| If the \fImap\fP has eBPF programs (parser and verdict), those will |
| be inherited by the socket being added. If the socket is |
| already attached to eBPF programs, this results in an error. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_msg_redirect_hash(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper is used in programs implementing policies at the |
| socket level. If the message \fImsg\fP is allowed to pass (i.e. if |
| the verdict eBPF program returns \fBSK_PASS\fP), redirect it to |
| the socket referenced by \fImap\fP (of type |
| \fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and |
| egress interfaces can be used for redirection. The |
| \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the |
| distinction (ingress path is selected if the flag is present, |
| egress path otherwise). This is the only flag supported for now. |
| .TP |
| .B Return |
| \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_sk_redirect_hash(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper is used in programs implementing policies at the |
| skb socket level. If the sk_buff \fIskb\fP is allowed to pass (i.e. |
| if the verdeict eBPF program returns \fBSK_PASS\fP), redirect it |
| to the socket referenced by \fImap\fP (of type |
| \fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and |
| egress interfaces can be used for redirection. The |
| \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the |
| distinction (ingress path is selected if the flag is present, |
| egress otherwise). This is the only flag supported for now. |
| .TP |
| .B Return |
| \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_lwt_push_encap(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB, void *\fP\fIhdr\fP\fB, u32\fP \fIlen\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Encapsulate the packet associated to \fIskb\fP within a Layer 3 |
| protocol header. This header is provided in the buffer at |
| address \fIhdr\fP, with \fIlen\fP its size in bytes. \fItype\fP indicates |
| the protocol of the header and can be one of: |
| .INDENT 7.0 |
| .TP |
| .B \fBBPF_LWT_ENCAP_SEG6\fP |
| IPv6 encapsulation with Segment Routing Header |
| (\fBstruct ipv6_sr_hdr\fP). \fIhdr\fP only contains the SRH, |
| the IPv6 header is computed by the kernel. |
| .TP |
| .B \fBBPF_LWT_ENCAP_SEG6_INLINE\fP |
| Only works if \fIskb\fP contains an IPv6 packet. Insert a |
| Segment Routing Header (\fBstruct ipv6_sr_hdr\fP) inside |
| the IPv6 header. |
| .UNINDENT |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_lwt_seg6_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Store \fIlen\fP bytes from address \fIfrom\fP into the packet |
| associated to \fIskb\fP, at \fIoffset\fP\&. Only the flags, tag and TLVs |
| inside the outermost IPv6 Segment Routing Header can be |
| modified through this helper. |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_lwt_seg6_adjust_srh(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, s32\fP \fIdelta\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Adjust the size allocated to TLVs in the outermost IPv6 |
| Segment Routing Header contained in the packet associated to |
| \fIskb\fP, at position \fIoffset\fP by \fIdelta\fP bytes. Only offsets |
| after the segments are accepted. \fIdelta\fP can be as well |
| positive (growing) as negative (shrinking). |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_lwt_seg6_action(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIaction\fP\fB, void *\fP\fIparam\fP\fB, u32\fP \fIparam_len\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Apply an IPv6 Segment Routing action of type \fIaction\fP to the |
| packet associated to \fIskb\fP\&. Each action takes a parameter |
| contained at address \fIparam\fP, and of length \fIparam_len\fP bytes. |
| \fIaction\fP can be one of: |
| .INDENT 7.0 |
| .TP |
| .B \fBSEG6_LOCAL_ACTION_END_X\fP |
| End.X action: Endpoint with Layer\-3 cross\-connect. |
| Type of \fIparam\fP: \fBstruct in6_addr\fP\&. |
| .TP |
| .B \fBSEG6_LOCAL_ACTION_END_T\fP |
| End.T action: Endpoint with specific IPv6 table lookup. |
| Type of \fIparam\fP: \fBint\fP\&. |
| .TP |
| .B \fBSEG6_LOCAL_ACTION_END_B6\fP |
| End.B6 action: Endpoint bound to an SRv6 policy. |
| Type of param: \fBstruct ipv6_sr_hdr\fP\&. |
| .TP |
| .B \fBSEG6_LOCAL_ACTION_END_B6_ENCAP\fP |
| End.B6.Encap action: Endpoint bound to an SRv6 |
| encapsulation policy. |
| Type of param: \fBstruct ipv6_sr_hdr\fP\&. |
| .UNINDENT |
| .sp |
| A call to this helper is susceptible to change the underlaying |
| packet buffer. Therefore, at load time, all checks on pointers |
| previously done by the verifier are invalidated and must be |
| performed again, if the helper is used in combination with |
| direct packet access. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_rc_keydown(void *\fP\fIctx\fP\fB, u32\fP \fIprotocol\fP\fB, u64\fP \fIscancode\fP\fB, u32\fP \fItoggle\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper is used in programs implementing IR decoding, to |
| report a successfully decoded key press with \fIscancode\fP, |
| \fItoggle\fP value in the given \fIprotocol\fP\&. The scancode will be |
| translated to a keycode using the rc keymap, and reported as |
| an input key down event. After a period a key up event is |
| generated. This period can be extended by calling either |
| \fBbpf_rc_keydown\fP() again with the same values, or calling |
| \fBbpf_rc_repeat\fP(). |
| .sp |
| Some protocols include a toggle bit, in case the button was |
| released and pressed again between consecutive scancodes. |
| .sp |
| The \fIctx\fP should point to the lirc sample as passed into |
| the program. |
| .sp |
| The \fIprotocol\fP is the decoded protocol number (see |
| \fBenum rc_proto\fP for some predefined values). |
| .sp |
| This helper is only available is the kernel was compiled with |
| the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to |
| "\fBy\fP". |
| .TP |
| .B Return |
| .UNINDENT |
| .TP |
| .B \fBint bpf_rc_repeat(void *\fP\fIctx\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper is used in programs implementing IR decoding, to |
| report a successfully decoded repeat key message. This delays |
| the generation of a key up event for previously generated |
| key down event. |
| .sp |
| Some IR protocols like NEC have a special IR message for |
| repeating last button, for when a button is held down. |
| .sp |
| The \fIctx\fP should point to the lirc sample as passed into |
| the program. |
| .sp |
| This helper is only available is the kernel was compiled with |
| the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to |
| "\fBy\fP". |
| .TP |
| .B Return |
| .UNINDENT |
| .TP |
| .B \fBuint64_t bpf_skb_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Return the cgroup v2 id of the socket associated with the \fIskb\fP\&. |
| This is roughly similar to the \fBbpf_get_cgroup_classid\fP() |
| helper for cgroup v1 by providing a tag resp. identifier that |
| can be matched on or used for map lookups e.g. to implement |
| policy. The cgroup v2 id of a given path in the hierarchy is |
| exposed in user space through the f_handle API in order to get |
| to the same 64\-bit id. |
| .sp |
| This helper can be used on TC egress path, but not on ingress, |
| and is available only if the kernel was compiled with the |
| \fBCONFIG_SOCK_CGROUP_DATA\fP configuration option. |
| .TP |
| .B Return |
| The id is returned or 0 in case the id could not be retrieved. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_skb_ancestor_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB, int\fP \fIancestor_level\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Return id of cgroup v2 that is ancestor of cgroup associated |
| with the \fIskb\fP at the \fIancestor_level\fP\&. The root cgroup is at |
| \fIancestor_level\fP zero and each step down the hierarchy |
| increments the level. If \fIancestor_level\fP == level of cgroup |
| associated with \fIskb\fP, then return value will be same as that |
| of \fBbpf_skb_cgroup_id\fP(). |
| .sp |
| The helper is useful to implement policies based on cgroups |
| that are upper in hierarchy than immediate cgroup associated |
| with \fIskb\fP\&. |
| .sp |
| The format of returned id and helper limitations are same as in |
| \fBbpf_skb_cgroup_id\fP(). |
| .TP |
| .B Return |
| The id is returned or 0 in case the id could not be retrieved. |
| .UNINDENT |
| .TP |
| .B \fBu64 bpf_get_current_cgroup_id(void)\fP |
| .INDENT 7.0 |
| .TP |
| .B Return |
| A 64\-bit integer containing the current cgroup id based |
| on the cgroup within which the current task is running. |
| .UNINDENT |
| .TP |
| .B \fBvoid* get_local_storage(void *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Get the pointer to the local storage area. |
| The type and the size of the local storage is defined |
| by the \fImap\fP argument. |
| The \fIflags\fP meaning is specific for each map type, |
| and has to be 0 for cgroup local storage. |
| .sp |
| Depending on the BPF program type, a local storage area |
| can be shared between multiple instances of the BPF program, |
| running simultaneously. |
| .sp |
| A user should care about the synchronization by themself. |
| For example, by using the \fBBPF_STX_XADD\fP instruction to alter |
| the shared data. |
| .TP |
| .B Return |
| A pointer to the local storage area. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_sk_select_reuseport(struct sk_reuseport_md *\fP\fIreuse\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Select a \fBSO_REUSEPORT\fP socket from a |
| \fBBPF_MAP_TYPE_REUSEPORT_ARRAY\fP \fImap\fP\&. |
| It checks the selected socket is matching the incoming |
| request in the socket buffer. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBstruct bpf_sock *bpf_sk_lookup_tcp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Look for TCP socket matching \fItuple\fP, optionally in a child |
| network namespace \fInetns\fP\&. The return value must be checked, |
| and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP(). |
| .sp |
| The \fIctx\fP should point to the context of the program, such as |
| the skb or socket (depending on the hook in use). This is used |
| to determine the base network namespace for the lookup. |
| .sp |
| \fItuple_size\fP must be one of: |
| .INDENT 7.0 |
| .TP |
| .B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP) |
| Look for an IPv4 socket. |
| .TP |
| .B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP) |
| Look for an IPv6 socket. |
| .UNINDENT |
| .sp |
| If the \fInetns\fP is a negative signed 32\-bit integer, then the |
| socket lookup table in the netns associated with the \fIctx\fP will |
| will be used. For the TC hooks, this is the netns of the device |
| in the skb. For socket hooks, this is the netns of the socket. |
| If \fInetns\fP is any other signed 32\-bit value greater than or |
| equal to zero then it specifies the ID of the netns relative to |
| the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the |
| range of 32\-bit integers are reserved for future use. |
| .sp |
| All values for \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .sp |
| This helper is available only if the kernel was compiled with |
| \fBCONFIG_NET\fP configuration option. |
| .TP |
| .B Return |
| Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure. |
| For sockets with reuseport option, the \fBstruct bpf_sock\fP |
| result is from \fBreuse\->socks\fP[] using the hash of the tuple. |
| .UNINDENT |
| .TP |
| .B \fBstruct bpf_sock *bpf_sk_lookup_udp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Look for UDP socket matching \fItuple\fP, optionally in a child |
| network namespace \fInetns\fP\&. The return value must be checked, |
| and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP(). |
| .sp |
| The \fIctx\fP should point to the context of the program, such as |
| the skb or socket (depending on the hook in use). This is used |
| to determine the base network namespace for the lookup. |
| .sp |
| \fItuple_size\fP must be one of: |
| .INDENT 7.0 |
| .TP |
| .B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP) |
| Look for an IPv4 socket. |
| .TP |
| .B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP) |
| Look for an IPv6 socket. |
| .UNINDENT |
| .sp |
| If the \fInetns\fP is a negative signed 32\-bit integer, then the |
| socket lookup table in the netns associated with the \fIctx\fP will |
| will be used. For the TC hooks, this is the netns of the device |
| in the skb. For socket hooks, this is the netns of the socket. |
| If \fInetns\fP is any other signed 32\-bit value greater than or |
| equal to zero then it specifies the ID of the netns relative to |
| the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the |
| range of 32\-bit integers are reserved for future use. |
| .sp |
| All values for \fIflags\fP are reserved for future usage, and must |
| be left at zero. |
| .sp |
| This helper is available only if the kernel was compiled with |
| \fBCONFIG_NET\fP configuration option. |
| .TP |
| .B Return |
| Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure. |
| For sockets with reuseport option, the \fBstruct bpf_sock\fP |
| result is from \fBreuse\->socks\fP[] using the hash of the tuple. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_sk_release(struct bpf_sock *\fP\fIsock\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Release the reference held by \fIsock\fP\&. \fIsock\fP must be a |
| non\-\fBNULL\fP pointer that was returned from |
| \fBbpf_sk_lookup_xxx\fP(). |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_map_pop_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Pop an element from \fImap\fP\&. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_map_peek_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Get an element from \fImap\fP without removing it. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_msg_push_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| For socket policies, insert \fIlen\fP bytes into \fImsg\fP at offset |
| \fIstart\fP\&. |
| .sp |
| If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a |
| \fImsg\fP it may want to insert metadata or options into the \fImsg\fP\&. |
| This can later be read and used by any of the lower layer BPF |
| hooks. |
| .sp |
| This helper may fail if under memory pressure (a malloc |
| fails) in these cases BPF programs will get an appropriate |
| error and BPF programs will need to handle them. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_msg_pop_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIpop\fP\fB, u64\fP \fIflags\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| Will remove \fIpop\fP bytes from a \fImsg\fP starting at byte \fIstart\fP\&. |
| This may result in \fBENOMEM\fP errors under certain situations if |
| an allocation and copy are required due to a full ring buffer. |
| However, the helper will try to avoid doing the allocation |
| if possible. Other errors can occur if input parameters are |
| invalid either due to \fIstart\fP byte not being valid part of \fImsg\fP |
| payload and/or \fIpop\fP value being to large. |
| .TP |
| .B Return |
| 0 on success, or a negative error in case of failure. |
| .UNINDENT |
| .TP |
| .B \fBint bpf_rc_pointer_rel(void *\fP\fIctx\fP\fB, s32\fP \fIrel_x\fP\fB, s32\fP \fIrel_y\fP\fB)\fP |
| .INDENT 7.0 |
| .TP |
| .B Description |
| This helper is used in programs implementing IR decoding, to |
| report a successfully decoded pointer movement. |
| .sp |
| The \fIctx\fP should point to the lirc sample as passed into |
| the program. |
| .sp |
| This helper is only available is the kernel was compiled with |
| the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to |
| "\fBy\fP". |
| .TP |
| .B Return |
| .UNINDENT |
| .UNINDENT |
| .SH EXAMPLES |
| .sp |
| Example usage for most of the eBPF helpers listed in this manual page are |
| available within the Linux kernel sources, at the following locations: |
| .INDENT 0.0 |
| .IP \(bu 2 |
| \fIsamples/bpf/\fP |
| .IP \(bu 2 |
| \fItools/testing/selftests/bpf/\fP |
| .UNINDENT |
| .SH LICENSE |
| .sp |
| eBPF programs can have an associated license, passed along with the bytecode |
| instructions to the kernel when the programs are loaded. The format for that |
| string is identical to the one in use for kernel modules (Dual licenses, such |
| as "Dual BSD/GPL", may be used). Some helper functions are only accessible to |
| programs that are compatible with the GNU Privacy License (GPL). |
| .sp |
| In order to use such helpers, the eBPF program must be loaded with the correct |
| license string passed (via \fBattr\fP) to the \fBbpf\fP() system call, and this |
| generally translates into the C source code of the program containing a line |
| similar to the following: |
| .INDENT 0.0 |
| .INDENT 3.5 |
| .sp |
| .nf |
| .ft C |
| char ____license[] __attribute__((section("license"), used)) = "GPL"; |
| .ft P |
| .fi |
| .UNINDENT |
| .UNINDENT |
| .SH IMPLEMENTATION |
| .sp |
| This manual page is an effort to document the existing eBPF helper functions. |
| But as of this writing, the BPF sub\-system is under heavy development. New eBPF |
| program or map types are added, along with new helper functions. Some helpers |
| are occasionally made available for additional program types. So in spite of |
| the efforts of the community, this page might not be up\-to\-date. If you want to |
| check by yourself what helper functions exist in your kernel, or what types of |
| programs they can support, here are some files among the kernel tree that you |
| may be interested in: |
| .INDENT 0.0 |
| .IP \(bu 2 |
| \fIinclude/uapi/linux/bpf.h\fP is the main BPF header. It contains the full list |
| of all helper functions, as well as many other BPF definitions including most |
| of the flags, structs or constants used by the helpers. |
| .IP \(bu 2 |
| \fInet/core/filter.c\fP contains the definition of most network\-related helper |
| functions, and the list of program types from which they can be used. |
| .IP \(bu 2 |
| \fIkernel/trace/bpf_trace.c\fP is the equivalent for most tracing program\-related |
| helpers. |
| .IP \(bu 2 |
| \fIkernel/bpf/verifier.c\fP contains the functions used to check that valid types |
| of eBPF maps are used with a given helper function. |
| .IP \(bu 2 |
| \fIkernel/bpf/\fP directory contains other files in which additional helpers are |
| defined (for cgroups, sockmaps, etc.). |
| .UNINDENT |
| .sp |
| Compatibility between helper functions and program types can generally be found |
| in the files where helper functions are defined. Look for the \fBstruct |
| bpf_func_proto\fP objects and for functions returning them: these functions |
| contain a list of helpers that a given program type can call. Note that the |
| \fBdefault:\fP label of the \fBswitch ... case\fP used to filter helpers can call |
| other functions, themselves allowing access to additional helpers. The |
| requirement for GPL license is also in those \fBstruct bpf_func_proto\fP\&. |
| .sp |
| Compatibility between helper functions and map types can be found in the |
| \fBcheck_map_func_compatibility\fP() function in file \fIkernel/bpf/verifier.c\fP\&. |
| .sp |
| Helper functions that invalidate the checks on \fBdata\fP and \fBdata_end\fP |
| pointers for network processing are listed in function |
| \fBbpf_helper_changes_pkt_data\fP() in file \fInet/core/filter.c\fP\&. |
| .SH SEE ALSO |
| .sp |
| \fBbpf\fP(2), |
| \fBcgroups\fP(7), |
| \fBip\fP(8), |
| \fBperf_event_open\fP(2), |
| \fBsendmsg\fP(2), |
| \fBsocket\fP(7), |
| \fBtc\-bpf\fP(8) |
| .\" Generated by docutils manpage writer. |