|  | .. SPDX-License-Identifier: GPL-2.0-only | 
|  | .. Copyright (C) 2022 Red Hat, Inc. | 
|  |  | 
|  | ================================================= | 
|  | BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH | 
|  | ================================================= | 
|  |  | 
|  | .. note:: | 
|  | - ``BPF_MAP_TYPE_DEVMAP`` was introduced in kernel version 4.14 | 
|  | - ``BPF_MAP_TYPE_DEVMAP_HASH`` was introduced in kernel version 5.4 | 
|  |  | 
|  | ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` are BPF maps primarily | 
|  | used as backend maps for the XDP BPF helper call ``bpf_redirect_map()``. | 
|  | ``BPF_MAP_TYPE_DEVMAP`` is backed by an array that uses the key as | 
|  | the index to lookup a reference to a net device. While ``BPF_MAP_TYPE_DEVMAP_HASH`` | 
|  | is backed by a hash table that uses a key to lookup a reference to a net device. | 
|  | The user provides either <``key``/ ``ifindex``> or <``key``/ ``struct bpf_devmap_val``> | 
|  | pairs to update the maps with new net devices. | 
|  |  | 
|  | .. note:: | 
|  | - The key to a hash map doesn't have to be an ``ifindex``. | 
|  | - While ``BPF_MAP_TYPE_DEVMAP_HASH`` allows for densely packing the net devices | 
|  | it comes at the cost of a hash of the key when performing a look up. | 
|  |  | 
|  | The setup and packet enqueue/send code is shared between the two types of | 
|  | devmap; only the lookup and insertion is different. | 
|  |  | 
|  | Usage | 
|  | ===== | 
|  | Kernel BPF | 
|  | ---------- | 
|  | bpf_redirect_map() | 
|  | ^^^^^^^^^^^^^^^^^^ | 
|  | .. code-block:: c | 
|  |  | 
|  | long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) | 
|  |  | 
|  | Redirect the packet to the endpoint referenced by ``map`` at index ``key``. | 
|  | For ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` this map contains | 
|  | references to net devices (for forwarding packets through other ports). | 
|  |  | 
|  | The lower two bits of *flags* are used as the return code if the map lookup | 
|  | fails. This is so that the return value can be one of the XDP program return | 
|  | codes up to ``XDP_TX``, as chosen by the caller. The higher bits of ``flags`` | 
|  | can be set to ``BPF_F_BROADCAST`` or ``BPF_F_EXCLUDE_INGRESS`` as defined | 
|  | below. | 
|  |  | 
|  | With ``BPF_F_BROADCAST`` the packet will be broadcast to all the interfaces | 
|  | in the map, with ``BPF_F_EXCLUDE_INGRESS`` the ingress interface will be excluded | 
|  | from the broadcast. | 
|  |  | 
|  | .. note:: | 
|  | - The key is ignored if BPF_F_BROADCAST is set. | 
|  | - The broadcast feature can also be used to implement multicast forwarding: | 
|  | simply create multiple DEVMAPs, each one corresponding to a single multicast group. | 
|  |  | 
|  | This helper will return ``XDP_REDIRECT`` on success, or the value of the two | 
|  | lower bits of the ``flags`` argument if the map lookup fails. | 
|  |  | 
|  | More information about redirection can be found :doc:`redirect` | 
|  |  | 
|  | bpf_map_lookup_elem() | 
|  | ^^^^^^^^^^^^^^^^^^^^^ | 
|  | .. code-block:: c | 
|  |  | 
|  | void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) | 
|  |  | 
|  | Net device entries can be retrieved using the ``bpf_map_lookup_elem()`` | 
|  | helper. | 
|  |  | 
|  | User space | 
|  | ---------- | 
|  | .. note:: | 
|  | DEVMAP entries can only be updated/deleted from user space and not | 
|  | from an eBPF program. Trying to call these functions from a kernel eBPF | 
|  | program will result in the program failing to load and a verifier warning. | 
|  |  | 
|  | bpf_map_update_elem() | 
|  | ^^^^^^^^^^^^^^^^^^^^^ | 
|  | .. code-block:: c | 
|  |  | 
|  | int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags); | 
|  |  | 
|  | Net device entries can be added or updated using the ``bpf_map_update_elem()`` | 
|  | helper. This helper replaces existing elements atomically. The ``value`` parameter | 
|  | can be ``struct bpf_devmap_val`` or a simple ``int ifindex`` for backwards | 
|  | compatibility. | 
|  |  | 
|  | .. code-block:: c | 
|  |  | 
|  | struct bpf_devmap_val { | 
|  | __u32 ifindex;   /* device index */ | 
|  | union { | 
|  | int   fd;  /* prog fd on map write */ | 
|  | __u32 id;  /* prog id on map read */ | 
|  | } bpf_prog; | 
|  | }; | 
|  |  | 
|  | The ``flags`` argument can be one of the following: | 
|  | - ``BPF_ANY``: Create a new element or update an existing element. | 
|  | - ``BPF_NOEXIST``: Create a new element only if it did not exist. | 
|  | - ``BPF_EXIST``: Update an existing element. | 
|  |  | 
|  | DEVMAPs can associate a program with a device entry by adding a ``bpf_prog.fd`` | 
|  | to ``struct bpf_devmap_val``. Programs are run after ``XDP_REDIRECT`` and have | 
|  | access to both Rx device and Tx device. The  program associated with the ``fd`` | 
|  | must have type XDP with expected attach type ``xdp_devmap``. | 
|  | When a program is associated with a device index, the program is run on an | 
|  | ``XDP_REDIRECT`` and before the buffer is added to the per-cpu queue. Examples | 
|  | of how to attach/use xdp_devmap progs can be found in the kernel selftests: | 
|  |  | 
|  | - ``tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c`` | 
|  | - ``tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c`` | 
|  |  | 
|  | bpf_map_lookup_elem() | 
|  | ^^^^^^^^^^^^^^^^^^^^^ | 
|  | .. code-block:: c | 
|  |  | 
|  | .. c:function:: | 
|  | int bpf_map_lookup_elem(int fd, const void *key, void *value); | 
|  |  | 
|  | Net device entries can be retrieved using the ``bpf_map_lookup_elem()`` | 
|  | helper. | 
|  |  | 
|  | bpf_map_delete_elem() | 
|  | ^^^^^^^^^^^^^^^^^^^^^ | 
|  | .. code-block:: c | 
|  |  | 
|  | .. c:function:: | 
|  | int bpf_map_delete_elem(int fd, const void *key); | 
|  |  | 
|  | Net device entries can be deleted using the ``bpf_map_delete_elem()`` | 
|  | helper. This helper will return 0 on success, or negative error in case of | 
|  | failure. | 
|  |  | 
|  | Examples | 
|  | ======== | 
|  |  | 
|  | Kernel BPF | 
|  | ---------- | 
|  |  | 
|  | The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP`` | 
|  | called tx_port. | 
|  |  | 
|  | .. code-block:: c | 
|  |  | 
|  | struct { | 
|  | __uint(type, BPF_MAP_TYPE_DEVMAP); | 
|  | __type(key, __u32); | 
|  | __type(value, __u32); | 
|  | __uint(max_entries, 256); | 
|  | } tx_port SEC(".maps"); | 
|  |  | 
|  | The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP_HASH`` | 
|  | called forward_map. | 
|  |  | 
|  | .. code-block:: c | 
|  |  | 
|  | struct { | 
|  | __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); | 
|  | __type(key, __u32); | 
|  | __type(value, struct bpf_devmap_val); | 
|  | __uint(max_entries, 32); | 
|  | } forward_map SEC(".maps"); | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | The value type in the DEVMAP above is a ``struct bpf_devmap_val`` | 
|  |  | 
|  | The following code snippet shows a simple xdp_redirect_map program. This program | 
|  | would work with a user space program that populates the devmap ``forward_map`` based | 
|  | on ingress ifindexes. The BPF program (below) is redirecting packets using the | 
|  | ingress ``ifindex`` as the ``key``. | 
|  |  | 
|  | .. code-block:: c | 
|  |  | 
|  | SEC("xdp") | 
|  | int xdp_redirect_map_func(struct xdp_md *ctx) | 
|  | { | 
|  | int index = ctx->ingress_ifindex; | 
|  |  | 
|  | return bpf_redirect_map(&forward_map, index, 0); | 
|  | } | 
|  |  | 
|  | The following code snippet shows a BPF program that is broadcasting packets to | 
|  | all the interfaces in the ``tx_port`` devmap. | 
|  |  | 
|  | .. code-block:: c | 
|  |  | 
|  | SEC("xdp") | 
|  | int xdp_redirect_map_func(struct xdp_md *ctx) | 
|  | { | 
|  | return bpf_redirect_map(&tx_port, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS); | 
|  | } | 
|  |  | 
|  | User space | 
|  | ---------- | 
|  |  | 
|  | The following code snippet shows how to update a devmap called ``tx_port``. | 
|  |  | 
|  | .. code-block:: c | 
|  |  | 
|  | int update_devmap(int ifindex, int redirect_ifindex) | 
|  | { | 
|  | int ret; | 
|  |  | 
|  | ret = bpf_map_update_elem(bpf_map__fd(tx_port), &ifindex, &redirect_ifindex, 0); | 
|  | if (ret < 0) { | 
|  | fprintf(stderr, "Failed to update devmap_ value: %s\n", | 
|  | strerror(errno)); | 
|  | } | 
|  |  | 
|  | return ret; | 
|  | } | 
|  |  | 
|  | The following code snippet shows how to update a hash_devmap called ``forward_map``. | 
|  |  | 
|  | .. code-block:: c | 
|  |  | 
|  | int update_devmap(int ifindex, int redirect_ifindex) | 
|  | { | 
|  | struct bpf_devmap_val devmap_val = { .ifindex = redirect_ifindex }; | 
|  | int ret; | 
|  |  | 
|  | ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0); | 
|  | if (ret < 0) { | 
|  | fprintf(stderr, "Failed to update devmap_ value: %s\n", | 
|  | strerror(errno)); | 
|  | } | 
|  | return ret; | 
|  | } | 
|  |  | 
|  | References | 
|  | =========== | 
|  |  | 
|  | - https://lwn.net/Articles/728146/ | 
|  | - https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6f9d451ab1a33728adb72d7ff66a7b374d665176 | 
|  | - https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4106 |