| \section{Packed Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Packed Virtqueues} |
| |
| Packed virtqueues is an alternative compact virtqueue layout using |
| read-write memory, that is memory that is both read and written |
| by both host and guest. |
| |
| Use of packed virtqueues is negotiated by the VIRTIO_F_RING_PACKED |
| feature bit. |
| |
| Packed virtqueues support up to $2^{15}$ entries each. |
| |
| With current transports, virtqueues are located in guest memory |
| allocated by the driver. |
| Each packed virtqueue consists of three parts: |
| |
| \begin{itemize} |
| \item Descriptor Ring - occupies the Descriptor Area |
| \item Driver Event Suppression - occupies the Driver Area |
| \item Device Event Suppression - occupies the Device Area |
| \end{itemize} |
| |
| Where the Descriptor Ring in turn consists of descriptors, |
| and where each descriptor can contain the following parts: |
| |
| \begin{itemize} |
| \item Buffer ID |
| \item Element Address |
| \item Element Length |
| \item Flags |
| \end{itemize} |
| |
| A buffer consists of zero or more device-readable physically-contiguous |
| elements followed by zero or more physically-contiguous |
| device-writable elements (each buffer has at least one element). |
| |
| When the driver wants to send such a buffer to the device, it |
| writes at least one available descriptor describing elements of |
| the buffer into the Descriptor Ring. The descriptor(s) are |
| associated with a buffer by means of a Buffer ID stored within |
| the descriptor. |
| |
| The driver then notifies the device. When the device has finished |
| processing the buffer, it writes a used device descriptor |
| including the Buffer ID into the Descriptor Ring (overwriting a |
| driver descriptor previously made available), and sends a |
| used event notification. |
| |
| The Descriptor Ring is used in a circular manner: the driver writes |
| descriptors into the ring in order. After reaching the end of the ring, |
| the next descriptor is placed at the head of the ring. Once the ring is |
| full of driver descriptors, the driver stops sending new requests and |
| waits for the device to start processing descriptors and to write out |
| some used descriptors before making new driver descriptors |
| available. |
| |
| Similarly, the device reads descriptors from the ring in order and |
| detects that a driver descriptor has been made available. As |
| processing of descriptors is completed, used descriptors are |
| written by the device back into the ring. |
| |
| Note: after reading driver descriptors and starting their |
| processing in order, the device might complete their processing out |
| of order. Used device descriptors are written in the order |
| in which their processing is complete. |
| |
| The Device Event Suppression data structure is write-only by the |
| device. It includes information for reducing the number of |
| device events, i.e., sending fewer available buffer notifications |
| to the device. |
| |
| The Driver Event Suppression data structure is read-only by the |
| device. It includes information for reducing the number of |
| driver events, i.e., sending fewer used buffer notifications |
| to the driver. |
| |
| \subsection{Driver and Device Ring Wrap Counters} |
| \label{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters} |
| Each of the driver and the device are expected to maintain, |
| internally, a single-bit ring wrap counter initialized to 1. |
| |
| The counter maintained by the driver is called the Driver |
| Ring Wrap Counter. The driver changes the value of this counter |
| each time it makes available the |
| last descriptor in the ring (after making the last descriptor |
| available). |
| |
| The counter maintained by the device is called the Device Ring Wrap |
| Counter. The device changes the value of this counter |
| each time it uses the last descriptor in |
| the ring (after marking the last descriptor used). |
| |
| It is easy to see that the Driver Ring Wrap Counter in the driver matches |
| the Device Ring Wrap Counter in the device when both are processing the same |
| descriptor, or when all available descriptors have been used. |
| |
| To mark a descriptor as available and used, both the driver and |
| the device use the following two flags: |
| \begin{lstlisting} |
| #define VIRTQ_DESC_F_AVAIL (1 << 7) |
| #define VIRTQ_DESC_F_USED (1 << 15) |
| \end{lstlisting} |
| |
| To mark a descriptor as available, the driver sets the |
| VIRTQ_DESC_F_AVAIL bit in Flags to match the internal Driver |
| Ring Wrap Counter. It also sets the VIRTQ_DESC_F_USED bit to match the |
| \emph{inverse} value (i.e. to not match the internal Driver Ring |
| Wrap Counter). |
| |
| To mark a descriptor as used, the device sets the |
| VIRTQ_DESC_F_USED bit in Flags to match the internal Device |
| Ring Wrap Counter. It also sets the VIRTQ_DESC_F_AVAIL bit to match the |
| \emph{same} value. |
| |
| Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different |
| for an available descriptor and equal for a used descriptor. |
| |
| Note that this observation is mostly useful for sanity-checking |
| as these are necessary but not sufficient conditions - for |
| example, all descriptors are zero-initialized. To detect used and |
| available descriptors it is possible for drivers and devices to |
| keep track of the last observed value of |
| VIRTQ_DESC_F_USED/VIRTQ_DESC_F_AVAIL. Other techniques to detect |
| VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes might also be |
| possible. |
| |
| \subsection{Polling of available and used descriptors} |
| \label{sec:Packed Virtqueues / Polling of available and used descriptors} |
| |
| Writes of device and driver descriptors can generally be |
| reordered, but each side (driver and device) are only required to |
| poll (or test) a single location in memory: the next device descriptor after |
| the one they processed previously, in circular order. |
| |
| Sometimes the device needs to only write out a single used descriptor |
| after processing a batch of multiple available descriptors. As |
| described in more detail below, this can happen when using |
| descriptor chaining or with in-order |
| use of descriptors. In this case, the device writes out a used |
| descriptor with the buffer id of the last descriptor in the group. |
| After processing the used descriptor, both device and driver then |
| skip forward in the ring the number of the remaining descriptors |
| in the group until processing (reading for the driver and writing |
| for the device) the next used descriptor. |
| |
| \subsection{Write Flag} |
| \label{sec:Packed Virtqueues / Write Flag} |
| |
| In an available descriptor, the VIRTQ_DESC_F_WRITE bit within Flags |
| is used to mark a descriptor as corresponding to a write-only or |
| read-only element of a buffer. |
| |
| \begin{lstlisting} |
| /* This marks a descriptor as device write-only (otherwise device read-only). */ |
| #define VIRTQ_DESC_F_WRITE 2 |
| \end{lstlisting} |
| |
| In a used descriptor, this bit is used to specify whether any |
| data has been written by the device into any parts of the buffer. |
| |
| |
| \subsection{Element Address and Length} |
| \label{sec:Packed Virtqueues / Element Address and Length} |
| |
| In an available descriptor, Element Address corresponds to the |
| physical address of the buffer element. The length of the element assumed |
| to be physically contiguous is stored in Element Length. |
| |
| In a used descriptor, Element Address is unused. Element Length |
| specifies the length of the buffer that has been initialized |
| (written to) by the device. |
| |
| Element Length is reserved for used descriptors without the |
| VIRTQ_DESC_F_WRITE flag, and is ignored by drivers. |
| |
| \subsection{Scatter-Gather Support} |
| \label{sec:Packed Virtqueues / Scatter-Gather Support} |
| |
| Some drivers need an ability to supply a list of multiple buffer |
| elements (also known as a scatter/gather list) with a request. |
| Two features support this: descriptor chaining and indirect descriptors. |
| |
| If neither feature is in use by the driver, each buffer is |
| physically-contiguous, either read-only or write-only and is |
| described completely by a single descriptor. |
| |
| While unusual (most implementations either create all lists |
| solely using non-indirect descriptors, or always use a single |
| indirect element), if both features have been negotiated, mixing |
| indirect and non-indirect descriptors in a ring is valid, as long as each |
| list only contains descriptors of a given type. |
| |
| Scatter/gather lists only apply to available descriptors. A |
| single used descriptor corresponds to the whole list. |
| |
| The device limits the number of descriptors in a list through a |
| transport-specific and/or device-specific value. If not limited, |
| the maximum number of descriptors in a list is the virt queue |
| size. |
| |
| \subsection{Next Flag: Descriptor Chaining} |
| \label{sec:Packed Virtqueues / Next Flag: Descriptor Chaining} |
| |
| The packed ring format allows the driver to supply |
| a scatter/gather list to the device |
| by using multiple descriptors, and setting the VIRTQ_DESC_F_NEXT bit in |
| Flags for all but the last available descriptor. |
| |
| \begin{lstlisting} |
| /* This marks a buffer as continuing. */ |
| #define VIRTQ_DESC_F_NEXT 1 |
| \end{lstlisting} |
| |
| Buffer ID is included in the last descriptor in the list. |
| |
| The driver always makes the first descriptor in the list |
| available after the rest of the list has been written out into |
| the ring. This guarantees that the device will never observe a |
| partial scatter/gather list in the ring. |
| |
| Note: all flags, including VIRTQ_DESC_F_AVAIL, VIRTQ_DESC_F_USED, |
| VIRTQ_DESC_F_WRITE must be set/cleared correctly in all |
| descriptors in the list, not just the first one. |
| |
| The device only writes out a single used descriptor for the whole |
| list. It then skips forward according to the number of |
| descriptors in the list. The driver needs to keep track of the size |
| of the list corresponding to each buffer ID, to be able to skip |
| to where the next used descriptor is written by the device. |
| |
| For example, if descriptors are used in the same order in which |
| they are made available, this will result in the used descriptor |
| overwriting the first available descriptor in the list, the used |
| descriptor for the next list overwriting the first available |
| descriptor in the next list, etc. |
| |
| VIRTQ_DESC_F_NEXT is reserved in used descriptors, and |
| should be ignored by drivers. |
| |
| \subsection{Indirect Flag: Scatter-Gather Support} |
| \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support} |
| |
| Some devices benefit by concurrently dispatching a large number |
| of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase |
| ring capacity the driver can store a (read-only by the device) table of indirect |
| descriptors anywhere in memory, and insert a descriptor in the main |
| virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to |
| a buffer element |
| containing this indirect descriptor table; \field{addr} and \field{len} |
| refer to the indirect table address and length in bytes, |
| respectively. |
| \begin{lstlisting} |
| /* This means the element contains a table of descriptors. */ |
| #define VIRTQ_DESC_F_INDIRECT 4 |
| \end{lstlisting} |
| |
| The indirect table layout structure looks like this |
| (\field{len} is the Buffer Length of the descriptor that refers to this table, |
| which is a variable): |
| |
| \begin{lstlisting} |
| struct pvirtq_indirect_descriptor_table { |
| /* The actual descriptor structures (struct pvirtq_desc each) */ |
| struct pvirtq_desc desc[len / sizeof(struct pvirtq_desc)]; |
| }; |
| \end{lstlisting} |
| |
| The first descriptor is located at the start of the indirect |
| descriptor table, additional indirect descriptors come |
| immediately afterwards. The VIRTQ_DESC_F_WRITE \field{flags} bit is the |
| only valid flag for descriptors in the indirect table. Others |
| are reserved and are ignored by the device. |
| Buffer ID is also reserved and is ignored by the device. |
| |
| In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE |
| is reserved and is ignored by the device. |
| |
| \subsection{In-order use of descriptors} |
| \label{sec:Packed Virtqueues / In-order use of descriptors} |
| |
| Some devices always use descriptors in the same order in which |
| they have been made available. These devices can offer the |
| VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows |
| devices to notify the use of a batch of buffers to the driver by |
| only writing out a single used descriptor with the Buffer ID |
| corresponding to the last descriptor in the batch. |
| |
| The device then skips forward in the ring according to the size of |
| the batch. The driver needs to look up the used Buffer ID and |
| calculate the batch size to be able to advance to where the next |
| used descriptor will be written by the device. |
| |
| This will result in the used descriptor overwriting the first |
| available descriptor in the batch, the used descriptor for the |
| next batch overwriting the first available descriptor in the next |
| batch, etc. |
| |
| The skipped buffers (for which no used descriptor was written) |
| are assumed to have been used (read or written) by the |
| device completely. |
| |
| \subsection{Multi-buffer requests} |
| \label{sec:Packed Virtqueues / Multi-buffer requests} |
| Some devices combine multiple buffers as part of processing of a |
| single request. These devices always mark the descriptor |
| corresponding to the first buffer in the request used after the |
| rest of the descriptors (corresponding to rest of the buffers) in |
| the request - which follow the first descriptor in ring order - |
| has been marked used and written out into the ring. This |
| guarantees that the driver will never observe a partial request |
| in the ring. |
| |
| \subsection{Driver and Device Event Suppression} |
| \label{sec:Packed Virtqueues / Driver and Device Event Suppression} |
| In many systems used and available buffer notifications involve |
| significant overhead. To mitigate this overhead, |
| each virtqueue includes two identical structures used for |
| controlling notifications between the device and the driver. |
| |
| The Driver Event Suppression structure is read-only by the |
| device and controls the used buffer notifications sent by the device |
| to the driver. |
| |
| The Device Event Suppression structure is read-only by |
| the driver and controls the available buffer notifications sent by the |
| driver to the device. |
| |
| Each of these Event Suppression structures includes the following fields: |
| |
| \begin{description} |
| \item [Descriptor Ring Change Event Flags] Takes values: |
| \begin{lstlisting} |
| /* Enable events */ |
| #define RING_EVENT_FLAGS_ENABLE 0x0 |
| /* Disable events */ |
| #define RING_EVENT_FLAGS_DISABLE 0x1 |
| /* |
| * Enable events for a specific descriptor |
| * (as specified by Descriptor Ring Change Event Offset/Wrap Counter). |
| * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated. |
| */ |
| #define RING_EVENT_FLAGS_DESC 0x2 |
| /* The value 0x3 is reserved */ |
| \end{lstlisting} |
| \item [Descriptor Ring Change Event Offset] If Event Flags set to descriptor |
| specific event: offset within the ring (in units of descriptor |
| size). Event will only trigger when this descriptor is |
| made available/used respectively. |
| \item [Descriptor Ring Change Event Wrap Counter] If Event Flags set to descriptor |
| specific event: offset within the ring (in units of descriptor |
| size). Event will only trigger when Ring Wrap Counter |
| matches this value and a descriptor is |
| made available/used respectively. |
| \end{description} |
| |
| After writing out some descriptors, both the device and the driver |
| are expected to consult the relevant structure to find out |
| whether a used respectively an available buffer notification should be sent. |
| |
| \subsubsection{Structure Size and Alignment} |
| \label{sec:Packed Virtqueues / Structure Size and Alignment} |
| |
| Each part of the virtqueue is physically-contiguous in guest memory, |
| and has different alignment requirements. |
| |
| The memory alignment and size requirements, in bytes, of each part of the |
| virtqueue are summarized in the following table: |
| |
| \begin{tabular}{|l|l|l|} |
| \hline |
| Virtqueue Part & Alignment & Size \\ |
| \hline \hline |
| Descriptor Ring & 16 & $16 * $(Queue Size) \\ |
| \hline |
| Device Event Suppression & 4 & 4 \\ |
| \hline |
| Driver Event Suppression & 4 & 4 \\ |
| \hline |
| \end{tabular} |
| |
| The Alignment column gives the minimum alignment for each part |
| of the virtqueue. |
| |
| The Size column gives the total number of bytes for each |
| part of the virtqueue. |
| |
| Queue Size corresponds to the maximum number of descriptors in the |
| virtqueue\footnote{For example, if Queue Size is 4 then at most 4 buffers |
| can be queued at any given time.}. The Queue Size value does not |
| have to be a power of 2. |
| |
| \drivernormative{\subsection}{Virtqueues}{Basic Facilities of a |
| Virtio Device / Packed Virtqueues} |
| The driver MUST ensure that the physical address of the first byte |
| of each virtqueue part is a multiple of the specified alignment value |
| in the above table. |
| |
| \devicenormative{\subsection}{Virtqueues}{Basic Facilities of a |
| Virtio Device / Packed Virtqueues} |
| The device MUST start processing driver descriptors in the order |
| in which they appear in the ring. |
| The device MUST start writing device descriptors into the ring in |
| the order in which they complete. |
| The device MAY reorder descriptor writes once they are started. |
| |
| \subsection{The Virtqueue Descriptor Format}\label{sec:Basic |
| Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue |
| Descriptor Format} |
| |
| The available descriptor refers to the buffers the driver is sending |
| to the device. \field{addr} is a physical address, and the |
| descriptor is identified with a buffer using the \field{id} field. |
| |
| \begin{lstlisting} |
| struct pvirtq_desc { |
| /* Buffer Address. */ |
| le64 addr; |
| /* Buffer Length. */ |
| le32 len; |
| /* Buffer ID. */ |
| le16 id; |
| /* The flags depending on descriptor type. */ |
| le16 flags; |
| }; |
| \end{lstlisting} |
| |
| The descriptor ring is zero-initialized. |
| |
| \subsection{Event Suppression Structure Format}\label{sec:Basic |
| Facilities of a Virtio Device / Packed Virtqueues / Event Suppression Structure |
| Format} |
| |
| The following structure is used to reduce the number of |
| notifications sent between driver and device. |
| |
| \begin{lstlisting} |
| struct pvirtq_event_suppress { |
| le16 { |
| desc_event_off : 15; /* Descriptor Ring Change Event Offset */ |
| desc_event_wrap : 1; /* Descriptor Ring Change Event Wrap Counter */ |
| } desc; /* If desc_event_flags set to RING_EVENT_FLAGS_DESC */ |
| le16 { |
| desc_event_flags : 2, /* Descriptor Ring Change Event Flags */ |
| reserved : 14; /* Reserved, set to 0 */ |
| } flags; |
| }; |
| \end{lstlisting} |
| |
| \devicenormative{\subsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table} |
| A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT |
| read a device-writable buffer. |
| A device MUST NOT use a descriptor unless it observes |
| the VIRTQ_DESC_F_AVAIL bit in its \field{flags} being changed |
| (e.g. as compared to the initial zero value). |
| A device MUST NOT change a descriptor after changing it's |
| the VIRTQ_DESC_F_USED bit in its \field{flags}. |
| |
| \drivernormative{\subsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / PAcked Virtqueues / The Virtqueue Descriptor Table} |
| A driver MUST NOT change a descriptor unless it observes |
| the VIRTQ_DESC_F_USED bit in its \field{flags} being changed. |
| A driver MUST NOT change a descriptor after changing |
| the VIRTQ_DESC_F_AVAIL bit in its \field{flags}. |
| When notifying the device, driver MUST set |
| \field{next_off} and |
| \field{next_wrap} to match the next descriptor |
| not yet made available to the device. |
| A driver MAY send multiple available buffer notifications without making |
| any new descriptors available to the device. |
| |
| \drivernormative{\subsection}{Scatter-Gather Support}{Basic Facilities of a |
| Virtio Device / Packed Virtqueues / Scatter-Gather Support} |
| A driver MUST NOT create a descriptor list longer than allowed |
| by the device. |
| |
| A driver MUST NOT create a descriptor list longer than the Queue |
| Size. |
| |
| This implies that loops in the descriptor list are forbidden! |
| |
| The driver MUST place any device-writable descriptor elements after |
| any device-readable descriptor elements. |
| |
| A driver MUST NOT depend on the device to use more descriptors |
| to be able to write out all descriptors in a list. A driver |
| MUST make sure there's enough space in the ring |
| for the whole list before making the first descriptor in the list |
| available to the device. |
| |
| A driver MUST NOT make the first descriptor in the list available |
| before all subsequent descriptors comprising the list are made |
| available. |
| |
| \devicenormative{\subsection}{Scatter-Gather Support}{Basic Facilities of a |
| Virtio Device / Packed Virtqueues / Scatter-Gather Support} |
| The device MUST use descriptors in a list chained by the |
| VIRTQ_DESC_F_NEXT flag in the same order that they |
| were made available by the driver. |
| |
| The device MAY limit the number of buffers it will allow in a |
| list. |
| |
| \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} |
| The driver MUST NOT set the DESC_F_INDIRECT flag unless the |
| VIRTIO_F_INDIRECT_DESC feature was negotiated. The driver MUST NOT |
| set any flags except DESC_F_WRITE within an indirect descriptor. |
| |
| A driver MUST NOT create a descriptor chain longer than allowed |
| by the device. |
| |
| A driver MUST NOT write direct descriptors with |
| DESC_F_INDIRECT set in a scatter-gather list linked by |
| VIRTQ_DESC_F_NEXT. |
| \field{flags}. |
| |
| \subsection{Virtqueue Operation}\label{sec:Basic Facilities of a Virtio Device / Packed Virtqueues / Virtqueue Operation} |
| |
| There are two parts to virtqueue operation: supplying new |
| available buffers to the device, and processing used buffers from |
| the device. |
| |
| What follows is the requirements of each of these two parts |
| when using the packed virtqueue format in more detail. |
| |
| \subsection{Supplying Buffers to The Device}\label{sec:Basic Facilities of a Virtio Device / Packed Virtqueues / Supplying Buffers to The Device} |
| |
| The driver offers buffers to one of the device's virtqueues as follows: |
| |
| \begin{enumerate} |
| \item The driver places the buffer into free descriptor(s) in the Descriptor Ring. |
| |
| \item The driver performs a suitable memory barrier to ensure that it updates |
| the descriptor(s) before checking for notification suppression. |
| |
| \item If notifications are not suppressed, the driver notifies the device |
| of the new available buffers. |
| \end{enumerate} |
| |
| What follows are the requirements of each stage in more detail. |
| |
| \subsubsection{Placing Available Buffers Into The Descriptor Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Placing Available Buffers Into The Descriptor Ring} |
| |
| For each buffer element, b: |
| |
| \begin{enumerate} |
| \item Get the next descriptor table entry, d |
| \item Get the next free buffer id value |
| \item Set \field{d.addr} to the physical address of the start of b |
| \item Set \field{d.len} to the length of b. |
| \item Set \field{d.id} to the buffer id |
| \item Calculate the flags as follows: |
| \begin{enumerate} |
| \item If b is device-writable, set the VIRTQ_DESC_F_WRITE bit to 1, otherwise 0 |
| \item Set the VIRTQ_DESC_F_AVAIL bit to the current value of the Driver Ring Wrap Counter |
| \item Set the VIRTQ_DESC_F_USED bit to inverse value |
| \end{enumerate} |
| \item Perform a memory barrier to ensure that the descriptor has |
| been initialized |
| \item Set \field{d.flags} to the calculated flags value |
| \item If d is the last descriptor in the ring, toggle the |
| Driver Ring Wrap Counter |
| \item Otherwise, increment d to point at the next descriptor |
| \end{enumerate} |
| |
| This makes a single descriptor buffer available. However, in |
| general the driver MAY make use of a batch of descriptors as part |
| of a single request. In that case, it defers updating |
| the descriptor flags for the first descriptor |
| (and the previous memory barrier) until after the rest of |
| the descriptors have been initialized. |
| |
| Once the descriptor \field{flags} field is updated by the driver, this exposes |
| the descriptor and its contents. The device MAY |
| access the descriptor and any following descriptors the driver created and the |
| memory they refer to immediately. |
| |
| \drivernormative{\paragraph}{Updating flags}{Basic Facilities of |
| a Virtio Device / Packed Virtqueues / Supplying Buffers to The |
| Device / Updating flags} |
| The driver MUST perform a suitable memory barrier before the |
| \field{flags} update, to ensure the |
| device sees the most up-to-date copy. |
| |
| \subsubsection{Sending Available Buffer Notifications}\label{sec:Basic Facilities |
| of a Virtio Device / Packed Virtqueues / Supplying Buffers to The Device |
| / Sending Available Buffer Notifications} |
| |
| The actual method of device notification is bus-specific, but generally |
| it can be expensive. So the device MAY suppress such notifications if it |
| doesn't need them, using the Event Suppression structure comprising the |
| Device Area as detailed in section \ref{sec:Basic |
| Facilities of a Virtio Device / Packed Virtqueues / Event |
| Suppression Structure Format}. |
| |
| The driver has to be careful to expose the new \field{flags} |
| value before checking if notifications are suppressed. |
| |
| \subsubsection{Implementation Example}\label{sec:Basic Facilities of a Virtio Device / Packed Virtqueues / Supplying Buffers to The Device / Implementation Example} |
| |
| Below is a driver code example. It does not attempt to reduce |
| the number of available buffer notifications, neither does it support |
| the VIRTIO_F_RING_EVENT_IDX feature. |
| |
| \begin{lstlisting} |
| /* Note: vq->avail_wrap_count is initialized to 1 */ |
| /* Note: vq->sgs is an array same size as the ring */ |
| |
| id = alloc_id(vq); |
| |
| first = vq->next_avail; |
| sgs = 0; |
| for (each buffer element b) { |
| sgs++; |
| |
| vq->ids[vq->next_avail] = -1; |
| vq->desc[vq->next_avail].address = get_addr(b); |
| vq->desc[vq->next_avail].len = get_len(b); |
| |
| avail = vq->avail_wrap_count ? VIRTQ_DESC_F_AVAIL : 0; |
| used = !vq->avail_wrap_count ? VIRTQ_DESC_F_USED : 0; |
| f = get_flags(b) | avail | used; |
| if (b is not the last buffer element) { |
| f |= VIRTQ_DESC_F_NEXT; |
| } |
| |
| /* Don't mark the 1st descriptor available until all of them are ready. */ |
| if (vq->next_avail == first) { |
| flags = f; |
| } else { |
| vq->desc[vq->next_avail].flags = f; |
| } |
| |
| last = vq->next_avail; |
| |
| vq->next_avail++; |
| |
| if (vq->next_avail >= vq->size) { |
| vq->next_avail = 0; |
| vq->avail_wrap_count ^= 1; |
| } |
| } |
| vq->sgs[id] = sgs; |
| /* ID included in the last descriptor in the list */ |
| vq->desc[last].id = id; |
| write_memory_barrier(); |
| vq->desc[first].flags = flags; |
| |
| memory_barrier(); |
| |
| if (vq->device_event.flags != RING_EVENT_FLAGS_DISABLE) { |
| notify_device(vq); |
| } |
| |
| \end{lstlisting} |
| |
| |
| \drivernormative{\paragraph}{Sending Available Buffer Notifications} {Basic Facilities of a Virtio Device / Packed Virtqueues / Supplying Buffers to The Device / Sending Available Buffer Notifications} |
| The driver MUST perform a suitable memory barrier before reading |
| the Event Suppression structure occupying the Device Area. Failing |
| to do so could result in mandatory available buffer notifications |
| not being sent. |
| |
| \subsection{Receiving Used Buffers From The Device}\label{sec:Basic Facilities of a Virtio Device / Packed Virtqueues / Receiving Used Buffers From The Device} |
| |
| Once the device has used buffers referred to by a descriptor (read from or written to them, or |
| parts of both, depending on the nature of the virtqueue and the |
| device), it sends a used buffer notification to the driver |
| as detailed in section \ref{sec:Basic |
| Facilities of a Virtio Device / Packed Virtqueues / Event |
| Suppression Structure Format}. |
| |
| \begin{note} |
| |
| For optimal performance, a driver MAY disable used buffer notifications |
| while processing the used buffers, but beware the problem of missing |
| notifications between emptying the ring and reenabling used buffer |
| notifications. This is usually handled by re-checking for more used |
| buffers after notifications are re-enabled: |
| |
| \end{note} |
| |
| \begin{lstlisting} |
| /* Note: vq->used_wrap_count is initialized to 1 */ |
| |
| vq->driver_event.flags = RING_EVENT_FLAGS_DISABLE; |
| |
| for (;;) { |
| struct pvirtq_desc *d = vq->desc[vq->next_used]; |
| |
| /* |
| * Check that |
| * 1. Descriptor has been made available. This check is necessary |
| * if the driver is making new descriptors available in parallel |
| * with this processing of used descriptors (e.g. from another thread). |
| * Note: there are many other ways to check this, e.g. |
| * track the number of outstanding available descriptors or buffers |
| * and check that it's not 0. |
| * 2. Descriptor has been used by the device. |
| */ |
| flags = d->flags; |
| bool avail = flags & VIRTQ_DESC_F_AVAIL; |
| bool used = flags & VIRTQ_DESC_F_USED; |
| if (avail != vq->used_wrap_count || used != vq->used_wrap_count) { |
| vq->driver_event.flags = RING_EVENT_FLAGS_ENABLE; |
| memory_barrier(); |
| |
| /* |
| * Re-test in case the driver made more descriptors available in |
| * parallel with the used descriptor processing (e.g. from another |
| * thread) and/or the device used more descriptors before the driver |
| * enabled events. |
| */ |
| flags = d->flags; |
| bool avail = flags & VIRTQ_DESC_F_AVAIL; |
| bool used = flags & VIRTQ_DESC_F_USED; |
| if (avail != vq->used_wrap_count || used != vq->used_wrap_count) { |
| break; |
| } |
| |
| vq->driver_event.flags = RING_EVENT_FLAGS_DISABLE; |
| } |
| |
| read_memory_barrier(); |
| |
| /* skip descriptors until the next buffer */ |
| id = d->id; |
| assert(id < vq->size); |
| sgs = vq->sgs[id]; |
| vq->next_used += sgs; |
| if (vq->next_used >= vq->size) { |
| vq->next_used -= vq->size; |
| vq->used_wrap_count ^= 1; |
| } |
| |
| free_id(vq, id); |
| |
| process_buffer(d); |
| } |
| \end{lstlisting} |