| The MSI Driver Guide HOWTO |
| Tom L Nguyen tom.l.nguyen@intel.com |
| 10/03/2003 |
| Revised Feb 12, 2004 by Martine Silbermann |
| email: Martine.Silbermann@hp.com |
| |
| 1. About this guide |
| |
| This guide describes the basics of Message Signaled Interrupts(MSI), the |
| advantages of using MSI over traditional interrupt mechanisms, and how |
| to enable your driver to use MSI or MSI-X. Also included is a Frequently |
| Asked Questions. |
| |
| 2. Copyright 2003 Intel Corporation |
| |
| 3. What is MSI/MSI-X? |
| |
| Message Signaled Interrupt (MSI), as described in the PCI Local Bus |
| Specification Revision 2.3 or latest, is an optional feature, and a |
| required feature for PCI Express devices. MSI enables a device function |
| to request service by sending an Inbound Memory Write on its PCI bus to |
| the FSB as a Message Signal Interrupt transaction. Because MSI is |
| generated in the form of a Memory Write, all transaction conditions, |
| such as a Retry, Master-Abort, Target-Abort or normal completion, are |
| supported. |
| |
| A PCI device that supports MSI must also support pin IRQ assertion |
| interrupt mechanism to provide backward compatibility for systems that |
| do not support MSI. In Systems, which support MSI, the bus driver is |
| responsible for initializing the message address and message data of |
| the device function's MSI/MSI-X capability structure during device |
| initial configuration. |
| |
| An MSI capable device function indicates MSI support by implementing |
| the MSI/MSI-X capability structure in its PCI capability list. The |
| device function may implement both the MSI capability structure and |
| the MSI-X capability structure; however, the bus driver should not |
| enable both, but instead enable only the MSI-X capability structure. |
| |
| The MSI capability structure contains Message Control register, |
| Message Address register and Message Data register. These registers |
| provide the bus driver control over MSI. The Message Control register |
| indicates the MSI capability supported by the device. The Message |
| Address register specifies the target address and the Message Data |
| register specifies the characteristics of the message. To request |
| service, the device function writes the content of the Message Data |
| register to the target address. The device and its software driver |
| are prohibited from writing to these registers. |
| |
| The MSI-X capability structure is an optional extension to MSI. It |
| uses an independent and separate capability structure. There are |
| some key advantages to implementing the MSI-X capability structure |
| over the MSI capability structure as described below. |
| |
| - Support a larger maximum number of vectors per function. |
| |
| - Provide the ability for system software to configure |
| each vector with an independent message address and message |
| data, specified by a table that resides in Memory Space. |
| |
| - MSI and MSI-X both support per-vector masking. Per-vector |
| masking is an optional extension of MSI but a required |
| feature for MSI-X. Per-vector masking provides the kernel |
| the ability to mask/unmask MSI when servicing its software |
| interrupt service routing handler. If per-vector masking is |
| not supported, then the device driver should provide the |
| hardware/software synchronization to ensure that the device |
| generates MSI when the driver wants it to do so. |
| |
| 4. Why use MSI? |
| |
| As a benefit the simplification of board design, MSI allows board |
| designers to remove out of band interrupt routing. MSI is another |
| step towards a legacy-free environment. |
| |
| Due to increasing pressure on chipset and processor packages to |
| reduce pin count, the need for interrupt pins is expected to |
| diminish over time. Devices, due to pin constraints, may implement |
| messages to increase performance. |
| |
| PCI Express endpoints uses INTx emulation (in-band messages) instead |
| of IRQ pin assertion. Using INTx emulation requires interrupt |
| sharing among devices connected to the same node (PCI bridge) while |
| MSI is unique (non-shared) and does not require BIOS configuration |
| support. As a result, the PCI Express technology requires MSI |
| support for better interrupt performance. |
| |
| Using MSI enables the device functions to support two or more |
| vectors, which can be configure to target different CPU's to |
| increase scalability. |
| |
| 5. Configuring a driver to use MSI/MSI-X |
| |
| By default, the kernel will not enable MSI/MSI-X on all devices that |
| support this capability. The CONFIG_PCI_USE_VECTOR kernel option |
| must be selected to enable MSI/MSI-X support. |
| |
| 5.1 Including MSI support into the kernel |
| |
| To allow MSI-Capable device drivers to selectively enable MSI (using |
| pci_enable_msi as described below), the VECTOR based scheme needs to |
| be enabled by setting CONFIG_PCI_USE_VECTOR. |
| |
| Since the target of the inbound message is the local APIC, providing |
| CONFIG_PCI_USE_VECTOR is dependent on whether CONFIG_X86_LOCAL_APIC |
| is enabled or not. |
| |
| int pci_enable_msi(struct pci_dev *) |
| |
| With this new API, any existing device driver, which like to have |
| MSI enabled on its device function, must call this explicitly. A |
| successful call will initialize the MSI/MSI-X capability structure |
| with ONE vector, regardless of whether the device function is |
| capable of supporting multiple messages. This vector replaces the |
| pre-assigned dev->irq with a new MSI vector. To avoid the conflict |
| of new assigned vector with existing pre-assigned vector requires |
| the device driver to call this API before calling request_irq(...). |
| |
| The below diagram shows the events, which switches the interrupt |
| mode on the MSI-capable device function between MSI mode and |
| PIN-IRQ assertion mode. |
| |
| ------------ pci_enable_msi ------------------------ |
| | | <=============== | | |
| | MSI MODE | | PIN-IRQ ASSERTION MODE | |
| | | ===============> | | |
| ------------ free_irq ------------------------ |
| |
| 5.2 Configuring for MSI support |
| |
| Due to the non-contiguous fashion in vector assignment of the |
| existing Linux kernel, this version does not support multiple |
| messages regardless of the device function is capable of supporting |
| more than one vector. The bus driver initializes only entry 0 of |
| this capability if pci_enable_msi(...) is called successfully by |
| the device driver. |
| |
| 5.3 Configuring for MSI-X support |
| |
| Both the MSI capability structure and the MSI-X capability structure |
| share the same above semantics; however, due to the ability of the |
| system software to configure each vector of the MSI-X capability |
| structure with an independent message address and message data, the |
| non-contiguous fashion in vector assignment of the existing Linux |
| kernel has no impact on supporting multiple messages on an MSI-X |
| capable device functions. By default, as mentioned above, ONE vector |
| should be always allocated to the MSI-X capability structure at |
| entry 0. The bus driver does not initialize other entries of the |
| MSI-X table. |
| |
| Note that the PCI subsystem should have full control of a MSI-X |
| table that resides in Memory Space. The software device driver |
| should not access this table. |
| |
| To request for additional vectors, the device software driver should |
| call function msi_alloc_vectors(). It is recommended that the |
| software driver should call this function once during the |
| initialization phase of the device driver. |
| |
| The function msi_alloc_vectors(), once invoked, enables either |
| all or nothing, depending on the current availability of vector |
| resources. If no vector resources are available, the device function |
| still works with ONE vector. If the vector resources are available |
| for the number of vectors requested by the driver, this function |
| will reconfigure the MSI-X capability structure of the device with |
| additional messages, starting from entry 1. To emphasize this |
| reason, for example, the device may be capable for supporting the |
| maximum of 32 vectors while its software driver usually may request |
| 4 vectors. |
| |
| For each vector, after this successful call, the device driver is |
| responsible to call other functions like request_irq(), enable_irq(), |
| etc. to enable this vector with its corresponding interrupt service |
| handler. It is the device driver's choice to have all vectors shared |
| the same interrupt service handler or each vector with a unique |
| interrupt service handler. |
| |
| In addition to the function msi_alloc_vectors(), another function |
| msi_free_vectors() is provided to allow the software driver to |
| release a number of vectors back to the vector resources. Once |
| invoked, the PCI subsystem disables (masks) each vector released. |
| These vectors are no longer valid for the hardware device and its |
| software driver to use. Like free_irq, it recommends that the |
| device driver should also call msi_free_vectors to release all |
| additional vectors previously requested. |
| |
| int msi_alloc_vectors(struct pci_dev *dev, int *vector, int nvec) |
| |
| This API enables the software driver to request the PCI subsystem |
| for additional messages. Depending on the number of vectors |
| available, the PCI subsystem enables either all or nothing. |
| |
| Argument dev points to the device (pci_dev) structure. |
| Argument vector is a pointer of integer type. The number of |
| elements is indicated in argument nvec. |
| Argument nvec is an integer indicating the number of messages |
| requested. |
| A return of zero indicates that the number of allocated vector is |
| successfully allocated. Otherwise, indicate resources not |
| available. |
| |
| int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec) |
| |
| This API enables the software driver to inform the PCI subsystem |
| that it is willing to release a number of vectors back to the |
| MSI resource pool. Once invoked, the PCI subsystem disables each |
| MSI-X entry associated with each vector stored in the argument 2. |
| These vectors are no longer valid for the hardware device and |
| its software driver to use. |
| |
| Argument dev points to the device (pci_dev) structure. |
| Argument vector is a pointer of integer type. The number of |
| elements is indicated in argument nvec. |
| Argument nvec is an integer indicating the number of messages |
| released. |
| A return of zero indicates that the number of allocated vectors |
| is successfully released. Otherwise, indicates a failure. |
| |
| 5.4 Hardware requirements for MSI support |
| MSI support requires support from both system hardware and |
| individual hardware device functions. |
| |
| 5.4.1 System hardware support |
| Since the target of MSI address is the local APIC CPU, enabling |
| MSI support in Linux kernel is dependent on whether existing |
| system hardware supports local APIC. Users should verify their |
| system whether it runs when CONFIG_X86_LOCAL_APIC=y. |
| |
| In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; |
| however, in UP environment, users must manually set |
| CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting |
| CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and |
| the option for MSI-capable device drivers to selectively enable |
| MSI (using pci_enable_msi as described below). |
| |
| Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI |
| vector is allocated new during runtime and MSI support does not |
| depend on BIOS support. This key independency enables MSI support |
| on future IOxAPIC free platform. |
| |
| 5.4.2 Device hardware support |
| The hardware device function supports MSI by indicating the |
| MSI/MSI-X capability structure on its PCI capability list. By |
| default, this capability structure will not be initialized by |
| the kernel to enable MSI during the system boot. In other words, |
| the device function is running on its default pin assertion mode. |
| Note that in many cases the hardware supporting MSI have bugs, |
| which may result in system hang. The software driver of specific |
| MSI-capable hardware is responsible for whether calling |
| pci_enable_msi or not. A return of zero indicates the kernel |
| successfully initializes the MSI/MSI-X capability structure of the |
| device funtion. The device function is now running on MSI mode. |
| |
| 5.5 How to tell whether MSI is enabled on device function |
| |
| At the driver level, a return of zero from pci_enable_msi(...) |
| indicates to the device driver that its device function is |
| initialized successfully and ready to run in MSI mode. |
| |
| At the user level, users can use command 'cat /proc/interrupts' |
| to display the vector allocated for the device and its interrupt |
| mode, as shown below. |
| |
| CPU0 CPU1 |
| 0: 324639 0 IO-APIC-edge timer |
| 1: 1186 0 IO-APIC-edge i8042 |
| 2: 0 0 XT-PIC cascade |
| 12: 2797 0 IO-APIC-edge i8042 |
| 14: 6543 0 IO-APIC-edge ide0 |
| 15: 1 0 IO-APIC-edge ide1 |
| 169: 0 0 IO-APIC-level uhci-hcd |
| 185: 0 0 IO-APIC-level uhci-hcd |
| 193: 138 10 PCI MSI aic79xx |
| 201: 30 0 PCI MSI aic79xx |
| 225: 30 0 IO-APIC-level aic7xxx |
| 233: 30 0 IO-APIC-level aic7xxx |
| NMI: 0 0 |
| LOC: 324553 325068 |
| ERR: 0 |
| MIS: 0 |
| |
| 6. FAQ |
| |
| Q1. Are there any limitations on using the MSI? |
| |
| A1. If the PCI device supports MSI and conforms to the |
| specification and the platform supports the APIC local bus, |
| then using MSI should work. |
| |
| Q2. Will it work on all the Pentium processors (P3, P4, Xeon, |
| AMD processors)? In P3 IPI's are transmitted on the APIC local |
| bus and in P4 and Xeon they are transmitted on the system |
| bus. Are there any implications with this? |
| |
| A2. MSI support enables a PCI device sending an inbound |
| memory write (0xfeexxxxx as target address) on its PCI bus |
| directly to the FSB. Since the message address has a |
| redirection hint bit cleared, it should work. |
| |
| Q3. The target address 0xfeexxxxx will be translated by the |
| Host Bridge into an interrupt message. Are there any |
| limitations on the chipsets such as Intel 8xx, Intel e7xxx, |
| or VIA? |
| |
| A3. If these chipsets support an inbound memory write with |
| target address set as 0xfeexxxxx, as conformed to PCI |
| specification 2.3 or latest, then it should work. |
| |
| Q4. From the driver point of view, if the MSI is lost because |
| of the errors occur during inbound memory write, then it may |
| wait for ever. Is there a mechanism for it to recover? |
| |
| A4. Since the target of the transaction is an inbound memory |
| write, all transaction termination conditions (Retry, |
| Master-Abort, Target-Abort, or normal completion) are |
| supported. A device sending an MSI must abide by all the PCI |
| rules and conditions regarding that inbound memory write. So, |
| if a retry is signaled it must retry, etc... We believe that |
| the recommendation for Abort is also a retry (refer to PCI |
| specification 2.3 or latest). |