Merge tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- fix DesignWare driver ENABLE-ABORT sequence, ensuring ABORT can
always be sent when needed
- check for PCLK in the SynQuacer controller as an optional clock,
allowing ACPI to directly provide the clock rate
- KEBA driver Kconfig dependency fix
- fix XIIC driver power suspend sequence
* tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: xiic: Fix pm_runtime_set_suspended() with runtime pm enabled
i2c: keba: I2C_KEBA should depend on KEBA_CP500
i2c: synquacer: Deal with optional PCLK correctly
i2c: designware: fix controller is holding SCL low while ENABLE bit is disabled
diff --git a/Documentation/admin-guide/device-mapper/delay.rst b/Documentation/admin-guide/device-mapper/delay.rst
index 917ba8c..4d66722 100644
--- a/Documentation/admin-guide/device-mapper/delay.rst
+++ b/Documentation/admin-guide/device-mapper/delay.rst
@@ -3,29 +3,52 @@
========
Device-Mapper's "delay" target delays reads and/or writes
-and maps them to different devices.
+and/or flushs and optionally maps them to different devices.
-Parameters::
+Arguments::
<device> <offset> <delay> [<write_device> <write_offset> <write_delay>
[<flush_device> <flush_offset> <flush_delay>]]
-With separate write parameters, the first set is only used for reads.
+Table line has to either have 3, 6 or 9 arguments:
+
+3: apply offset and delay to read, write and flush operations on device
+
+6: apply offset and delay to device, also apply write_offset and write_delay
+ to write and flush operations on optionally different write_device with
+ optionally different sector offset
+
+9: same as 6 arguments plus define flush_offset and flush_delay explicitely
+ on/with optionally different flush_device/flush_offset.
+
Offsets are specified in sectors.
+
Delays are specified in milliseconds.
+
Example scripts
===============
::
-
#!/bin/sh
- # Create device delaying rw operation for 500ms
- echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
+ #
+ # Create mapped device named "delayed" delaying read, write and flush operations for 500ms.
+ #
+ dmsetup create delayed --table "0 `blockdev --getsz $1` delay $1 0 500"
::
-
#!/bin/sh
- # Create device delaying only write operation for 500ms and
- # splitting reads and writes to different devices $1 $2
- echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
+ #
+ # Create mapped device delaying write and flush operations for 400ms and
+ # splitting reads to device $1 but writes and flushs to different device $2
+ # to different offsets of 2048 and 4096 sectors respectively.
+ #
+ dmsetup create delayed --table "0 `blockdev --getsz $1` delay $1 2048 0 $2 4096 400"
+
+::
+ #!/bin/sh
+ #
+ # Create mapped device delaying reads for 50ms, writes for 100ms and flushs for 333ms
+ # onto the same backing device at offset 0 sectors.
+ #
+ dmsetup create delayed --table "0 `blockdev --getsz $1` delay $1 0 50 $2 0 100 $1 0 333"
diff --git a/Documentation/admin-guide/device-mapper/dm-crypt.rst b/Documentation/admin-guide/device-mapper/dm-crypt.rst
index 48a48bd0..9f8139f 100644
--- a/Documentation/admin-guide/device-mapper/dm-crypt.rst
+++ b/Documentation/admin-guide/device-mapper/dm-crypt.rst
@@ -160,6 +160,10 @@
The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
if this flag is specified.
+integrity_key_size:<bytes>
+ Use an integrity key of <bytes> size instead of using an integrity key size
+ of the digest size of the used HMAC algorithm.
+
Module parameters::
max_read_size
diff --git a/Documentation/admin-guide/device-mapper/vdo.rst b/Documentation/admin-guide/device-mapper/vdo.rst
index c69ac18..a14e6d3 100644
--- a/Documentation/admin-guide/device-mapper/vdo.rst
+++ b/Documentation/admin-guide/device-mapper/vdo.rst
@@ -251,7 +251,12 @@
by the vdostats userspace program to interpret the output
buffer.
- dump:
+ config:
+ Outputs useful vdo configuration information. Mostly used
+ by users who want to recreate a similar VDO volume and
+ want to know the creation configuration used.
+
+ dump:
Dumps many internal structures to the system log. This is
not always safe to run, so it should only be used to debug
a hung vdo. Optional parameters to specify structures to
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bb48ae2..1518343 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2677,6 +2677,23 @@
Default is Y (on).
+ kvm.enable_virt_at_load=[KVM,ARM64,LOONGARCH,MIPS,RISCV,X86]
+ If enabled, KVM will enable virtualization in hardware
+ when KVM is loaded, and disable virtualization when KVM
+ is unloaded (if KVM is built as a module).
+
+ If disabled, KVM will dynamically enable and disable
+ virtualization on-demand when creating and destroying
+ VMs, i.e. on the 0=>1 and 1=>0 transitions of the
+ number of VMs.
+
+ Enabling virtualization at module lode avoids potential
+ latency for creation of the 0=>1 VM, as KVM serializes
+ virtualization enabling across all online CPUs. The
+ "cost" of enabling virtualization when KVM is loaded,
+ is that doing so may interfere with using out-of-tree
+ hypervisors that want to "own" virtualization hardware.
+
kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
Default is false (don't support).
diff --git a/Documentation/arch/loongarch/irq-chip-model.rst b/Documentation/arch/loongarch/irq-chip-model.rst
index 7988f41..6dd4825 100644
--- a/Documentation/arch/loongarch/irq-chip-model.rst
+++ b/Documentation/arch/loongarch/irq-chip-model.rst
@@ -85,6 +85,38 @@
| Devices |
+---------+
+Advanced Extended IRQ model
+===========================
+
+In this model, IPI (Inter-Processor Interrupt) and CPU Local Timer interrupt go
+to CPUINTC directly, CPU UARTS interrupts go to LIOINTC, PCH-MSI interrupts go
+to AVECINTC, and then go to CPUINTC directly, while all other devices interrupts
+go to PCH-PIC/PCH-LPC and gathered by EIOINTC, and then go to CPUINTC directly::
+
+ +-----+ +-----------------------+ +-------+
+ | IPI | --> | CPUINTC | <-- | Timer |
+ +-----+ +-----------------------+ +-------+
+ ^ ^ ^
+ | | |
+ +---------+ +----------+ +---------+ +-------+
+ | EIOINTC | | AVECINTC | | LIOINTC | <-- | UARTs |
+ +---------+ +----------+ +---------+ +-------+
+ ^ ^
+ | |
+ +---------+ +---------+
+ | PCH-PIC | | PCH-MSI |
+ +---------+ +---------+
+ ^ ^ ^
+ | | |
+ +---------+ +---------+ +---------+
+ | Devices | | PCH-LPC | | Devices |
+ +---------+ +---------+ +---------+
+ ^
+ |
+ +---------+
+ | Devices |
+ +---------+
+
ACPI-related definitions
========================
diff --git a/Documentation/arch/s390/vfio-ap.rst b/Documentation/arch/s390/vfio-ap.rst
index ea744cb..eba1991 100644
--- a/Documentation/arch/s390/vfio-ap.rst
+++ b/Documentation/arch/s390/vfio-ap.rst
@@ -999,6 +999,36 @@
resulting from plugging it in references a queue device bound to the vfio_ap
device driver.
+Driver Features
+===============
+The vfio_ap driver exposes a sysfs file containing supported features.
+This exists so third party tools (like Libvirt and mdevctl) can query the
+availability of specific features.
+
+The features list can be found here: /sys/bus/matrix/devices/matrix/features
+
+Entries are space delimited. Each entry consists of a combination of
+alphanumeric and underscore characters.
+
+Example:
+cat /sys/bus/matrix/devices/matrix/features
+guest_matrix dyn ap_config
+
+the following features are advertised:
+
+---------------+---------------------------------------------------------------+
+| Flag | Description |
++==============+===============================================================+
+| guest_matrix | guest_matrix attribute exists. It reports the matrix of |
+| | adapters and domains that are or will be passed through to a |
+| | guest when the mdev is attached to it. |
++--------------+---------------------------------------------------------------+
+| dyn | Indicates hot plug/unplug of AP adapters, domains and control |
+| | domains for a guest to which the mdev is attached. |
++------------+-----------------------------------------------------------------+
+| ap_config | ap_config interface for one-shot modifications to mdev config |
++--------------+---------------------------------------------------------------+
+
Limitations
===========
Live guest migration is not supported for guests using AP devices without
diff --git a/Documentation/core-api/cleanup.rst b/Documentation/core-api/cleanup.rst
new file mode 100644
index 0000000..527eb2f
--- /dev/null
+++ b/Documentation/core-api/cleanup.rst
@@ -0,0 +1,8 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+Scope-based Cleanup Helpers
+===========================
+
+.. kernel-doc:: include/linux/cleanup.h
+ :doc: scope-based cleanup helpers
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index e18a2ff..a331d2c 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -35,6 +35,7 @@
kobject
kref
+ cleanup
assoc_array
xarray
maple_tree
diff --git a/Documentation/driver-api/cxl/access-coordinates.rst b/Documentation/driver-api/cxl/access-coordinates.rst
new file mode 100644
index 0000000..b07950e
--- /dev/null
+++ b/Documentation/driver-api/cxl/access-coordinates.rst
@@ -0,0 +1,91 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+==================================
+CXL Access Coordinates Computation
+==================================
+
+Shared Upstream Link Calculation
+================================
+For certain CXL region construction with endpoints behind CXL switches (SW) or
+Root Ports (RP), there is the possibility of the total bandwidth for all
+the endpoints behind a switch being more than the switch upstream link.
+A similar situation can occur within the host, upstream of the root ports.
+The CXL driver performs an additional pass after all the targets have
+arrived for a region in order to recalculate the bandwidths with possible
+upstream link being a limiting factor in mind.
+
+The algorithm assumes the configuration is a symmetric topology as that
+maximizes performance. When asymmetric topology is detected, the calculation
+is aborted. An asymmetric topology is detected during topology walk where the
+number of RPs detected as a grandparent is not equal to the number of devices
+iterated in the same iteration loop. The assumption is made that subtle
+asymmetry in properties does not happen and all paths to EPs are equal.
+
+There can be multiple switches under an RP. There can be multiple RPs under
+a CXL Host Bridge (HB). There can be multiple HBs under a CXL Fixed Memory
+Window Structure (CFMWS).
+
+An example hierarchy:
+
+> CFMWS 0
+> |
+> _________|_________
+> | |
+> ACPI0017-0 ACPI0017-1
+> GP0/HB0/ACPI0016-0 GP1/HB1/ACPI0016-1
+> | | | |
+> RP0 RP1 RP2 RP3
+> | | | |
+> SW 0 SW 1 SW 2 SW 3
+> | | | | | | | |
+> EP0 EP1 EP2 EP3 EP4 EP5 EP6 EP7
+
+Computation for the example hierarchy:
+
+Min (GP0 to CPU BW,
+ Min(SW 0 Upstream Link to RP0 BW,
+ Min(SW0SSLBIS for SW0DSP0 (EP0), EP0 DSLBIS, EP0 Upstream Link) +
+ Min(SW0SSLBIS for SW0DSP1 (EP1), EP1 DSLBIS, EP1 Upstream link)) +
+ Min(SW 1 Upstream Link to RP1 BW,
+ Min(SW1SSLBIS for SW1DSP0 (EP2), EP2 DSLBIS, EP2 Upstream Link) +
+ Min(SW1SSLBIS for SW1DSP1 (EP3), EP3 DSLBIS, EP3 Upstream link))) +
+Min (GP1 to CPU BW,
+ Min(SW 2 Upstream Link to RP2 BW,
+ Min(SW2SSLBIS for SW2DSP0 (EP4), EP4 DSLBIS, EP4 Upstream Link) +
+ Min(SW2SSLBIS for SW2DSP1 (EP5), EP5 DSLBIS, EP5 Upstream link)) +
+ Min(SW 3 Upstream Link to RP3 BW,
+ Min(SW3SSLBIS for SW3DSP0 (EP6), EP6 DSLBIS, EP6 Upstream Link) +
+ Min(SW3SSLBIS for SW3DSP1 (EP7), EP7 DSLBIS, EP7 Upstream link))))
+
+The calculation starts at cxl_region_shared_upstream_perf_update(). A xarray
+is created to collect all the endpoint bandwidths via the
+cxl_endpoint_gather_bandwidth() function. The min() of bandwidth from the
+endpoint CDAT and the upstream link bandwidth is calculated. If the endpoint
+has a CXL switch as a parent, then min() of calculated bandwidth and the
+bandwidth from the SSLBIS for the switch downstream port that is associated
+with the endpoint is calculated. The final bandwidth is stored in a
+'struct cxl_perf_ctx' in the xarray indexed by a device pointer. If the
+endpoint is direct attached to a root port (RP), the device pointer would be an
+RP device. If the endpoint is behind a switch, the device pointer would be the
+upstream device of the parent switch.
+
+At the next stage, the code walks through one or more switches if they exist
+in the topology. For endpoints directly attached to RPs, this step is skipped.
+If there is another switch upstream, the code takes the min() of the current
+gathered bandwidth and the upstream link bandwidth. If there's a switch
+upstream, then the SSLBIS of the upstream switch.
+
+Once the topology walk reaches the RP, whether it's direct attached endpoints
+or walking through the switch(es), cxl_rp_gather_bandwidth() is called. At
+this point all the bandwidths are aggregated per each host bridge, which is
+also the index for the resulting xarray.
+
+The next step is to take the min() of the per host bridge bandwidth and the
+bandwidth from the Generic Port (GP). The bandwidths for the GP is retrieved
+via ACPI tables SRAT/HMAT. The min bandwidth are aggregated under the same
+ACPI0017 device to form a new xarray.
+
+Finally, the cxl_region_update_bandwidth() is called and the aggregated
+bandwidth from all the members of the last xarray is updated for the
+access coordinates residing in the cxl region (cxlr) context.
diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst
index 12b8272..965ba90 100644
--- a/Documentation/driver-api/cxl/index.rst
+++ b/Documentation/driver-api/cxl/index.rst
@@ -8,6 +8,7 @@
:maxdepth: 1
memory-devices
+ access-coordinates
maturity-map
diff --git a/Documentation/translations/zh_CN/arch/loongarch/irq-chip-model.rst b/Documentation/translations/zh_CN/arch/loongarch/irq-chip-model.rst
index f1e9ab1..4727619 100644
--- a/Documentation/translations/zh_CN/arch/loongarch/irq-chip-model.rst
+++ b/Documentation/translations/zh_CN/arch/loongarch/irq-chip-model.rst
@@ -87,6 +87,38 @@
| Devices |
+---------+
+高级扩展IRQ模型
+===============
+
+在这种模型里面,IPI(Inter-Processor Interrupt)和CPU本地时钟中断直接发送到CPUINTC,
+CPU串口(UARTs)中断发送到LIOINTC,PCH-MSI中断发送到AVECINTC,而后通过AVECINTC直接
+送达CPUINTC,而其他所有设备的中断则分别发送到所连接的PCH-PIC/PCH-LPC,然后由EIOINTC
+统一收集,再直接到达CPUINTC::
+
+ +-----+ +-----------------------+ +-------+
+ | IPI | --> | CPUINTC | <-- | Timer |
+ +-----+ +-----------------------+ +-------+
+ ^ ^ ^
+ | | |
+ +---------+ +----------+ +---------+ +-------+
+ | EIOINTC | | AVECINTC | | LIOINTC | <-- | UARTs |
+ +---------+ +----------+ +---------+ +-------+
+ ^ ^
+ | |
+ +---------+ +---------+
+ | PCH-PIC | | PCH-MSI |
+ +---------+ +---------+
+ ^ ^ ^
+ | | |
+ +---------+ +---------+ +---------+
+ | Devices | | PCH-LPC | | Devices |
+ +---------+ +---------+ +---------+
+ ^
+ |
+ +---------+
+ | Devices |
+ +---------+
+
ACPI相关的定义
==============
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index b3be874..e324719 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4214,7 +4214,9 @@
enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace
on denied accesses, i.e. userspace effectively intercepts the MSR access. If
KVM_MSR_EXIT_REASON_FILTER is not enabled, KVM will inject a #GP into the guest
-on denied accesses.
+on denied accesses. Note, if an MSR access is denied during emulation of MSR
+load/stores during VMX transitions, KVM ignores KVM_MSR_EXIT_REASON_FILTER.
+See the below warning for full details.
If an MSR access is allowed by userspace, KVM will emulate and/or virtualize
the access in accordance with the vCPU model. Note, KVM may still ultimately
@@ -4229,9 +4231,22 @@
an error.
.. warning::
- MSR accesses as part of nested VM-Enter/VM-Exit are not filtered.
- This includes both writes to individual VMCS fields and reads/writes
- through the MSR lists pointed to by the VMCS.
+ MSR accesses that are side effects of instruction execution (emulated or
+ native) are not filtered as hardware does not honor MSR bitmaps outside of
+ RDMSR and WRMSR, and KVM mimics that behavior when emulating instructions
+ to avoid pointless divergence from hardware. E.g. RDPID reads MSR_TSC_AUX,
+ SYSENTER reads the SYSENTER MSRs, etc.
+
+ MSRs that are loaded/stored via dedicated VMCS fields are not filtered as
+ part of VM-Enter/VM-Exit emulation.
+
+ MSRs that are loaded/store via VMX's load/store lists _are_ filtered as part
+ of VM-Enter/VM-Exit emulation. If an MSR access is denied on VM-Enter, KVM
+ synthesizes a consistency check VM-Exit(EXIT_REASON_MSR_LOAD_FAIL). If an
+ MSR access is denied on VM-Exit, KVM synthesizes a VM-Abort. In short, KVM
+ extends Intel's architectural list of MSRs that cannot be loaded/saved via
+ the VM-Enter/VM-Exit MSR list. It is platform owner's responsibility to
+ to communicate any such restrictions to their end users.
x2APIC MSR accesses cannot be filtered (KVM silently ignores filters that
cover any x2APIC MSRs).
@@ -8082,6 +8097,14 @@
guest CPUID on writes to MISC_ENABLE if
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
disabled.
+
+KVM_X86_QUIRK_SLOT_ZAP_ALL By default, KVM invalidates all SPTEs in
+ fast way for memslot deletion when VM type
+ is KVM_X86_DEFAULT_VM.
+ When this quirk is disabled or when VM type
+ is other than KVM_X86_DEFAULT_VM, KVM zaps
+ only leaf SPTEs that are within the range of
+ the memslot being deleted.
=================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 02880d5..20a9a37 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -11,6 +11,8 @@
- cpus_read_lock() is taken outside kvm_lock
+- kvm_usage_lock is taken outside cpus_read_lock()
+
- kvm->lock is taken outside vcpu->mutex
- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
@@ -24,6 +26,13 @@
are taken on the waiting side when modifying memslots, so MMU notifiers
must not take either kvm->slots_lock or kvm->slots_arch_lock.
+cpus_read_lock() vs kvm_lock:
+
+- Taking cpus_read_lock() outside of kvm_lock is problematic, despite that
+ being the official ordering, as it is quite easy to unknowingly trigger
+ cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list,
+ e.g. avoid complex operations when possible.
+
For SRCU:
- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
@@ -227,10 +236,16 @@
:Type: mutex
:Arch: any
:Protects: - vm_list
- - kvm_usage_count
+
+``kvm_usage_lock``
+^^^^^^^^^^^^^^^^^^
+
+:Type: mutex
+:Arch: any
+:Protects: - kvm_usage_count
- hardware virtualization enable/disable
-:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
- enable/disable.
+:Comment: Exists to allow taking cpus_read_lock() while kvm_usage_count is
+ protected, which simplifies the virtualization enabling logic.
``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -290,11 +305,12 @@
wakeup.
``vendor_module_lock``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^
:Type: mutex
:Arch: x86
:Protects: loading a vendor module (kvm_amd or kvm_intel)
-:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
- taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
- many operations need to take cpu_hotplug_lock when loading a vendor module,
- e.g. updating static calls.
+:Comment: Exists because using kvm_lock leads to deadlock. kvm_lock is taken
+ in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while
+ cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many
+ operations need to take cpu_hotplug_lock when loading a vendor module, e.g.
+ updating static calls.
diff --git a/Documentation/virt/uml/user_mode_linux_howto_v2.rst b/Documentation/virt/uml/user_mode_linux_howto_v2.rst
index 2794244..584000b 100644
--- a/Documentation/virt/uml/user_mode_linux_howto_v2.rst
+++ b/Documentation/virt/uml/user_mode_linux_howto_v2.rst
@@ -217,6 +217,8 @@
+-----------+--------+------------------------------------+------------+
| fd | vector | dependent on fd type | varies |
+-----------+--------+------------------------------------+------------+
+| vde | vector | dep. on VDE VPN: Virt.Net Locator | varies |
++-----------+--------+------------------------------------+------------+
| tuntap | legacy | none | ~ 500Mbit |
+-----------+--------+------------------------------------+------------+
| daemon | legacy | none | ~ 450Mbit |
@@ -573,6 +575,41 @@
BESS transport does not require any special privileges.
+VDE vector transport
+--------------------
+
+Virtual Distributed Ethernet (VDE) is a project whose main goal is to provide a
+highly flexible support for virtual networking.
+
+http://wiki.virtualsquare.org/#/tutorials/vdebasics
+
+Common usages of VDE include fast prototyping and teaching.
+
+Examples:
+
+ ``vecX:transport=vde,vnl=tap://tap0``
+
+use tap0
+
+ ``vecX:transport=vde,vnl=slirp://``
+
+use slirp
+
+ ``vec0:transport=vde,vnl=vde:///tmp/switch``
+
+connect to a vde switch
+
+ ``vecX:transport=\"vde,vnl=cmd://ssh remote.host //tmp/sshlirp\"``
+
+connect to a remote slirp (instant VPN: convert ssh to VPN, it uses sshlirp)
+https://github.com/virtualsquare/sshlirp
+
+ ``vec0:transport=vde,vnl=vxvde://234.0.0.1``
+
+connect to a local area cloud (all the UML nodes using the same
+multicast address running on hosts in the same multicast domain (LAN)
+will be automagically connected together to a virtual LAN.
+
Configuring Legacy transports
=============================
diff --git a/Documentation/watchdog/convert_drivers_to_kernel_api.rst b/Documentation/watchdog/convert_drivers_to_kernel_api.rst
index a1c3f03..e83609a 100644
--- a/Documentation/watchdog/convert_drivers_to_kernel_api.rst
+++ b/Documentation/watchdog/convert_drivers_to_kernel_api.rst
@@ -75,7 +75,6 @@
-static const struct file_operations s3c2410wdt_fops = {
- .owner = THIS_MODULE,
- - .llseek = no_llseek,
- .write = s3c2410wdt_write,
- .unlocked_ioctl = s3c2410wdt_ioctl,
- .open = s3c2410wdt_open,
diff --git a/MAINTAINERS b/MAINTAINERS
index 00716c1..c27f319 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5728,8 +5728,7 @@
S: Maintained
F: Documentation/driver-api/cxl
F: drivers/cxl/
-F: include/linux/einj-cxl.h
-F: include/linux/cxl-event.h
+F: include/cxl/
F: include/uapi/linux/cxl_mem.h
F: tools/testing/cxl/
@@ -15679,6 +15678,9 @@
MODULE SUPPORT
M: Luis Chamberlain <mcgrof@kernel.org>
+R: Petr Pavlu <petr.pavlu@suse.com>
+R: Sami Tolvanen <samitolvanen@google.com>
+R: Daniel Gomez <da.gomez@samsung.com>
L: linux-modules@vger.kernel.org
L: linux-kernel@vger.kernel.org
S: Maintained
@@ -19345,10 +19347,7 @@
F: include/linux/random.h
F: include/uapi/linux/random.h
F: drivers/virt/vmgenid.c
-F: include/vdso/getrandom.h
-F: lib/vdso/getrandom.c
-F: arch/x86/entry/vdso/vgetrandom*
-F: arch/x86/include/asm/vdso/getrandom*
+N: ^.*/vdso/[^/]*getrandom[^/]+$
RAPIDIO SUBSYSTEM
M: Matt Porter <mporter@kernel.crashing.org>
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe07641..a0d01c4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2164,7 +2164,7 @@
}
}
-int kvm_arch_hardware_enable(void)
+int kvm_arch_enable_virtualization_cpu(void)
{
/*
* Most calls to this function are made with migration
@@ -2184,7 +2184,7 @@
return 0;
}
-void kvm_arch_hardware_disable(void)
+void kvm_arch_disable_virtualization_cpu(void)
{
kvm_timer_cpu_down();
kvm_vgic_cpu_down();
@@ -2380,7 +2380,7 @@
/*
* The stub hypercalls are now disabled, so set our local flag to
- * prevent a later re-init attempt in kvm_arch_hardware_enable().
+ * prevent a later re-init attempt in kvm_arch_enable_virtualization_cpu().
*/
__this_cpu_write(kvm_hyp_initialized, 1);
preempt_enable();
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 0eb0436..bb35c34 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -25,6 +25,8 @@
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_DEVMAP
select ARCH_HAS_PTE_SPECIAL
+ select ARCH_HAS_SET_MEMORY
+ select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_INLINE_READ_LOCK if !PREEMPTION
select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
@@ -82,6 +84,7 @@
select GENERIC_CMOS_UPDATE
select GENERIC_CPU_AUTOPROBE
select GENERIC_CPU_DEVICES
+ select GENERIC_CPU_VULNERABILITIES
select GENERIC_ENTRY
select GENERIC_GETTIMEOFDAY
select GENERIC_IOREMAP if !ARCH_IOREMAP
@@ -147,7 +150,7 @@
select HAVE_LIVEPATCH
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI
- select HAVE_OBJTOOL if AS_HAS_EXPLICIT_RELOCS && AS_HAS_THIN_ADD_SUB && !CC_IS_CLANG
+ select HAVE_OBJTOOL if AS_HAS_EXPLICIT_RELOCS && AS_HAS_THIN_ADD_SUB
select HAVE_PCI
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
@@ -267,7 +270,7 @@
def_bool $(as-instr,movfcsr2gr \$t0$(comma)\$fcsr0)
config AS_HAS_THIN_ADD_SUB
- def_bool $(cc-option,-Wa$(comma)-mthin-add-sub)
+ def_bool $(cc-option,-Wa$(comma)-mthin-add-sub) || AS_IS_LLVM
config AS_HAS_LSX_EXTENSION
def_bool $(as-instr,vld \$vr0$(comma)\$a0$(comma)0)
diff --git a/arch/loongarch/include/asm/atomic.h b/arch/loongarch/include/asm/atomic.h
index 99af8b3..c86f0ab 100644
--- a/arch/loongarch/include/asm/atomic.h
+++ b/arch/loongarch/include/asm/atomic.h
@@ -15,6 +15,7 @@
#define __LL "ll.w "
#define __SC "sc.w "
#define __AMADD "amadd.w "
+#define __AMOR "amor.w "
#define __AMAND_DB "amand_db.w "
#define __AMOR_DB "amor_db.w "
#define __AMXOR_DB "amxor_db.w "
@@ -22,6 +23,7 @@
#define __LL "ll.d "
#define __SC "sc.d "
#define __AMADD "amadd.d "
+#define __AMOR "amor.d "
#define __AMAND_DB "amand_db.d "
#define __AMOR_DB "amor_db.d "
#define __AMXOR_DB "amxor_db.d "
diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
index 16a716f..fc83bb3 100644
--- a/arch/loongarch/include/asm/cpu-features.h
+++ b/arch/loongarch/include/asm/cpu-features.h
@@ -51,6 +51,7 @@
#define cpu_has_lbt_mips cpu_opt(LOONGARCH_CPU_LBT_MIPS)
#define cpu_has_lbt (cpu_has_lbt_x86|cpu_has_lbt_arm|cpu_has_lbt_mips)
#define cpu_has_csr cpu_opt(LOONGARCH_CPU_CSR)
+#define cpu_has_iocsr cpu_opt(LOONGARCH_CPU_IOCSR)
#define cpu_has_tlb cpu_opt(LOONGARCH_CPU_TLB)
#define cpu_has_watch cpu_opt(LOONGARCH_CPU_WATCH)
#define cpu_has_vint cpu_opt(LOONGARCH_CPU_VINT)
@@ -65,6 +66,7 @@
#define cpu_has_guestid cpu_opt(LOONGARCH_CPU_GUESTID)
#define cpu_has_hypervisor cpu_opt(LOONGARCH_CPU_HYPERVISOR)
#define cpu_has_ptw cpu_opt(LOONGARCH_CPU_PTW)
+#define cpu_has_lspw cpu_opt(LOONGARCH_CPU_LSPW)
#define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
#endif /* __ASM_CPU_FEATURES_H */
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
index 843f9c4..98cf4d7 100644
--- a/arch/loongarch/include/asm/cpu.h
+++ b/arch/loongarch/include/asm/cpu.h
@@ -87,19 +87,21 @@
#define CPU_FEATURE_LBT_MIPS 12 /* CPU has MIPS Binary Translation */
#define CPU_FEATURE_TLB 13 /* CPU has TLB */
#define CPU_FEATURE_CSR 14 /* CPU has CSR */
-#define CPU_FEATURE_WATCH 15 /* CPU has watchpoint registers */
-#define CPU_FEATURE_VINT 16 /* CPU has vectored interrupts */
-#define CPU_FEATURE_CSRIPI 17 /* CPU has CSR-IPI */
-#define CPU_FEATURE_EXTIOI 18 /* CPU has EXT-IOI */
-#define CPU_FEATURE_PREFETCH 19 /* CPU has prefetch instructions */
-#define CPU_FEATURE_PMP 20 /* CPU has perfermance counter */
-#define CPU_FEATURE_SCALEFREQ 21 /* CPU supports cpufreq scaling */
-#define CPU_FEATURE_FLATMODE 22 /* CPU has flat mode */
-#define CPU_FEATURE_EIODECODE 23 /* CPU has EXTIOI interrupt pin decode mode */
-#define CPU_FEATURE_GUESTID 24 /* CPU has GuestID feature */
-#define CPU_FEATURE_HYPERVISOR 25 /* CPU has hypervisor (running in VM) */
-#define CPU_FEATURE_PTW 26 /* CPU has hardware page table walker */
-#define CPU_FEATURE_AVECINT 27 /* CPU has avec interrupt */
+#define CPU_FEATURE_IOCSR 15 /* CPU has IOCSR */
+#define CPU_FEATURE_WATCH 16 /* CPU has watchpoint registers */
+#define CPU_FEATURE_VINT 17 /* CPU has vectored interrupts */
+#define CPU_FEATURE_CSRIPI 18 /* CPU has CSR-IPI */
+#define CPU_FEATURE_EXTIOI 19 /* CPU has EXT-IOI */
+#define CPU_FEATURE_PREFETCH 20 /* CPU has prefetch instructions */
+#define CPU_FEATURE_PMP 21 /* CPU has perfermance counter */
+#define CPU_FEATURE_SCALEFREQ 22 /* CPU supports cpufreq scaling */
+#define CPU_FEATURE_FLATMODE 23 /* CPU has flat mode */
+#define CPU_FEATURE_EIODECODE 24 /* CPU has EXTIOI interrupt pin decode mode */
+#define CPU_FEATURE_GUESTID 25 /* CPU has GuestID feature */
+#define CPU_FEATURE_HYPERVISOR 26 /* CPU has hypervisor (running in VM) */
+#define CPU_FEATURE_PTW 27 /* CPU has hardware page table walker */
+#define CPU_FEATURE_LSPW 28 /* CPU has LSPW (lddir/ldpte instructions) */
+#define CPU_FEATURE_AVECINT 29 /* CPU has AVEC interrupt */
#define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
#define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
@@ -115,6 +117,7 @@
#define LOONGARCH_CPU_LBT_ARM BIT_ULL(CPU_FEATURE_LBT_ARM)
#define LOONGARCH_CPU_LBT_MIPS BIT_ULL(CPU_FEATURE_LBT_MIPS)
#define LOONGARCH_CPU_TLB BIT_ULL(CPU_FEATURE_TLB)
+#define LOONGARCH_CPU_IOCSR BIT_ULL(CPU_FEATURE_IOCSR)
#define LOONGARCH_CPU_CSR BIT_ULL(CPU_FEATURE_CSR)
#define LOONGARCH_CPU_WATCH BIT_ULL(CPU_FEATURE_WATCH)
#define LOONGARCH_CPU_VINT BIT_ULL(CPU_FEATURE_VINT)
@@ -128,6 +131,7 @@
#define LOONGARCH_CPU_GUESTID BIT_ULL(CPU_FEATURE_GUESTID)
#define LOONGARCH_CPU_HYPERVISOR BIT_ULL(CPU_FEATURE_HYPERVISOR)
#define LOONGARCH_CPU_PTW BIT_ULL(CPU_FEATURE_PTW)
+#define LOONGARCH_CPU_LSPW BIT_ULL(CPU_FEATURE_LSPW)
#define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
#endif /* _ASM_CPU_H */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index 04bf1a7..2654241 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -62,6 +62,7 @@
#define LOONGARCH_CPUCFG1 0x1
#define CPUCFG1_ISGR32 BIT(0)
#define CPUCFG1_ISGR64 BIT(1)
+#define CPUCFG1_ISA GENMASK(1, 0)
#define CPUCFG1_PAGING BIT(2)
#define CPUCFG1_IOCSR BIT(3)
#define CPUCFG1_PABITS GENMASK(11, 4)
diff --git a/arch/loongarch/include/asm/mmu_context.h b/arch/loongarch/include/asm/mmu_context.h
index 9f97c34..304363b 100644
--- a/arch/loongarch/include/asm/mmu_context.h
+++ b/arch/loongarch/include/asm/mmu_context.h
@@ -49,12 +49,12 @@
/* Normal, classic get_new_mmu_context */
static inline void
-get_new_mmu_context(struct mm_struct *mm, unsigned long cpu)
+get_new_mmu_context(struct mm_struct *mm, unsigned long cpu, bool *need_flush)
{
u64 asid = asid_cache(cpu);
if (!((++asid) & cpu_asid_mask(&cpu_data[cpu])))
- local_flush_tlb_user(); /* start new asid cycle */
+ *need_flush = true; /* start new asid cycle */
cpu_context(cpu, mm) = asid_cache(cpu) = asid;
}
@@ -74,21 +74,34 @@
return 0;
}
+static inline void atomic_update_pgd_asid(unsigned long asid, unsigned long pgdl)
+{
+ __asm__ __volatile__(
+ "csrwr %[pgdl_val], %[pgdl_reg] \n\t"
+ "csrwr %[asid_val], %[asid_reg] \n\t"
+ : [asid_val] "+r" (asid), [pgdl_val] "+r" (pgdl)
+ : [asid_reg] "i" (LOONGARCH_CSR_ASID), [pgdl_reg] "i" (LOONGARCH_CSR_PGDL)
+ : "memory"
+ );
+}
+
static inline void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
struct task_struct *tsk)
{
+ bool need_flush = false;
unsigned int cpu = smp_processor_id();
/* Check if our ASID is of an older version and thus invalid */
if (!asid_valid(next, cpu))
- get_new_mmu_context(next, cpu);
-
- write_csr_asid(cpu_asid(cpu, next));
+ get_new_mmu_context(next, cpu, &need_flush);
if (next != &init_mm)
- csr_write64((unsigned long)next->pgd, LOONGARCH_CSR_PGDL);
+ atomic_update_pgd_asid(cpu_asid(cpu, next), (unsigned long)next->pgd);
else
- csr_write64((unsigned long)invalid_pg_dir, LOONGARCH_CSR_PGDL);
+ atomic_update_pgd_asid(cpu_asid(cpu, next), (unsigned long)invalid_pg_dir);
+
+ if (need_flush)
+ local_flush_tlb_user(); /* Flush tlb after update ASID */
/*
* Mark current->active_mm as not "active" anymore.
@@ -135,9 +148,15 @@
asid = read_csr_asid() & cpu_asid_mask(¤t_cpu_data);
if (asid == cpu_asid(cpu, mm)) {
+ bool need_flush = false;
+
if (!current->mm || (current->mm == mm)) {
- get_new_mmu_context(mm, cpu);
+ get_new_mmu_context(mm, cpu, &need_flush);
+
write_csr_asid(cpu_asid(cpu, mm));
+ if (need_flush)
+ local_flush_tlb_user(); /* Flush tlb after update ASID */
+
goto out;
}
}
diff --git a/arch/loongarch/include/asm/percpu.h b/arch/loongarch/include/asm/percpu.h
index 8f290e5..87be9b1 100644
--- a/arch/loongarch/include/asm/percpu.h
+++ b/arch/loongarch/include/asm/percpu.h
@@ -68,75 +68,6 @@
PERCPU_OP(or, or, |)
#undef PERCPU_OP
-static __always_inline unsigned long __percpu_read(void __percpu *ptr, int size)
-{
- unsigned long ret;
-
- switch (size) {
- case 1:
- __asm__ __volatile__ ("ldx.b %[ret], $r21, %[ptr] \n"
- : [ret] "=&r"(ret)
- : [ptr] "r"(ptr)
- : "memory");
- break;
- case 2:
- __asm__ __volatile__ ("ldx.h %[ret], $r21, %[ptr] \n"
- : [ret] "=&r"(ret)
- : [ptr] "r"(ptr)
- : "memory");
- break;
- case 4:
- __asm__ __volatile__ ("ldx.w %[ret], $r21, %[ptr] \n"
- : [ret] "=&r"(ret)
- : [ptr] "r"(ptr)
- : "memory");
- break;
- case 8:
- __asm__ __volatile__ ("ldx.d %[ret], $r21, %[ptr] \n"
- : [ret] "=&r"(ret)
- : [ptr] "r"(ptr)
- : "memory");
- break;
- default:
- ret = 0;
- BUILD_BUG();
- }
-
- return ret;
-}
-
-static __always_inline void __percpu_write(void __percpu *ptr, unsigned long val, int size)
-{
- switch (size) {
- case 1:
- __asm__ __volatile__("stx.b %[val], $r21, %[ptr] \n"
- :
- : [val] "r" (val), [ptr] "r" (ptr)
- : "memory");
- break;
- case 2:
- __asm__ __volatile__("stx.h %[val], $r21, %[ptr] \n"
- :
- : [val] "r" (val), [ptr] "r" (ptr)
- : "memory");
- break;
- case 4:
- __asm__ __volatile__("stx.w %[val], $r21, %[ptr] \n"
- :
- : [val] "r" (val), [ptr] "r" (ptr)
- : "memory");
- break;
- case 8:
- __asm__ __volatile__("stx.d %[val], $r21, %[ptr] \n"
- :
- : [val] "r" (val), [ptr] "r" (ptr)
- : "memory");
- break;
- default:
- BUILD_BUG();
- }
-}
-
static __always_inline unsigned long __percpu_xchg(void *ptr, unsigned long val, int size)
{
switch (size) {
@@ -157,6 +88,33 @@
return 0;
}
+#define __pcpu_op_1(op) op ".b "
+#define __pcpu_op_2(op) op ".h "
+#define __pcpu_op_4(op) op ".w "
+#define __pcpu_op_8(op) op ".d "
+
+#define _percpu_read(size, _pcp) \
+({ \
+ typeof(_pcp) __pcp_ret; \
+ \
+ __asm__ __volatile__( \
+ __pcpu_op_##size("ldx") "%[ret], $r21, %[ptr] \n" \
+ : [ret] "=&r"(__pcp_ret) \
+ : [ptr] "r"(&(_pcp)) \
+ : "memory"); \
+ \
+ __pcp_ret; \
+})
+
+#define _percpu_write(size, _pcp, _val) \
+do { \
+ __asm__ __volatile__( \
+ __pcpu_op_##size("stx") "%[val], $r21, %[ptr] \n" \
+ : \
+ : [val] "r"(_val), [ptr] "r"(&(_pcp)) \
+ : "memory"); \
+} while (0)
+
/* this_cpu_cmpxchg */
#define _protect_cmpxchg_local(pcp, o, n) \
({ \
@@ -167,18 +125,6 @@
__ret; \
})
-#define _percpu_read(pcp) \
-({ \
- typeof(pcp) __retval; \
- __retval = (typeof(pcp))__percpu_read(&(pcp), sizeof(pcp)); \
- __retval; \
-})
-
-#define _percpu_write(pcp, val) \
-do { \
- __percpu_write(&(pcp), (unsigned long)(val), sizeof(pcp)); \
-} while (0) \
-
#define _pcp_protect(operation, pcp, val) \
({ \
typeof(pcp) __retval; \
@@ -215,15 +161,15 @@
#define this_cpu_or_4(pcp, val) _percpu_or(pcp, val)
#define this_cpu_or_8(pcp, val) _percpu_or(pcp, val)
-#define this_cpu_read_1(pcp) _percpu_read(pcp)
-#define this_cpu_read_2(pcp) _percpu_read(pcp)
-#define this_cpu_read_4(pcp) _percpu_read(pcp)
-#define this_cpu_read_8(pcp) _percpu_read(pcp)
+#define this_cpu_read_1(pcp) _percpu_read(1, pcp)
+#define this_cpu_read_2(pcp) _percpu_read(2, pcp)
+#define this_cpu_read_4(pcp) _percpu_read(4, pcp)
+#define this_cpu_read_8(pcp) _percpu_read(8, pcp)
-#define this_cpu_write_1(pcp, val) _percpu_write(pcp, val)
-#define this_cpu_write_2(pcp, val) _percpu_write(pcp, val)
-#define this_cpu_write_4(pcp, val) _percpu_write(pcp, val)
-#define this_cpu_write_8(pcp, val) _percpu_write(pcp, val)
+#define this_cpu_write_1(pcp, val) _percpu_write(1, pcp, val)
+#define this_cpu_write_2(pcp, val) _percpu_write(2, pcp, val)
+#define this_cpu_write_4(pcp, val) _percpu_write(4, pcp, val)
+#define this_cpu_write_8(pcp, val) _percpu_write(8, pcp, val)
#define this_cpu_xchg_1(pcp, val) _percpu_xchg(pcp, val)
#define this_cpu_xchg_2(pcp, val) _percpu_xchg(pcp, val)
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index 85431f2..9965f52 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -331,29 +331,23 @@
* Make sure the buddy is global too (if it's !none,
* it better already be global)
*/
+ if (pte_none(ptep_get(buddy))) {
#ifdef CONFIG_SMP
- /*
- * For SMP, multiple CPUs can race, so we need to do
- * this atomically.
- */
- unsigned long page_global = _PAGE_GLOBAL;
- unsigned long tmp;
+ /*
+ * For SMP, multiple CPUs can race, so we need
+ * to do this atomically.
+ */
+ __asm__ __volatile__(
+ __AMOR "$zero, %[global], %[buddy] \n"
+ : [buddy] "+ZB" (buddy->pte)
+ : [global] "r" (_PAGE_GLOBAL)
+ : "memory");
- __asm__ __volatile__ (
- "1:" __LL "%[tmp], %[buddy] \n"
- " bnez %[tmp], 2f \n"
- " or %[tmp], %[tmp], %[global] \n"
- __SC "%[tmp], %[buddy] \n"
- " beqz %[tmp], 1b \n"
- " nop \n"
- "2: \n"
- __WEAK_LLSC_MB
- : [buddy] "+m" (buddy->pte), [tmp] "=&r" (tmp)
- : [global] "r" (page_global));
+ DBAR(0b11000); /* o_wrw = 0b11000 */
#else /* !CONFIG_SMP */
- if (pte_none(ptep_get(buddy)))
WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
#endif /* CONFIG_SMP */
+ }
}
}
diff --git a/arch/loongarch/include/asm/set_memory.h b/arch/loongarch/include/asm/set_memory.h
new file mode 100644
index 0000000..d70505b
--- /dev/null
+++ b/arch/loongarch/include/asm/set_memory.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2024 Loongson Technology Corporation Limited
+ */
+
+#ifndef _ASM_LOONGARCH_SET_MEMORY_H
+#define _ASM_LOONGARCH_SET_MEMORY_H
+
+/*
+ * Functions to change memory attributes.
+ */
+int set_memory_x(unsigned long addr, int numpages);
+int set_memory_nx(unsigned long addr, int numpages);
+int set_memory_ro(unsigned long addr, int numpages);
+int set_memory_rw(unsigned long addr, int numpages);
+
+bool kernel_page_present(struct page *page);
+int set_direct_map_default_noflush(struct page *page);
+int set_direct_map_invalid_noflush(struct page *page);
+
+#endif /* _ASM_LOONGARCH_SET_MEMORY_H */
diff --git a/arch/loongarch/include/uapi/asm/hwcap.h b/arch/loongarch/include/uapi/asm/hwcap.h
index 6955a7c..2b34e56c 100644
--- a/arch/loongarch/include/uapi/asm/hwcap.h
+++ b/arch/loongarch/include/uapi/asm/hwcap.h
@@ -17,5 +17,6 @@
#define HWCAP_LOONGARCH_LBT_ARM (1 << 11)
#define HWCAP_LOONGARCH_LBT_MIPS (1 << 12)
#define HWCAP_LOONGARCH_PTW (1 << 13)
+#define HWCAP_LOONGARCH_LSPW (1 << 14)
#endif /* _UAPI_ASM_HWCAP_H */
diff --git a/arch/loongarch/include/uapi/asm/sigcontext.h b/arch/loongarch/include/uapi/asm/sigcontext.h
index 6c22f61..5cd1212 100644
--- a/arch/loongarch/include/uapi/asm/sigcontext.h
+++ b/arch/loongarch/include/uapi/asm/sigcontext.h
@@ -9,7 +9,6 @@
#define _UAPI_ASM_SIGCONTEXT_H
#include <linux/types.h>
-#include <linux/posix_types.h>
/* FP context was used */
#define SC_USED_FP (1 << 0)
diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
index 929a497..f1a74b8 100644
--- a/arch/loongarch/kernel/acpi.c
+++ b/arch/loongarch/kernel/acpi.c
@@ -9,6 +9,7 @@
#include <linux/init.h>
#include <linux/acpi.h>
+#include <linux/efi-bgrt.h>
#include <linux/irq.h>
#include <linux/irqdomain.h>
#include <linux/memblock.h>
@@ -212,6 +213,9 @@
/* Do not enable ACPI SPCR console by default */
acpi_parse_spcr(earlycon_acpi_spcr_enable, false);
+ if (IS_ENABLED(CONFIG_ACPI_BGRT))
+ acpi_table_parse(ACPI_SIG_BGRT, acpi_parse_bgrt);
+
return;
fdt_earlycon:
diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
index 14f0449..cbce099 100644
--- a/arch/loongarch/kernel/cpu-probe.c
+++ b/arch/loongarch/kernel/cpu-probe.c
@@ -91,12 +91,30 @@
unsigned int config;
unsigned long asid_mask;
- c->options = LOONGARCH_CPU_CPUCFG | LOONGARCH_CPU_CSR |
- LOONGARCH_CPU_TLB | LOONGARCH_CPU_VINT | LOONGARCH_CPU_WATCH;
+ c->options = LOONGARCH_CPU_CPUCFG | LOONGARCH_CPU_CSR | LOONGARCH_CPU_VINT;
elf_hwcap = HWCAP_LOONGARCH_CPUCFG;
config = read_cpucfg(LOONGARCH_CPUCFG1);
+
+ switch (config & CPUCFG1_ISA) {
+ case 0:
+ set_isa(c, LOONGARCH_CPU_ISA_LA32R);
+ break;
+ case 1:
+ set_isa(c, LOONGARCH_CPU_ISA_LA32S);
+ break;
+ case 2:
+ set_isa(c, LOONGARCH_CPU_ISA_LA64);
+ break;
+ default:
+ pr_warn("Warning: unknown ISA level\n");
+ }
+
+ if (config & CPUCFG1_PAGING)
+ c->options |= LOONGARCH_CPU_TLB;
+ if (config & CPUCFG1_IOCSR)
+ c->options |= LOONGARCH_CPU_IOCSR;
if (config & CPUCFG1_UAL) {
c->options |= LOONGARCH_CPU_UAL;
elf_hwcap |= HWCAP_LOONGARCH_UAL;
@@ -139,6 +157,10 @@
c->options |= LOONGARCH_CPU_PTW;
elf_hwcap |= HWCAP_LOONGARCH_PTW;
}
+ if (config & CPUCFG2_LSPW) {
+ c->options |= LOONGARCH_CPU_LSPW;
+ elf_hwcap |= HWCAP_LOONGARCH_LSPW;
+ }
if (config & CPUCFG2_LVZP) {
c->options |= LOONGARCH_CPU_LVZ;
elf_hwcap |= HWCAP_LOONGARCH_LVZ;
@@ -162,22 +184,6 @@
if (config & CPUCFG6_PMP)
c->options |= LOONGARCH_CPU_PMP;
- config = iocsr_read32(LOONGARCH_IOCSR_FEATURES);
- if (config & IOCSRF_CSRIPI)
- c->options |= LOONGARCH_CPU_CSRIPI;
- if (config & IOCSRF_EXTIOI)
- c->options |= LOONGARCH_CPU_EXTIOI;
- if (config & IOCSRF_FREQSCALE)
- c->options |= LOONGARCH_CPU_SCALEFREQ;
- if (config & IOCSRF_FLATMODE)
- c->options |= LOONGARCH_CPU_FLATMODE;
- if (config & IOCSRF_EIODECODE)
- c->options |= LOONGARCH_CPU_EIODECODE;
- if (config & IOCSRF_AVEC)
- c->options |= LOONGARCH_CPU_AVECINT;
- if (config & IOCSRF_VM)
- c->options |= LOONGARCH_CPU_HYPERVISOR;
-
config = csr_read32(LOONGARCH_CSR_ASID);
config = (config & CSR_ASID_BIT) >> CSR_ASID_BIT_SHIFT;
asid_mask = GENMASK(config - 1, 0);
@@ -210,6 +216,9 @@
default:
pr_warn("Warning: unknown TLB type\n");
}
+
+ if (get_num_brps() + get_num_wrps())
+ c->options |= LOONGARCH_CPU_WATCH;
}
#define MAX_NAME_LEN 32
@@ -220,8 +229,45 @@
static inline void cpu_probe_loongson(struct cpuinfo_loongarch *c, unsigned int cpu)
{
+ uint32_t config;
uint64_t *vendor = (void *)(&cpu_full_name[VENDOR_OFFSET]);
uint64_t *cpuname = (void *)(&cpu_full_name[CPUNAME_OFFSET]);
+ const char *core_name = "Unknown";
+
+ switch (BIT(fls(c->isa_level) - 1)) {
+ case LOONGARCH_CPU_ISA_LA32R:
+ case LOONGARCH_CPU_ISA_LA32S:
+ c->cputype = CPU_LOONGSON32;
+ __cpu_family[cpu] = "Loongson-32bit";
+ break;
+ case LOONGARCH_CPU_ISA_LA64:
+ c->cputype = CPU_LOONGSON64;
+ __cpu_family[cpu] = "Loongson-64bit";
+ break;
+ }
+
+ switch (c->processor_id & PRID_SERIES_MASK) {
+ case PRID_SERIES_LA132:
+ core_name = "LA132";
+ break;
+ case PRID_SERIES_LA264:
+ core_name = "LA264";
+ break;
+ case PRID_SERIES_LA364:
+ core_name = "LA364";
+ break;
+ case PRID_SERIES_LA464:
+ core_name = "LA464";
+ break;
+ case PRID_SERIES_LA664:
+ core_name = "LA664";
+ break;
+ }
+
+ pr_info("%s Processor probed (%s Core)\n", __cpu_family[cpu], core_name);
+
+ if (!cpu_has_iocsr)
+ return;
if (!__cpu_full_name[cpu])
__cpu_full_name[cpu] = cpu_full_name;
@@ -229,43 +275,21 @@
*vendor = iocsr_read64(LOONGARCH_IOCSR_VENDOR);
*cpuname = iocsr_read64(LOONGARCH_IOCSR_CPUNAME);
- switch (c->processor_id & PRID_SERIES_MASK) {
- case PRID_SERIES_LA132:
- c->cputype = CPU_LOONGSON32;
- set_isa(c, LOONGARCH_CPU_ISA_LA32S);
- __cpu_family[cpu] = "Loongson-32bit";
- pr_info("32-bit Loongson Processor probed (LA132 Core)\n");
- break;
- case PRID_SERIES_LA264:
- c->cputype = CPU_LOONGSON64;
- set_isa(c, LOONGARCH_CPU_ISA_LA64);
- __cpu_family[cpu] = "Loongson-64bit";
- pr_info("64-bit Loongson Processor probed (LA264 Core)\n");
- break;
- case PRID_SERIES_LA364:
- c->cputype = CPU_LOONGSON64;
- set_isa(c, LOONGARCH_CPU_ISA_LA64);
- __cpu_family[cpu] = "Loongson-64bit";
- pr_info("64-bit Loongson Processor probed (LA364 Core)\n");
- break;
- case PRID_SERIES_LA464:
- c->cputype = CPU_LOONGSON64;
- set_isa(c, LOONGARCH_CPU_ISA_LA64);
- __cpu_family[cpu] = "Loongson-64bit";
- pr_info("64-bit Loongson Processor probed (LA464 Core)\n");
- break;
- case PRID_SERIES_LA664:
- c->cputype = CPU_LOONGSON64;
- set_isa(c, LOONGARCH_CPU_ISA_LA64);
- __cpu_family[cpu] = "Loongson-64bit";
- pr_info("64-bit Loongson Processor probed (LA664 Core)\n");
- break;
- default: /* Default to 64 bit */
- c->cputype = CPU_LOONGSON64;
- set_isa(c, LOONGARCH_CPU_ISA_LA64);
- __cpu_family[cpu] = "Loongson-64bit";
- pr_info("64-bit Loongson Processor probed (Unknown Core)\n");
- }
+ config = iocsr_read32(LOONGARCH_IOCSR_FEATURES);
+ if (config & IOCSRF_CSRIPI)
+ c->options |= LOONGARCH_CPU_CSRIPI;
+ if (config & IOCSRF_EXTIOI)
+ c->options |= LOONGARCH_CPU_EXTIOI;
+ if (config & IOCSRF_FREQSCALE)
+ c->options |= LOONGARCH_CPU_SCALEFREQ;
+ if (config & IOCSRF_FLATMODE)
+ c->options |= LOONGARCH_CPU_FLATMODE;
+ if (config & IOCSRF_EIODECODE)
+ c->options |= LOONGARCH_CPU_EIODECODE;
+ if (config & IOCSRF_AVEC)
+ c->options |= LOONGARCH_CPU_AVECINT;
+ if (config & IOCSRF_VM)
+ c->options |= LOONGARCH_CPU_HYPERVISOR;
}
#ifdef CONFIG_64BIT
diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
index 0d33cbc..6ce46d9 100644
--- a/arch/loongarch/kernel/proc.c
+++ b/arch/loongarch/kernel/proc.c
@@ -31,6 +31,7 @@
static int show_cpuinfo(struct seq_file *m, void *v)
{
unsigned long n = (unsigned long) v - 1;
+ unsigned int isa = cpu_data[n].isa_level;
unsigned int version = cpu_data[n].processor_id & 0xff;
unsigned int fp_version = cpu_data[n].fpu_vers;
struct proc_cpuinfo_notifier_args proc_cpuinfo_notifier_args;
@@ -64,9 +65,11 @@
cpu_pabits + 1, cpu_vabits + 1);
seq_printf(m, "ISA\t\t\t:");
- if (cpu_has_loongarch32)
- seq_printf(m, " loongarch32");
- if (cpu_has_loongarch64)
+ if (isa & LOONGARCH_CPU_ISA_LA32R)
+ seq_printf(m, " loongarch32r");
+ if (isa & LOONGARCH_CPU_ISA_LA32S)
+ seq_printf(m, " loongarch32s");
+ if (isa & LOONGARCH_CPU_ISA_LA64)
seq_printf(m, " loongarch64");
seq_printf(m, "\n");
@@ -81,6 +84,7 @@
if (cpu_has_complex) seq_printf(m, " complex");
if (cpu_has_crypto) seq_printf(m, " crypto");
if (cpu_has_ptw) seq_printf(m, " ptw");
+ if (cpu_has_lspw) seq_printf(m, " lspw");
if (cpu_has_lvz) seq_printf(m, " lvz");
if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
diff --git a/arch/loongarch/kernel/syscall.c b/arch/loongarch/kernel/syscall.c
index ba5d093..168bd97 100644
--- a/arch/loongarch/kernel/syscall.c
+++ b/arch/loongarch/kernel/syscall.c
@@ -79,7 +79,3 @@
syscall_exit_to_user_mode(regs);
}
-
-#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
-STACK_FRAME_NON_STANDARD(do_syscall);
-#endif
diff --git a/arch/loongarch/kvm/main.c b/arch/loongarch/kvm/main.c
index 844736b..27e9b94 100644
--- a/arch/loongarch/kvm/main.c
+++ b/arch/loongarch/kvm/main.c
@@ -261,7 +261,7 @@
return -ENOIOCTLCMD;
}
-int kvm_arch_hardware_enable(void)
+int kvm_arch_enable_virtualization_cpu(void)
{
unsigned long env, gcfg = 0;
@@ -300,7 +300,7 @@
return 0;
}
-void kvm_arch_hardware_disable(void)
+void kvm_arch_disable_virtualization_cpu(void)
{
write_csr_gcfg(0);
write_csr_gstat(0);
diff --git a/arch/loongarch/mm/Makefile b/arch/loongarch/mm/Makefile
index e4d1e58..278be2c 100644
--- a/arch/loongarch/mm/Makefile
+++ b/arch/loongarch/mm/Makefile
@@ -4,7 +4,8 @@
#
obj-y += init.o cache.o tlb.o tlbex.o extable.o \
- fault.o ioremap.o maccess.o mmap.o pgtable.o page.o
+ fault.o ioremap.o maccess.o mmap.o pgtable.o \
+ page.o pageattr.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_KASAN) += kasan_init.o
diff --git a/arch/loongarch/mm/fault.c b/arch/loongarch/mm/fault.c
index 97b40de..deefd96 100644
--- a/arch/loongarch/mm/fault.c
+++ b/arch/loongarch/mm/fault.c
@@ -31,11 +31,52 @@
int show_unhandled_signals = 1;
+static int __kprobes spurious_fault(unsigned long write, unsigned long address)
+{
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ if (!(address & __UA_LIMIT))
+ return 0;
+
+ pgd = pgd_offset_k(address);
+ if (!pgd_present(pgdp_get(pgd)))
+ return 0;
+
+ p4d = p4d_offset(pgd, address);
+ if (!p4d_present(p4dp_get(p4d)))
+ return 0;
+
+ pud = pud_offset(p4d, address);
+ if (!pud_present(pudp_get(pud)))
+ return 0;
+
+ pmd = pmd_offset(pud, address);
+ if (!pmd_present(pmdp_get(pmd)))
+ return 0;
+
+ if (pmd_leaf(*pmd)) {
+ return write ? pmd_write(pmdp_get(pmd)) : 1;
+ } else {
+ pte = pte_offset_kernel(pmd, address);
+ if (!pte_present(ptep_get(pte)))
+ return 0;
+
+ return write ? pte_write(ptep_get(pte)) : 1;
+ }
+}
+
static void __kprobes no_context(struct pt_regs *regs,
unsigned long write, unsigned long address)
{
const int field = sizeof(unsigned long) * 2;
+ if (spurious_fault(write, address))
+ return;
+
/* Are we prepared to handle this kernel fault? */
if (fixup_exception(regs))
return;
diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c
new file mode 100644
index 0000000..ffd8d76
--- /dev/null
+++ b/arch/loongarch/mm/pageattr.c
@@ -0,0 +1,218 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Loongson Technology Corporation Limited
+ */
+
+#include <linux/pagewalk.h>
+#include <linux/pgtable.h>
+#include <asm/set_memory.h>
+#include <asm/tlbflush.h>
+
+struct pageattr_masks {
+ pgprot_t set_mask;
+ pgprot_t clear_mask;
+};
+
+static unsigned long set_pageattr_masks(unsigned long val, struct mm_walk *walk)
+{
+ unsigned long new_val = val;
+ struct pageattr_masks *masks = walk->private;
+
+ new_val &= ~(pgprot_val(masks->clear_mask));
+ new_val |= (pgprot_val(masks->set_mask));
+
+ return new_val;
+}
+
+static int pageattr_pgd_entry(pgd_t *pgd, unsigned long addr,
+ unsigned long next, struct mm_walk *walk)
+{
+ pgd_t val = pgdp_get(pgd);
+
+ if (pgd_leaf(val)) {
+ val = __pgd(set_pageattr_masks(pgd_val(val), walk));
+ set_pgd(pgd, val);
+ }
+
+ return 0;
+}
+
+static int pageattr_p4d_entry(p4d_t *p4d, unsigned long addr,
+ unsigned long next, struct mm_walk *walk)
+{
+ p4d_t val = p4dp_get(p4d);
+
+ if (p4d_leaf(val)) {
+ val = __p4d(set_pageattr_masks(p4d_val(val), walk));
+ set_p4d(p4d, val);
+ }
+
+ return 0;
+}
+
+static int pageattr_pud_entry(pud_t *pud, unsigned long addr,
+ unsigned long next, struct mm_walk *walk)
+{
+ pud_t val = pudp_get(pud);
+
+ if (pud_leaf(val)) {
+ val = __pud(set_pageattr_masks(pud_val(val), walk));
+ set_pud(pud, val);
+ }
+
+ return 0;
+}
+
+static int pageattr_pmd_entry(pmd_t *pmd, unsigned long addr,
+ unsigned long next, struct mm_walk *walk)
+{
+ pmd_t val = pmdp_get(pmd);
+
+ if (pmd_leaf(val)) {
+ val = __pmd(set_pageattr_masks(pmd_val(val), walk));
+ set_pmd(pmd, val);
+ }
+
+ return 0;
+}
+
+static int pageattr_pte_entry(pte_t *pte, unsigned long addr,
+ unsigned long next, struct mm_walk *walk)
+{
+ pte_t val = ptep_get(pte);
+
+ val = __pte(set_pageattr_masks(pte_val(val), walk));
+ set_pte(pte, val);
+
+ return 0;
+}
+
+static int pageattr_pte_hole(unsigned long addr, unsigned long next,
+ int depth, struct mm_walk *walk)
+{
+ return 0;
+}
+
+static const struct mm_walk_ops pageattr_ops = {
+ .pgd_entry = pageattr_pgd_entry,
+ .p4d_entry = pageattr_p4d_entry,
+ .pud_entry = pageattr_pud_entry,
+ .pmd_entry = pageattr_pmd_entry,
+ .pte_entry = pageattr_pte_entry,
+ .pte_hole = pageattr_pte_hole,
+ .walk_lock = PGWALK_RDLOCK,
+};
+
+static int __set_memory(unsigned long addr, int numpages, pgprot_t set_mask, pgprot_t clear_mask)
+{
+ int ret;
+ unsigned long start = addr;
+ unsigned long end = start + PAGE_SIZE * numpages;
+ struct pageattr_masks masks = {
+ .set_mask = set_mask,
+ .clear_mask = clear_mask
+ };
+
+ if (!numpages)
+ return 0;
+
+ mmap_write_lock(&init_mm);
+ ret = walk_page_range_novma(&init_mm, start, end, &pageattr_ops, NULL, &masks);
+ mmap_write_unlock(&init_mm);
+
+ flush_tlb_kernel_range(start, end);
+
+ return ret;
+}
+
+int set_memory_x(unsigned long addr, int numpages)
+{
+ if (addr < vm_map_base)
+ return 0;
+
+ return __set_memory(addr, numpages, __pgprot(0), __pgprot(_PAGE_NO_EXEC));
+}
+
+int set_memory_nx(unsigned long addr, int numpages)
+{
+ if (addr < vm_map_base)
+ return 0;
+
+ return __set_memory(addr, numpages, __pgprot(_PAGE_NO_EXEC), __pgprot(0));
+}
+
+int set_memory_ro(unsigned long addr, int numpages)
+{
+ if (addr < vm_map_base)
+ return 0;
+
+ return __set_memory(addr, numpages, __pgprot(0), __pgprot(_PAGE_WRITE | _PAGE_DIRTY));
+}
+
+int set_memory_rw(unsigned long addr, int numpages)
+{
+ if (addr < vm_map_base)
+ return 0;
+
+ return __set_memory(addr, numpages, __pgprot(_PAGE_WRITE | _PAGE_DIRTY), __pgprot(0));
+}
+
+bool kernel_page_present(struct page *page)
+{
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+ unsigned long addr = (unsigned long)page_address(page);
+
+ if (addr < vm_map_base)
+ return true;
+
+ pgd = pgd_offset_k(addr);
+ if (pgd_none(pgdp_get(pgd)))
+ return false;
+ if (pgd_leaf(pgdp_get(pgd)))
+ return true;
+
+ p4d = p4d_offset(pgd, addr);
+ if (p4d_none(p4dp_get(p4d)))
+ return false;
+ if (p4d_leaf(p4dp_get(p4d)))
+ return true;
+
+ pud = pud_offset(p4d, addr);
+ if (pud_none(pudp_get(pud)))
+ return false;
+ if (pud_leaf(pudp_get(pud)))
+ return true;
+
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(pmdp_get(pmd)))
+ return false;
+ if (pmd_leaf(pmdp_get(pmd)))
+ return true;
+
+ pte = pte_offset_kernel(pmd, addr);
+ return pte_present(ptep_get(pte));
+}
+
+int set_direct_map_default_noflush(struct page *page)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+
+ if (addr < vm_map_base)
+ return 0;
+
+ return __set_memory(addr, 1, PAGE_KERNEL, __pgprot(0));
+}
+
+int set_direct_map_invalid_noflush(struct page *page)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+
+ if (addr < vm_map_base)
+ return 0;
+
+ return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID));
+}
diff --git a/arch/loongarch/pci/acpi.c b/arch/loongarch/pci/acpi.c
index 365f7de..1da4dc4 100644
--- a/arch/loongarch/pci/acpi.c
+++ b/arch/loongarch/pci/acpi.c
@@ -225,6 +225,7 @@
if (bus) {
memcpy(bus->sysdata, info->cfg, sizeof(struct pci_config_window));
kfree(info);
+ kfree(root_ops);
} else {
struct pci_bus *child;
diff --git a/arch/loongarch/vdso/vgetrandom-chacha.S b/arch/loongarch/vdso/vgetrandom-chacha.S
index 7e86a50..c2733e6 100644
--- a/arch/loongarch/vdso/vgetrandom-chacha.S
+++ b/arch/loongarch/vdso/vgetrandom-chacha.S
@@ -9,23 +9,11 @@
.text
-/* Salsa20 quarter-round */
-.macro QR a b c d
- add.w \a, \a, \b
- xor \d, \d, \a
- rotri.w \d, \d, 16
-
- add.w \c, \c, \d
- xor \b, \b, \c
- rotri.w \b, \b, 20
-
- add.w \a, \a, \b
- xor \d, \d, \a
- rotri.w \d, \d, 24
-
- add.w \c, \c, \d
- xor \b, \b, \c
- rotri.w \b, \b, 25
+.macro OP_4REG op d0 d1 d2 d3 s0 s1 s2 s3
+ \op \d0, \d0, \s0
+ \op \d1, \d1, \s1
+ \op \d2, \d2, \s2
+ \op \d3, \d3, \s3
.endm
/*
@@ -74,6 +62,23 @@
/* Reuse i as copy3 */
#define copy3 i
+/* Packs to be used with OP_4REG */
+#define line0 state0, state1, state2, state3
+#define line1 state4, state5, state6, state7
+#define line2 state8, state9, state10, state11
+#define line3 state12, state13, state14, state15
+
+#define line1_perm state5, state6, state7, state4
+#define line2_perm state10, state11, state8, state9
+#define line3_perm state15, state12, state13, state14
+
+#define copy copy0, copy1, copy2, copy3
+
+#define _16 16, 16, 16, 16
+#define _20 20, 20, 20, 20
+#define _24 24, 24, 24, 24
+#define _25 25, 25, 25, 25
+
/*
* The ABI requires s0-s9 saved, and sp aligned to 16-byte.
* This does not violate the stack-less requirement: no sensitive data
@@ -126,16 +131,38 @@
li.w i, 10
.Lpermute:
/* odd round */
- QR state0, state4, state8, state12
- QR state1, state5, state9, state13
- QR state2, state6, state10, state14
- QR state3, state7, state11, state15
+ OP_4REG add.w line0, line1
+ OP_4REG xor line3, line0
+ OP_4REG rotri.w line3, _16
+
+ OP_4REG add.w line2, line3
+ OP_4REG xor line1, line2
+ OP_4REG rotri.w line1, _20
+
+ OP_4REG add.w line0, line1
+ OP_4REG xor line3, line0
+ OP_4REG rotri.w line3, _24
+
+ OP_4REG add.w line2, line3
+ OP_4REG xor line1, line2
+ OP_4REG rotri.w line1, _25
/* even round */
- QR state0, state5, state10, state15
- QR state1, state6, state11, state12
- QR state2, state7, state8, state13
- QR state3, state4, state9, state14
+ OP_4REG add.w line0, line1_perm
+ OP_4REG xor line3_perm, line0
+ OP_4REG rotri.w line3_perm, _16
+
+ OP_4REG add.w line2_perm, line3_perm
+ OP_4REG xor line1_perm, line2_perm
+ OP_4REG rotri.w line1_perm, _20
+
+ OP_4REG add.w line0, line1_perm
+ OP_4REG xor line3_perm, line0
+ OP_4REG rotri.w line3_perm, _24
+
+ OP_4REG add.w line2_perm, line3_perm
+ OP_4REG xor line1_perm, line2_perm
+ OP_4REG rotri.w line1_perm, _25
addi.w i, i, -1
bnez i, .Lpermute
@@ -147,10 +174,7 @@
li.w copy3, 0x6b206574
/* output[0,1,2,3] = copy[0,1,2,3] + state[0,1,2,3] */
- add.w state0, state0, copy0
- add.w state1, state1, copy1
- add.w state2, state2, copy2
- add.w state3, state3, copy3
+ OP_4REG add.w line0, copy
st.w state0, output, 0
st.w state1, output, 4
st.w state2, output, 8
@@ -165,10 +189,7 @@
ld.w state3, key, 12
/* output[4,5,6,7] = state[0,1,2,3] + state[4,5,6,7] */
- add.w state4, state4, state0
- add.w state5, state5, state1
- add.w state6, state6, state2
- add.w state7, state7, state3
+ OP_4REG add.w line1, line0
st.w state4, output, 16
st.w state5, output, 20
st.w state6, output, 24
@@ -181,10 +202,7 @@
ld.w state3, key, 28
/* output[8,9,10,11] = state[0,1,2,3] + state[8,9,10,11] */
- add.w state8, state8, state0
- add.w state9, state9, state1
- add.w state10, state10, state2
- add.w state11, state11, state3
+ OP_4REG add.w line2, line0
st.w state8, output, 32
st.w state9, output, 36
st.w state10, output, 40
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 6743a57..f7222eb 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -728,8 +728,8 @@
int (*handle_fpe)(struct kvm_vcpu *vcpu);
int (*handle_msa_disabled)(struct kvm_vcpu *vcpu);
int (*handle_guest_exit)(struct kvm_vcpu *vcpu);
- int (*hardware_enable)(void);
- void (*hardware_disable)(void);
+ int (*enable_virtualization_cpu)(void);
+ void (*disable_virtualization_cpu)(void);
int (*check_extension)(struct kvm *kvm, long ext);
int (*vcpu_init)(struct kvm_vcpu *vcpu);
void (*vcpu_uninit)(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index b5de770..60b43ea 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -125,14 +125,14 @@
return 1;
}
-int kvm_arch_hardware_enable(void)
+int kvm_arch_enable_virtualization_cpu(void)
{
- return kvm_mips_callbacks->hardware_enable();
+ return kvm_mips_callbacks->enable_virtualization_cpu();
}
-void kvm_arch_hardware_disable(void)
+void kvm_arch_disable_virtualization_cpu(void)
{
- kvm_mips_callbacks->hardware_disable();
+ kvm_mips_callbacks->disable_virtualization_cpu();
}
int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
diff --git a/arch/mips/kvm/vz.c b/arch/mips/kvm/vz.c
index 99d5a71..ccab4d7 100644
--- a/arch/mips/kvm/vz.c
+++ b/arch/mips/kvm/vz.c
@@ -2869,7 +2869,7 @@
return ret + 1;
}
-static int kvm_vz_hardware_enable(void)
+static int kvm_vz_enable_virtualization_cpu(void)
{
unsigned int mmu_size, guest_mmu_size, ftlb_size;
u64 guest_cvmctl, cvmvmconfig;
@@ -2983,7 +2983,7 @@
return 0;
}
-static void kvm_vz_hardware_disable(void)
+static void kvm_vz_disable_virtualization_cpu(void)
{
u64 cvmvmconfig;
unsigned int mmu_size;
@@ -3280,8 +3280,8 @@
.handle_msa_disabled = kvm_trap_vz_handle_msa_disabled,
.handle_guest_exit = kvm_trap_vz_handle_guest_exit,
- .hardware_enable = kvm_vz_hardware_enable,
- .hardware_disable = kvm_vz_hardware_disable,
+ .enable_virtualization_cpu = kvm_vz_enable_virtualization_cpu,
+ .disable_virtualization_cpu = kvm_vz_disable_virtualization_cpu,
.check_extension = kvm_vz_check_extension,
.vcpu_init = kvm_vz_vcpu_init,
.vcpu_uninit = kvm_vz_vcpu_uninit,
diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
index b0f0816..5e8e37a 100644
--- a/arch/parisc/kernel/perf.c
+++ b/arch/parisc/kernel/perf.c
@@ -466,7 +466,6 @@
}
static const struct file_operations perf_fops = {
- .llseek = no_llseek,
.read = perf_read,
.write = perf_write,
.unlocked_ioctl = perf_ioctl,
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index bab2ec3..f3427f6 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -20,7 +20,7 @@
return -EINVAL;
}
-int kvm_arch_hardware_enable(void)
+int kvm_arch_enable_virtualization_cpu(void)
{
csr_write(CSR_HEDELEG, KVM_HEDELEG_DEFAULT);
csr_write(CSR_HIDELEG, KVM_HIDELEG_DEFAULT);
@@ -35,7 +35,7 @@
return 0;
}
-void kvm_arch_hardware_disable(void)
+void kvm_arch_disable_virtualization_cpu(void)
{
kvm_riscv_aia_disable();
diff --git a/arch/s390/configs/debug_defconfig b/arch/s390/configs/debug_defconfig
index 7ec1b8c..9b57add 100644
--- a/arch/s390/configs/debug_defconfig
+++ b/arch/s390/configs/debug_defconfig
@@ -59,6 +59,7 @@
CONFIG_APPLDATA_BASE=y
CONFIG_S390_HYPFS_FS=y
CONFIG_KVM=m
+CONFIG_KVM_S390_UCONTROL=y
CONFIG_S390_UNWIND_SELFTEST=m
CONFIG_S390_KPROBES_SANITY_TEST=m
CONFIG_S390_MODULES_SANITY_TEST=m
diff --git a/arch/s390/hypfs/hypfs_dbfs.c b/arch/s390/hypfs/hypfs_dbfs.c
index 0e855c5..5d9effb 100644
--- a/arch/s390/hypfs/hypfs_dbfs.c
+++ b/arch/s390/hypfs/hypfs_dbfs.c
@@ -76,7 +76,6 @@
static const struct file_operations dbfs_ops = {
.read = dbfs_read,
- .llseek = no_llseek,
.unlocked_ioctl = dbfs_ioctl,
};
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index 858beaf..d428635 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -443,7 +443,6 @@
.release = hypfs_release,
.read_iter = hypfs_read_iter,
.write_iter = hypfs_write_iter,
- .llseek = no_llseek,
};
static struct file_system_type hypfs_type = {
diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c
index bce50ca..e62bea9 100644
--- a/arch/s390/kernel/debug.c
+++ b/arch/s390/kernel/debug.c
@@ -163,7 +163,6 @@
.write = debug_input,
.open = debug_open,
.release = debug_close,
- .llseek = no_llseek,
};
static struct dentry *debug_debugfs_root_entry;
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 18b0d02..e2e0aa4 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -1698,7 +1698,6 @@
.release = cfset_release,
.unlocked_ioctl = cfset_ioctl,
.compat_ioctl = cfset_ioctl,
- .llseek = no_llseek
};
static struct miscdevice cfset_dev = {
diff --git a/arch/s390/kernel/sysinfo.c b/arch/s390/kernel/sysinfo.c
index 2be30a9..88055f5 100644
--- a/arch/s390/kernel/sysinfo.c
+++ b/arch/s390/kernel/sysinfo.c
@@ -498,7 +498,6 @@
.open = stsi_open_##fc##_##s1##_##s2, \
.release = stsi_release, \
.read = stsi_read, \
- .llseek = no_llseek, \
};
static int stsi_release(struct inode *inode, struct file *file)
diff --git a/arch/s390/kernel/vdso64/vdso_user_wrapper.S b/arch/s390/kernel/vdso64/vdso_user_wrapper.S
index e26e686..aa06c85 100644
--- a/arch/s390/kernel/vdso64/vdso_user_wrapper.S
+++ b/arch/s390/kernel/vdso64/vdso_user_wrapper.S
@@ -13,10 +13,7 @@
* for details.
*/
.macro vdso_func func
- .globl __kernel_\func
- .type __kernel_\func,@function
- __ALIGN
-__kernel_\func:
+SYM_FUNC_START(__kernel_\func)
CFI_STARTPROC
aghi %r15,-STACK_FRAME_VDSO_OVERHEAD
CFI_DEF_CFA_OFFSET (STACK_FRAME_USER_OVERHEAD + STACK_FRAME_VDSO_OVERHEAD)
@@ -32,7 +29,7 @@
CFI_RESTORE 15
br %r14
CFI_ENDPROC
- .size __kernel_\func,.-__kernel_\func
+SYM_FUNC_END(__kernel_\func)
.endm
vdso_func gettimeofday
@@ -41,16 +38,13 @@
vdso_func getcpu
.macro vdso_syscall func,syscall
- .globl __kernel_\func
- .type __kernel_\func,@function
- __ALIGN
-__kernel_\func:
+SYM_FUNC_START(__kernel_\func)
CFI_STARTPROC
svc \syscall
/* Make sure we notice when a syscall returns, which shouldn't happen */
.word 0
CFI_ENDPROC
- .size __kernel_\func,.-__kernel_\func
+SYM_FUNC_END(__kernel_\func)
.endm
vdso_syscall restart_syscall,__NR_restart_syscall
diff --git a/arch/s390/kernel/vdso64/vgetrandom-chacha.S b/arch/s390/kernel/vdso64/vgetrandom-chacha.S
index d802b0a9..09c034c 100644
--- a/arch/s390/kernel/vdso64/vgetrandom-chacha.S
+++ b/arch/s390/kernel/vdso64/vgetrandom-chacha.S
@@ -1,7 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/stringify.h>
#include <linux/linkage.h>
#include <asm/alternative.h>
+#include <asm/dwarf.h>
#include <asm/fpu-insn.h>
#define STATE0 %v0
@@ -12,9 +14,6 @@
#define COPY1 %v5
#define COPY2 %v6
#define COPY3 %v7
-#define PERM4 %v16
-#define PERM8 %v17
-#define PERM12 %v18
#define BEPERM %v19
#define TMP0 %v20
#define TMP1 %v21
@@ -23,13 +22,11 @@
.section .rodata
- .balign 128
-.Lconstants:
+ .balign 32
+SYM_DATA_START_LOCAL(chacha20_constants)
.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 # endian-neutral
- .long 0x04050607,0x08090a0b,0x0c0d0e0f,0x00010203 # rotl 4 bytes
- .long 0x08090a0b,0x0c0d0e0f,0x00010203,0x04050607 # rotl 8 bytes
- .long 0x0c0d0e0f,0x00010203,0x04050607,0x08090a0b # rotl 12 bytes
.long 0x03020100,0x07060504,0x0b0a0908,0x0f0e0d0c # byte swap
+SYM_DATA_END(chacha20_constants)
.text
/*
@@ -43,13 +40,14 @@
* size_t nblocks)
*/
SYM_FUNC_START(__arch_chacha20_blocks_nostack)
- larl %r1,.Lconstants
+ CFI_STARTPROC
+ larl %r1,chacha20_constants
/* COPY0 = "expand 32-byte k" */
VL COPY0,0,,%r1
- /* PERM4-PERM12,BEPERM = byte selectors for VPERM */
- VLM PERM4,BEPERM,16,%r1
+ /* BEPERM = byte selectors for VPERM */
+ ALTERNATIVE __stringify(VL BEPERM,16,,%r1), "brcl 0,0", ALT_FACILITY(148)
/* COPY1,COPY2 = key */
VLM COPY1,COPY2,0,%r3
@@ -89,11 +87,11 @@
VERLLF STATE1,STATE1,7
/* STATE1[0,1,2,3] = STATE1[1,2,3,0] */
- VPERM STATE1,STATE1,STATE1,PERM4
+ VSLDB STATE1,STATE1,STATE1,4
/* STATE2[0,1,2,3] = STATE2[2,3,0,1] */
- VPERM STATE2,STATE2,STATE2,PERM8
+ VSLDB STATE2,STATE2,STATE2,8
/* STATE3[0,1,2,3] = STATE3[3,0,1,2] */
- VPERM STATE3,STATE3,STATE3,PERM12
+ VSLDB STATE3,STATE3,STATE3,12
/* STATE0 += STATE1, STATE3 = rotl32(STATE3 ^ STATE0, 16) */
VAF STATE0,STATE0,STATE1
@@ -116,32 +114,38 @@
VERLLF STATE1,STATE1,7
/* STATE1[0,1,2,3] = STATE1[3,0,1,2] */
- VPERM STATE1,STATE1,STATE1,PERM12
+ VSLDB STATE1,STATE1,STATE1,12
/* STATE2[0,1,2,3] = STATE2[2,3,0,1] */
- VPERM STATE2,STATE2,STATE2,PERM8
+ VSLDB STATE2,STATE2,STATE2,8
/* STATE3[0,1,2,3] = STATE3[1,2,3,0] */
- VPERM STATE3,STATE3,STATE3,PERM4
+ VSLDB STATE3,STATE3,STATE3,4
brctg %r0,.Ldoubleround
- /* OUTPUT0 = STATE0 + STATE0 */
+ /* OUTPUT0 = STATE0 + COPY0 */
VAF STATE0,STATE0,COPY0
- /* OUTPUT1 = STATE1 + STATE1 */
+ /* OUTPUT1 = STATE1 + COPY1 */
VAF STATE1,STATE1,COPY1
- /* OUTPUT2 = STATE2 + STATE2 */
+ /* OUTPUT2 = STATE2 + COPY2 */
VAF STATE2,STATE2,COPY2
- /* OUTPUT2 = STATE3 + STATE3 */
+ /* OUTPUT3 = STATE3 + COPY3 */
VAF STATE3,STATE3,COPY3
- /*
- * 32 bit wise little endian store to OUTPUT. If the vector
- * enhancement facility 2 is not installed use the slow path.
- */
- ALTERNATIVE "brc 0xf,.Lstoreslow", "nop", ALT_FACILITY(148)
- VSTBRF STATE0,0,,%r2
- VSTBRF STATE1,16,,%r2
- VSTBRF STATE2,32,,%r2
- VSTBRF STATE3,48,,%r2
-.Lstoredone:
+ ALTERNATIVE \
+ __stringify( \
+ /* Convert STATE to little endian and store to OUTPUT */\
+ VPERM TMP0,STATE0,STATE0,BEPERM; \
+ VPERM TMP1,STATE1,STATE1,BEPERM; \
+ VPERM TMP2,STATE2,STATE2,BEPERM; \
+ VPERM TMP3,STATE3,STATE3,BEPERM; \
+ VSTM TMP0,TMP3,0,%r2), \
+ __stringify( \
+ /* 32 bit wise little endian store to OUTPUT */ \
+ VSTBRF STATE0,0,,%r2; \
+ VSTBRF STATE1,16,,%r2; \
+ VSTBRF STATE2,32,,%r2; \
+ VSTBRF STATE3,48,,%r2; \
+ brcl 0,0), \
+ ALT_FACILITY(148)
/* ++COPY3.COUNTER */
/* alsih %r3,1 */
@@ -173,13 +177,5 @@
VZERO TMP3
br %r14
-
-.Lstoreslow:
- /* Convert STATE to little endian format and store to OUTPUT */
- VPERM TMP0,STATE0,STATE0,BEPERM
- VPERM TMP1,STATE1,STATE1,BEPERM
- VPERM TMP2,STATE2,STATE2,BEPERM
- VPERM TMP3,STATE3,STATE3,BEPERM
- VSTM TMP0,TMP3,0,%r2
- j .Lstoredone
+ CFI_ENDPROC
SYM_FUNC_END(__arch_chacha20_blocks_nostack)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0fd9686..bb7134f 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -348,20 +348,29 @@
return cc == 0;
}
-static __always_inline void __insn32_query(unsigned int opcode, u8 *query)
+static __always_inline void __sortl_query(u8 (*query)[32])
{
asm volatile(
" lghi 0,0\n"
- " lgr 1,%[query]\n"
+ " la 1,%[query]\n"
/* Parameter registers are ignored */
- " .insn rrf,%[opc] << 16,2,4,6,0\n"
+ " .insn rre,0xb9380000,2,4\n"
+ : [query] "=R" (*query)
:
- : [query] "d" ((unsigned long)query), [opc] "i" (opcode)
- : "cc", "memory", "0", "1");
+ : "cc", "0", "1");
}
-#define INSN_SORTL 0xb938
-#define INSN_DFLTCC 0xb939
+static __always_inline void __dfltcc_query(u8 (*query)[32])
+{
+ asm volatile(
+ " lghi 0,0\n"
+ " la 1,%[query]\n"
+ /* Parameter registers are ignored */
+ " .insn rrf,0xb9390000,2,4,6,0\n"
+ : [query] "=R" (*query)
+ :
+ : "cc", "0", "1");
+}
static void __init kvm_s390_cpu_feat_init(void)
{
@@ -415,10 +424,10 @@
kvm_s390_available_subfunc.kdsa);
if (test_facility(150)) /* SORTL */
- __insn32_query(INSN_SORTL, kvm_s390_available_subfunc.sortl);
+ __sortl_query(&kvm_s390_available_subfunc.sortl);
if (test_facility(151)) /* DFLTCC */
- __insn32_query(INSN_DFLTCC, kvm_s390_available_subfunc.dfltcc);
+ __dfltcc_query(&kvm_s390_available_subfunc.dfltcc);
if (MACHINE_HAS_ESOP)
allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index ee90a91..6f55a59 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -657,7 +657,6 @@
.release = clp_misc_release,
.unlocked_ioctl = clp_misc_ioctl,
.compat_ioctl = clp_misc_ioctl,
- .llseek = no_llseek,
};
static struct miscdevice clp_misc_device = {
diff --git a/arch/sh/include/asm/irq.h b/arch/sh/include/asm/irq.h
index 0f384b1..53fc18a 100644
--- a/arch/sh/include/asm/irq.h
+++ b/arch/sh/include/asm/irq.h
@@ -14,12 +14,6 @@
#define NO_IRQ_IGNORE ((unsigned int)-1)
/*
- * Simple Mask Register Support
- */
-extern void make_maskreg_irq(unsigned int irq);
-extern unsigned short *irq_mask_register;
-
-/*
* PINT IRQs
*/
void make_imask_irq(unsigned int irq);
diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index dca84fd..c89575d 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -11,7 +11,6 @@
select ARCH_HAS_KCOV
select ARCH_HAS_STRNCPY_FROM_USER
select ARCH_HAS_STRNLEN_USER
- select ARCH_NO_PREEMPT_DYNAMIC
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
diff --git a/arch/um/drivers/harddog_kern.c b/arch/um/drivers/harddog_kern.c
index 99a7144..819aabb 100644
--- a/arch/um/drivers/harddog_kern.c
+++ b/arch/um/drivers/harddog_kern.c
@@ -164,7 +164,6 @@
.compat_ioctl = compat_ptr_ioctl,
.open = harddog_open,
.release = harddog_release,
- .llseek = no_llseek,
};
static struct miscdevice harddog_miscdev = {
diff --git a/arch/um/drivers/hostaudio_kern.c b/arch/um/drivers/hostaudio_kern.c
index c42b793..9d22887 100644
--- a/arch/um/drivers/hostaudio_kern.c
+++ b/arch/um/drivers/hostaudio_kern.c
@@ -291,7 +291,6 @@
static const struct file_operations hostaudio_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.read = hostaudio_read,
.write = hostaudio_write,
.poll = hostaudio_poll,
@@ -304,7 +303,6 @@
static const struct file_operations hostmixer_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.unlocked_ioctl = hostmixer_ioctl_mixdev,
.open = hostmixer_open_mixdev,
.release = hostmixer_release,
diff --git a/arch/um/drivers/vector_kern.c b/arch/um/drivers/vector_kern.c
index 2d47328..c992da8 100644
--- a/arch/um/drivers/vector_kern.c
+++ b/arch/um/drivers/vector_kern.c
@@ -22,6 +22,7 @@
#include <linux/interrupt.h>
#include <linux/firmware.h>
#include <linux/fs.h>
+#include <asm/atomic.h>
#include <uapi/linux/filter.h>
#include <init.h>
#include <irq_kern.h>
@@ -102,18 +103,33 @@
static void vector_reset_stats(struct vector_private *vp)
{
+ /* We reuse the existing queue locks for stats */
+
+ /* RX stats are modified with RX head_lock held
+ * in vector_poll.
+ */
+
+ spin_lock(&vp->rx_queue->head_lock);
vp->estats.rx_queue_max = 0;
vp->estats.rx_queue_running_average = 0;
- vp->estats.tx_queue_max = 0;
- vp->estats.tx_queue_running_average = 0;
vp->estats.rx_encaps_errors = 0;
+ vp->estats.sg_ok = 0;
+ vp->estats.sg_linearized = 0;
+ spin_unlock(&vp->rx_queue->head_lock);
+
+ /* TX stats are modified with TX head_lock held
+ * in vector_send.
+ */
+
+ spin_lock(&vp->tx_queue->head_lock);
vp->estats.tx_timeout_count = 0;
vp->estats.tx_restart_queue = 0;
vp->estats.tx_kicks = 0;
vp->estats.tx_flow_control_xon = 0;
vp->estats.tx_flow_control_xoff = 0;
- vp->estats.sg_ok = 0;
- vp->estats.sg_linearized = 0;
+ vp->estats.tx_queue_max = 0;
+ vp->estats.tx_queue_running_average = 0;
+ spin_unlock(&vp->tx_queue->head_lock);
}
static int get_mtu(struct arglist *def)
@@ -232,12 +248,6 @@
static char *drop_buffer;
-/* Array backed queues optimized for bulk enqueue/dequeue and
- * 1:N (small values of N) or 1:1 enqueuer/dequeuer ratios.
- * For more details and full design rationale see
- * http://foswiki.cambridgegreys.com/Main/EatYourTailAndEnjoyIt
- */
-
/*
* Advance the mmsg queue head by n = advance. Resets the queue to
@@ -247,27 +257,13 @@
static int vector_advancehead(struct vector_queue *qi, int advance)
{
- int queue_depth;
-
qi->head =
(qi->head + advance)
% qi->max_depth;
- spin_lock(&qi->tail_lock);
- qi->queue_depth -= advance;
-
- /* we are at 0, use this to
- * reset head and tail so we can use max size vectors
- */
-
- if (qi->queue_depth == 0) {
- qi->head = 0;
- qi->tail = 0;
- }
- queue_depth = qi->queue_depth;
- spin_unlock(&qi->tail_lock);
- return queue_depth;
+ atomic_sub(advance, &qi->queue_depth);
+ return atomic_read(&qi->queue_depth);
}
/* Advance the queue tail by n = advance.
@@ -277,16 +273,11 @@
static int vector_advancetail(struct vector_queue *qi, int advance)
{
- int queue_depth;
-
qi->tail =
(qi->tail + advance)
% qi->max_depth;
- spin_lock(&qi->head_lock);
- qi->queue_depth += advance;
- queue_depth = qi->queue_depth;
- spin_unlock(&qi->head_lock);
- return queue_depth;
+ atomic_add(advance, &qi->queue_depth);
+ return atomic_read(&qi->queue_depth);
}
static int prep_msg(struct vector_private *vp,
@@ -339,9 +330,7 @@
int iov_count;
spin_lock(&qi->tail_lock);
- spin_lock(&qi->head_lock);
- queue_depth = qi->queue_depth;
- spin_unlock(&qi->head_lock);
+ queue_depth = atomic_read(&qi->queue_depth);
if (skb)
packet_len = skb->len;
@@ -360,6 +349,7 @@
mmsg_vector->msg_hdr.msg_iovlen = iov_count;
mmsg_vector->msg_hdr.msg_name = vp->fds->remote_addr;
mmsg_vector->msg_hdr.msg_namelen = vp->fds->remote_addr_size;
+ wmb(); /* Make the packet visible to the NAPI poll thread */
queue_depth = vector_advancetail(qi, 1);
} else
goto drop;
@@ -398,7 +388,7 @@
}
/*
- * Generic vector deque via sendmmsg with support for forming headers
+ * Generic vector dequeue via sendmmsg with support for forming headers
* using transport specific callback. Allows GRE, L2TPv3, RAW and
* other transports to use a common dequeue procedure in vector mode
*/
@@ -408,69 +398,64 @@
{
struct vector_private *vp = netdev_priv(qi->dev);
struct mmsghdr *send_from;
- int result = 0, send_len, queue_depth = qi->max_depth;
+ int result = 0, send_len;
if (spin_trylock(&qi->head_lock)) {
- if (spin_trylock(&qi->tail_lock)) {
- /* update queue_depth to current value */
- queue_depth = qi->queue_depth;
- spin_unlock(&qi->tail_lock);
- while (queue_depth > 0) {
- /* Calculate the start of the vector */
- send_len = queue_depth;
- send_from = qi->mmsg_vector;
- send_from += qi->head;
- /* Adjust vector size if wraparound */
- if (send_len + qi->head > qi->max_depth)
- send_len = qi->max_depth - qi->head;
- /* Try to TX as many packets as possible */
- if (send_len > 0) {
- result = uml_vector_sendmmsg(
- vp->fds->tx_fd,
- send_from,
- send_len,
- 0
- );
- vp->in_write_poll =
- (result != send_len);
- }
- /* For some of the sendmmsg error scenarios
- * we may end being unsure in the TX success
- * for all packets. It is safer to declare
- * them all TX-ed and blame the network.
+ /* update queue_depth to current value */
+ while (atomic_read(&qi->queue_depth) > 0) {
+ /* Calculate the start of the vector */
+ send_len = atomic_read(&qi->queue_depth);
+ send_from = qi->mmsg_vector;
+ send_from += qi->head;
+ /* Adjust vector size if wraparound */
+ if (send_len + qi->head > qi->max_depth)
+ send_len = qi->max_depth - qi->head;
+ /* Try to TX as many packets as possible */
+ if (send_len > 0) {
+ result = uml_vector_sendmmsg(
+ vp->fds->tx_fd,
+ send_from,
+ send_len,
+ 0
+ );
+ vp->in_write_poll =
+ (result != send_len);
+ }
+ /* For some of the sendmmsg error scenarios
+ * we may end being unsure in the TX success
+ * for all packets. It is safer to declare
+ * them all TX-ed and blame the network.
+ */
+ if (result < 0) {
+ if (net_ratelimit())
+ netdev_err(vp->dev, "sendmmsg err=%i\n",
+ result);
+ vp->in_error = true;
+ result = send_len;
+ }
+ if (result > 0) {
+ consume_vector_skbs(qi, result);
+ /* This is equivalent to an TX IRQ.
+ * Restart the upper layers to feed us
+ * more packets.
*/
- if (result < 0) {
- if (net_ratelimit())
- netdev_err(vp->dev, "sendmmsg err=%i\n",
- result);
- vp->in_error = true;
- result = send_len;
- }
- if (result > 0) {
- queue_depth =
- consume_vector_skbs(qi, result);
- /* This is equivalent to an TX IRQ.
- * Restart the upper layers to feed us
- * more packets.
- */
- if (result > vp->estats.tx_queue_max)
- vp->estats.tx_queue_max = result;
- vp->estats.tx_queue_running_average =
- (vp->estats.tx_queue_running_average + result) >> 1;
- }
- netif_wake_queue(qi->dev);
- /* if TX is busy, break out of the send loop,
- * poll write IRQ will reschedule xmit for us
- */
- if (result != send_len) {
- vp->estats.tx_restart_queue++;
- break;
- }
+ if (result > vp->estats.tx_queue_max)
+ vp->estats.tx_queue_max = result;
+ vp->estats.tx_queue_running_average =
+ (vp->estats.tx_queue_running_average + result) >> 1;
+ }
+ netif_wake_queue(qi->dev);
+ /* if TX is busy, break out of the send loop,
+ * poll write IRQ will reschedule xmit for us.
+ */
+ if (result != send_len) {
+ vp->estats.tx_restart_queue++;
+ break;
}
}
spin_unlock(&qi->head_lock);
}
- return queue_depth;
+ return atomic_read(&qi->queue_depth);
}
/* Queue destructor. Deliberately stateless so we can use
@@ -589,7 +574,7 @@
}
spin_lock_init(&result->head_lock);
spin_lock_init(&result->tail_lock);
- result->queue_depth = 0;
+ atomic_set(&result->queue_depth, 0);
result->head = 0;
result->tail = 0;
return result;
@@ -668,18 +653,27 @@
}
-/* Prepare queue for recvmmsg one-shot rx - fill with fresh sk_buffs*/
+/* Prepare queue for recvmmsg one-shot rx - fill with fresh sk_buffs */
static void prep_queue_for_rx(struct vector_queue *qi)
{
struct vector_private *vp = netdev_priv(qi->dev);
struct mmsghdr *mmsg_vector = qi->mmsg_vector;
void **skbuff_vector = qi->skbuff_vector;
- int i;
+ int i, queue_depth;
- if (qi->queue_depth == 0)
+ queue_depth = atomic_read(&qi->queue_depth);
+
+ if (queue_depth == 0)
return;
- for (i = 0; i < qi->queue_depth; i++) {
+
+ /* RX is always emptied 100% during each cycle, so we do not
+ * have to do the tail wraparound math for it.
+ */
+
+ qi->head = qi->tail = 0;
+
+ for (i = 0; i < queue_depth; i++) {
/* it is OK if allocation fails - recvmmsg with NULL data in
* iov argument still performs an RX, just drops the packet
* This allows us stop faffing around with a "drop buffer"
@@ -689,7 +683,7 @@
skbuff_vector++;
mmsg_vector++;
}
- qi->queue_depth = 0;
+ atomic_set(&qi->queue_depth, 0);
}
static struct vector_device *find_device(int n)
@@ -972,7 +966,7 @@
budget = qi->max_depth;
packet_count = uml_vector_recvmmsg(
- vp->fds->rx_fd, qi->mmsg_vector, qi->max_depth, 0);
+ vp->fds->rx_fd, qi->mmsg_vector, budget, 0);
if (packet_count < 0)
vp->in_error = true;
@@ -985,7 +979,7 @@
* many do we need to prep the next time prep_queue_for_rx() is called.
*/
- qi->queue_depth = packet_count;
+ atomic_add(packet_count, &qi->queue_depth);
for (i = 0; i < packet_count; i++) {
skb = (*skbuff_vector);
@@ -1172,6 +1166,7 @@
if ((vp->options & VECTOR_TX) != 0)
tx_enqueued = (vector_send(vp->tx_queue) > 0);
+ spin_lock(&vp->rx_queue->head_lock);
if ((vp->options & VECTOR_RX) > 0)
err = vector_mmsg_rx(vp, budget);
else {
@@ -1179,12 +1174,13 @@
if (err > 0)
err = 1;
}
+ spin_unlock(&vp->rx_queue->head_lock);
if (err > 0)
work_done += err;
if (tx_enqueued || err > 0)
napi_schedule(napi);
- if (work_done < budget)
+ if (work_done <= budget)
napi_complete_done(napi, work_done);
return work_done;
}
@@ -1225,7 +1221,7 @@
vp->rx_header_size,
MAX_IOV_SIZE
);
- vp->rx_queue->queue_depth = get_depth(vp->parsed);
+ atomic_set(&vp->rx_queue->queue_depth, get_depth(vp->parsed));
} else {
vp->header_rxbuffer = kmalloc(
vp->rx_header_size,
@@ -1467,7 +1463,17 @@
{
struct vector_private *vp = netdev_priv(dev);
+ /* Stats are modified in the dequeue portions of
+ * rx/tx which are protected by the head locks
+ * grabbing these locks here ensures they are up
+ * to date.
+ */
+
+ spin_lock(&vp->tx_queue->head_lock);
+ spin_lock(&vp->rx_queue->head_lock);
memcpy(tmp_stats, &vp->estats, sizeof(struct vector_estats));
+ spin_unlock(&vp->rx_queue->head_lock);
+ spin_unlock(&vp->tx_queue->head_lock);
}
static int vector_get_coalesce(struct net_device *netdev,
diff --git a/arch/um/drivers/vector_kern.h b/arch/um/drivers/vector_kern.h
index 806df55..4178347 100644
--- a/arch/um/drivers/vector_kern.h
+++ b/arch/um/drivers/vector_kern.h
@@ -14,6 +14,7 @@
#include <linux/ctype.h>
#include <linux/workqueue.h>
#include <linux/interrupt.h>
+#include <asm/atomic.h>
#include "vector_user.h"
@@ -44,7 +45,8 @@
struct net_device *dev;
spinlock_t head_lock;
spinlock_t tail_lock;
- int queue_depth, head, tail, max_depth, max_iov_frags;
+ atomic_t queue_depth;
+ int head, tail, max_depth, max_iov_frags;
short options;
};
diff --git a/arch/um/drivers/vector_user.c b/arch/um/drivers/vector_user.c
index b16a5e5..2ea67e6 100644
--- a/arch/um/drivers/vector_user.c
+++ b/arch/um/drivers/vector_user.c
@@ -46,6 +46,9 @@
#define TRANS_FD "fd"
#define TRANS_FD_LEN strlen(TRANS_FD)
+#define TRANS_VDE "vde"
+#define TRANS_VDE_LEN strlen(TRANS_VDE)
+
#define VNET_HDR_FAIL "could not enable vnet headers on fd %d"
#define TUN_GET_F_FAIL "tapraw: TUNGETFEATURES failed: %s"
#define L2TPV3_BIND_FAIL "l2tpv3_open : could not bind socket err=%i"
@@ -434,6 +437,84 @@
return NULL;
}
+/* enough char to store an int type */
+#define ENOUGH(type) ((CHAR_BIT * sizeof(type) - 1) / 3 + 2)
+#define ENOUGH_OCTAL(type) ((CHAR_BIT * sizeof(type) + 2) / 3)
+/* vde_plug --descr xx --port2 xx --mod2 xx --group2 xx seqpacket://NN vnl (NULL) */
+#define VDE_MAX_ARGC 12
+#define VDE_SEQPACKET_HEAD "seqpacket://"
+#define VDE_SEQPACKET_HEAD_LEN (sizeof(VDE_SEQPACKET_HEAD) - 1)
+#define VDE_DEFAULT_DESCRIPTION "UML"
+
+static struct vector_fds *user_init_vde_fds(struct arglist *ifspec)
+{
+ char seqpacketvnl[VDE_SEQPACKET_HEAD_LEN + ENOUGH(int) + 1];
+ char *argv[VDE_MAX_ARGC] = {"vde_plug"};
+ int argc = 1;
+ int rv;
+ int sv[2];
+ struct vector_fds *result = NULL;
+
+ char *vnl = uml_vector_fetch_arg(ifspec,"vnl");
+ char *descr = uml_vector_fetch_arg(ifspec,"descr");
+ char *port = uml_vector_fetch_arg(ifspec,"port");
+ char *mode = uml_vector_fetch_arg(ifspec,"mode");
+ char *group = uml_vector_fetch_arg(ifspec,"group");
+ if (descr == NULL) descr = VDE_DEFAULT_DESCRIPTION;
+
+ argv[argc++] = "--descr";
+ argv[argc++] = descr;
+ if (port != NULL) {
+ argv[argc++] = "--port2";
+ argv[argc++] = port;
+ }
+ if (mode != NULL) {
+ argv[argc++] = "--mod2";
+ argv[argc++] = mode;
+ }
+ if (group != NULL) {
+ argv[argc++] = "--group2";
+ argv[argc++] = group;
+ }
+ argv[argc++] = seqpacketvnl;
+ argv[argc++] = vnl;
+ argv[argc++] = NULL;
+
+ rv = socketpair(AF_UNIX, SOCK_SEQPACKET, 0, sv);
+ if (rv < 0) {
+ printk(UM_KERN_ERR "vde: seqpacket socketpair err %d", -errno);
+ return NULL;
+ }
+ rv = os_set_exec_close(sv[0]);
+ if (rv < 0) {
+ printk(UM_KERN_ERR "vde: seqpacket socketpair cloexec err %d", -errno);
+ goto vde_cleanup_sv;
+ }
+ snprintf(seqpacketvnl, sizeof(seqpacketvnl), VDE_SEQPACKET_HEAD "%d", sv[1]);
+
+ run_helper(NULL, NULL, argv);
+
+ close(sv[1]);
+
+ result = uml_kmalloc(sizeof(struct vector_fds), UM_GFP_KERNEL);
+ if (result == NULL) {
+ printk(UM_KERN_ERR "fd open: allocation failed");
+ goto vde_cleanup;
+ }
+
+ result->rx_fd = sv[0];
+ result->tx_fd = sv[0];
+ result->remote_addr_size = 0;
+ result->remote_addr = NULL;
+ return result;
+
+vde_cleanup_sv:
+ close(sv[1]);
+vde_cleanup:
+ close(sv[0]);
+ return NULL;
+}
+
static struct vector_fds *user_init_raw_fds(struct arglist *ifspec)
{
int rxfd = -1, txfd = -1;
@@ -673,6 +754,8 @@
return user_init_unix_fds(parsed, ID_BESS);
if (strncmp(transport, TRANS_FD, TRANS_FD_LEN) == 0)
return user_init_fd_fds(parsed);
+ if (strncmp(transport, TRANS_VDE, TRANS_VDE_LEN) == 0)
+ return user_init_vde_fds(parsed);
return NULL;
}
diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index 5bb397b..83373c9 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -359,11 +359,4 @@
return pte;
}
-/* Clear a kernel PTE and flush it from the TLB */
-#define kpte_clear_flush(ptep, vaddr) \
-do { \
- pte_clear(&init_mm, (vaddr), (ptep)); \
- __flush_tlb_one((vaddr)); \
-} while (0)
-
#endif
diff --git a/arch/um/include/asm/processor-generic.h b/arch/um/include/asm/processor-generic.h
index 5a7c052..bce4595 100644
--- a/arch/um/include/asm/processor-generic.h
+++ b/arch/um/include/asm/processor-generic.h
@@ -28,20 +28,10 @@
struct arch_thread arch;
jmp_buf switch_buf;
struct {
- int op;
- union {
- struct {
- int pid;
- } fork, exec;
- struct {
- int (*proc)(void *);
- void *arg;
- } thread;
- struct {
- void (*proc)(void *);
- void *arg;
- } cb;
- } u;
+ struct {
+ int (*proc)(void *);
+ void *arg;
+ } thread;
} request;
};
@@ -51,7 +41,7 @@
.fault_addr = NULL, \
.prev_sched = NULL, \
.arch = INIT_ARCH_THREAD, \
- .request = { 0 } \
+ .request = { } \
}
/*
diff --git a/arch/um/include/asm/sysrq.h b/arch/um/include/asm/sysrq.h
deleted file mode 100644
index 8fc8c65..0000000
--- a/arch/um/include/asm/sysrq.h
+++ /dev/null
@@ -1,8 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __UM_SYSRQ_H
-#define __UM_SYSRQ_H
-
-struct task_struct;
-extern void show_trace(struct task_struct* task, unsigned long *stack);
-
-#endif
diff --git a/arch/um/include/shared/skas/mm_id.h b/arch/um/include/shared/skas/mm_id.h
index 1e76ba4..140388c 100644
--- a/arch/um/include/shared/skas/mm_id.h
+++ b/arch/um/include/shared/skas/mm_id.h
@@ -7,10 +7,7 @@
#define __MM_ID_H
struct mm_id {
- union {
- int mm_fd;
- int pid;
- } u;
+ int pid;
unsigned long stack;
int syscall_data_len;
};
diff --git a/arch/um/include/shared/skas/skas.h b/arch/um/include/shared/skas/skas.h
index ebaa116..85c5012 100644
--- a/arch/um/include/shared/skas/skas.h
+++ b/arch/um/include/shared/skas/skas.h
@@ -10,10 +10,8 @@
extern int userspace_pid[];
-extern int user_thread(unsigned long stack, int flags);
extern void new_thread_handler(void);
extern void handle_syscall(struct uml_pt_regs *regs);
-extern long execute_syscall_skas(void *r);
extern unsigned long current_stub_stack(void);
extern struct mm_id *current_mm_id(void);
extern void current_mm_sync(void);
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index 2c15bb2..cb8b5cd 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -35,8 +35,5 @@
PT_REGS_IP(regs) = eip;
PT_REGS_SP(regs) = esp;
clear_thread_flag(TIF_SINGLESTEP);
-#ifdef SUBARCH_EXECVE1
- SUBARCH_EXECVE1(regs->regs);
-#endif
}
EXPORT_SYMBOL(start_thread);
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index f36b63f..be2856a 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -109,8 +109,8 @@
schedule_tail(current->thread.prev_sched);
current->thread.prev_sched = NULL;
- fn = current->thread.request.u.thread.proc;
- arg = current->thread.request.u.thread.arg;
+ fn = current->thread.request.thread.proc;
+ arg = current->thread.request.thread.arg;
/*
* callback returns only if the kernel thread execs a process
@@ -158,8 +158,8 @@
arch_copy_thread(¤t->thread.arch, &p->thread.arch);
} else {
get_safe_registers(p->thread.regs.regs.gp, p->thread.regs.regs.fp);
- p->thread.request.u.thread.proc = args->fn;
- p->thread.request.u.thread.arg = args->fn_arg;
+ p->thread.request.thread.proc = args->fn;
+ p->thread.request.thread.arg = args->fn_arg;
handler = new_thread_handler;
}
diff --git a/arch/um/kernel/reboot.c b/arch/um/kernel/reboot.c
index 3736bca..680bce4 100644
--- a/arch/um/kernel/reboot.c
+++ b/arch/um/kernel/reboot.c
@@ -29,7 +29,7 @@
t = find_lock_task_mm(p);
if (!t)
continue;
- pid = t->mm->context.id.u.pid;
+ pid = t->mm->context.id.pid;
task_unlock(t);
os_kill_ptraced_process(pid, 1);
}
diff --git a/arch/um/kernel/skas/mmu.c b/arch/um/kernel/skas/mmu.c
index 47f98d8..886ed5e 100644
--- a/arch/um/kernel/skas/mmu.c
+++ b/arch/um/kernel/skas/mmu.c
@@ -32,11 +32,11 @@
new_id->stack = stack;
block_signals_trace();
- new_id->u.pid = start_userspace(stack);
+ new_id->pid = start_userspace(stack);
unblock_signals_trace();
- if (new_id->u.pid < 0) {
- ret = new_id->u.pid;
+ if (new_id->pid < 0) {
+ ret = new_id->pid;
goto out_free;
}
@@ -83,12 +83,12 @@
* whole UML suddenly dying. Also, cover negative and
* 1 cases, since they shouldn't happen either.
*/
- if (mmu->id.u.pid < 2) {
+ if (mmu->id.pid < 2) {
printk(KERN_ERR "corrupt mm_context - pid = %d\n",
- mmu->id.u.pid);
+ mmu->id.pid);
return;
}
- os_kill_ptraced_process(mmu->id.u.pid, 1);
+ os_kill_ptraced_process(mmu->id.pid, 1);
free_pages(mmu->id.stack, ilog2(STUB_DATA_PAGES));
}
diff --git a/arch/um/kernel/skas/process.c b/arch/um/kernel/skas/process.c
index 5f9c1c5..68657988 100644
--- a/arch/um/kernel/skas/process.c
+++ b/arch/um/kernel/skas/process.c
@@ -39,8 +39,8 @@
init_new_thread_signals();
- init_task.thread.request.u.thread.proc = start_kernel_proc;
- init_task.thread.request.u.thread.arg = NULL;
+ init_task.thread.request.thread.proc = start_kernel_proc;
+ init_task.thread.request.thread.arg = NULL;
return start_idle_thread(task_stack_page(&init_task),
&init_task.thread.switch_buf);
}
diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c
index 9ee19e5..b09e852 100644
--- a/arch/um/kernel/skas/syscall.c
+++ b/arch/um/kernel/skas/syscall.c
@@ -12,23 +12,13 @@
#include <sysdep/syscalls.h>
#include <linux/time-internal.h>
#include <asm/unistd.h>
+#include <asm/delay.h>
void handle_syscall(struct uml_pt_regs *r)
{
struct pt_regs *regs = container_of(r, struct pt_regs, regs);
int syscall;
- /*
- * If we have infinite CPU resources, then make every syscall also a
- * preemption point, since we don't have any other preemption in this
- * case, and kernel threads would basically never run until userspace
- * went to sleep, even if said userspace interacts with the kernel in
- * various ways.
- */
- if (time_travel_mode == TT_MODE_INFCPU ||
- time_travel_mode == TT_MODE_EXTERNAL)
- schedule();
-
/* Initialize the syscall number and default return value. */
UPT_SYSCALL_NR(r) = PT_SYSCALL_NR(r->gp);
PT_REGS_SET_SYSCALL_RETURN(regs, -ENOSYS);
@@ -41,9 +31,25 @@
goto out;
syscall = UPT_SYSCALL_NR(r);
- if (syscall >= 0 && syscall < __NR_syscalls)
- PT_REGS_SET_SYSCALL_RETURN(regs,
- EXECUTE_SYSCALL(syscall, regs));
+ if (syscall >= 0 && syscall < __NR_syscalls) {
+ unsigned long ret = EXECUTE_SYSCALL(syscall, regs);
+
+ PT_REGS_SET_SYSCALL_RETURN(regs, ret);
+
+ /*
+ * An error value here can be some form of -ERESTARTSYS
+ * and then we'd just loop. Make any error syscalls take
+ * some time, so that it won't just loop if something is
+ * not ready, and hopefully other things will make some
+ * progress.
+ */
+ if (IS_ERR_VALUE(ret) &&
+ (time_travel_mode == TT_MODE_INFCPU ||
+ time_travel_mode == TT_MODE_EXTERNAL)) {
+ um_udelay(1);
+ schedule();
+ }
+ }
out:
syscall_trace_leave(regs);
diff --git a/arch/um/kernel/sysrq.c b/arch/um/kernel/sysrq.c
index 7467153..4bb8622 100644
--- a/arch/um/kernel/sysrq.c
+++ b/arch/um/kernel/sysrq.c
@@ -11,7 +11,6 @@
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
-#include <asm/sysrq.h>
#include <asm/stacktrace.h>
#include <os.h>
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 47b9f5e..29b27b9 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -839,7 +839,7 @@
if (get_current()->mm != NULL)
{
/* userspace - relay signal, results in correct userspace timers */
- os_alarm_process(get_current()->mm->context.id.u.pid);
+ os_alarm_process(get_current()->mm->context.id.pid);
}
(*timer_clockevent.event_handler)(&timer_clockevent);
diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
index 44c6fc6..548af31 100644
--- a/arch/um/kernel/tlb.c
+++ b/arch/um/kernel/tlb.c
@@ -82,16 +82,12 @@
(x ? UM_PROT_EXEC : 0));
if (pte_newpage(*pte)) {
if (pte_present(*pte)) {
- if (pte_newpage(*pte)) {
- __u64 offset;
- unsigned long phys =
- pte_val(*pte) & PAGE_MASK;
- int fd = phys_mapping(phys, &offset);
+ __u64 offset;
+ unsigned long phys = pte_val(*pte) & PAGE_MASK;
+ int fd = phys_mapping(phys, &offset);
- ret = ops->mmap(ops->mm_idp, addr,
- PAGE_SIZE, prot, fd,
- offset);
- }
+ ret = ops->mmap(ops->mm_idp, addr, PAGE_SIZE,
+ prot, fd, offset);
} else
ret = ops->unmap(ops->mm_idp, addr, PAGE_SIZE);
} else if (pte_newprot(*pte))
diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c
index 5adf8f6..f1d03cf 100644
--- a/arch/um/os-Linux/file.c
+++ b/arch/um/os-Linux/file.c
@@ -528,7 +528,8 @@
ssize_t os_rcv_fd_msg(int fd, int *fds, unsigned int n_fds,
void *data, size_t data_len)
{
- char buf[CMSG_SPACE(sizeof(*fds) * n_fds)];
+#define MAX_RCV_FDS 2
+ char buf[CMSG_SPACE(sizeof(*fds) * MAX_RCV_FDS)];
struct cmsghdr *cmsg;
struct iovec iov = {
.iov_base = data,
@@ -538,10 +539,13 @@
.msg_iov = &iov,
.msg_iovlen = 1,
.msg_control = buf,
- .msg_controllen = sizeof(buf),
+ .msg_controllen = CMSG_SPACE(sizeof(*fds) * n_fds),
};
int n;
+ if (n_fds > MAX_RCV_FDS)
+ return -EINVAL;
+
n = recvmsg(fd, &msg, 0);
if (n < 0)
return -errno;
diff --git a/arch/um/os-Linux/skas/mem.c b/arch/um/os-Linux/skas/mem.c
index c554307..9a13ac2 100644
--- a/arch/um/os-Linux/skas/mem.c
+++ b/arch/um/os-Linux/skas/mem.c
@@ -78,7 +78,7 @@
{
struct stub_data *proc_data = (void *)mm_idp->stack;
int n, i;
- int err, pid = mm_idp->u.pid;
+ int err, pid = mm_idp->pid;
n = ptrace_setregs(pid, syscall_regs);
if (n < 0) {
diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c
index f708834..b6f656b 100644
--- a/arch/um/os-Linux/skas/process.c
+++ b/arch/um/os-Linux/skas/process.c
@@ -588,5 +588,5 @@
void __switch_mm(struct mm_id *mm_idp)
{
- userspace_pid[0] = mm_idp->u.pid;
+ userspace_pid[0] = mm_idp->pid;
}
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index da8b66d..327c45c 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -433,6 +434,11 @@
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 8db2ec4..1f650b4 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -163,20 +163,18 @@
}
#define arch_atomic64_dec_return arch_atomic64_dec_return
-static __always_inline s64 arch_atomic64_add(s64 i, atomic64_t *v)
+static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v)
{
__alternative_atomic64(add, add_return,
ASM_OUTPUT2("+A" (i), "+c" (v)),
ASM_NO_INPUT_CLOBBER("memory"));
- return i;
}
-static __always_inline s64 arch_atomic64_sub(s64 i, atomic64_t *v)
+static __always_inline void arch_atomic64_sub(s64 i, atomic64_t *v)
{
__alternative_atomic64(sub, sub_return,
ASM_OUTPUT2("+A" (i), "+c" (v)),
ASM_NO_INPUT_CLOBBER("memory"));
- return i;
}
static __always_inline void arch_atomic64_inc(atomic64_t *v)
diff --git a/arch/x86/include/asm/cpuid.h b/arch/x86/include/asm/cpuid.h
index 80cc638..ca42433 100644
--- a/arch/x86/include/asm/cpuid.h
+++ b/arch/x86/include/asm/cpuid.h
@@ -179,6 +179,7 @@
case 0x1d:
case 0x1e:
case 0x1f:
+ case 0x24:
case 0x8000001d:
return true;
}
diff --git a/arch/x86/include/asm/intel-family.h b/arch/x86/include/asm/intel-family.h
index 44949f9..1a42f82 100644
--- a/arch/x86/include/asm/intel-family.h
+++ b/arch/x86/include/asm/intel-family.h
@@ -135,6 +135,8 @@
#define INTEL_LUNARLAKE_M IFM(6, 0xBD)
+#define INTEL_PANTHERLAKE_L IFM(6, 0xCC)
+
/* "Small Core" Processors (Atom/E-Core) */
#define INTEL_ATOM_BONNELL IFM(6, 0x1C) /* Diamondville, Pineview */
@@ -178,4 +180,7 @@
#define INTEL_FAM5_QUARK_X1000 0x09 /* Quark X1000 SoC */
#define INTEL_QUARK_X1000 IFM(5, 0x09) /* Quark X1000 SoC */
+/* Family 19 */
+#define INTEL_PANTHERCOVE_X IFM(19, 0x01) /* Diamond Rapids */
+
#endif /* _ASM_X86_INTEL_FAMILY_H */
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 68ad4f9..861d080 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -14,8 +14,8 @@
* be __static_call_return0.
*/
KVM_X86_OP(check_processor_compatibility)
-KVM_X86_OP(hardware_enable)
-KVM_X86_OP(hardware_disable)
+KVM_X86_OP(enable_virtualization_cpu)
+KVM_X86_OP(disable_virtualization_cpu)
KVM_X86_OP(hardware_unsetup)
KVM_X86_OP(has_emulated_msr)
KVM_X86_OP(vcpu_after_set_cpuid)
@@ -125,7 +125,7 @@
KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
KVM_X86_OP_OPTIONAL(guest_memory_reclaimed)
-KVM_X86_OP(get_msr_feature)
+KVM_X86_OP(get_feature_msr)
KVM_X86_OP(check_emulate_instruction)
KVM_X86_OP(apic_init_signal_blocked)
KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a68cb3..6d9f763 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -36,6 +36,7 @@
#include <asm/kvm_page_track.h>
#include <asm/kvm_vcpu_regs.h>
#include <asm/hyperv-tlfs.h>
+#include <asm/reboot.h>
#define __KVM_HAVE_ARCH_VCPU_DEBUGFS
@@ -211,6 +212,7 @@
EXIT_FASTPATH_NONE,
EXIT_FASTPATH_REENTER_GUEST,
EXIT_FASTPATH_EXIT_HANDLED,
+ EXIT_FASTPATH_EXIT_USERSPACE,
};
typedef enum exit_fastpath_completion fastpath_t;
@@ -280,10 +282,6 @@
#define PFERR_PRIVATE_ACCESS BIT_ULL(49)
#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
-#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
- PFERR_WRITE_MASK | \
- PFERR_PRESENT_MASK)
-
/* apic attention bits */
#define KVM_APIC_CHECK_VAPIC 0
/*
@@ -1629,8 +1627,10 @@
int (*check_processor_compatibility)(void);
- int (*hardware_enable)(void);
- void (*hardware_disable)(void);
+ int (*enable_virtualization_cpu)(void);
+ void (*disable_virtualization_cpu)(void);
+ cpu_emergency_virt_cb *emergency_disable_virtualization_cpu;
+
void (*hardware_unsetup)(void);
bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
@@ -1727,6 +1727,8 @@
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+
+ const bool x2apic_icr_is_split;
const unsigned long required_apicv_inhibits;
bool allow_apicv_in_x2apic_without_x2apic_virtualization;
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
@@ -1806,7 +1808,7 @@
int (*vm_move_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
void (*guest_memory_reclaimed)(struct kvm *kvm);
- int (*get_msr_feature)(struct kvm_msr_entry *entry);
+ int (*get_feature_msr)(u32 msr, u64 *data);
int (*check_emulate_instruction)(struct kvm_vcpu *vcpu, int emul_type,
void *insn, int insn_len);
@@ -2060,6 +2062,8 @@
void kvm_enable_efer_bits(u64);
bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
+int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data);
int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated);
int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data);
int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data);
@@ -2136,7 +2140,15 @@
void kvm_update_dr7(struct kvm_vcpu *vcpu);
-int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
+bool __kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+ bool always_retry);
+
+static inline bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu,
+ gpa_t cr2_or_gpa)
+{
+ return __kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa, false);
+}
+
void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
ulong roots_to_free);
void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu);
@@ -2254,6 +2266,7 @@
int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
int kvm_cpu_has_extint(struct kvm_vcpu *v);
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
+int kvm_cpu_get_extint(struct kvm_vcpu *v);
int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
@@ -2345,7 +2358,8 @@
KVM_X86_QUIRK_OUT_7E_INC_RIP | \
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
- KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS)
+ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
+ KVM_X86_QUIRK_SLOT_ZAP_ALL)
/*
* KVM previously used a u32 field in kvm_run to indicate the hypercall was
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7c06a4..3ae84c3 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -36,6 +36,20 @@
#define EFER_FFXSR (1<<_EFER_FFXSR)
#define EFER_AUTOIBRS (1<<_EFER_AUTOIBRS)
+/*
+ * Architectural memory types that are common to MTRRs, PAT, VMX MSRs, etc.
+ * Most MSRs support/allow only a subset of memory types, but the values
+ * themselves are common across all relevant MSRs.
+ */
+#define X86_MEMTYPE_UC 0ull /* Uncacheable, a.k.a. Strong Uncacheable */
+#define X86_MEMTYPE_WC 1ull /* Write Combining */
+/* RESERVED 2 */
+/* RESERVED 3 */
+#define X86_MEMTYPE_WT 4ull /* Write Through */
+#define X86_MEMTYPE_WP 5ull /* Write Protected */
+#define X86_MEMTYPE_WB 6ull /* Write Back */
+#define X86_MEMTYPE_UC_MINUS 7ull /* Weak Uncacheabled (PAT only) */
+
/* FRED MSRs */
#define MSR_IA32_FRED_RSP0 0x1cc /* Level 0 stack pointer */
#define MSR_IA32_FRED_RSP1 0x1cd /* Level 1 stack pointer */
@@ -365,6 +379,12 @@
#define MSR_IA32_CR_PAT 0x00000277
+#define PAT_VALUE(p0, p1, p2, p3, p4, p5, p6, p7) \
+ ((X86_MEMTYPE_ ## p0) | (X86_MEMTYPE_ ## p1 << 8) | \
+ (X86_MEMTYPE_ ## p2 << 16) | (X86_MEMTYPE_ ## p3 << 24) | \
+ (X86_MEMTYPE_ ## p4 << 32) | (X86_MEMTYPE_ ## p5 << 40) | \
+ (X86_MEMTYPE_ ## p6 << 48) | (X86_MEMTYPE_ ## p7 << 56))
+
#define MSR_IA32_DEBUGCTLMSR 0x000001d9
#define MSR_IA32_LASTBRANCHFROMIP 0x000001db
#define MSR_IA32_LASTBRANCHTOIP 0x000001dc
@@ -1159,15 +1179,6 @@
#define MSR_IA32_VMX_VMFUNC 0x00000491
#define MSR_IA32_VMX_PROCBASED_CTLS3 0x00000492
-/* VMX_BASIC bits and bitmasks */
-#define VMX_BASIC_VMCS_SIZE_SHIFT 32
-#define VMX_BASIC_TRUE_CTLS (1ULL << 55)
-#define VMX_BASIC_64 0x0001000000000000LLU
-#define VMX_BASIC_MEM_TYPE_SHIFT 50
-#define VMX_BASIC_MEM_TYPE_MASK 0x003c000000000000LLU
-#define VMX_BASIC_MEM_TYPE_WB 6LLU
-#define VMX_BASIC_INOUT 0x0040000000000000LLU
-
/* Resctrl MSRs: */
/* - Intel: */
#define MSR_IA32_L3_QOS_CFG 0xc81
@@ -1185,11 +1196,6 @@
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
-/* MSR_IA32_VMX_MISC bits */
-#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
-#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
-#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F
-
/* AMD-V MSRs */
#define MSR_VM_CR 0xc0010114
#define MSR_VM_IGNNE 0xc0010115
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 7e9db77..d1426b6 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -270,5 +270,26 @@
#include <asm/pgtable-invert.h>
-#endif /* !__ASSEMBLY__ */
+#else /* __ASSEMBLY__ */
+
+#define l4_index(x) (((x) >> 39) & 511)
+#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+
+L4_PAGE_OFFSET = l4_index(__PAGE_OFFSET_BASE_L4)
+L4_START_KERNEL = l4_index(__START_KERNEL_map)
+
+L3_START_KERNEL = pud_index(__START_KERNEL_map)
+
+#define SYM_DATA_START_PAGE_ALIGNED(name) \
+ SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
+
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT) \
+ i = 0 ; \
+ .rept (COUNT) ; \
+ .quad (START) + (i << PMD_SHIFT) + (PERM) ; \
+ i = i + 1 ; \
+ .endr
+
+#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_PGTABLE_64_H */
diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 6536873..d0ef2a6 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,8 +25,8 @@
#define MRR_BIOS 0
#define MRR_APM 1
-#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
typedef void (cpu_emergency_virt_cb)(void);
+#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback);
void cpu_emergency_disable_virtualization(void);
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index f0dea37..2b59b99 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -516,6 +516,20 @@
u32 ghcb_usage;
} __packed;
+struct vmcb {
+ struct vmcb_control_area control;
+ union {
+ struct vmcb_save_area save;
+
+ /*
+ * For SEV-ES VMs, the save area in the VMCB is used only to
+ * save/load host state. Guest state resides in a separate
+ * page, the aptly named VM Save Area (VMSA), that is encrypted
+ * with the guest's private key.
+ */
+ struct sev_es_save_area host_sev_es_save;
+ };
+} __packed;
#define EXPECTED_VMCB_SAVE_AREA_SIZE 744
#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
@@ -532,6 +546,7 @@
BUILD_BUG_ON(sizeof(struct ghcb_save_area) != EXPECTED_GHCB_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
+ BUILD_BUG_ON(offsetof(struct vmcb, save) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
/* Check offsets of reserved fields */
@@ -568,11 +583,6 @@
BUILD_BUG_RESERVED_OFFSET(ghcb, 0xff0);
}
-struct vmcb {
- struct vmcb_control_area control;
- struct vmcb_save_area save;
-} __packed;
-
#define SVM_CPUID_FUNC 0x8000000a
#define SVM_SELECTOR_S_SHIFT 4
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index d77a310..f7fd436 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -122,19 +122,17 @@
#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
-#define VMX_MISC_PREEMPTION_TIMER_RATE_MASK 0x0000001f
-#define VMX_MISC_SAVE_EFER_LMA 0x00000020
-#define VMX_MISC_ACTIVITY_HLT 0x00000040
-#define VMX_MISC_ACTIVITY_WAIT_SIPI 0x00000100
-#define VMX_MISC_ZERO_LEN_INS 0x40000000
-#define VMX_MISC_MSR_LIST_MULTIPLIER 512
-
/* VMFUNC functions */
#define VMFUNC_CONTROL_BIT(x) BIT((VMX_FEATURE_##x & 0x1f) - 28)
#define VMX_VMFUNC_EPTP_SWITCHING VMFUNC_CONTROL_BIT(EPTP_SWITCHING)
#define VMFUNC_EPTP_ENTRIES 512
+#define VMX_BASIC_32BIT_PHYS_ADDR_ONLY BIT_ULL(48)
+#define VMX_BASIC_DUAL_MONITOR_TREATMENT BIT_ULL(49)
+#define VMX_BASIC_INOUT BIT_ULL(54)
+#define VMX_BASIC_TRUE_CTLS BIT_ULL(55)
+
static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
{
return vmx_basic & GENMASK_ULL(30, 0);
@@ -145,9 +143,30 @@
return (vmx_basic & GENMASK_ULL(44, 32)) >> 32;
}
+static inline u32 vmx_basic_vmcs_mem_type(u64 vmx_basic)
+{
+ return (vmx_basic & GENMASK_ULL(53, 50)) >> 50;
+}
+
+static inline u64 vmx_basic_encode_vmcs_info(u32 revision, u16 size, u8 memtype)
+{
+ return revision | ((u64)size << 32) | ((u64)memtype << 50);
+}
+
+#define VMX_MISC_SAVE_EFER_LMA BIT_ULL(5)
+#define VMX_MISC_ACTIVITY_HLT BIT_ULL(6)
+#define VMX_MISC_ACTIVITY_SHUTDOWN BIT_ULL(7)
+#define VMX_MISC_ACTIVITY_WAIT_SIPI BIT_ULL(8)
+#define VMX_MISC_INTEL_PT BIT_ULL(14)
+#define VMX_MISC_RDMSR_IN_SMM BIT_ULL(15)
+#define VMX_MISC_VMXOFF_BLOCK_SMI BIT_ULL(28)
+#define VMX_MISC_VMWRITE_SHADOW_RO_FIELDS BIT_ULL(29)
+#define VMX_MISC_ZERO_LEN_INS BIT_ULL(30)
+#define VMX_MISC_MSR_LIST_MULTIPLIER 512
+
static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc)
{
- return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+ return vmx_misc & GENMASK_ULL(4, 0);
}
static inline int vmx_misc_cr3_count(u64 vmx_misc)
@@ -508,9 +527,10 @@
#define VMX_EPTP_PWL_4 0x18ull
#define VMX_EPTP_PWL_5 0x20ull
#define VMX_EPTP_AD_ENABLE_BIT (1ull << 6)
+/* The EPTP memtype is encoded in bits 2:0, i.e. doesn't need to be shifted. */
#define VMX_EPTP_MT_MASK 0x7ull
-#define VMX_EPTP_MT_WB 0x6ull
-#define VMX_EPTP_MT_UC 0x0ull
+#define VMX_EPTP_MT_WB X86_MEMTYPE_WB
+#define VMX_EPTP_MT_UC X86_MEMTYPE_UC
#define VMX_EPT_READABLE_MASK 0x1ull
#define VMX_EPT_WRITABLE_MASK 0x2ull
#define VMX_EPT_EXECUTABLE_MASK 0x4ull
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index bf57a82..a8debbf 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -439,6 +439,7 @@
#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5)
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
+#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
#define KVM_STATE_NESTED_FORMAT_VMX 0
#define KVM_STATE_NESTED_FORMAT_SVM 1
diff --git a/arch/x86/kernel/cpu/mce/dev-mcelog.c b/arch/x86/kernel/cpu/mce/dev-mcelog.c
index a3aa019..af44fd5 100644
--- a/arch/x86/kernel/cpu/mce/dev-mcelog.c
+++ b/arch/x86/kernel/cpu/mce/dev-mcelog.c
@@ -331,7 +331,6 @@
.poll = mce_chrdev_poll,
.unlocked_ioctl = mce_chrdev_ioctl,
.compat_ioctl = compat_ptr_ioctl,
- .llseek = no_llseek,
};
static struct miscdevice mce_chrdev_device = {
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2a2fc14..989d368 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -55,6 +55,12 @@
#include "mtrr.h"
+static_assert(X86_MEMTYPE_UC == MTRR_TYPE_UNCACHABLE);
+static_assert(X86_MEMTYPE_WC == MTRR_TYPE_WRCOMB);
+static_assert(X86_MEMTYPE_WT == MTRR_TYPE_WRTHROUGH);
+static_assert(X86_MEMTYPE_WP == MTRR_TYPE_WRPROT);
+static_assert(X86_MEMTYPE_WB == MTRR_TYPE_WRBACK);
+
/* arch_phys_wc_add returns an MTRR register index plus this offset. */
#define MTRR_TO_PHYS_WC_OFFSET 1000
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index e69489d..972e6b6 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -1567,7 +1567,6 @@
static const struct file_operations pseudo_lock_dev_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.read = NULL,
.write = NULL,
.open = pseudo_lock_dev_open,
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 330922b..16752b8 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -32,13 +32,6 @@
* We are not able to switch in one step to the final KERNEL ADDRESS SPACE
* because we need identity-mapped pages.
*/
-#define l4_index(x) (((x) >> 39) & 511)
-#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
-
-L4_PAGE_OFFSET = l4_index(__PAGE_OFFSET_BASE_L4)
-L4_START_KERNEL = l4_index(__START_KERNEL_map)
-
-L3_START_KERNEL = pud_index(__START_KERNEL_map)
__HEAD
.code64
@@ -577,9 +570,6 @@
SYM_CODE_END(vc_no_ghcb)
#endif
-#define SYM_DATA_START_PAGE_ALIGNED(name) \
- SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
-
#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
/*
* Each PGD needs to be 8k long and 8k aligned. We do not
@@ -601,14 +591,6 @@
#define PTI_USER_PGD_FILL 0
#endif
-/* Automate the creation of 1 to 1 mapping pmd entries */
-#define PMDS(START, PERM, COUNT) \
- i = 0 ; \
- .rept (COUNT) ; \
- .quad (START) + (i << PMD_SHIFT) + (PERM) ; \
- i = i + 1 ; \
- .endr
-
__INITDATA
.balign 4
@@ -708,8 +690,6 @@
.endr
SYM_DATA_END(level1_fixmap_pgt)
-#undef PMDS
-
.data
.align 16
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2617be5..41786b8 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -705,7 +705,7 @@
kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
- F(AMX_COMPLEX)
+ F(AMX_COMPLEX) | F(AVX10)
);
kvm_cpu_cap_init_kvm_defined(CPUID_7_2_EDX,
@@ -721,6 +721,10 @@
SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
);
+ kvm_cpu_cap_init_kvm_defined(CPUID_24_0_EBX,
+ F(AVX10_128) | F(AVX10_256) | F(AVX10_512)
+ );
+
kvm_cpu_cap_mask(CPUID_8000_0001_ECX,
F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
@@ -949,7 +953,7 @@
switch (function) {
case 0:
/* Limited to the highest leaf implemented in KVM. */
- entry->eax = min(entry->eax, 0x1fU);
+ entry->eax = min(entry->eax, 0x24U);
break;
case 1:
cpuid_entry_override(entry, CPUID_1_EDX);
@@ -1174,6 +1178,28 @@
break;
}
break;
+ case 0x24: {
+ u8 avx10_version;
+
+ if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) {
+ entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+ break;
+ }
+
+ /*
+ * The AVX10 version is encoded in EBX[7:0]. Note, the version
+ * is guaranteed to be >=1 if AVX10 is supported. Note #2, the
+ * version needs to be captured before overriding EBX features!
+ */
+ avx10_version = min_t(u8, entry->ebx & 0xff, 1);
+ cpuid_entry_override(entry, CPUID_24_0_EBX);
+ entry->ebx |= avx10_version;
+
+ entry->eax = 0;
+ entry->ecx = 0;
+ entry->edx = 0;
+ break;
+ }
case KVM_CPUID_SIGNATURE: {
const u32 *sigptr = (const u32 *)KVM_SIGNATURE;
entry->eax = KVM_CPUID_FEATURES;
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 3d7eb11..63f66c5 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -108,7 +108,7 @@
* Read pending interrupt(from non-APIC source)
* vector and intack.
*/
-static int kvm_cpu_get_extint(struct kvm_vcpu *v)
+int kvm_cpu_get_extint(struct kvm_vcpu *v)
{
if (!kvm_cpu_has_extint(v)) {
WARN_ON(!lapic_in_kernel(v));
@@ -131,6 +131,7 @@
} else
return kvm_pic_read_irq(v->kvm); /* PIC */
}
+EXPORT_SYMBOL_GPL(kvm_cpu_get_extint);
/*
* Read pending interrupt vector and intack.
@@ -141,9 +142,12 @@
if (vector != -1)
return vector; /* PIC */
- return kvm_get_apic_interrupt(v); /* APIC */
+ vector = kvm_apic_has_interrupt(v); /* APIC */
+ if (vector != -1)
+ kvm_apic_ack_interrupt(v, vector);
+
+ return vector;
}
-EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
{
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5bb481a..2098dc6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1944,7 +1944,7 @@
u64 ns = 0;
ktime_t expire;
struct kvm_vcpu *vcpu = apic->vcpu;
- unsigned long this_tsc_khz = vcpu->arch.virtual_tsc_khz;
+ u32 this_tsc_khz = vcpu->arch.virtual_tsc_khz;
unsigned long flags;
ktime_t now;
@@ -2453,6 +2453,43 @@
}
EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
+#define X2APIC_ICR_RESERVED_BITS (GENMASK_ULL(31, 20) | GENMASK_ULL(17, 16) | BIT(13))
+
+int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
+{
+ if (data & X2APIC_ICR_RESERVED_BITS)
+ return 1;
+
+ /*
+ * The BUSY bit is reserved on both Intel and AMD in x2APIC mode, but
+ * only AMD requires it to be zero, Intel essentially just ignores the
+ * bit. And if IPI virtualization (Intel) or x2AVIC (AMD) is enabled,
+ * the CPU performs the reserved bits checks, i.e. the underlying CPU
+ * behavior will "win". Arbitrarily clear the BUSY bit, as there is no
+ * sane way to provide consistent behavior with respect to hardware.
+ */
+ data &= ~APIC_ICR_BUSY;
+
+ kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
+ if (kvm_x86_ops.x2apic_icr_is_split) {
+ kvm_lapic_set_reg(apic, APIC_ICR, data);
+ kvm_lapic_set_reg(apic, APIC_ICR2, data >> 32);
+ } else {
+ kvm_lapic_set_reg64(apic, APIC_ICR, data);
+ }
+ trace_kvm_apic_write(APIC_ICR, data);
+ return 0;
+}
+
+static u64 kvm_x2apic_icr_read(struct kvm_lapic *apic)
+{
+ if (kvm_x86_ops.x2apic_icr_is_split)
+ return (u64)kvm_lapic_get_reg(apic, APIC_ICR) |
+ (u64)kvm_lapic_get_reg(apic, APIC_ICR2) << 32;
+
+ return kvm_lapic_get_reg64(apic, APIC_ICR);
+}
+
/* emulate APIC access in a trap manner */
void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
{
@@ -2470,7 +2507,7 @@
* maybe-unecessary write, and both are in the noise anyways.
*/
if (apic_x2apic_mode(apic) && offset == APIC_ICR)
- kvm_x2apic_icr_write(apic, kvm_lapic_get_reg64(apic, APIC_ICR));
+ WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_x2apic_icr_read(apic)));
else
kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset));
}
@@ -2922,14 +2959,13 @@
}
}
-int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
+void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector)
{
- int vector = kvm_apic_has_interrupt(vcpu);
struct kvm_lapic *apic = vcpu->arch.apic;
u32 ppr;
- if (vector == -1)
- return -1;
+ if (WARN_ON_ONCE(vector < 0 || !apic))
+ return;
/*
* We get here even with APIC virtualization enabled, if doing
@@ -2957,8 +2993,8 @@
__apic_update_ppr(apic, &ppr);
}
- return vector;
}
+EXPORT_SYMBOL_GPL(kvm_apic_ack_interrupt);
static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s, bool set)
@@ -2990,18 +3026,22 @@
/*
* In x2APIC mode, the LDR is fixed and based on the id. And
- * ICR is internally a single 64-bit register, but needs to be
- * split to ICR+ICR2 in userspace for backwards compatibility.
+ * if the ICR is _not_ split, ICR is internally a single 64-bit
+ * register, but needs to be split to ICR+ICR2 in userspace for
+ * backwards compatibility.
*/
- if (set) {
+ if (set)
*ldr = kvm_apic_calc_x2apic_ldr(x2apic_id);
- icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
- (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
- __kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
- } else {
- icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
- __kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
+ if (!kvm_x86_ops.x2apic_icr_is_split) {
+ if (set) {
+ icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
+ (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
+ __kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
+ } else {
+ icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
+ __kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
+ }
}
}
@@ -3194,22 +3234,12 @@
return 0;
}
-int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
-{
- data &= ~APIC_ICR_BUSY;
-
- kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
- kvm_lapic_set_reg64(apic, APIC_ICR, data);
- trace_kvm_apic_write(APIC_ICR, data);
- return 0;
-}
-
static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
{
u32 low;
if (reg == APIC_ICR) {
- *data = kvm_lapic_get_reg64(apic, APIC_ICR);
+ *data = kvm_x2apic_icr_read(apic);
return 0;
}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 7ef8ae7..1b8ef98 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -88,15 +88,14 @@
void kvm_free_lapic(struct kvm_vcpu *vcpu);
int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
+void kvm_apic_ack_interrupt(struct kvm_vcpu *vcpu, int vector);
int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
-int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
-u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
void kvm_recalculate_apic_map(struct kvm *kvm);
void kvm_apic_set_version(struct kvm_vcpu *vcpu);
void kvm_apic_after_set_mcg_cap(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 4341e0e..9dc5dd4 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -223,8 +223,6 @@
bool kvm_mmu_may_ignore_guest_pat(void);
-int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
-
int kvm_mmu_post_init_vm(struct kvm *kvm);
void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7813d28..e52f990 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -614,32 +614,6 @@
return __get_spte_lockless(sptep);
}
-/* Returns the Accessed status of the PTE and resets it at the same time. */
-static bool mmu_spte_age(u64 *sptep)
-{
- u64 spte = mmu_spte_get_lockless(sptep);
-
- if (!is_accessed_spte(spte))
- return false;
-
- if (spte_ad_enabled(spte)) {
- clear_bit((ffs(shadow_accessed_mask) - 1),
- (unsigned long *)sptep);
- } else {
- /*
- * Capture the dirty status of the page, so that it doesn't get
- * lost when the SPTE is marked for access tracking.
- */
- if (is_writable_pte(spte))
- kvm_set_pfn_dirty(spte_to_pfn(spte));
-
- spte = mark_spte_for_access_track(spte);
- mmu_spte_update_no_track(sptep, spte);
- }
-
- return true;
-}
-
static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
{
return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
@@ -938,6 +912,7 @@
* in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
* pte_list_desc containing more mappings.
*/
+#define KVM_RMAP_MANY BIT(0)
/*
* Returns the number of pointers in the rmap chain, not counting the new one.
@@ -950,16 +925,16 @@
if (!rmap_head->val) {
rmap_head->val = (unsigned long)spte;
- } else if (!(rmap_head->val & 1)) {
+ } else if (!(rmap_head->val & KVM_RMAP_MANY)) {
desc = kvm_mmu_memory_cache_alloc(cache);
desc->sptes[0] = (u64 *)rmap_head->val;
desc->sptes[1] = spte;
desc->spte_count = 2;
desc->tail_count = 0;
- rmap_head->val = (unsigned long)desc | 1;
+ rmap_head->val = (unsigned long)desc | KVM_RMAP_MANY;
++count;
} else {
- desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY);
count = desc->tail_count + desc->spte_count;
/*
@@ -968,10 +943,10 @@
*/
if (desc->spte_count == PTE_LIST_EXT) {
desc = kvm_mmu_memory_cache_alloc(cache);
- desc->more = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ desc->more = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY);
desc->spte_count = 0;
desc->tail_count = count;
- rmap_head->val = (unsigned long)desc | 1;
+ rmap_head->val = (unsigned long)desc | KVM_RMAP_MANY;
}
desc->sptes[desc->spte_count++] = spte;
}
@@ -982,7 +957,7 @@
struct kvm_rmap_head *rmap_head,
struct pte_list_desc *desc, int i)
{
- struct pte_list_desc *head_desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ struct pte_list_desc *head_desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY);
int j = head_desc->spte_count - 1;
/*
@@ -1011,7 +986,7 @@
if (!head_desc->more)
rmap_head->val = 0;
else
- rmap_head->val = (unsigned long)head_desc->more | 1;
+ rmap_head->val = (unsigned long)head_desc->more | KVM_RMAP_MANY;
mmu_free_pte_list_desc(head_desc);
}
@@ -1024,13 +999,13 @@
if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_head->val, kvm))
return;
- if (!(rmap_head->val & 1)) {
+ if (!(rmap_head->val & KVM_RMAP_MANY)) {
if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_head->val != spte, kvm))
return;
rmap_head->val = 0;
} else {
- desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY);
while (desc) {
for (i = 0; i < desc->spte_count; ++i) {
if (desc->sptes[i] == spte) {
@@ -1063,12 +1038,12 @@
if (!rmap_head->val)
return false;
- if (!(rmap_head->val & 1)) {
+ if (!(rmap_head->val & KVM_RMAP_MANY)) {
mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val);
goto out;
}
- desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY);
for (; desc; desc = next) {
for (i = 0; i < desc->spte_count; i++)
@@ -1088,10 +1063,10 @@
if (!rmap_head->val)
return 0;
- else if (!(rmap_head->val & 1))
+ else if (!(rmap_head->val & KVM_RMAP_MANY))
return 1;
- desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY);
return desc->tail_count + desc->spte_count;
}
@@ -1153,13 +1128,13 @@
if (!rmap_head->val)
return NULL;
- if (!(rmap_head->val & 1)) {
+ if (!(rmap_head->val & KVM_RMAP_MANY)) {
iter->desc = NULL;
sptep = (u64 *)rmap_head->val;
goto out;
}
- iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+ iter->desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY);
iter->pos = 0;
sptep = iter->desc->sptes[iter->pos];
out:
@@ -1307,15 +1282,6 @@
return flush;
}
-/**
- * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
- * @kvm: kvm instance
- * @slot: slot to protect
- * @gfn_offset: start of the BITS_PER_LONG pages we care about
- * @mask: indicates which pages we should protect
- *
- * Used when we do not need to care about huge page mappings.
- */
static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
@@ -1339,16 +1305,6 @@
}
}
-/**
- * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages, or write
- * protect the page if the D-bit isn't supported.
- * @kvm: kvm instance
- * @slot: slot to clear D-bit
- * @gfn_offset: start of the BITS_PER_LONG pages we care about
- * @mask: indicates which pages we should clear D-bit
- *
- * Used for PML to re-log the dirty GPAs after userspace querying dirty_bitmap.
- */
static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
@@ -1372,24 +1328,16 @@
}
}
-/**
- * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
- * PT level pages.
- *
- * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to
- * enable dirty logging for them.
- *
- * We need to care about huge page mappings: e.g. during dirty logging we may
- * have such mappings.
- */
void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
{
/*
- * Huge pages are NOT write protected when we start dirty logging in
- * initially-all-set mode; must write protect them here so that they
- * are split to 4K on the first write.
+ * If the slot was assumed to be "initially all dirty", write-protect
+ * huge pages to ensure they are split to 4KiB on the first write (KVM
+ * dirty logs at 4KiB granularity). If eager page splitting is enabled,
+ * immediately try to split huge pages, e.g. so that vCPUs don't get
+ * saddled with the cost of splitting.
*
* The gfn_offset is guaranteed to be aligned to 64, but the base_gfn
* of memslot has no such restriction, so the range can cross two large
@@ -1411,7 +1359,16 @@
PG_LEVEL_2M);
}
- /* Now handle 4K PTEs. */
+ /*
+ * (Re)Enable dirty logging for all 4KiB SPTEs that map the GFNs in
+ * mask. If PML is enabled and the GFN doesn't need to be write-
+ * protected for other reasons, e.g. shadow paging, clear the Dirty bit.
+ * Otherwise clear the Writable bit.
+ *
+ * Note that kvm_mmu_clear_dirty_pt_masked() is called whenever PML is
+ * enabled but it chooses between clearing the Dirty bit and Writeable
+ * bit based on the context.
+ */
if (kvm_x86_ops.cpu_dirty_log_size)
kvm_mmu_clear_dirty_pt_masked(kvm, slot, gfn_offset, mask);
else
@@ -1453,18 +1410,12 @@
return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn, PG_LEVEL_4K);
}
-static bool __kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
- const struct kvm_memory_slot *slot)
+static bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ const struct kvm_memory_slot *slot)
{
return kvm_zap_all_rmap_sptes(kvm, rmap_head);
}
-static bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
- struct kvm_memory_slot *slot, gfn_t gfn, int level)
-{
- return __kvm_zap_rmap(kvm, rmap_head, slot);
-}
-
struct slot_rmap_walk_iterator {
/* input fields. */
const struct kvm_memory_slot *slot;
@@ -1513,7 +1464,7 @@
static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
{
while (++iterator->rmap <= iterator->end_rmap) {
- iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
+ iterator->gfn += KVM_PAGES_PER_HPAGE(iterator->level);
if (iterator->rmap->val)
return;
@@ -1534,23 +1485,71 @@
slot_rmap_walk_okay(_iter_); \
slot_rmap_walk_next(_iter_))
-typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
- struct kvm_memory_slot *slot, gfn_t gfn,
- int level);
+/* The return value indicates if tlb flush on all vcpus is needed. */
+typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
+ struct kvm_rmap_head *rmap_head,
+ const struct kvm_memory_slot *slot);
-static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
- struct kvm_gfn_range *range,
- rmap_handler_t handler)
+static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
+ const struct kvm_memory_slot *slot,
+ slot_rmaps_handler fn,
+ int start_level, int end_level,
+ gfn_t start_gfn, gfn_t end_gfn,
+ bool can_yield, bool flush_on_yield,
+ bool flush)
{
struct slot_rmap_walk_iterator iterator;
- bool ret = false;
- for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
- range->start, range->end - 1, &iterator)
- ret |= handler(kvm, iterator.rmap, range->slot, iterator.gfn,
- iterator.level);
+ lockdep_assert_held_write(&kvm->mmu_lock);
- return ret;
+ for_each_slot_rmap_range(slot, start_level, end_level, start_gfn,
+ end_gfn, &iterator) {
+ if (iterator.rmap)
+ flush |= fn(kvm, iterator.rmap, slot);
+
+ if (!can_yield)
+ continue;
+
+ if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
+ if (flush && flush_on_yield) {
+ kvm_flush_remote_tlbs_range(kvm, start_gfn,
+ iterator.gfn - start_gfn + 1);
+ flush = false;
+ }
+ cond_resched_rwlock_write(&kvm->mmu_lock);
+ }
+ }
+
+ return flush;
+}
+
+static __always_inline bool walk_slot_rmaps(struct kvm *kvm,
+ const struct kvm_memory_slot *slot,
+ slot_rmaps_handler fn,
+ int start_level, int end_level,
+ bool flush_on_yield)
+{
+ return __walk_slot_rmaps(kvm, slot, fn, start_level, end_level,
+ slot->base_gfn, slot->base_gfn + slot->npages - 1,
+ true, flush_on_yield, false);
+}
+
+static __always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
+ const struct kvm_memory_slot *slot,
+ slot_rmaps_handler fn,
+ bool flush_on_yield)
+{
+ return walk_slot_rmaps(kvm, slot, fn, PG_LEVEL_4K, PG_LEVEL_4K, flush_on_yield);
+}
+
+static bool __kvm_rmap_zap_gfn_range(struct kvm *kvm,
+ const struct kvm_memory_slot *slot,
+ gfn_t start, gfn_t end, bool can_yield,
+ bool flush)
+{
+ return __walk_slot_rmaps(kvm, slot, kvm_zap_rmap,
+ PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
+ start, end - 1, can_yield, true, flush);
}
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1558,7 +1557,9 @@
bool flush = false;
if (kvm_memslots_have_rmaps(kvm))
- flush = kvm_handle_gfn_range(kvm, range, kvm_zap_rmap);
+ flush = __kvm_rmap_zap_gfn_range(kvm, range->slot,
+ range->start, range->end,
+ range->may_block, flush);
if (tdp_mmu_enabled)
flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
@@ -1570,31 +1571,6 @@
return flush;
}
-static bool kvm_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
- struct kvm_memory_slot *slot, gfn_t gfn, int level)
-{
- u64 *sptep;
- struct rmap_iterator iter;
- int young = 0;
-
- for_each_rmap_spte(rmap_head, &iter, sptep)
- young |= mmu_spte_age(sptep);
-
- return young;
-}
-
-static bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
- struct kvm_memory_slot *slot, gfn_t gfn, int level)
-{
- u64 *sptep;
- struct rmap_iterator iter;
-
- for_each_rmap_spte(rmap_head, &iter, sptep)
- if (is_accessed_spte(*sptep))
- return true;
- return false;
-}
-
#define RMAP_RECYCLE_THRESHOLD 1000
static void __rmap_add(struct kvm *kvm,
@@ -1629,12 +1605,52 @@
__rmap_add(vcpu->kvm, cache, slot, spte, gfn, access);
}
+static bool kvm_rmap_age_gfn_range(struct kvm *kvm,
+ struct kvm_gfn_range *range, bool test_only)
+{
+ struct slot_rmap_walk_iterator iterator;
+ struct rmap_iterator iter;
+ bool young = false;
+ u64 *sptep;
+
+ for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
+ range->start, range->end - 1, &iterator) {
+ for_each_rmap_spte(iterator.rmap, &iter, sptep) {
+ u64 spte = *sptep;
+
+ if (!is_accessed_spte(spte))
+ continue;
+
+ if (test_only)
+ return true;
+
+ if (spte_ad_enabled(spte)) {
+ clear_bit((ffs(shadow_accessed_mask) - 1),
+ (unsigned long *)sptep);
+ } else {
+ /*
+ * Capture the dirty status of the page, so that
+ * it doesn't get lost when the SPTE is marked
+ * for access tracking.
+ */
+ if (is_writable_pte(spte))
+ kvm_set_pfn_dirty(spte_to_pfn(spte));
+
+ spte = mark_spte_for_access_track(spte);
+ mmu_spte_update_no_track(sptep, spte);
+ }
+ young = true;
+ }
+ }
+ return young;
+}
+
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
bool young = false;
if (kvm_memslots_have_rmaps(kvm))
- young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap);
+ young = kvm_rmap_age_gfn_range(kvm, range, false);
if (tdp_mmu_enabled)
young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
@@ -1647,7 +1663,7 @@
bool young = false;
if (kvm_memslots_have_rmaps(kvm))
- young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap);
+ young = kvm_rmap_age_gfn_range(kvm, range, true);
if (tdp_mmu_enabled)
young |= kvm_tdp_mmu_test_age_gfn(kvm, range);
@@ -2713,36 +2729,49 @@
write_unlock(&kvm->mmu_lock);
}
-int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
+bool __kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+ bool always_retry)
{
- struct kvm_mmu_page *sp;
+ struct kvm *kvm = vcpu->kvm;
LIST_HEAD(invalid_list);
- int r;
+ struct kvm_mmu_page *sp;
+ gpa_t gpa = cr2_or_gpa;
+ bool r = false;
- r = 0;
- write_lock(&kvm->mmu_lock);
- for_each_gfn_valid_sp_with_gptes(kvm, sp, gfn) {
- r = 1;
- kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+ /*
+ * Bail early if there aren't any write-protected shadow pages to avoid
+ * unnecessarily taking mmu_lock lock, e.g. if the gfn is write-tracked
+ * by a third party. Reading indirect_shadow_pages without holding
+ * mmu_lock is safe, as this is purely an optimization, i.e. a false
+ * positive is benign, and a false negative will simply result in KVM
+ * skipping the unprotect+retry path, which is also an optimization.
+ */
+ if (!READ_ONCE(kvm->arch.indirect_shadow_pages))
+ goto out;
+
+ if (!vcpu->arch.mmu->root_role.direct) {
+ gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
+ if (gpa == INVALID_GPA)
+ goto out;
}
+
+ write_lock(&kvm->mmu_lock);
+ for_each_gfn_valid_sp_with_gptes(kvm, sp, gpa_to_gfn(gpa))
+ kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+
+ /*
+ * Snapshot the result before zapping, as zapping will remove all list
+ * entries, i.e. checking the list later would yield a false negative.
+ */
+ r = !list_empty(&invalid_list);
kvm_mmu_commit_zap_page(kvm, &invalid_list);
write_unlock(&kvm->mmu_lock);
- return r;
-}
-
-static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
-{
- gpa_t gpa;
- int r;
-
- if (vcpu->arch.mmu->root_role.direct)
- return 0;
-
- gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
-
- r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
-
+out:
+ if (r || always_retry) {
+ vcpu->arch.last_retry_eip = kvm_rip_read(vcpu);
+ vcpu->arch.last_retry_addr = cr2_or_gpa;
+ }
return r;
}
@@ -2914,10 +2943,8 @@
trace_kvm_mmu_set_spte(level, gfn, sptep);
}
- if (wrprot) {
- if (write_fault)
- ret = RET_PF_EMULATE;
- }
+ if (wrprot && write_fault)
+ ret = RET_PF_WRITE_PROTECTED;
if (flush)
kvm_flush_remote_tlbs_gfn(vcpu->kvm, gfn, level);
@@ -4549,7 +4576,7 @@
return RET_PF_RETRY;
if (page_fault_handle_page_track(vcpu, fault))
- return RET_PF_EMULATE;
+ return RET_PF_WRITE_PROTECTED;
r = fast_page_fault(vcpu, fault);
if (r != RET_PF_INVALID)
@@ -4618,8 +4645,6 @@
if (!flags) {
trace_kvm_page_fault(vcpu, fault_address, error_code);
- if (kvm_event_needs_reinjection(vcpu))
- kvm_mmu_unprotect_page_virt(vcpu, fault_address);
r = kvm_mmu_page_fault(vcpu, fault_address, error_code, insn,
insn_len);
} else if (flags & KVM_PV_REASON_PAGE_NOT_PRESENT) {
@@ -4642,7 +4667,7 @@
int r;
if (page_fault_handle_page_track(vcpu, fault))
- return RET_PF_EMULATE;
+ return RET_PF_WRITE_PROTECTED;
r = fast_page_fault(vcpu, fault);
if (r != RET_PF_INVALID)
@@ -4719,6 +4744,7 @@
switch (r) {
case RET_PF_FIXED:
case RET_PF_SPURIOUS:
+ case RET_PF_WRITE_PROTECTED:
return 0;
case RET_PF_EMULATE:
@@ -5963,6 +5989,106 @@
write_unlock(&vcpu->kvm->mmu_lock);
}
+static bool is_write_to_guest_page_table(u64 error_code)
+{
+ const u64 mask = PFERR_GUEST_PAGE_MASK | PFERR_WRITE_MASK | PFERR_PRESENT_MASK;
+
+ return (error_code & mask) == mask;
+}
+
+static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+ u64 error_code, int *emulation_type)
+{
+ bool direct = vcpu->arch.mmu->root_role.direct;
+
+ /*
+ * Do not try to unprotect and retry if the vCPU re-faulted on the same
+ * RIP with the same address that was previously unprotected, as doing
+ * so will likely put the vCPU into an infinite. E.g. if the vCPU uses
+ * a non-page-table modifying instruction on the PDE that points to the
+ * instruction, then unprotecting the gfn will unmap the instruction's
+ * code, i.e. make it impossible for the instruction to ever complete.
+ */
+ if (vcpu->arch.last_retry_eip == kvm_rip_read(vcpu) &&
+ vcpu->arch.last_retry_addr == cr2_or_gpa)
+ return RET_PF_EMULATE;
+
+ /*
+ * Reset the unprotect+retry values that guard against infinite loops.
+ * The values will be refreshed if KVM explicitly unprotects a gfn and
+ * retries, in all other cases it's safe to retry in the future even if
+ * the next page fault happens on the same RIP+address.
+ */
+ vcpu->arch.last_retry_eip = 0;
+ vcpu->arch.last_retry_addr = 0;
+
+ /*
+ * It should be impossible to reach this point with an MMIO cache hit,
+ * as RET_PF_WRITE_PROTECTED is returned if and only if there's a valid,
+ * writable memslot, and creating a memslot should invalidate the MMIO
+ * cache by way of changing the memslot generation. WARN and disallow
+ * retry if MMIO is detected, as retrying MMIO emulation is pointless
+ * and could put the vCPU into an infinite loop because the processor
+ * will keep faulting on the non-existent MMIO address.
+ */
+ if (WARN_ON_ONCE(mmio_info_in_cache(vcpu, cr2_or_gpa, direct)))
+ return RET_PF_EMULATE;
+
+ /*
+ * Before emulating the instruction, check to see if the access was due
+ * to a read-only violation while the CPU was walking non-nested NPT
+ * page tables, i.e. for a direct MMU, for _guest_ page tables in L1.
+ * If L1 is sharing (a subset of) its page tables with L2, e.g. by
+ * having nCR3 share lower level page tables with hCR3, then when KVM
+ * (L0) write-protects the nested NPTs, i.e. npt12 entries, KVM is also
+ * unknowingly write-protecting L1's guest page tables, which KVM isn't
+ * shadowing.
+ *
+ * Because the CPU (by default) walks NPT page tables using a write
+ * access (to ensure the CPU can do A/D updates), page walks in L1 can
+ * trigger write faults for the above case even when L1 isn't modifying
+ * PTEs. As a result, KVM will unnecessarily emulate (or at least, try
+ * to emulate) an excessive number of L1 instructions; because L1's MMU
+ * isn't shadowed by KVM, there is no need to write-protect L1's gPTEs
+ * and thus no need to emulate in order to guarantee forward progress.
+ *
+ * Try to unprotect the gfn, i.e. zap any shadow pages, so that L1 can
+ * proceed without triggering emulation. If one or more shadow pages
+ * was zapped, skip emulation and resume L1 to let it natively execute
+ * the instruction. If no shadow pages were zapped, then the write-
+ * fault is due to something else entirely, i.e. KVM needs to emulate,
+ * as resuming the guest will put it into an infinite loop.
+ *
+ * Note, this code also applies to Intel CPUs, even though it is *very*
+ * unlikely that an L1 will share its page tables (IA32/PAE/paging64
+ * format) with L2's page tables (EPT format).
+ *
+ * For indirect MMUs, i.e. if KVM is shadowing the current MMU, try to
+ * unprotect the gfn and retry if an event is awaiting reinjection. If
+ * KVM emulates multiple instructions before completing event injection,
+ * the event could be delayed beyond what is architecturally allowed,
+ * e.g. KVM could inject an IRQ after the TPR has been raised.
+ */
+ if (((direct && is_write_to_guest_page_table(error_code)) ||
+ (!direct && kvm_event_needs_reinjection(vcpu))) &&
+ kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa))
+ return RET_PF_RETRY;
+
+ /*
+ * The gfn is write-protected, but if KVM detects its emulating an
+ * instruction that is unlikely to be used to modify page tables, or if
+ * emulation fails, KVM can try to unprotect the gfn and let the CPU
+ * re-execute the instruction that caused the page fault. Do not allow
+ * retrying an instruction from a nested guest as KVM is only explicitly
+ * shadowing L1's page tables, i.e. unprotecting something for L1 isn't
+ * going to magically fix whatever issue caused L2 to fail.
+ */
+ if (!is_guest_mode(vcpu))
+ *emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
+
+ return RET_PF_EMULATE;
+}
+
int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len)
{
@@ -6008,6 +6134,10 @@
if (r < 0)
return r;
+ if (r == RET_PF_WRITE_PROTECTED)
+ r = kvm_mmu_write_protect_fault(vcpu, cr2_or_gpa, error_code,
+ &emulation_type);
+
if (r == RET_PF_FIXED)
vcpu->stat.pf_fixed++;
else if (r == RET_PF_EMULATE)
@@ -6018,32 +6148,6 @@
if (r != RET_PF_EMULATE)
return 1;
- /*
- * Before emulating the instruction, check if the error code
- * was due to a RO violation while translating the guest page.
- * This can occur when using nested virtualization with nested
- * paging in both guests. If true, we simply unprotect the page
- * and resume the guest.
- */
- if (vcpu->arch.mmu->root_role.direct &&
- (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
- kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
- return 1;
- }
-
- /*
- * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we can still
- * optimistically try to just unprotect the page and let the processor
- * re-execute the instruction that caused the page fault. Do not allow
- * retrying MMIO emulation, as it's not only pointless but could also
- * cause us to enter an infinite loop because the processor will keep
- * faulting on the non-existent MMIO address. Retrying an instruction
- * from a nested guest is also pointless and dangerous as we are only
- * explicitly shadowing L1's page tables, i.e. unprotecting something
- * for L1 isn't going to magically fix whatever issue cause L2 to fail.
- */
- if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
- emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
emulate:
return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn,
insn_len);
@@ -6202,59 +6306,6 @@
}
EXPORT_SYMBOL_GPL(kvm_configure_mmu);
-/* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
- struct kvm_rmap_head *rmap_head,
- const struct kvm_memory_slot *slot);
-
-static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
- const struct kvm_memory_slot *slot,
- slot_rmaps_handler fn,
- int start_level, int end_level,
- gfn_t start_gfn, gfn_t end_gfn,
- bool flush_on_yield, bool flush)
-{
- struct slot_rmap_walk_iterator iterator;
-
- lockdep_assert_held_write(&kvm->mmu_lock);
-
- for_each_slot_rmap_range(slot, start_level, end_level, start_gfn,
- end_gfn, &iterator) {
- if (iterator.rmap)
- flush |= fn(kvm, iterator.rmap, slot);
-
- if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
- if (flush && flush_on_yield) {
- kvm_flush_remote_tlbs_range(kvm, start_gfn,
- iterator.gfn - start_gfn + 1);
- flush = false;
- }
- cond_resched_rwlock_write(&kvm->mmu_lock);
- }
- }
-
- return flush;
-}
-
-static __always_inline bool walk_slot_rmaps(struct kvm *kvm,
- const struct kvm_memory_slot *slot,
- slot_rmaps_handler fn,
- int start_level, int end_level,
- bool flush_on_yield)
-{
- return __walk_slot_rmaps(kvm, slot, fn, start_level, end_level,
- slot->base_gfn, slot->base_gfn + slot->npages - 1,
- flush_on_yield, false);
-}
-
-static __always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
- const struct kvm_memory_slot *slot,
- slot_rmaps_handler fn,
- bool flush_on_yield)
-{
- return walk_slot_rmaps(kvm, slot, fn, PG_LEVEL_4K, PG_LEVEL_4K, flush_on_yield);
-}
-
static void free_mmu_pages(struct kvm_mmu *mmu)
{
if (!tdp_enabled && mmu->pae_root)
@@ -6528,9 +6579,8 @@
if (WARN_ON_ONCE(start >= end))
continue;
- flush = __walk_slot_rmaps(kvm, memslot, __kvm_zap_rmap,
- PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
- start, end - 1, true, flush);
+ flush = __kvm_rmap_zap_gfn_range(kvm, memslot, start,
+ end, true, flush);
}
}
@@ -6818,7 +6868,7 @@
*/
for (level = KVM_MAX_HUGEPAGE_LEVEL; level > target_level; level--)
__walk_slot_rmaps(kvm, slot, shadow_mmu_try_split_huge_pages,
- level, level, start, end - 1, true, false);
+ level, level, start, end - 1, true, true, false);
}
/* Must be called with the mmu_lock held in write-mode. */
@@ -6997,10 +7047,42 @@
kvm_mmu_zap_all(kvm);
}
+/*
+ * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
+ *
+ * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
+ * case scenario we'll have unused shadow pages lying around until they
+ * are recycled due to age or when the VM is destroyed.
+ */
+static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ struct kvm_gfn_range range = {
+ .slot = slot,
+ .start = slot->base_gfn,
+ .end = slot->base_gfn + slot->npages,
+ .may_block = true,
+ };
+
+ write_lock(&kvm->mmu_lock);
+ if (kvm_unmap_gfn_range(kvm, &range))
+ kvm_flush_remote_tlbs_memslot(kvm, slot);
+
+ write_unlock(&kvm->mmu_lock);
+}
+
+static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
+{
+ return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
+ kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+}
+
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
- kvm_mmu_zap_all_fast(kvm);
+ if (kvm_memslot_flush_zap_all(kvm))
+ kvm_mmu_zap_all_fast(kvm);
+ else
+ kvm_mmu_zap_memslot_leafs(kvm, slot);
}
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 1721d97..c988278 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -258,6 +258,8 @@
* RET_PF_CONTINUE: So far, so good, keep handling the page fault.
* RET_PF_RETRY: let CPU fault again on the address.
* RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
+ * RET_PF_WRITE_PROTECTED: the gfn is write-protected, either unprotected the
+ * gfn and retry, or emulate the instruction directly.
* RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
* RET_PF_FIXED: The faulting entry has been fixed.
* RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
@@ -274,6 +276,7 @@
RET_PF_CONTINUE = 0,
RET_PF_RETRY,
RET_PF_EMULATE,
+ RET_PF_WRITE_PROTECTED,
RET_PF_INVALID,
RET_PF_FIXED,
RET_PF_SPURIOUS,
@@ -349,8 +352,6 @@
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);
-void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
-
void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 195d98b..f35a830ce 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -57,6 +57,7 @@
TRACE_DEFINE_ENUM(RET_PF_CONTINUE);
TRACE_DEFINE_ENUM(RET_PF_RETRY);
TRACE_DEFINE_ENUM(RET_PF_EMULATE);
+TRACE_DEFINE_ENUM(RET_PF_WRITE_PROTECTED);
TRACE_DEFINE_ENUM(RET_PF_INVALID);
TRACE_DEFINE_ENUM(RET_PF_FIXED);
TRACE_DEFINE_ENUM(RET_PF_SPURIOUS);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 69941ce..ae7d39f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -646,10 +646,10 @@
* really care if it changes underneath us after this point).
*/
if (FNAME(gpte_changed)(vcpu, gw, top_level))
- goto out_gpte_changed;
+ return RET_PF_RETRY;
if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
- goto out_gpte_changed;
+ return RET_PF_RETRY;
/*
* Load a new root and retry the faulting instruction in the extremely
@@ -659,7 +659,7 @@
*/
if (unlikely(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa))) {
kvm_make_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu);
- goto out_gpte_changed;
+ return RET_PF_RETRY;
}
for_each_shadow_entry(vcpu, fault->addr, it) {
@@ -674,34 +674,38 @@
sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn,
false, access);
- if (sp != ERR_PTR(-EEXIST)) {
- /*
- * We must synchronize the pagetable before linking it
- * because the guest doesn't need to flush tlb when
- * the gpte is changed from non-present to present.
- * Otherwise, the guest may use the wrong mapping.
- *
- * For PG_LEVEL_4K, kvm_mmu_get_page() has already
- * synchronized it transiently via kvm_sync_page().
- *
- * For higher level pagetable, we synchronize it via
- * the slower mmu_sync_children(). If it needs to
- * break, some progress has been made; return
- * RET_PF_RETRY and retry on the next #PF.
- * KVM_REQ_MMU_SYNC is not necessary but it
- * expedites the process.
- */
- if (sp->unsync_children &&
- mmu_sync_children(vcpu, sp, false))
- return RET_PF_RETRY;
- }
+ /*
+ * Synchronize the new page before linking it, as the CPU (KVM)
+ * is architecturally disallowed from inserting non-present
+ * entries into the TLB, i.e. the guest isn't required to flush
+ * the TLB when changing the gPTE from non-present to present.
+ *
+ * For PG_LEVEL_4K, kvm_mmu_find_shadow_page() has already
+ * synchronized the page via kvm_sync_page().
+ *
+ * For higher level pages, which cannot be unsync themselves
+ * but can have unsync children, synchronize via the slower
+ * mmu_sync_children(). If KVM needs to drop mmu_lock due to
+ * contention or to reschedule, instruct the caller to retry
+ * the #PF (mmu_sync_children() ensures forward progress will
+ * be made).
+ */
+ if (sp != ERR_PTR(-EEXIST) && sp->unsync_children &&
+ mmu_sync_children(vcpu, sp, false))
+ return RET_PF_RETRY;
/*
- * Verify that the gpte in the page we've just write
- * protected is still there.
+ * Verify that the gpte in the page, which is now either
+ * write-protected or unsync, wasn't modified between the fault
+ * and acquiring mmu_lock. This needs to be done even when
+ * reusing an existing shadow page to ensure the information
+ * gathered by the walker matches the information stored in the
+ * shadow page (which could have been modified by a different
+ * vCPU even if the page was already linked). Holding mmu_lock
+ * prevents the shadow page from changing after this point.
*/
if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))
- goto out_gpte_changed;
+ return RET_PF_RETRY;
if (sp != ERR_PTR(-EEXIST))
link_shadow_page(vcpu, it.sptep, sp);
@@ -755,9 +759,6 @@
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
return ret;
-
-out_gpte_changed:
- return RET_PF_RETRY;
}
/*
@@ -805,7 +806,7 @@
if (page_fault_handle_page_track(vcpu, fault)) {
shadow_page_table_clear_flood(vcpu, fault->addr);
- return RET_PF_EMULATE;
+ return RET_PF_WRITE_PROTECTED;
}
r = mmu_topup_memory_caches(vcpu, true);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3c55955..3b996c1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1046,10 +1046,8 @@
* protected, emulation is needed. If the emulation was skipped,
* the vCPU would have the same fault again.
*/
- if (wrprot) {
- if (fault->write)
- ret = RET_PF_EMULATE;
- }
+ if (wrprot && fault->write)
+ ret = RET_PF_WRITE_PROTECTED;
/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
if (unlikely(is_mmio_spte(vcpu->kvm, new_spte))) {
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 2f4e155..0d17d6b 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -17,6 +17,7 @@
CPUID_8000_0007_EDX,
CPUID_8000_0022_EAX,
CPUID_7_2_EDX,
+ CPUID_24_0_EBX,
NR_KVM_CPU_CAPS,
NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
@@ -46,6 +47,7 @@
#define X86_FEATURE_AVX_NE_CONVERT KVM_X86_FEATURE(CPUID_7_1_EDX, 5)
#define X86_FEATURE_AMX_COMPLEX KVM_X86_FEATURE(CPUID_7_1_EDX, 8)
#define X86_FEATURE_PREFETCHITI KVM_X86_FEATURE(CPUID_7_1_EDX, 14)
+#define X86_FEATURE_AVX10 KVM_X86_FEATURE(CPUID_7_1_EDX, 19)
/* Intel-defined sub-features, CPUID level 0x00000007:2 (EDX) */
#define X86_FEATURE_INTEL_PSFD KVM_X86_FEATURE(CPUID_7_2_EDX, 0)
@@ -55,6 +57,11 @@
#define KVM_X86_FEATURE_BHI_CTRL KVM_X86_FEATURE(CPUID_7_2_EDX, 4)
#define X86_FEATURE_MCDT_NO KVM_X86_FEATURE(CPUID_7_2_EDX, 5)
+/* Intel-defined sub-features, CPUID level 0x00000024:0 (EBX) */
+#define X86_FEATURE_AVX10_128 KVM_X86_FEATURE(CPUID_24_0_EBX, 16)
+#define X86_FEATURE_AVX10_256 KVM_X86_FEATURE(CPUID_24_0_EBX, 17)
+#define X86_FEATURE_AVX10_512 KVM_X86_FEATURE(CPUID_24_0_EBX, 18)
+
/* CPUID level 0x80000007 (EDX). */
#define KVM_X86_FEATURE_CONSTANT_TSC KVM_X86_FEATURE(CPUID_8000_0007_EDX, 8)
@@ -90,6 +97,7 @@
[CPUID_8000_0021_EAX] = {0x80000021, 0, CPUID_EAX},
[CPUID_8000_0022_EAX] = {0x80000022, 0, CPUID_EAX},
[CPUID_7_2_EDX] = { 7, 2, CPUID_EDX},
+ [CPUID_24_0_EBX] = { 0x24, 0, CPUID_EBX},
};
/*
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index 00e3c27..85241c0 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -624,17 +624,31 @@
#endif
/*
- * Give leave_smm() a chance to make ISA-specific changes to the vCPU
- * state (e.g. enter guest mode) before loading state from the SMM
- * state-save area.
+ * FIXME: When resuming L2 (a.k.a. guest mode), the transition to guest
+ * mode should happen _after_ loading state from SMRAM. However, KVM
+ * piggybacks the nested VM-Enter flows (which is wrong for many other
+ * reasons), and so nSVM/nVMX would clobber state that is loaded from
+ * SMRAM and from the VMCS/VMCB.
*/
if (kvm_x86_call(leave_smm)(vcpu, &smram))
return X86EMUL_UNHANDLEABLE;
#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
- return rsm_load_state_64(ctxt, &smram.smram64);
+ ret = rsm_load_state_64(ctxt, &smram.smram64);
else
#endif
- return rsm_load_state_32(ctxt, &smram.smram32);
+ ret = rsm_load_state_32(ctxt, &smram.smram32);
+
+ /*
+ * If RSM fails and triggers shutdown, architecturally the shutdown
+ * occurs *before* the transition to guest mode. But due to KVM's
+ * flawed handling of RSM to L2 (see above), the vCPU may already be
+ * in_guest_mode(). Force the vCPU out of guest mode before delivering
+ * the shutdown, so that L1 enters shutdown instead of seeing a VM-Exit
+ * that architecturally shouldn't be possible.
+ */
+ if (ret != X86EMUL_CONTINUE && is_guest_mode(vcpu))
+ kvm_leave_nested(vcpu);
+ return ret;
}
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6f704c1..d5314cb 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1693,8 +1693,8 @@
return -EINVAL;
ret = -ENOMEM;
- ctl = kzalloc(sizeof(*ctl), GFP_KERNEL_ACCOUNT);
- save = kzalloc(sizeof(*save), GFP_KERNEL_ACCOUNT);
+ ctl = kzalloc(sizeof(*ctl), GFP_KERNEL);
+ save = kzalloc(sizeof(*save), GFP_KERNEL);
if (!ctl || !save)
goto out_free;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5ab2c92..9df3e1e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -573,7 +573,7 @@
static __always_inline struct sev_es_save_area *sev_es_host_save_area(struct svm_cpu_data *sd)
{
- return page_address(sd->save_area) + 0x400;
+ return &sd->save_area->host_sev_es_save;
}
static inline void kvm_cpu_svm_disable(void)
@@ -592,14 +592,14 @@
}
}
-static void svm_emergency_disable(void)
+static void svm_emergency_disable_virtualization_cpu(void)
{
kvm_rebooting = true;
kvm_cpu_svm_disable();
}
-static void svm_hardware_disable(void)
+static void svm_disable_virtualization_cpu(void)
{
/* Make sure we clean up behind us */
if (tsc_scaling)
@@ -610,7 +610,7 @@
amd_pmu_disable_virt();
}
-static int svm_hardware_enable(void)
+static int svm_enable_virtualization_cpu(void)
{
struct svm_cpu_data *sd;
@@ -696,7 +696,7 @@
return;
kfree(sd->sev_vmcbs);
- __free_page(sd->save_area);
+ __free_page(__sme_pa_to_page(sd->save_area_pa));
sd->save_area_pa = 0;
sd->save_area = NULL;
}
@@ -704,23 +704,24 @@
static int svm_cpu_init(int cpu)
{
struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
+ struct page *save_area_page;
int ret = -ENOMEM;
memset(sd, 0, sizeof(struct svm_cpu_data));
- sd->save_area = snp_safe_alloc_page_node(cpu_to_node(cpu), GFP_KERNEL);
- if (!sd->save_area)
+ save_area_page = snp_safe_alloc_page_node(cpu_to_node(cpu), GFP_KERNEL);
+ if (!save_area_page)
return ret;
ret = sev_cpu_init(sd);
if (ret)
goto free_save_area;
- sd->save_area_pa = __sme_page_pa(sd->save_area);
+ sd->save_area = page_address(save_area_page);
+ sd->save_area_pa = __sme_page_pa(save_area_page);
return 0;
free_save_area:
- __free_page(sd->save_area);
- sd->save_area = NULL;
+ __free_page(save_area_page);
return ret;
}
@@ -1124,8 +1125,7 @@
for_each_possible_cpu(cpu)
svm_cpu_uninit(cpu);
- __free_pages(pfn_to_page(iopm_base >> PAGE_SHIFT),
- get_order(IOPM_SIZE));
+ __free_pages(__sme_pa_to_page(iopm_base), get_order(IOPM_SIZE));
iopm_base = 0;
}
@@ -1301,7 +1301,7 @@
if (!kvm_hlt_in_guest(vcpu->kvm))
svm_set_intercept(svm, INTERCEPT_HLT);
- control->iopm_base_pa = __sme_set(iopm_base);
+ control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
control->int_ctl = V_INTR_MASKING_MASK;
@@ -1503,7 +1503,7 @@
sev_free_vcpu(vcpu);
- __free_page(pfn_to_page(__sme_clr(svm->vmcb01.pa) >> PAGE_SHIFT));
+ __free_page(__sme_pa_to_page(svm->vmcb01.pa));
__free_pages(virt_to_page(svm->msrpm), get_order(MSRPM_SIZE));
}
@@ -1533,7 +1533,7 @@
* TSC_AUX is always virtualized for SEV-ES guests when the feature is
* available. The user return MSR support is not required in this case
* because TSC_AUX is restored on #VMEXIT from the host save area
- * (which has been initialized in svm_hardware_enable()).
+ * (which has been initialized in svm_enable_virtualization_cpu()).
*/
if (likely(tsc_aux_uret_slot >= 0) &&
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
@@ -2825,17 +2825,17 @@
return kvm_complete_insn_gp(vcpu, ret);
}
-static int svm_get_msr_feature(struct kvm_msr_entry *msr)
+static int svm_get_feature_msr(u32 msr, u64 *data)
{
- msr->data = 0;
+ *data = 0;
- switch (msr->index) {
+ switch (msr) {
case MSR_AMD64_DE_CFG:
if (cpu_feature_enabled(X86_FEATURE_LFENCE_RDTSC))
- msr->data |= MSR_AMD64_DE_CFG_LFENCE_SERIALIZE;
+ *data |= MSR_AMD64_DE_CFG_LFENCE_SERIALIZE;
break;
default:
- return KVM_MSR_RET_INVALID;
+ return KVM_MSR_RET_UNSUPPORTED;
}
return 0;
@@ -3144,7 +3144,7 @@
* feature is available. The user return MSR support is not
* required in this case because TSC_AUX is restored on #VMEXIT
* from the host save area (which has been initialized in
- * svm_hardware_enable()).
+ * svm_enable_virtualization_cpu()).
*/
if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) && sev_es_guest(vcpu->kvm))
break;
@@ -3191,18 +3191,21 @@
kvm_pr_unimpl_wrmsr(vcpu, ecx, data);
break;
case MSR_AMD64_DE_CFG: {
- struct kvm_msr_entry msr_entry;
+ u64 supported_de_cfg;
- msr_entry.index = msr->index;
- if (svm_get_msr_feature(&msr_entry))
+ if (svm_get_feature_msr(ecx, &supported_de_cfg))
return 1;
- /* Check the supported bits */
- if (data & ~msr_entry.data)
+ if (data & ~supported_de_cfg)
return 1;
- /* Don't allow the guest to change a bit, #GP */
- if (!msr->host_initiated && (data ^ msr_entry.data))
+ /*
+ * Don't let the guest change the host-programmed value. The
+ * MSR is very model specific, i.e. contains multiple bits that
+ * are completely unknown to KVM, and the one bit known to KVM
+ * is simply a reflection of hardware capabilities.
+ */
+ if (!msr->host_initiated && data != svm->msr_decfg)
return 1;
svm->msr_decfg = data;
@@ -4156,12 +4159,21 @@
static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
if (is_guest_mode(vcpu))
return EXIT_FASTPATH_NONE;
- if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_MSR &&
- to_svm(vcpu)->vmcb->control.exit_info_1)
+ switch (svm->vmcb->control.exit_code) {
+ case SVM_EXIT_MSR:
+ if (!svm->vmcb->control.exit_info_1)
+ break;
return handle_fastpath_set_msr_irqoff(vcpu);
+ case SVM_EXIT_HLT:
+ return handle_fastpath_hlt(vcpu);
+ default:
+ break;
+ }
return EXIT_FASTPATH_NONE;
}
@@ -4992,8 +5004,9 @@
.check_processor_compatibility = svm_check_processor_compat,
.hardware_unsetup = svm_hardware_unsetup,
- .hardware_enable = svm_hardware_enable,
- .hardware_disable = svm_hardware_disable,
+ .enable_virtualization_cpu = svm_enable_virtualization_cpu,
+ .disable_virtualization_cpu = svm_disable_virtualization_cpu,
+ .emergency_disable_virtualization_cpu = svm_emergency_disable_virtualization_cpu,
.has_emulated_msr = svm_has_emulated_msr,
.vcpu_create = svm_vcpu_create,
@@ -5011,7 +5024,7 @@
.vcpu_unblocking = avic_vcpu_unblocking,
.update_exception_bitmap = svm_update_exception_bitmap,
- .get_msr_feature = svm_get_msr_feature,
+ .get_feature_msr = svm_get_feature_msr,
.get_msr = svm_get_msr,
.set_msr = svm_set_msr,
.get_segment_base = svm_get_segment_base,
@@ -5062,6 +5075,8 @@
.enable_nmi_window = svm_enable_nmi_window,
.enable_irq_window = svm_enable_irq_window,
.update_cr8_intercept = svm_update_cr8_intercept,
+
+ .x2apic_icr_is_split = true,
.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
.apicv_post_state_restore = avic_apicv_post_state_restore,
@@ -5266,7 +5281,7 @@
iopm_va = page_address(iopm_pages);
memset(iopm_va, 0xff, PAGE_SIZE * (1 << order));
- iopm_base = page_to_pfn(iopm_pages) << PAGE_SHIFT;
+ iopm_base = __sme_page_pa(iopm_pages);
init_msrpm_offsets();
@@ -5425,8 +5440,6 @@
static void __svm_exit(void)
{
kvm_x86_vendor_exit();
-
- cpu_emergency_unregister_virt_callback(svm_emergency_disable);
}
static int __init svm_init(void)
@@ -5442,8 +5455,6 @@
if (r)
return r;
- cpu_emergency_register_virt_callback(svm_emergency_disable);
-
/*
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 76107c7..43fa6a1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -25,7 +25,21 @@
#include "cpuid.h"
#include "kvm_cache_regs.h"
-#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
+/*
+ * Helpers to convert to/from physical addresses for pages whose address is
+ * consumed directly by hardware. Even though it's a physical address, SVM
+ * often restricts the address to the natural width, hence 'unsigned long'
+ * instead of 'hpa_t'.
+ */
+static inline unsigned long __sme_page_pa(struct page *page)
+{
+ return __sme_set(page_to_pfn(page) << PAGE_SHIFT);
+}
+
+static inline struct page *__sme_pa_to_page(unsigned long pa)
+{
+ return pfn_to_page(__sme_clr(pa) >> PAGE_SHIFT);
+}
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2
@@ -321,7 +335,7 @@
u32 next_asid;
u32 min_asid;
- struct page *save_area;
+ struct vmcb *save_area;
unsigned long save_area_pa;
struct vmcb *current_vmcb;
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index a0c8eb3..2ed80ae 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -209,10 +209,8 @@
7: vmload %_ASM_AX
8:
-#ifdef CONFIG_MITIGATION_RETPOLINE
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
- FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
-#endif
+ FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
/* Clobbers RAX, RCX, RDX. */
RESTORE_HOST_SPEC_CTRL
@@ -348,10 +346,8 @@
2: cli
-#ifdef CONFIG_MITIGATION_RETPOLINE
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
- FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
-#endif
+ FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
/* Clobbers RAX, RCX, RDX, consumes RDI (@svm) and RSI (@spec_ctrl_intercepted). */
RESTORE_HOST_SPEC_CTRL
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 41a4533..cb65882 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -54,9 +54,7 @@
};
struct vmcs_config {
- int size;
- u32 basic_cap;
- u32 revision_id;
+ u64 basic;
u32 pin_based_exec_ctrl;
u32 cpu_based_exec_ctrl;
u32 cpu_based_2nd_exec_ctrl;
@@ -76,7 +74,7 @@
static inline bool cpu_has_vmx_basic_inout(void)
{
- return (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
+ return vmcs_config.basic & VMX_BASIC_INOUT;
}
static inline bool cpu_has_virtual_nmis(void)
@@ -225,7 +223,7 @@
static inline bool cpu_has_vmx_shadow_vmcs(void)
{
/* check if the cpu supports writing r/o exit information fields */
- if (!(vmcs_config.misc & MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
+ if (!(vmcs_config.misc & VMX_MISC_VMWRITE_SHADOW_RO_FIELDS))
return false;
return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -367,7 +365,7 @@
static inline bool cpu_has_vmx_intel_pt(void)
{
- return (vmcs_config.misc & MSR_IA32_VMX_MISC_INTEL_PT) &&
+ return (vmcs_config.misc & VMX_MISC_INTEL_PT) &&
(vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_PT_USE_GPA) &&
(vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_RTIT_CTL);
}
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 0bf35eb..7668e2f 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -23,8 +23,10 @@
.hardware_unsetup = vmx_hardware_unsetup,
- .hardware_enable = vmx_hardware_enable,
- .hardware_disable = vmx_hardware_disable,
+ .enable_virtualization_cpu = vmx_enable_virtualization_cpu,
+ .disable_virtualization_cpu = vmx_disable_virtualization_cpu,
+ .emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu,
+
.has_emulated_msr = vmx_has_emulated_msr,
.vm_size = sizeof(struct kvm_vmx),
@@ -41,7 +43,7 @@
.vcpu_put = vmx_vcpu_put,
.update_exception_bitmap = vmx_update_exception_bitmap,
- .get_msr_feature = vmx_get_msr_feature,
+ .get_feature_msr = vmx_get_feature_msr,
.get_msr = vmx_get_msr,
.set_msr = vmx_set_msr,
.get_segment_base = vmx_get_segment_base,
@@ -89,6 +91,8 @@
.enable_nmi_window = vmx_enable_nmi_window,
.enable_irq_window = vmx_enable_irq_window,
.update_cr8_intercept = vmx_update_cr8_intercept,
+
+ .x2apic_icr_is_split = false,
.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2392a7e..a8e7bc0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -981,7 +981,7 @@
__func__, i, e.index, e.reserved);
goto fail;
}
- if (kvm_set_msr(vcpu, e.index, e.value)) {
+ if (kvm_set_msr_with_filter(vcpu, e.index, e.value)) {
pr_debug_ratelimited(
"%s cannot write MSR (%u, 0x%x, 0x%llx)\n",
__func__, i, e.index, e.value);
@@ -1017,7 +1017,7 @@
}
}
- if (kvm_get_msr(vcpu, msr_index, data)) {
+ if (kvm_get_msr_with_filter(vcpu, msr_index, data)) {
pr_debug_ratelimited("%s cannot read MSR (0x%x)\n", __func__,
msr_index);
return false;
@@ -1112,9 +1112,9 @@
/*
* Emulated VMEntry does not fail here. Instead a less
* accurate value will be returned by
- * nested_vmx_get_vmexit_msr_value() using kvm_get_msr()
- * instead of reading the value from the vmcs02 VMExit
- * MSR-store area.
+ * nested_vmx_get_vmexit_msr_value() by reading KVM's
+ * internal MSR state instead of reading the value from
+ * the vmcs02 VMExit MSR-store area.
*/
pr_warn_ratelimited(
"Not enough msr entries in msr_autostore. Can't add msr %x\n",
@@ -1251,21 +1251,32 @@
static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
{
- const u64 feature_and_reserved =
- /* feature (except bit 48; see below) */
- BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
- /* reserved */
- BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
+ const u64 feature_bits = VMX_BASIC_DUAL_MONITOR_TREATMENT |
+ VMX_BASIC_INOUT |
+ VMX_BASIC_TRUE_CTLS;
+
+ const u64 reserved_bits = GENMASK_ULL(63, 56) |
+ GENMASK_ULL(47, 45) |
+ BIT_ULL(31);
+
u64 vmx_basic = vmcs_config.nested.basic;
- if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
+ BUILD_BUG_ON(feature_bits & reserved_bits);
+
+ /*
+ * Except for 32BIT_PHYS_ADDR_ONLY, which is an anti-feature bit (has
+ * inverted polarity), the incoming value must not set feature bits or
+ * reserved bits that aren't allowed/supported by KVM. Fields, i.e.
+ * multi-bit values, are explicitly checked below.
+ */
+ if (!is_bitwise_subset(vmx_basic, data, feature_bits | reserved_bits))
return -EINVAL;
/*
* KVM does not emulate a version of VMX that constrains physical
* addresses of VMX structures (e.g. VMCS) to 32-bits.
*/
- if (data & BIT_ULL(48))
+ if (data & VMX_BASIC_32BIT_PHYS_ADDR_ONLY)
return -EINVAL;
if (vmx_basic_vmcs_revision_id(vmx_basic) !=
@@ -1334,16 +1345,29 @@
static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
{
- const u64 feature_and_reserved_bits =
- /* feature */
- BIT_ULL(5) | GENMASK_ULL(8, 6) | BIT_ULL(14) | BIT_ULL(15) |
- BIT_ULL(28) | BIT_ULL(29) | BIT_ULL(30) |
- /* reserved */
- GENMASK_ULL(13, 9) | BIT_ULL(31);
+ const u64 feature_bits = VMX_MISC_SAVE_EFER_LMA |
+ VMX_MISC_ACTIVITY_HLT |
+ VMX_MISC_ACTIVITY_SHUTDOWN |
+ VMX_MISC_ACTIVITY_WAIT_SIPI |
+ VMX_MISC_INTEL_PT |
+ VMX_MISC_RDMSR_IN_SMM |
+ VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |
+ VMX_MISC_VMXOFF_BLOCK_SMI |
+ VMX_MISC_ZERO_LEN_INS;
+
+ const u64 reserved_bits = BIT_ULL(31) | GENMASK_ULL(13, 9);
+
u64 vmx_misc = vmx_control_msr(vmcs_config.nested.misc_low,
vmcs_config.nested.misc_high);
- if (!is_bitwise_subset(vmx_misc, data, feature_and_reserved_bits))
+ BUILD_BUG_ON(feature_bits & reserved_bits);
+
+ /*
+ * The incoming value must not set feature bits or reserved bits that
+ * aren't allowed/supported by KVM. Fields, i.e. multi-bit values, are
+ * explicitly checked below.
+ */
+ if (!is_bitwise_subset(vmx_misc, data, feature_bits | reserved_bits))
return -EINVAL;
if ((vmx->nested.msrs.pinbased_ctls_high &
@@ -2317,10 +2341,12 @@
/* Posted interrupts setting is only taken from vmcs12. */
vmx->nested.pi_pending = false;
- if (nested_cpu_has_posted_intr(vmcs12))
+ if (nested_cpu_has_posted_intr(vmcs12)) {
vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv;
- else
+ } else {
+ vmx->nested.posted_intr_nv = -1;
exec_control &= ~PIN_BASED_POSTED_INTR;
+ }
pin_controls_set(vmx, exec_control);
/*
@@ -2470,6 +2496,7 @@
if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) {
+
vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
vmcs_write16(GUEST_SS_SELECTOR, vmcs12->guest_ss_selector);
@@ -2507,7 +2534,7 @@
vmcs_writel(GUEST_GDTR_BASE, vmcs12->guest_gdtr_base);
vmcs_writel(GUEST_IDTR_BASE, vmcs12->guest_idtr_base);
- vmx->segment_cache.bitmask = 0;
+ vmx_segment_cache_clear(vmx);
}
if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
@@ -4284,11 +4311,52 @@
}
if (kvm_cpu_has_interrupt(vcpu) && !vmx_interrupt_blocked(vcpu)) {
+ int irq;
+
if (block_nested_events)
return -EBUSY;
if (!nested_exit_on_intr(vcpu))
goto no_vmexit;
- nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
+
+ if (!nested_exit_intr_ack_set(vcpu)) {
+ nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
+ return 0;
+ }
+
+ irq = kvm_cpu_get_extint(vcpu);
+ if (irq != -1) {
+ nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
+ INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq, 0);
+ return 0;
+ }
+
+ irq = kvm_apic_has_interrupt(vcpu);
+ if (WARN_ON_ONCE(irq < 0))
+ goto no_vmexit;
+
+ /*
+ * If the IRQ is L2's PI notification vector, process posted
+ * interrupts for L2 instead of injecting VM-Exit, as the
+ * detection/morphing architecturally occurs when the IRQ is
+ * delivered to the CPU. Note, only interrupts that are routed
+ * through the local APIC trigger posted interrupt processing,
+ * and enabling posted interrupts requires ACK-on-exit.
+ */
+ if (irq == vmx->nested.posted_intr_nv) {
+ vmx->nested.pi_pending = true;
+ kvm_apic_clear_irr(vcpu, irq);
+ goto no_vmexit;
+ }
+
+ nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
+ INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq, 0);
+
+ /*
+ * ACK the interrupt _after_ emulating VM-Exit, as the IRQ must
+ * be marked as in-service in vmcs01.GUEST_INTERRUPT_STATUS.SVI
+ * if APICv is active.
+ */
+ kvm_apic_ack_interrupt(vcpu, irq);
return 0;
}
@@ -4806,7 +4874,7 @@
goto vmabort;
}
- if (kvm_set_msr(vcpu, h.index, h.value)) {
+ if (kvm_set_msr_with_filter(vcpu, h.index, h.value)) {
pr_debug_ratelimited(
"%s WRMSR failed (%u, 0x%x, 0x%llx)\n",
__func__, j, h.index, h.value);
@@ -4969,14 +5037,6 @@
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
if (likely(!vmx->fail)) {
- if ((u16)vm_exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
- nested_exit_intr_ack_set(vcpu)) {
- int irq = kvm_cpu_get_interrupt(vcpu);
- WARN_ON(irq < 0);
- vmcs12->vm_exit_intr_info = irq |
- INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
- }
-
if (vm_exit_reason != -1)
trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
vmcs12->exit_qualification,
@@ -7051,7 +7111,7 @@
{
msrs->misc_low = (u32)vmcs_conf->misc & VMX_MISC_SAVE_EFER_LMA;
msrs->misc_low |=
- MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |
+ VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |
VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
VMX_MISC_ACTIVITY_HLT |
VMX_MISC_ACTIVITY_WAIT_SIPI;
@@ -7066,12 +7126,10 @@
* guest, and the VMCS structure we give it - not about the
* VMX support of the underlying hardware.
*/
- msrs->basic =
- VMCS12_REVISION |
- VMX_BASIC_TRUE_CTLS |
- ((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) |
- (VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT);
+ msrs->basic = vmx_basic_encode_vmcs_info(VMCS12_REVISION, VMCS12_SIZE,
+ X86_MEMTYPE_WB);
+ msrs->basic |= VMX_BASIC_TRUE_CTLS;
if (cpu_has_vmx_basic_inout())
msrs->basic |= VMX_BASIC_INOUT;
}
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index cce4e2a..2c296b6 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -39,11 +39,17 @@
static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
{
+ lockdep_assert_once(lockdep_is_held(&vcpu->mutex) ||
+ !refcount_read(&vcpu->kvm->users_count));
+
return to_vmx(vcpu)->nested.cached_vmcs12;
}
static inline struct vmcs12 *get_shadow_vmcs12(struct kvm_vcpu *vcpu)
{
+ lockdep_assert_once(lockdep_is_held(&vcpu->mutex) ||
+ !refcount_read(&vcpu->kvm->users_count));
+
return to_vmx(vcpu)->nested.cached_shadow_vmcs12;
}
@@ -109,7 +115,7 @@
static inline bool nested_cpu_has_vmwrite_any_field(struct kvm_vcpu *vcpu)
{
return to_vmx(vcpu)->nested.msrs.misc_low &
- MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS;
+ VMX_MISC_VMWRITE_SHADOW_RO_FIELDS;
}
static inline bool nested_cpu_has_zero_length_injection(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 6fef01e..a3c3d2a 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -274,7 +274,7 @@
* simultaneously set SGX_ATTR_PROVISIONKEY to bypass the check to
* enforce restriction of access to the PROVISIONKEY.
*/
- contents = (struct sgx_secs *)__get_free_page(GFP_KERNEL_ACCOUNT);
+ contents = (struct sgx_secs *)__get_free_page(GFP_KERNEL);
if (!contents)
return -ENOMEM;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 733a0c4..1a44383 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -525,10 +525,6 @@
VMX_SEGMENT_FIELD(LDTR),
};
-static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx)
-{
- vmx->segment_cache.bitmask = 0;
-}
static unsigned long host_idt_base;
@@ -755,7 +751,7 @@
return -EIO;
}
-static void vmx_emergency_disable(void)
+void vmx_emergency_disable_virtualization_cpu(void)
{
int cpu = raw_smp_processor_id();
struct loaded_vmcs *v;
@@ -1998,15 +1994,15 @@
return !(msr->data & ~valid_bits);
}
-int vmx_get_msr_feature(struct kvm_msr_entry *msr)
+int vmx_get_feature_msr(u32 msr, u64 *data)
{
- switch (msr->index) {
+ switch (msr) {
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
if (!nested)
return 1;
- return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
+ return vmx_get_vmx_msr(&vmcs_config.nested, msr, data);
default:
- return KVM_MSR_RET_INVALID;
+ return KVM_MSR_RET_UNSUPPORTED;
}
}
@@ -2605,13 +2601,13 @@
static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
struct vmx_capability *vmx_cap)
{
- u32 vmx_msr_low, vmx_msr_high;
u32 _pin_based_exec_control = 0;
u32 _cpu_based_exec_control = 0;
u32 _cpu_based_2nd_exec_control = 0;
u64 _cpu_based_3rd_exec_control = 0;
u32 _vmexit_control = 0;
u32 _vmentry_control = 0;
+ u64 basic_msr;
u64 misc_msr;
int i;
@@ -2734,29 +2730,29 @@
_vmexit_control &= ~x_ctrl;
}
- rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);
+ rdmsrl(MSR_IA32_VMX_BASIC, basic_msr);
/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
- if ((vmx_msr_high & 0x1fff) > PAGE_SIZE)
+ if (vmx_basic_vmcs_size(basic_msr) > PAGE_SIZE)
return -EIO;
#ifdef CONFIG_X86_64
- /* IA-32 SDM Vol 3B: 64-bit CPUs always have VMX_BASIC_MSR[48]==0. */
- if (vmx_msr_high & (1u<<16))
+ /*
+ * KVM expects to be able to shove all legal physical addresses into
+ * VMCS fields for 64-bit kernels, and per the SDM, "This bit is always
+ * 0 for processors that support Intel 64 architecture".
+ */
+ if (basic_msr & VMX_BASIC_32BIT_PHYS_ADDR_ONLY)
return -EIO;
#endif
/* Require Write-Back (WB) memory type for VMCS accesses. */
- if (((vmx_msr_high >> 18) & 15) != 6)
+ if (vmx_basic_vmcs_mem_type(basic_msr) != X86_MEMTYPE_WB)
return -EIO;
rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
- vmcs_conf->size = vmx_msr_high & 0x1fff;
- vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
-
- vmcs_conf->revision_id = vmx_msr_low;
-
+ vmcs_conf->basic = basic_msr;
vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
@@ -2844,7 +2840,7 @@
return -EFAULT;
}
-int vmx_hardware_enable(void)
+int vmx_enable_virtualization_cpu(void)
{
int cpu = raw_smp_processor_id();
u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
@@ -2881,7 +2877,7 @@
__loaded_vmcs_clear(v);
}
-void vmx_hardware_disable(void)
+void vmx_disable_virtualization_cpu(void)
{
vmclear_local_loaded_vmcss();
@@ -2903,13 +2899,13 @@
if (!pages)
return NULL;
vmcs = page_address(pages);
- memset(vmcs, 0, vmcs_config.size);
+ memset(vmcs, 0, vmx_basic_vmcs_size(vmcs_config.basic));
/* KVM supports Enlightened VMCS v1 only */
if (kvm_is_using_evmcs())
vmcs->hdr.revision_id = KVM_EVMCS_VERSION;
else
- vmcs->hdr.revision_id = vmcs_config.revision_id;
+ vmcs->hdr.revision_id = vmx_basic_vmcs_revision_id(vmcs_config.basic);
if (shadow)
vmcs->hdr.shadow_vmcs = 1;
@@ -3002,7 +2998,7 @@
* physical CPU.
*/
if (kvm_is_using_evmcs())
- vmcs->hdr.revision_id = vmcs_config.revision_id;
+ vmcs->hdr.revision_id = vmx_basic_vmcs_revision_id(vmcs_config.basic);
per_cpu(vmxarea, cpu) = vmcs;
}
@@ -4219,6 +4215,13 @@
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ /*
+ * DO NOT query the vCPU's vmcs12, as vmcs12 is dynamically allocated
+ * and freed, and must not be accessed outside of vcpu->mutex. The
+ * vCPU's cached PI NV is valid if and only if posted interrupts
+ * enabled in its vmcs12, i.e. checking the vector also checks that
+ * L1 has enabled posted interrupts for L2.
+ */
if (is_guest_mode(vcpu) &&
vector == vmx->nested.posted_intr_nv) {
/*
@@ -5804,8 +5807,9 @@
error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK)
? PFERR_PRESENT_MASK : 0;
- error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) != 0 ?
- PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
+ if (error_code & EPT_VIOLATION_GVA_IS_VALID)
+ error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ?
+ PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
/*
* Check that the GPA doesn't exceed physical memory limits, as that is
@@ -7265,6 +7269,8 @@
return handle_fastpath_set_msr_irqoff(vcpu);
case EXIT_REASON_PREEMPTION_TIMER:
return handle_fastpath_preemption_timer(vcpu, force_immediate_exit);
+ case EXIT_REASON_HLT:
+ return handle_fastpath_hlt(vcpu);
default:
return EXIT_FASTPATH_NONE;
}
@@ -7965,6 +7971,7 @@
kvm_cpu_cap_clear(X86_FEATURE_SGX_LC);
kvm_cpu_cap_clear(X86_FEATURE_SGX1);
kvm_cpu_cap_clear(X86_FEATURE_SGX2);
+ kvm_cpu_cap_clear(X86_FEATURE_SGX_EDECCSSA);
}
if (vmx_umip_emulated())
@@ -8515,7 +8522,7 @@
u64 use_timer_freq = 5000ULL * 1000 * 1000;
cpu_preemption_timer_multi =
- vmcs_config.misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+ vmx_misc_preemption_timer_rate(vmcs_config.misc);
if (tsc_khz)
use_timer_freq = (u64)tsc_khz * 1000;
@@ -8582,8 +8589,6 @@
{
allow_smaller_maxphyaddr = false;
- cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
-
vmx_cleanup_l1d_flush();
}
@@ -8630,8 +8635,6 @@
pi_init_cpu(cpu);
}
- cpu_emergency_register_virt_callback(vmx_emergency_disable);
-
vmx_check_vmcs12_offsets();
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 42498fa..2325f77 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -17,10 +17,6 @@
#include "run_flags.h"
#include "../mmu.h"
-#define MSR_TYPE_R 1
-#define MSR_TYPE_W 2
-#define MSR_TYPE_RW 3
-
#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
#ifdef CONFIG_X86_64
@@ -756,4 +752,9 @@
return lapic_in_kernel(vcpu) && enable_ipiv;
}
+static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx)
+{
+ vmx->segment_cache.bitmask = 0;
+}
+
#endif /* __KVM_X86_VMX_H */
diff --git a/arch/x86/kvm/vmx/vmx_onhyperv.h b/arch/x86/kvm/vmx/vmx_onhyperv.h
index eb48153..bba24ed 100644
--- a/arch/x86/kvm/vmx/vmx_onhyperv.h
+++ b/arch/x86/kvm/vmx/vmx_onhyperv.h
@@ -104,6 +104,14 @@
struct hv_vp_assist_page *vp_ap =
hv_get_vp_assist_page(smp_processor_id());
+ /*
+ * When enabling eVMCS, KVM verifies that every CPU has a valid hv_vp_assist_page()
+ * and aborts enabling the feature otherwise. CPU onlining path is also checked in
+ * vmx_hardware_enable().
+ */
+ if (KVM_BUG_ON(!vp_ap, kvm_get_running_vcpu()->kvm))
+ return;
+
if (current_evmcs->hv_enlightenments_control.nested_flush_hypercall)
vp_ap->nested_control.features.directhypercall = 1;
vp_ap->current_nested_vmcs = phys_addr;
diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
index 8060e5f..93e020d 100644
--- a/arch/x86/kvm/vmx/vmx_ops.h
+++ b/arch/x86/kvm/vmx/vmx_ops.h
@@ -47,7 +47,7 @@
BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 0x2001,
"16-bit accessor invalid for 64-bit high field");
BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x4000,
- "16-bit accessor invalid for 32-bit high field");
+ "16-bit accessor invalid for 32-bit field");
BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x6000,
"16-bit accessor invalid for natural width field");
}
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index ce3221c..a55981c 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -13,8 +13,9 @@
void vmx_hardware_unsetup(void);
int vmx_check_processor_compat(void);
-int vmx_hardware_enable(void);
-void vmx_hardware_disable(void);
+int vmx_enable_virtualization_cpu(void);
+void vmx_disable_virtualization_cpu(void);
+void vmx_emergency_disable_virtualization_cpu(void);
int vmx_vm_init(struct kvm *kvm);
void vmx_vm_destroy(struct kvm *kvm);
int vmx_vcpu_precreate(struct kvm *kvm);
@@ -56,7 +57,7 @@
void vmx_msr_filter_changed(struct kvm_vcpu *vcpu);
void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
-int vmx_get_msr_feature(struct kvm_msr_entry *msr);
+int vmx_get_feature_msr(u32 msr, u64 *data);
int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c983c8e..83fe0a7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -305,24 +305,237 @@
static struct kmem_cache *x86_emulator_cache;
/*
- * When called, it means the previous get/set msr reached an invalid msr.
- * Return true if we want to ignore/silent this failed msr access.
+ * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
+ * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS,
+ * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that
+ * require host support, i.e. should be probed via RDMSR. emulated_msrs holds
+ * MSRs that KVM emulates without strictly requiring host support.
+ * msr_based_features holds MSRs that enumerate features, i.e. are effectively
+ * CPUID leafs. Note, msr_based_features isn't mutually exclusive with
+ * msrs_to_save and emulated_msrs.
*/
-static bool kvm_msr_ignored_check(u32 msr, u64 data, bool write)
-{
- const char *op = write ? "wrmsr" : "rdmsr";
- if (ignore_msrs) {
- if (report_ignored_msrs)
- kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n",
- op, msr, data);
- /* Mask the error */
+static const u32 msrs_to_save_base[] = {
+ MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
+ MSR_STAR,
+#ifdef CONFIG_X86_64
+ MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
+#endif
+ MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
+ MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+ MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
+ MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
+ MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
+ MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
+ MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
+ MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
+ MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+ MSR_IA32_UMWAIT_CONTROL,
+
+ MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+};
+
+static const u32 msrs_to_save_pmu[] = {
+ MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
+ MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
+ MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
+ MSR_CORE_PERF_GLOBAL_CTRL,
+ MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
+
+ /* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
+ MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
+ MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
+ MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
+ MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
+ MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
+
+ MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
+ MSR_K7_PERFCTR0, MSR_K7_PERFCTR1, MSR_K7_PERFCTR2, MSR_K7_PERFCTR3,
+
+ /* This part of MSRs should match KVM_MAX_NR_AMD_GP_COUNTERS. */
+ MSR_F15H_PERF_CTL0, MSR_F15H_PERF_CTL1, MSR_F15H_PERF_CTL2,
+ MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
+ MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
+ MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
+
+ MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+};
+
+static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
+ ARRAY_SIZE(msrs_to_save_pmu)];
+static unsigned num_msrs_to_save;
+
+static const u32 emulated_msrs_all[] = {
+ MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+ MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
+
+#ifdef CONFIG_KVM_HYPERV
+ HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+ HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
+ HV_X64_MSR_TSC_FREQUENCY, HV_X64_MSR_APIC_FREQUENCY,
+ HV_X64_MSR_CRASH_P0, HV_X64_MSR_CRASH_P1, HV_X64_MSR_CRASH_P2,
+ HV_X64_MSR_CRASH_P3, HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL,
+ HV_X64_MSR_RESET,
+ HV_X64_MSR_VP_INDEX,
+ HV_X64_MSR_VP_RUNTIME,
+ HV_X64_MSR_SCONTROL,
+ HV_X64_MSR_STIMER0_CONFIG,
+ HV_X64_MSR_VP_ASSIST_PAGE,
+ HV_X64_MSR_REENLIGHTENMENT_CONTROL, HV_X64_MSR_TSC_EMULATION_CONTROL,
+ HV_X64_MSR_TSC_EMULATION_STATUS, HV_X64_MSR_TSC_INVARIANT_CONTROL,
+ HV_X64_MSR_SYNDBG_OPTIONS,
+ HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS,
+ HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER,
+ HV_X64_MSR_SYNDBG_PENDING_BUFFER,
+#endif
+
+ MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+ MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
+
+ MSR_IA32_TSC_ADJUST,
+ MSR_IA32_TSC_DEADLINE,
+ MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_PERF_CAPABILITIES,
+ MSR_IA32_MISC_ENABLE,
+ MSR_IA32_MCG_STATUS,
+ MSR_IA32_MCG_CTL,
+ MSR_IA32_MCG_EXT_CTL,
+ MSR_IA32_SMBASE,
+ MSR_SMI_COUNT,
+ MSR_PLATFORM_INFO,
+ MSR_MISC_FEATURES_ENABLES,
+ MSR_AMD64_VIRT_SPEC_CTRL,
+ MSR_AMD64_TSC_RATIO,
+ MSR_IA32_POWER_CTL,
+ MSR_IA32_UCODE_REV,
+
+ /*
+ * KVM always supports the "true" VMX control MSRs, even if the host
+ * does not. The VMX MSRs as a whole are considered "emulated" as KVM
+ * doesn't strictly require them to exist in the host (ignoring that
+ * KVM would refuse to load in the first place if the core set of MSRs
+ * aren't supported).
+ */
+ MSR_IA32_VMX_BASIC,
+ MSR_IA32_VMX_TRUE_PINBASED_CTLS,
+ MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
+ MSR_IA32_VMX_TRUE_EXIT_CTLS,
+ MSR_IA32_VMX_TRUE_ENTRY_CTLS,
+ MSR_IA32_VMX_MISC,
+ MSR_IA32_VMX_CR0_FIXED0,
+ MSR_IA32_VMX_CR4_FIXED0,
+ MSR_IA32_VMX_VMCS_ENUM,
+ MSR_IA32_VMX_PROCBASED_CTLS2,
+ MSR_IA32_VMX_EPT_VPID_CAP,
+ MSR_IA32_VMX_VMFUNC,
+
+ MSR_K7_HWCR,
+ MSR_KVM_POLL_CONTROL,
+};
+
+static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
+static unsigned num_emulated_msrs;
+
+/*
+ * List of MSRs that control the existence of MSR-based features, i.e. MSRs
+ * that are effectively CPUID leafs. VMX MSRs are also included in the set of
+ * feature MSRs, but are handled separately to allow expedited lookups.
+ */
+static const u32 msr_based_features_all_except_vmx[] = {
+ MSR_AMD64_DE_CFG,
+ MSR_IA32_UCODE_REV,
+ MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_PERF_CAPABILITIES,
+};
+
+static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
+ (KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
+static unsigned int num_msr_based_features;
+
+/*
+ * All feature MSRs except uCode revID, which tracks the currently loaded uCode
+ * patch, are immutable once the vCPU model is defined.
+ */
+static bool kvm_is_immutable_feature_msr(u32 msr)
+{
+ int i;
+
+ if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
return true;
- } else {
- kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
- op, msr, data);
- return false;
+
+ for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
+ if (msr == msr_based_features_all_except_vmx[i])
+ return msr != MSR_IA32_UCODE_REV;
}
+
+ return false;
+}
+
+static bool kvm_is_advertised_msr(u32 msr_index)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_msrs_to_save; i++) {
+ if (msrs_to_save[i] == msr_index)
+ return true;
+ }
+
+ for (i = 0; i < num_emulated_msrs; i++) {
+ if (emulated_msrs[i] == msr_index)
+ return true;
+ }
+
+ return false;
+}
+
+typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated);
+
+static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
+ u64 *data, bool host_initiated,
+ enum kvm_msr_access rw,
+ msr_access_t msr_access_fn)
+{
+ const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
+ int ret;
+
+ BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
+
+ /*
+ * Zero the data on read failures to avoid leaking stack data to the
+ * guest and/or userspace, e.g. if the failure is ignored below.
+ */
+ ret = msr_access_fn(vcpu, msr, data, host_initiated);
+ if (ret && rw == MSR_TYPE_R)
+ *data = 0;
+
+ if (ret != KVM_MSR_RET_UNSUPPORTED)
+ return ret;
+
+ /*
+ * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
+ * advertises to userspace, even if an MSR isn't fully supported.
+ * Simply check that @data is '0', which covers both the write '0' case
+ * and all reads (in which case @data is zeroed on failure; see above).
+ */
+ if (host_initiated && !*data && kvm_is_advertised_msr(msr))
+ return 0;
+
+ if (!ignore_msrs) {
+ kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
+ op, msr, *data);
+ return ret;
+ }
+
+ if (report_ignored_msrs)
+ kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
+
+ return 0;
}
static struct kmem_cache *kvm_alloc_emulator_cache(void)
@@ -355,7 +568,7 @@
/*
* Disabling irqs at this point since the following code could be
- * interrupted and executed through kvm_arch_hardware_disable()
+ * interrupted and executed through kvm_arch_disable_virtualization_cpu()
*/
local_irq_save(flags);
if (msrs->registered) {
@@ -413,8 +626,7 @@
static void kvm_user_return_msr_cpu_online(void)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
u64 value;
int i;
@@ -621,12 +833,6 @@
ex->payload = payload;
}
-/* Forcibly leave the nested mode in cases like a vCPU reset */
-static void kvm_leave_nested(struct kvm_vcpu *vcpu)
-{
- kvm_x86_ops.nested_ops->leave_nested(vcpu);
-}
-
static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
unsigned nr, bool has_error, u32 error_code,
bool has_payload, unsigned long payload, bool reinject)
@@ -1412,178 +1618,6 @@
EXPORT_SYMBOL_GPL(kvm_emulate_rdpmc);
/*
- * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
- * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS,
- * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that
- * require host support, i.e. should be probed via RDMSR. emulated_msrs holds
- * MSRs that KVM emulates without strictly requiring host support.
- * msr_based_features holds MSRs that enumerate features, i.e. are effectively
- * CPUID leafs. Note, msr_based_features isn't mutually exclusive with
- * msrs_to_save and emulated_msrs.
- */
-
-static const u32 msrs_to_save_base[] = {
- MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
- MSR_STAR,
-#ifdef CONFIG_X86_64
- MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
-#endif
- MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
- MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
- MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
- MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
- MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
- MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
- MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
- MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
- MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
- MSR_IA32_UMWAIT_CONTROL,
-
- MSR_IA32_XFD, MSR_IA32_XFD_ERR,
-};
-
-static const u32 msrs_to_save_pmu[] = {
- MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
- MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
- MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
- MSR_CORE_PERF_GLOBAL_CTRL,
- MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
-
- /* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
- MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
- MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
- MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
- MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
- MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
- MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
- MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
- MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
-
- MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
- MSR_K7_PERFCTR0, MSR_K7_PERFCTR1, MSR_K7_PERFCTR2, MSR_K7_PERFCTR3,
-
- /* This part of MSRs should match KVM_MAX_NR_AMD_GP_COUNTERS. */
- MSR_F15H_PERF_CTL0, MSR_F15H_PERF_CTL1, MSR_F15H_PERF_CTL2,
- MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
- MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
- MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
-
- MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
-};
-
-static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
- ARRAY_SIZE(msrs_to_save_pmu)];
-static unsigned num_msrs_to_save;
-
-static const u32 emulated_msrs_all[] = {
- MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
- MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-
-#ifdef CONFIG_KVM_HYPERV
- HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
- HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
- HV_X64_MSR_TSC_FREQUENCY, HV_X64_MSR_APIC_FREQUENCY,
- HV_X64_MSR_CRASH_P0, HV_X64_MSR_CRASH_P1, HV_X64_MSR_CRASH_P2,
- HV_X64_MSR_CRASH_P3, HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL,
- HV_X64_MSR_RESET,
- HV_X64_MSR_VP_INDEX,
- HV_X64_MSR_VP_RUNTIME,
- HV_X64_MSR_SCONTROL,
- HV_X64_MSR_STIMER0_CONFIG,
- HV_X64_MSR_VP_ASSIST_PAGE,
- HV_X64_MSR_REENLIGHTENMENT_CONTROL, HV_X64_MSR_TSC_EMULATION_CONTROL,
- HV_X64_MSR_TSC_EMULATION_STATUS, HV_X64_MSR_TSC_INVARIANT_CONTROL,
- HV_X64_MSR_SYNDBG_OPTIONS,
- HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS,
- HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER,
- HV_X64_MSR_SYNDBG_PENDING_BUFFER,
-#endif
-
- MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
- MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
-
- MSR_IA32_TSC_ADJUST,
- MSR_IA32_TSC_DEADLINE,
- MSR_IA32_ARCH_CAPABILITIES,
- MSR_IA32_PERF_CAPABILITIES,
- MSR_IA32_MISC_ENABLE,
- MSR_IA32_MCG_STATUS,
- MSR_IA32_MCG_CTL,
- MSR_IA32_MCG_EXT_CTL,
- MSR_IA32_SMBASE,
- MSR_SMI_COUNT,
- MSR_PLATFORM_INFO,
- MSR_MISC_FEATURES_ENABLES,
- MSR_AMD64_VIRT_SPEC_CTRL,
- MSR_AMD64_TSC_RATIO,
- MSR_IA32_POWER_CTL,
- MSR_IA32_UCODE_REV,
-
- /*
- * KVM always supports the "true" VMX control MSRs, even if the host
- * does not. The VMX MSRs as a whole are considered "emulated" as KVM
- * doesn't strictly require them to exist in the host (ignoring that
- * KVM would refuse to load in the first place if the core set of MSRs
- * aren't supported).
- */
- MSR_IA32_VMX_BASIC,
- MSR_IA32_VMX_TRUE_PINBASED_CTLS,
- MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
- MSR_IA32_VMX_TRUE_EXIT_CTLS,
- MSR_IA32_VMX_TRUE_ENTRY_CTLS,
- MSR_IA32_VMX_MISC,
- MSR_IA32_VMX_CR0_FIXED0,
- MSR_IA32_VMX_CR4_FIXED0,
- MSR_IA32_VMX_VMCS_ENUM,
- MSR_IA32_VMX_PROCBASED_CTLS2,
- MSR_IA32_VMX_EPT_VPID_CAP,
- MSR_IA32_VMX_VMFUNC,
-
- MSR_K7_HWCR,
- MSR_KVM_POLL_CONTROL,
-};
-
-static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
-static unsigned num_emulated_msrs;
-
-/*
- * List of MSRs that control the existence of MSR-based features, i.e. MSRs
- * that are effectively CPUID leafs. VMX MSRs are also included in the set of
- * feature MSRs, but are handled separately to allow expedited lookups.
- */
-static const u32 msr_based_features_all_except_vmx[] = {
- MSR_AMD64_DE_CFG,
- MSR_IA32_UCODE_REV,
- MSR_IA32_ARCH_CAPABILITIES,
- MSR_IA32_PERF_CAPABILITIES,
-};
-
-static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
- (KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
-static unsigned int num_msr_based_features;
-
-/*
- * All feature MSRs except uCode revID, which tracks the currently loaded uCode
- * patch, are immutable once the vCPU model is defined.
- */
-static bool kvm_is_immutable_feature_msr(u32 msr)
-{
- int i;
-
- if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
- return true;
-
- for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
- if (msr == msr_based_features_all_except_vmx[i])
- return msr != MSR_IA32_UCODE_REV;
- }
-
- return false;
-}
-
-/*
* Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM
* does not yet virtualize. These include:
* 10 - MISC_PACKAGE_CTRLS
@@ -1660,40 +1694,31 @@
return data;
}
-static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
+static int kvm_get_feature_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
{
- switch (msr->index) {
+ WARN_ON_ONCE(!host_initiated);
+
+ switch (index) {
case MSR_IA32_ARCH_CAPABILITIES:
- msr->data = kvm_get_arch_capabilities();
+ *data = kvm_get_arch_capabilities();
break;
case MSR_IA32_PERF_CAPABILITIES:
- msr->data = kvm_caps.supported_perf_cap;
+ *data = kvm_caps.supported_perf_cap;
break;
case MSR_IA32_UCODE_REV:
- rdmsrl_safe(msr->index, &msr->data);
+ rdmsrl_safe(index, data);
break;
default:
- return kvm_x86_call(get_msr_feature)(msr);
+ return kvm_x86_call(get_feature_msr)(index, data);
}
return 0;
}
-static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
{
- struct kvm_msr_entry msr;
- int r;
-
- /* Unconditionally clear the output for simplicity */
- msr.data = 0;
- msr.index = index;
- r = kvm_get_msr_feature(&msr);
-
- if (r == KVM_MSR_RET_INVALID && kvm_msr_ignored_check(index, 0, false))
- r = 0;
-
- *data = msr.data;
-
- return r;
+ return kvm_do_msr_access(vcpu, index, data, true, MSR_TYPE_R,
+ kvm_get_feature_msr);
}
static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
@@ -1880,16 +1905,17 @@
return kvm_x86_call(set_msr)(vcpu, &msr);
}
+static int _kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
+{
+ return __kvm_set_msr(vcpu, index, *data, host_initiated);
+}
+
static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
u32 index, u64 data, bool host_initiated)
{
- int ret = __kvm_set_msr(vcpu, index, data, host_initiated);
-
- if (ret == KVM_MSR_RET_INVALID)
- if (kvm_msr_ignored_check(index, data, true))
- ret = 0;
-
- return ret;
+ return kvm_do_msr_access(vcpu, index, &data, host_initiated, MSR_TYPE_W,
+ _kvm_set_msr);
}
/*
@@ -1928,31 +1954,25 @@
static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
u32 index, u64 *data, bool host_initiated)
{
- int ret = __kvm_get_msr(vcpu, index, data, host_initiated);
-
- if (ret == KVM_MSR_RET_INVALID) {
- /* Unconditionally clear *data for simplicity */
- *data = 0;
- if (kvm_msr_ignored_check(index, 0, false))
- ret = 0;
- }
-
- return ret;
+ return kvm_do_msr_access(vcpu, index, data, host_initiated, MSR_TYPE_R,
+ __kvm_get_msr);
}
-static int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data)
{
if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
return KVM_MSR_RET_FILTERED;
return kvm_get_msr_ignored_check(vcpu, index, data, false);
}
+EXPORT_SYMBOL_GPL(kvm_get_msr_with_filter);
-static int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data)
+int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data)
{
if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
return KVM_MSR_RET_FILTERED;
return kvm_set_msr_ignored_check(vcpu, index, data, false);
}
+EXPORT_SYMBOL_GPL(kvm_set_msr_with_filter);
int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data)
{
@@ -1999,7 +2019,7 @@
static u64 kvm_msr_reason(int r)
{
switch (r) {
- case KVM_MSR_RET_INVALID:
+ case KVM_MSR_RET_UNSUPPORTED:
return KVM_MSR_EXIT_REASON_UNKNOWN;
case KVM_MSR_RET_FILTERED:
return KVM_MSR_EXIT_REASON_FILTER;
@@ -2162,31 +2182,34 @@
{
u32 msr = kvm_rcx_read(vcpu);
u64 data;
- fastpath_t ret = EXIT_FASTPATH_NONE;
+ fastpath_t ret;
+ bool handled;
kvm_vcpu_srcu_read_lock(vcpu);
switch (msr) {
case APIC_BASE_MSR + (APIC_ICR >> 4):
data = kvm_read_edx_eax(vcpu);
- if (!handle_fastpath_set_x2apic_icr_irqoff(vcpu, data)) {
- kvm_skip_emulated_instruction(vcpu);
- ret = EXIT_FASTPATH_EXIT_HANDLED;
- }
+ handled = !handle_fastpath_set_x2apic_icr_irqoff(vcpu, data);
break;
case MSR_IA32_TSC_DEADLINE:
data = kvm_read_edx_eax(vcpu);
- if (!handle_fastpath_set_tscdeadline(vcpu, data)) {
- kvm_skip_emulated_instruction(vcpu);
- ret = EXIT_FASTPATH_REENTER_GUEST;
- }
+ handled = !handle_fastpath_set_tscdeadline(vcpu, data);
break;
default:
+ handled = false;
break;
}
- if (ret != EXIT_FASTPATH_NONE)
+ if (handled) {
+ if (!kvm_skip_emulated_instruction(vcpu))
+ ret = EXIT_FASTPATH_EXIT_USERSPACE;
+ else
+ ret = EXIT_FASTPATH_REENTER_GUEST;
trace_kvm_msr_write(msr, data);
+ } else {
+ ret = EXIT_FASTPATH_NONE;
+ }
kvm_vcpu_srcu_read_unlock(vcpu);
@@ -3746,18 +3769,6 @@
mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
-static bool kvm_is_msr_to_save(u32 msr_index)
-{
- unsigned int i;
-
- for (i = 0; i < num_msrs_to_save; i++) {
- if (msrs_to_save[i] == msr_index)
- return true;
- }
-
- return false;
-}
-
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
u32 msr = msr_info->index;
@@ -4139,15 +4150,7 @@
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
- /*
- * Userspace is allowed to write '0' to MSRs that KVM reports
- * as to-be-saved, even if an MSRs isn't fully supported.
- */
- if (msr_info->host_initiated && !data &&
- kvm_is_msr_to_save(msr))
- break;
-
- return KVM_MSR_RET_INVALID;
+ return KVM_MSR_RET_UNSUPPORTED;
}
return 0;
}
@@ -4498,17 +4501,7 @@
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info);
- /*
- * Userspace is allowed to read MSRs that KVM reports as
- * to-be-saved, even if an MSR isn't fully supported.
- */
- if (msr_info->host_initiated &&
- kvm_is_msr_to_save(msr_info->index)) {
- msr_info->data = 0;
- break;
- }
-
- return KVM_MSR_RET_INVALID;
+ return KVM_MSR_RET_UNSUPPORTED;
}
return 0;
}
@@ -4946,7 +4939,7 @@
break;
}
case KVM_GET_MSRS:
- r = msr_io(NULL, argp, do_get_msr_feature, 1);
+ r = msr_io(NULL, argp, do_get_feature_msr, 1);
break;
#ifdef CONFIG_KVM_HYPERV
case KVM_GET_SUPPORTED_HV_CPUID:
@@ -7383,11 +7376,9 @@
static void kvm_probe_feature_msr(u32 msr_index)
{
- struct kvm_msr_entry msr = {
- .index = msr_index,
- };
+ u64 data;
- if (kvm_get_msr_feature(&msr))
+ if (kvm_get_feature_msr(NULL, msr_index, &data, true))
return;
msr_based_features[num_msr_based_features++] = msr_index;
@@ -8865,60 +8856,13 @@
return 1;
}
-static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- int emulation_type)
+static bool kvm_unprotect_and_retry_on_failure(struct kvm_vcpu *vcpu,
+ gpa_t cr2_or_gpa,
+ int emulation_type)
{
- gpa_t gpa = cr2_or_gpa;
- kvm_pfn_t pfn;
-
if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
return false;
- if (WARN_ON_ONCE(is_guest_mode(vcpu)) ||
- WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
- return false;
-
- if (!vcpu->arch.mmu->root_role.direct) {
- /*
- * Write permission should be allowed since only
- * write access need to be emulated.
- */
- gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
-
- /*
- * If the mapping is invalid in guest, let cpu retry
- * it to generate fault.
- */
- if (gpa == INVALID_GPA)
- return true;
- }
-
- /*
- * Do not retry the unhandleable instruction if it faults on the
- * readonly host memory, otherwise it will goto a infinite loop:
- * retry instruction -> write #PF -> emulation fail -> retry
- * instruction -> ...
- */
- pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
-
- /*
- * If the instruction failed on the error pfn, it can not be fixed,
- * report the error to userspace.
- */
- if (is_error_noslot_pfn(pfn))
- return false;
-
- kvm_release_pfn_clean(pfn);
-
- /*
- * If emulation may have been triggered by a write to a shadowed page
- * table, unprotect the gfn (zap any relevant SPTEs) and re-enter the
- * guest to let the CPU re-execute the instruction in the hope that the
- * CPU can cleanly execute the instruction that KVM failed to emulate.
- */
- if (vcpu->kvm->arch.indirect_shadow_pages)
- kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
-
/*
* If the failed instruction faulted on an access to page tables that
* are used to translate any part of the instruction, KVM can't resolve
@@ -8929,54 +8873,24 @@
* then zap the SPTE to unprotect the gfn, and then do it all over
* again. Report the error to userspace.
*/
- return !(emulation_type & EMULTYPE_WRITE_PF_TO_SP);
-}
-
-static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
- gpa_t cr2_or_gpa, int emulation_type)
-{
- struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
- unsigned long last_retry_eip, last_retry_addr, gpa = cr2_or_gpa;
-
- last_retry_eip = vcpu->arch.last_retry_eip;
- last_retry_addr = vcpu->arch.last_retry_addr;
+ if (emulation_type & EMULTYPE_WRITE_PF_TO_SP)
+ return false;
/*
- * If the emulation is caused by #PF and it is non-page_table
- * writing instruction, it means the VM-EXIT is caused by shadow
- * page protected, we can zap the shadow page and retry this
- * instruction directly.
- *
- * Note: if the guest uses a non-page-table modifying instruction
- * on the PDE that points to the instruction, then we will unmap
- * the instruction and go to an infinite loop. So, we cache the
- * last retried eip and the last fault address, if we meet the eip
- * and the address again, we can break out of the potential infinite
- * loop.
+ * If emulation may have been triggered by a write to a shadowed page
+ * table, unprotect the gfn (zap any relevant SPTEs) and re-enter the
+ * guest to let the CPU re-execute the instruction in the hope that the
+ * CPU can cleanly execute the instruction that KVM failed to emulate.
*/
- vcpu->arch.last_retry_eip = vcpu->arch.last_retry_addr = 0;
+ __kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa, true);
- if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
- return false;
-
- if (WARN_ON_ONCE(is_guest_mode(vcpu)) ||
- WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
- return false;
-
- if (x86_page_table_writing_insn(ctxt))
- return false;
-
- if (ctxt->eip == last_retry_eip && last_retry_addr == cr2_or_gpa)
- return false;
-
- vcpu->arch.last_retry_eip = ctxt->eip;
- vcpu->arch.last_retry_addr = cr2_or_gpa;
-
- if (!vcpu->arch.mmu->root_role.direct)
- gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
-
- kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
-
+ /*
+ * Retry even if _this_ vCPU didn't unprotect the gfn, as it's possible
+ * all SPTEs were already zapped by a different task. The alternative
+ * is to report the error to userspace and likely terminate the guest,
+ * and the last_retry_{eip,addr} checks will prevent retrying the page
+ * fault indefinitely, i.e. there's nothing to lose by retrying.
+ */
return true;
}
@@ -9176,6 +9090,11 @@
struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
bool writeback = true;
+ if ((emulation_type & EMULTYPE_ALLOW_RETRY_PF) &&
+ (WARN_ON_ONCE(is_guest_mode(vcpu)) ||
+ WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF))))
+ emulation_type &= ~EMULTYPE_ALLOW_RETRY_PF;
+
r = kvm_check_emulate_insn(vcpu, emulation_type, insn, insn_len);
if (r != X86EMUL_CONTINUE) {
if (r == X86EMUL_RETRY_INSTR || r == X86EMUL_PROPAGATE_FAULT)
@@ -9206,8 +9125,8 @@
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
- if (reexecute_instruction(vcpu, cr2_or_gpa,
- emulation_type))
+ if (kvm_unprotect_and_retry_on_failure(vcpu, cr2_or_gpa,
+ emulation_type))
return 1;
if (ctxt->have_exception &&
@@ -9254,7 +9173,15 @@
return 1;
}
- if (retry_instruction(ctxt, cr2_or_gpa, emulation_type))
+ /*
+ * If emulation was caused by a write-protection #PF on a non-page_table
+ * writing instruction, try to unprotect the gfn, i.e. zap shadow pages,
+ * and retry the instruction, as the vCPU is likely no longer using the
+ * gfn as a page table.
+ */
+ if ((emulation_type & EMULTYPE_ALLOW_RETRY_PF) &&
+ !x86_page_table_writing_insn(ctxt) &&
+ kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa))
return 1;
/* this is needed for vmware backdoor interface to work since it
@@ -9285,7 +9212,8 @@
return 1;
if (r == EMULATION_FAILED) {
- if (reexecute_instruction(vcpu, cr2_or_gpa, emulation_type))
+ if (kvm_unprotect_and_retry_on_failure(vcpu, cr2_or_gpa,
+ emulation_type))
return 1;
return handle_emulation_failure(vcpu, emulation_type);
@@ -9753,7 +9681,7 @@
guard(mutex)(&vendor_module_lock);
- if (kvm_x86_ops.hardware_enable) {
+ if (kvm_x86_ops.enable_virtualization_cpu) {
pr_err("already loaded vendor module '%s'\n", kvm_x86_ops.name);
return -EEXIST;
}
@@ -9880,7 +9808,7 @@
return 0;
out_unwind_ops:
- kvm_x86_ops.hardware_enable = NULL;
+ kvm_x86_ops.enable_virtualization_cpu = NULL;
kvm_x86_call(hardware_unsetup)();
out_mmu_exit:
kvm_mmu_vendor_module_exit();
@@ -9921,56 +9849,11 @@
WARN_ON(static_branch_unlikely(&kvm_xen_enabled.key));
#endif
mutex_lock(&vendor_module_lock);
- kvm_x86_ops.hardware_enable = NULL;
+ kvm_x86_ops.enable_virtualization_cpu = NULL;
mutex_unlock(&vendor_module_lock);
}
EXPORT_SYMBOL_GPL(kvm_x86_vendor_exit);
-static int __kvm_emulate_halt(struct kvm_vcpu *vcpu, int state, int reason)
-{
- /*
- * The vCPU has halted, e.g. executed HLT. Update the run state if the
- * local APIC is in-kernel, the run loop will detect the non-runnable
- * state and halt the vCPU. Exit to userspace if the local APIC is
- * managed by userspace, in which case userspace is responsible for
- * handling wake events.
- */
- ++vcpu->stat.halt_exits;
- if (lapic_in_kernel(vcpu)) {
- vcpu->arch.mp_state = state;
- return 1;
- } else {
- vcpu->run->exit_reason = reason;
- return 0;
- }
-}
-
-int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu)
-{
- return __kvm_emulate_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT);
-}
-EXPORT_SYMBOL_GPL(kvm_emulate_halt_noskip);
-
-int kvm_emulate_halt(struct kvm_vcpu *vcpu)
-{
- int ret = kvm_skip_emulated_instruction(vcpu);
- /*
- * TODO: we might be squashing a GUESTDBG_SINGLESTEP-triggered
- * KVM_EXIT_DEBUG here.
- */
- return kvm_emulate_halt_noskip(vcpu) && ret;
-}
-EXPORT_SYMBOL_GPL(kvm_emulate_halt);
-
-int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
-{
- int ret = kvm_skip_emulated_instruction(vcpu);
-
- return __kvm_emulate_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD,
- KVM_EXIT_AP_RESET_HOLD) && ret;
-}
-EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
-
#ifdef CONFIG_X86_64
static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
unsigned long clock_type)
@@ -11207,6 +11090,9 @@
if (vcpu->arch.apic_attention)
kvm_lapic_sync_from_vapic(vcpu);
+ if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
+ return 0;
+
r = kvm_x86_call(handle_exit)(vcpu, exit_fastpath);
return r;
@@ -11220,6 +11106,67 @@
return r;
}
+static bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
+{
+ return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
+ !vcpu->arch.apf.halted);
+}
+
+static bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
+{
+ if (!list_empty_careful(&vcpu->async_pf.done))
+ return true;
+
+ if (kvm_apic_has_pending_init_or_sipi(vcpu) &&
+ kvm_apic_init_sipi_allowed(vcpu))
+ return true;
+
+ if (vcpu->arch.pv.pv_unhalted)
+ return true;
+
+ if (kvm_is_exception_pending(vcpu))
+ return true;
+
+ if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
+ (vcpu->arch.nmi_pending &&
+ kvm_x86_call(nmi_allowed)(vcpu, false)))
+ return true;
+
+#ifdef CONFIG_KVM_SMM
+ if (kvm_test_request(KVM_REQ_SMI, vcpu) ||
+ (vcpu->arch.smi_pending &&
+ kvm_x86_call(smi_allowed)(vcpu, false)))
+ return true;
+#endif
+
+ if (kvm_test_request(KVM_REQ_PMI, vcpu))
+ return true;
+
+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+ return true;
+
+ if (kvm_arch_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu))
+ return true;
+
+ if (kvm_hv_has_stimer_pending(vcpu))
+ return true;
+
+ if (is_guest_mode(vcpu) &&
+ kvm_x86_ops.nested_ops->has_events &&
+ kvm_x86_ops.nested_ops->has_events(vcpu, false))
+ return true;
+
+ if (kvm_xen_has_pending_events(vcpu))
+ return true;
+
+ return false;
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
+{
+ return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
+}
+
/* Called within kvm->srcu read side. */
static inline int vcpu_block(struct kvm_vcpu *vcpu)
{
@@ -11291,12 +11238,6 @@
return 1;
}
-static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
-{
- return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
- !vcpu->arch.apf.halted);
-}
-
/* Called within kvm->srcu read side. */
static int vcpu_run(struct kvm_vcpu *vcpu)
{
@@ -11348,6 +11289,98 @@
return r;
}
+static int __kvm_emulate_halt(struct kvm_vcpu *vcpu, int state, int reason)
+{
+ /*
+ * The vCPU has halted, e.g. executed HLT. Update the run state if the
+ * local APIC is in-kernel, the run loop will detect the non-runnable
+ * state and halt the vCPU. Exit to userspace if the local APIC is
+ * managed by userspace, in which case userspace is responsible for
+ * handling wake events.
+ */
+ ++vcpu->stat.halt_exits;
+ if (lapic_in_kernel(vcpu)) {
+ if (kvm_vcpu_has_events(vcpu))
+ vcpu->arch.pv.pv_unhalted = false;
+ else
+ vcpu->arch.mp_state = state;
+ return 1;
+ } else {
+ vcpu->run->exit_reason = reason;
+ return 0;
+ }
+}
+
+int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu)
+{
+ return __kvm_emulate_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT);
+}
+EXPORT_SYMBOL_GPL(kvm_emulate_halt_noskip);
+
+int kvm_emulate_halt(struct kvm_vcpu *vcpu)
+{
+ int ret = kvm_skip_emulated_instruction(vcpu);
+ /*
+ * TODO: we might be squashing a GUESTDBG_SINGLESTEP-triggered
+ * KVM_EXIT_DEBUG here.
+ */
+ return kvm_emulate_halt_noskip(vcpu) && ret;
+}
+EXPORT_SYMBOL_GPL(kvm_emulate_halt);
+
+fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu)
+{
+ int ret;
+
+ kvm_vcpu_srcu_read_lock(vcpu);
+ ret = kvm_emulate_halt(vcpu);
+ kvm_vcpu_srcu_read_unlock(vcpu);
+
+ if (!ret)
+ return EXIT_FASTPATH_EXIT_USERSPACE;
+
+ if (kvm_vcpu_running(vcpu))
+ return EXIT_FASTPATH_REENTER_GUEST;
+
+ return EXIT_FASTPATH_EXIT_HANDLED;
+}
+EXPORT_SYMBOL_GPL(handle_fastpath_hlt);
+
+int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
+{
+ int ret = kvm_skip_emulated_instruction(vcpu);
+
+ return __kvm_emulate_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD,
+ KVM_EXIT_AP_RESET_HOLD) && ret;
+}
+EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
+
+bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu)
+{
+ return kvm_vcpu_apicv_active(vcpu) &&
+ kvm_x86_call(dy_apicv_has_pending_interrupt)(vcpu);
+}
+
+bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.preempted_in_kernel;
+}
+
+bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
+{
+ if (READ_ONCE(vcpu->arch.pv.pv_unhalted))
+ return true;
+
+ if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
+#ifdef CONFIG_KVM_SMM
+ kvm_test_request(KVM_REQ_SMI, vcpu) ||
+#endif
+ kvm_test_request(KVM_REQ_EVENT, vcpu))
+ return true;
+
+ return kvm_arch_dy_has_pending_interrupt(vcpu);
+}
+
static inline int complete_emulated_io(struct kvm_vcpu *vcpu)
{
return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE);
@@ -12264,8 +12297,6 @@
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
- vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
-
kvm_async_pf_hash_reset(vcpu);
vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap;
@@ -12431,6 +12462,8 @@
if (!init_event) {
vcpu->arch.smbase = 0x30000;
+ vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
+
vcpu->arch.msr_misc_features_enables = 0;
vcpu->arch.ia32_misc_enable_msr = MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
@@ -12516,7 +12549,17 @@
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
-int kvm_arch_hardware_enable(void)
+void kvm_arch_enable_virtualization(void)
+{
+ cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
+}
+
+void kvm_arch_disable_virtualization(void)
+{
+ cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
+}
+
+int kvm_arch_enable_virtualization_cpu(void)
{
struct kvm *kvm;
struct kvm_vcpu *vcpu;
@@ -12532,7 +12575,7 @@
if (ret)
return ret;
- ret = kvm_x86_call(hardware_enable)();
+ ret = kvm_x86_call(enable_virtualization_cpu)();
if (ret != 0)
return ret;
@@ -12612,9 +12655,9 @@
return 0;
}
-void kvm_arch_hardware_disable(void)
+void kvm_arch_disable_virtualization_cpu(void)
{
- kvm_x86_call(hardware_disable)();
+ kvm_x86_call(disable_virtualization_cpu)();
drop_user_return_notifiers();
}
@@ -13162,87 +13205,6 @@
kvm_arch_free_memslot(kvm, old);
}
-static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
-{
- if (!list_empty_careful(&vcpu->async_pf.done))
- return true;
-
- if (kvm_apic_has_pending_init_or_sipi(vcpu) &&
- kvm_apic_init_sipi_allowed(vcpu))
- return true;
-
- if (vcpu->arch.pv.pv_unhalted)
- return true;
-
- if (kvm_is_exception_pending(vcpu))
- return true;
-
- if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
- (vcpu->arch.nmi_pending &&
- kvm_x86_call(nmi_allowed)(vcpu, false)))
- return true;
-
-#ifdef CONFIG_KVM_SMM
- if (kvm_test_request(KVM_REQ_SMI, vcpu) ||
- (vcpu->arch.smi_pending &&
- kvm_x86_call(smi_allowed)(vcpu, false)))
- return true;
-#endif
-
- if (kvm_test_request(KVM_REQ_PMI, vcpu))
- return true;
-
- if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
- return true;
-
- if (kvm_arch_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu))
- return true;
-
- if (kvm_hv_has_stimer_pending(vcpu))
- return true;
-
- if (is_guest_mode(vcpu) &&
- kvm_x86_ops.nested_ops->has_events &&
- kvm_x86_ops.nested_ops->has_events(vcpu, false))
- return true;
-
- if (kvm_xen_has_pending_events(vcpu))
- return true;
-
- return false;
-}
-
-int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
-{
- return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
-}
-
-bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu)
-{
- return kvm_vcpu_apicv_active(vcpu) &&
- kvm_x86_call(dy_apicv_has_pending_interrupt)(vcpu);
-}
-
-bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu)
-{
- return vcpu->arch.preempted_in_kernel;
-}
-
-bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
-{
- if (READ_ONCE(vcpu->arch.pv.pv_unhalted))
- return true;
-
- if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
-#ifdef CONFIG_KVM_SMM
- kvm_test_request(KVM_REQ_SMI, vcpu) ||
-#endif
- kvm_test_request(KVM_REQ_EVENT, vcpu))
- return true;
-
- return kvm_arch_dy_has_pending_interrupt(vcpu);
-}
-
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.guest_state_protected)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 50596f6..a84c48e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -103,11 +103,18 @@
return max(val, min);
}
-#define MSR_IA32_CR_PAT_DEFAULT 0x0007040600070406ULL
+#define MSR_IA32_CR_PAT_DEFAULT \
+ PAT_VALUE(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC)
void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
int kvm_check_nested_events(struct kvm_vcpu *vcpu);
+/* Forcibly leave the nested mode in cases like a vCPU reset */
+static inline void kvm_leave_nested(struct kvm_vcpu *vcpu)
+{
+ kvm_x86_ops.nested_ops->leave_nested(vcpu);
+}
+
static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
{
return vcpu->arch.last_vmentry_cpu != -1;
@@ -334,6 +341,7 @@
int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
+fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu);
extern struct kvm_caps kvm_caps;
extern struct kvm_host_values kvm_host;
@@ -504,13 +512,26 @@
int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
+enum kvm_msr_access {
+ MSR_TYPE_R = BIT(0),
+ MSR_TYPE_W = BIT(1),
+ MSR_TYPE_RW = MSR_TYPE_R | MSR_TYPE_W,
+};
+
/*
* Internal error codes that are used to indicate that MSR emulation encountered
- * an error that should result in #GP in the guest, unless userspace
- * handles it.
+ * an error that should result in #GP in the guest, unless userspace handles it.
+ * Note, '1', '0', and negative numbers are off limits, as they are used by KVM
+ * as part of KVM's lightly documented internal KVM_RUN return codes.
+ *
+ * UNSUPPORTED - The MSR isn't supported, either because it is completely
+ * unknown to KVM, or because the MSR should not exist according
+ * to the vCPU model.
+ *
+ * FILTERED - Access to the MSR is denied by a userspace MSR filter.
*/
-#define KVM_MSR_RET_INVALID 2 /* in-kernel MSR emulation #GP condition */
-#define KVM_MSR_RET_FILTERED 3 /* #GP due to userspace MSR filter */
+#define KVM_MSR_RET_UNSUPPORTED 2
+#define KVM_MSR_RET_FILTERED 3
#define __cr4_reserved_bits(__cpu_has, __c) \
({ \
diff --git a/arch/x86/lib/atomic64_cx8_32.S b/arch/x86/lib/atomic64_cx8_32.S
index 90afb48..b2eff07 100644
--- a/arch/x86/lib/atomic64_cx8_32.S
+++ b/arch/x86/lib/atomic64_cx8_32.S
@@ -16,6 +16,11 @@
cmpxchg8b (\reg)
.endm
+.macro read64_nonatomic reg
+ movl (\reg), %eax
+ movl 4(\reg), %edx
+.endm
+
SYM_FUNC_START(atomic64_read_cx8)
read64 %ecx
RET
@@ -51,7 +56,7 @@
movl %edx, %edi
movl %ecx, %ebp
- read64 %ecx
+ read64_nonatomic %ecx
1:
movl %eax, %ebx
movl %edx, %ecx
@@ -79,7 +84,7 @@
SYM_FUNC_START(atomic64_\func\()_return_cx8)
pushl %ebx
- read64 %esi
+ read64_nonatomic %esi
1:
movl %eax, %ebx
movl %edx, %ecx
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index f73b5ce..feb8cc6 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -176,15 +176,6 @@
}
#endif
-enum {
- PAT_UC = 0, /* uncached */
- PAT_WC = 1, /* Write combining */
- PAT_WT = 4, /* Write Through */
- PAT_WP = 5, /* Write Protected */
- PAT_WB = 6, /* Write Back (default) */
- PAT_UC_MINUS = 7, /* UC, but can be overridden by MTRR */
-};
-
#define CM(c) (_PAGE_CACHE_MODE_ ## c)
static enum page_cache_mode __init pat_get_cache_mode(unsigned int pat_val,
@@ -194,13 +185,13 @@
char *cache_mode;
switch (pat_val) {
- case PAT_UC: cache = CM(UC); cache_mode = "UC "; break;
- case PAT_WC: cache = CM(WC); cache_mode = "WC "; break;
- case PAT_WT: cache = CM(WT); cache_mode = "WT "; break;
- case PAT_WP: cache = CM(WP); cache_mode = "WP "; break;
- case PAT_WB: cache = CM(WB); cache_mode = "WB "; break;
- case PAT_UC_MINUS: cache = CM(UC_MINUS); cache_mode = "UC- "; break;
- default: cache = CM(WB); cache_mode = "WB "; break;
+ case X86_MEMTYPE_UC: cache = CM(UC); cache_mode = "UC "; break;
+ case X86_MEMTYPE_WC: cache = CM(WC); cache_mode = "WC "; break;
+ case X86_MEMTYPE_WT: cache = CM(WT); cache_mode = "WT "; break;
+ case X86_MEMTYPE_WP: cache = CM(WP); cache_mode = "WP "; break;
+ case X86_MEMTYPE_WB: cache = CM(WB); cache_mode = "WB "; break;
+ case X86_MEMTYPE_UC_MINUS: cache = CM(UC_MINUS); cache_mode = "UC- "; break;
+ default: cache = CM(WB); cache_mode = "WB "; break;
}
memcpy(msg, cache_mode, 4);
@@ -257,12 +248,6 @@
void __init pat_bp_init(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
-#define PAT(p0, p1, p2, p3, p4, p5, p6, p7) \
- (((u64)PAT_ ## p0) | ((u64)PAT_ ## p1 << 8) | \
- ((u64)PAT_ ## p2 << 16) | ((u64)PAT_ ## p3 << 24) | \
- ((u64)PAT_ ## p4 << 32) | ((u64)PAT_ ## p5 << 40) | \
- ((u64)PAT_ ## p6 << 48) | ((u64)PAT_ ## p7 << 56))
-
if (!IS_ENABLED(CONFIG_X86_PAT))
pr_info_once("x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.\n");
@@ -293,7 +278,7 @@
* NOTE: When WC or WP is used, it is redirected to UC- per
* the default setup in __cachemode2pte_tbl[].
*/
- pat_msr_val = PAT(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC);
+ pat_msr_val = PAT_VALUE(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC);
}
/*
@@ -328,7 +313,7 @@
* NOTE: When WT or WP is used, it is redirected to UC- per
* the default setup in __cachemode2pte_tbl[].
*/
- pat_msr_val = PAT(WB, WC, UC_MINUS, UC, WB, WC, UC_MINUS, UC);
+ pat_msr_val = PAT_VALUE(WB, WC, UC_MINUS, UC, WB, WC, UC_MINUS, UC);
} else {
/*
* Full PAT support. We put WT in slot 7 to improve
@@ -356,13 +341,12 @@
* The reserved slots are unused, but mapped to their
* corresponding types in the presence of PAT errata.
*/
- pat_msr_val = PAT(WB, WC, UC_MINUS, UC, WB, WP, UC_MINUS, WT);
+ pat_msr_val = PAT_VALUE(WB, WC, UC_MINUS, UC, WB, WP, UC_MINUS, WT);
}
memory_caching_control |= CACHE_PAT;
init_cache_modes(pat_msr_val);
-#undef PAT
}
static DEFINE_SPINLOCK(memtype_lock); /* protects memtype accesses */
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index f7235ef8..64fca49 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -7,6 +7,7 @@
.code32
.text
#define _pa(x) ((x) - __START_KERNEL_map)
+#define rva(x) ((x) - pvh_start_xen)
#include <linux/elfnote.h>
#include <linux/init.h>
@@ -15,6 +16,7 @@
#include <asm/segment.h>
#include <asm/asm.h>
#include <asm/boot.h>
+#include <asm/pgtable.h>
#include <asm/processor-flags.h>
#include <asm/msr.h>
#include <asm/nospec-branch.h>
@@ -54,7 +56,25 @@
UNWIND_HINT_END_OF_STACK
cld
- lgdt (_pa(gdt))
+ /*
+ * See the comment for startup_32 for more details. We need to
+ * execute a call to get the execution address to be position
+ * independent, but we don't have a stack. Save and restore the
+ * magic field of start_info in ebx, and use that as the stack.
+ */
+ mov (%ebx), %eax
+ leal 4(%ebx), %esp
+ ANNOTATE_INTRA_FUNCTION_CALL
+ call 1f
+1: popl %ebp
+ mov %eax, (%ebx)
+ subl $rva(1b), %ebp
+ movl $0, %esp
+
+ leal rva(gdt)(%ebp), %eax
+ leal rva(gdt_start)(%ebp), %ecx
+ movl %ecx, 2(%eax)
+ lgdt (%eax)
mov $PVH_DS_SEL,%eax
mov %eax,%ds
@@ -62,14 +82,14 @@
mov %eax,%ss
/* Stash hvm_start_info. */
- mov $_pa(pvh_start_info), %edi
+ leal rva(pvh_start_info)(%ebp), %edi
mov %ebx, %esi
- mov _pa(pvh_start_info_sz), %ecx
+ movl rva(pvh_start_info_sz)(%ebp), %ecx
shr $2,%ecx
rep
movsl
- mov $_pa(early_stack_end), %esp
+ leal rva(early_stack_end)(%ebp), %esp
/* Enable PAE mode. */
mov %cr4, %eax
@@ -83,31 +103,86 @@
btsl $_EFER_LME, %eax
wrmsr
+ mov %ebp, %ebx
+ subl $_pa(pvh_start_xen), %ebx /* offset */
+ jz .Lpagetable_done
+
+ /* Fixup page-tables for relocation. */
+ leal rva(pvh_init_top_pgt)(%ebp), %edi
+ movl $PTRS_PER_PGD, %ecx
+2:
+ testl $_PAGE_PRESENT, 0x00(%edi)
+ jz 1f
+ addl %ebx, 0x00(%edi)
+1:
+ addl $8, %edi
+ decl %ecx
+ jnz 2b
+
+ /* L3 ident has a single entry. */
+ leal rva(pvh_level3_ident_pgt)(%ebp), %edi
+ addl %ebx, 0x00(%edi)
+
+ leal rva(pvh_level3_kernel_pgt)(%ebp), %edi
+ addl %ebx, (PAGE_SIZE - 16)(%edi)
+ addl %ebx, (PAGE_SIZE - 8)(%edi)
+
+ /* pvh_level2_ident_pgt is fine - large pages */
+
+ /* pvh_level2_kernel_pgt needs adjustment - large pages */
+ leal rva(pvh_level2_kernel_pgt)(%ebp), %edi
+ movl $PTRS_PER_PMD, %ecx
+2:
+ testl $_PAGE_PRESENT, 0x00(%edi)
+ jz 1f
+ addl %ebx, 0x00(%edi)
+1:
+ addl $8, %edi
+ decl %ecx
+ jnz 2b
+
+.Lpagetable_done:
/* Enable pre-constructed page tables. */
- mov $_pa(init_top_pgt), %eax
+ leal rva(pvh_init_top_pgt)(%ebp), %eax
mov %eax, %cr3
mov $(X86_CR0_PG | X86_CR0_PE), %eax
mov %eax, %cr0
/* Jump to 64-bit mode. */
- ljmp $PVH_CS_SEL, $_pa(1f)
+ pushl $PVH_CS_SEL
+ leal rva(1f)(%ebp), %eax
+ pushl %eax
+ lretl
/* 64-bit entry point. */
.code64
1:
+ UNWIND_HINT_END_OF_STACK
+
/* Set base address in stack canary descriptor. */
mov $MSR_GS_BASE,%ecx
- mov $_pa(canary), %eax
+ leal canary(%rip), %eax
xor %edx, %edx
wrmsr
+ /*
+ * Calculate load offset and store in phys_base. __pa() needs
+ * phys_base set to calculate the hypercall page in xen_pvh_init().
+ */
+ movq %rbp, %rbx
+ subq $_pa(pvh_start_xen), %rbx
+ movq %rbx, phys_base(%rip)
call xen_prepare_pvh
+ /*
+ * Clear phys_base. __startup_64 will *add* to its value,
+ * so reset to 0.
+ */
+ xor %rbx, %rbx
+ movq %rbx, phys_base(%rip)
/* startup_64 expects boot_params in %rsi. */
- mov $_pa(pvh_bootparams), %rsi
- mov $_pa(startup_64), %rax
- ANNOTATE_RETPOLINE_SAFE
- jmp *%rax
+ lea pvh_bootparams(%rip), %rsi
+ jmp startup_64
#else /* CONFIG_X86_64 */
@@ -143,7 +218,7 @@
.balign 8
SYM_DATA_START_LOCAL(gdt)
.word gdt_end - gdt_start
- .long _pa(gdt_start)
+ .long _pa(gdt_start) /* x86-64 will overwrite if relocated. */
.word 0
SYM_DATA_END(gdt)
SYM_DATA_START_LOCAL(gdt_start)
@@ -163,5 +238,67 @@
.fill BOOT_STACK_SIZE, 1, 0
SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
+#ifdef CONFIG_X86_64
+/*
+ * Xen PVH needs a set of identity mapped and kernel high mapping
+ * page tables. pvh_start_xen starts running on the identity mapped
+ * page tables, but xen_prepare_pvh calls into the high mapping.
+ * These page tables need to be relocatable and are only used until
+ * startup_64 transitions to init_top_pgt.
+ */
+SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt)
+ .quad pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .org pvh_init_top_pgt + L4_PAGE_OFFSET * 8, 0
+ .quad pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .org pvh_init_top_pgt + L4_START_KERNEL * 8, 0
+ /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+ .quad pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+SYM_DATA_END(pvh_init_top_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt)
+ .quad pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .fill 511, 8, 0
+SYM_DATA_END(pvh_level3_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt)
+ /*
+ * Since I easily can, map the first 1G.
+ * Don't set NX because code runs from these pages.
+ *
+ * Note: This sets _PAGE_GLOBAL despite whether
+ * the CPU supports it or it is enabled. But,
+ * the CPU should ignore the bit.
+ */
+ PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
+SYM_DATA_END(pvh_level2_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_kernel_pgt)
+ .fill L3_START_KERNEL, 8, 0
+ /* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
+ .quad pvh_level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .quad 0 /* no fixmap */
+SYM_DATA_END(pvh_level3_kernel_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_kernel_pgt)
+ /*
+ * Kernel high mapping.
+ *
+ * The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
+ * virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
+ * 512 MiB otherwise.
+ *
+ * (NOTE: after that starts the module area, see MODULES_VADDR.)
+ *
+ * This table is eventually used by the kernel during normal runtime.
+ * Care must be taken to clear out undesired bits later, like _PAGE_RW
+ * or _PAGE_GLOBAL in some cases.
+ */
+ PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE / PMD_SIZE)
+SYM_DATA_END(pvh_level2_kernel_pgt)
+
+ ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_RELOC,
+ .long CONFIG_PHYSICAL_ALIGN;
+ .long LOAD_PHYSICAL_ADDR;
+ .long KERNEL_IMAGE_SIZE - 1)
+#endif
+
ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
_ASM_PTR (pvh_start_xen - __START_KERNEL_map))
diff --git a/arch/x86/um/sysrq_32.c b/arch/x86/um/sysrq_32.c
index f238348..a1ee415 100644
--- a/arch/x86/um/sysrq_32.c
+++ b/arch/x86/um/sysrq_32.c
@@ -9,7 +9,6 @@
#include <linux/sched/debug.h>
#include <linux/kallsyms.h>
#include <asm/ptrace.h>
-#include <asm/sysrq.h>
/* This is declared by <linux/sched.h> */
void show_regs(struct pt_regs *regs)
diff --git a/arch/x86/um/sysrq_64.c b/arch/x86/um/sysrq_64.c
index 0bf6de4..340d8a2 100644
--- a/arch/x86/um/sysrq_64.c
+++ b/arch/x86/um/sysrq_64.c
@@ -12,7 +12,6 @@
#include <linux/utsname.h>
#include <asm/current.h>
#include <asm/ptrace.h>
-#include <asm/sysrq.h>
void show_regs(struct pt_regs *regs)
{
diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 728a436..bf68c32 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -4,6 +4,7 @@
#include <linux/mm.h>
#include <xen/hvc-console.h>
+#include <xen/acpi.h>
#include <asm/bootparam.h>
#include <asm/io_apic.h>
@@ -28,6 +29,28 @@
bool __ro_after_init xen_pvh;
EXPORT_SYMBOL_GPL(xen_pvh);
+#ifdef CONFIG_XEN_DOM0
+int xen_pvh_setup_gsi(int gsi, int trigger, int polarity)
+{
+ int ret;
+ struct physdev_setup_gsi setup_gsi;
+
+ setup_gsi.gsi = gsi;
+ setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+ setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+ ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+ if (ret == -EEXIST) {
+ xen_raw_printk("Already setup the GSI :%d\n", gsi);
+ ret = 0;
+ } else if (ret)
+ xen_raw_printk("Fail to setup GSI (%d)!\n", gsi);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi);
+#endif
+
/*
* Reserve e820 UNUSABLE regions to inflate the memory balloon.
*
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index e3a7c2a..d67f63d 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -451,7 +451,7 @@
config ACPI_BGRT
bool "Boottime Graphics Resource Table support"
- depends on EFI && (X86 || ARM64)
+ depends on EFI && (X86 || ARM64 || LOONGARCH)
help
This driver adds support for exposing the ACPI Boottime Graphics
Resource Table, which allows the operating system to obtain
diff --git a/drivers/acpi/apei/einj-cxl.c b/drivers/acpi/apei/einj-cxl.c
index 8b8be0c..4f81a11 100644
--- a/drivers/acpi/apei/einj-cxl.c
+++ b/drivers/acpi/apei/einj-cxl.c
@@ -7,9 +7,9 @@
*
* Author: Ben Cheatham <benjamin.cheatham@amd.com>
*/
-#include <linux/einj-cxl.h>
#include <linux/seq_file.h>
#include <linux/pci.h>
+#include <cxl/einj.h>
#include "apei-internal.h"
diff --git a/drivers/acpi/apei/erst-dbg.c b/drivers/acpi/apei/erst-dbg.c
index 8bc71cd..2460763 100644
--- a/drivers/acpi/apei/erst-dbg.c
+++ b/drivers/acpi/apei/erst-dbg.c
@@ -199,7 +199,6 @@
.read = erst_dbg_read,
.write = erst_dbg_write,
.unlocked_ioctl = erst_dbg_ioctl,
- .llseek = no_llseek,
};
static struct miscdevice erst_dbg_dev = {
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 623cc0c..ada93cf 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -27,7 +27,6 @@
#include <linux/timer.h>
#include <linux/cper.h>
#include <linux/cleanup.h>
-#include <linux/cxl-event.h>
#include <linux/platform_device.h>
#include <linux/mutex.h>
#include <linux/ratelimit.h>
@@ -50,6 +49,7 @@
#include <acpi/apei.h>
#include <asm/fixmap.h>
#include <asm/tlbflush.h>
+#include <cxl/event.h>
#include <ras/ras_event.h>
#include "apei-internal.h"
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30cec..630fe0a 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@
}
#endif /* CONFIG_X86_IO_APIC */
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
{
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 3328a6f..a4aedf7 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2256,10 +2256,15 @@
static unsigned int ata_msense_control_spgt2(struct ata_device *dev, u8 *buf,
u8 spg)
{
- u8 *b, *cdl = dev->cdl->desc_log_buf, *desc;
+ u8 *b, *cdl, *desc;
u32 policy;
int i;
+ if (!(dev->flags & ATA_DFLAG_CDL) || !dev->cdl)
+ return 0;
+
+ cdl = dev->cdl->desc_log_buf;
+
/*
* Fill the subpage. The first four bytes of the T2A/T2B mode pages
* are a header. The PAGE LENGTH field is the size of the page
@@ -2356,7 +2361,7 @@
case ALL_SUB_MPAGES:
n = ata_msense_control_spg0(dev, buf, changeable);
n += ata_msense_control_spgt2(dev, buf + n, CDL_T2A_SUB_MPAGE);
- n += ata_msense_control_spgt2(dev, buf + n, CDL_T2A_SUB_MPAGE);
+ n += ata_msense_control_spgt2(dev, buf + n, CDL_T2B_SUB_MPAGE);
n += ata_msense_control_ata_feature(dev, buf + n);
return n;
default:
diff --git a/drivers/auxdisplay/charlcd.c b/drivers/auxdisplay/charlcd.c
index bb94638..19b6193 100644
--- a/drivers/auxdisplay/charlcd.c
+++ b/drivers/auxdisplay/charlcd.c
@@ -526,7 +526,6 @@
.write = charlcd_write,
.open = charlcd_open,
.release = charlcd_release,
- .llseek = no_llseek,
};
static struct miscdevice charlcd_dev = {
diff --git a/drivers/base/attribute_container.c b/drivers/base/attribute_container.c
index 01ef796..b6f941a 100644
--- a/drivers/base/attribute_container.c
+++ b/drivers/base/attribute_container.c
@@ -346,8 +346,7 @@
* @fn: the function to execute for each classdev.
*
* This function is for executing a trigger when you need to know both
- * the container and the classdev. If you only care about the
- * container, then use attribute_container_trigger() instead.
+ * the container and the classdev.
*/
void
attribute_container_device_trigger(struct device *dev,
@@ -379,33 +378,6 @@
}
/**
- * attribute_container_trigger - trigger a function for each matching container
- *
- * @dev: The generic device to activate the trigger for
- * @fn: the function to trigger
- *
- * This routine triggers a function that only needs to know the
- * matching containers (not the classdev) associated with a device.
- * It is more lightweight than attribute_container_device_trigger, so
- * should be used in preference unless the triggering function
- * actually needs to know the classdev.
- */
-void
-attribute_container_trigger(struct device *dev,
- int (*fn)(struct attribute_container *,
- struct device *))
-{
- struct attribute_container *cont;
-
- mutex_lock(&attribute_container_mutex);
- list_for_each_entry(cont, &attribute_container_list, node) {
- if (cont->match(cont, dev))
- fn(cont, dev);
- }
- mutex_unlock(&attribute_container_mutex);
-}
-
-/**
* attribute_container_add_attrs - add attributes
*
* @classdev: The class device
@@ -459,24 +431,6 @@
}
/**
- * attribute_container_add_class_device_adapter - simple adapter for triggers
- *
- * @cont: the container to register.
- * @dev: the generic device to activate the trigger for
- * @classdev: the class device to add
- *
- * This function is identical to attribute_container_add_class_device except
- * that it is designed to be called from the triggers
- */
-int
-attribute_container_add_class_device_adapter(struct attribute_container *cont,
- struct device *dev,
- struct device *classdev)
-{
- return attribute_container_add_class_device(classdev);
-}
-
-/**
* attribute_container_remove_attrs - remove any attribute files
*
* @classdev: The class device to remove the files from
diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c
index 54b9283..7823888 100644
--- a/drivers/base/auxiliary.c
+++ b/drivers/base/auxiliary.c
@@ -352,7 +352,7 @@
*/
struct auxiliary_device *auxiliary_find_device(struct device *start,
const void *data,
- int (*match)(struct device *dev, const void *data))
+ device_match_t match)
{
struct device *dev;
diff --git a/drivers/base/base.h b/drivers/base/base.h
index 0b53593..8cf04a5 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -145,7 +145,7 @@
static inline void auxiliary_bus_init(void) { }
#endif
-struct kobject *virtual_device_parent(struct device *dev);
+struct kobject *virtual_device_parent(void);
int bus_add_device(struct device *dev);
void bus_probe_device(struct device *dev);
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index ffea072..657c93c 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -152,7 +152,8 @@
{
struct bus_attribute *bus_attr = to_bus_attr(attr);
struct subsys_private *subsys_priv = to_subsys_private(kobj);
- ssize_t ret = 0;
+ /* return -EIO for reading a bus attribute without show() */
+ ssize_t ret = -EIO;
if (bus_attr->show)
ret = bus_attr->show(subsys_priv->bus, buf);
@@ -164,7 +165,8 @@
{
struct bus_attribute *bus_attr = to_bus_attr(attr);
struct subsys_private *subsys_priv = to_subsys_private(kobj);
- ssize_t ret = 0;
+ /* return -EIO for writing a bus attribute without store() */
+ ssize_t ret = -EIO;
if (bus_attr->store)
ret = bus_attr->store(subsys_priv->bus, buf, count);
@@ -389,7 +391,7 @@
*/
struct device *bus_find_device(const struct bus_type *bus,
struct device *start, const void *data,
- int (*match)(struct device *dev, const void *data))
+ device_match_t match)
{
struct subsys_private *sp = bus_to_subsys(bus);
struct klist_iter i;
@@ -920,6 +922,8 @@
bus_remove_file(bus, &bus_attr_uevent);
bus_uevent_fail:
kset_unregister(&priv->subsys);
+ /* Above kset_unregister() will kfree @priv */
+ priv = NULL;
out:
kfree(priv);
return retval;
@@ -1294,7 +1298,7 @@
{
struct kobject *virtual_dir;
- virtual_dir = virtual_device_parent(NULL);
+ virtual_dir = virtual_device_parent();
if (!virtual_dir)
return -ENOMEM;
@@ -1385,8 +1389,13 @@
return -ENOMEM;
system_kset = kset_create_and_add("system", NULL, &devices_kset->kobj);
- if (!system_kset)
+ if (!system_kset) {
+ /* Do error handling here as devices_init() do */
+ kset_unregister(bus_kset);
+ bus_kset = NULL;
+ pr_err("%s: failed to create and add kset 'bus'\n", __func__);
return -ENOMEM;
+ }
return 0;
}
diff --git a/drivers/base/class.c b/drivers/base/class.c
index 7b38fdf..cb53592 100644
--- a/drivers/base/class.c
+++ b/drivers/base/class.c
@@ -183,6 +183,17 @@
pr_debug("device class '%s': registering\n", cls->name);
+ if (cls->ns_type && !cls->namespace) {
+ pr_err("%s: class '%s' does not have namespace\n",
+ __func__, cls->name);
+ return -EINVAL;
+ }
+ if (!cls->ns_type && cls->namespace) {
+ pr_err("%s: class '%s' does not have ns_type\n",
+ __func__, cls->name);
+ return -EINVAL;
+ }
+
cp = kzalloc(sizeof(*cp), GFP_KERNEL);
if (!cp)
return -ENOMEM;
@@ -433,8 +444,7 @@
* code. There's no locking restriction.
*/
struct device *class_find_device(const struct class *class, const struct device *start,
- const void *data,
- int (*match)(struct device *, const void *))
+ const void *data, device_match_t match)
{
struct subsys_private *sp = class_to_subsys(class);
struct class_dev_iter iter;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 8c0733d..a4c8534 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -9,29 +9,30 @@
*/
#include <linux/acpi.h>
+#include <linux/blkdev.h>
+#include <linux/cleanup.h>
#include <linux/cpufreq.h>
#include <linux/device.h>
+#include <linux/dma-map-ops.h> /* for dma_default_coherent */
#include <linux/err.h>
#include <linux/fwnode.h>
#include <linux/init.h>
+#include <linux/kdev_t.h>
#include <linux/kstrtox.h>
#include <linux/module.h>
-#include <linux/slab.h>
-#include <linux/kdev_t.h>
+#include <linux/mutex.h>
+#include <linux/netdevice.h>
#include <linux/notifier.h>
#include <linux/of.h>
#include <linux/of_device.h>
-#include <linux/blkdev.h>
-#include <linux/mutex.h>
#include <linux/pm_runtime.h>
-#include <linux/netdevice.h>
#include <linux/rcupdate.h>
-#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
#include <linux/string_helpers.h>
#include <linux/swiotlb.h>
#include <linux/sysfs.h>
-#include <linux/dma-map-ops.h> /* for dma_default_coherent */
#include "base.h"
#include "physical_location.h"
@@ -97,12 +98,9 @@
int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup,
u8 flags)
{
- int ret;
+ guard(mutex)(&fwnode_link_lock);
- mutex_lock(&fwnode_link_lock);
- ret = __fwnode_link_add(con, sup, flags);
- mutex_unlock(&fwnode_link_lock);
- return ret;
+ return __fwnode_link_add(con, sup, flags);
}
/**
@@ -143,10 +141,10 @@
{
struct fwnode_link *link, *tmp;
- mutex_lock(&fwnode_link_lock);
+ guard(mutex)(&fwnode_link_lock);
+
list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook)
__fwnode_link_del(link);
- mutex_unlock(&fwnode_link_lock);
}
/**
@@ -159,10 +157,10 @@
{
struct fwnode_link *link, *tmp;
- mutex_lock(&fwnode_link_lock);
+ guard(mutex)(&fwnode_link_lock);
+
list_for_each_entry_safe(link, tmp, &fwnode->consumers, s_hook)
__fwnode_link_del(link);
- mutex_unlock(&fwnode_link_lock);
}
/**
@@ -563,20 +561,11 @@
static int devlink_add_symlinks(struct device *dev)
{
+ char *buf_con __free(kfree) = NULL, *buf_sup __free(kfree) = NULL;
int ret;
- size_t len;
struct device_link *link = to_devlink(dev);
struct device *sup = link->supplier;
struct device *con = link->consumer;
- char *buf;
-
- len = max(strlen(dev_bus_name(sup)) + strlen(dev_name(sup)),
- strlen(dev_bus_name(con)) + strlen(dev_name(con)));
- len += strlen(":");
- len += strlen("supplier:") + 1;
- buf = kzalloc(len, GFP_KERNEL);
- if (!buf)
- return -ENOMEM;
ret = sysfs_create_link(&link->link_dev.kobj, &sup->kobj, "supplier");
if (ret)
@@ -586,58 +575,64 @@
if (ret)
goto err_con;
- snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
- ret = sysfs_create_link(&sup->kobj, &link->link_dev.kobj, buf);
+ buf_con = kasprintf(GFP_KERNEL, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
+ if (!buf_con) {
+ ret = -ENOMEM;
+ goto err_con_dev;
+ }
+
+ ret = sysfs_create_link(&sup->kobj, &link->link_dev.kobj, buf_con);
if (ret)
goto err_con_dev;
- snprintf(buf, len, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup));
- ret = sysfs_create_link(&con->kobj, &link->link_dev.kobj, buf);
+ buf_sup = kasprintf(GFP_KERNEL, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup));
+ if (!buf_sup) {
+ ret = -ENOMEM;
+ goto err_sup_dev;
+ }
+
+ ret = sysfs_create_link(&con->kobj, &link->link_dev.kobj, buf_sup);
if (ret)
goto err_sup_dev;
goto out;
err_sup_dev:
- snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
- sysfs_remove_link(&sup->kobj, buf);
+ sysfs_remove_link(&sup->kobj, buf_con);
err_con_dev:
sysfs_remove_link(&link->link_dev.kobj, "consumer");
err_con:
sysfs_remove_link(&link->link_dev.kobj, "supplier");
out:
- kfree(buf);
return ret;
}
static void devlink_remove_symlinks(struct device *dev)
{
+ char *buf_con __free(kfree) = NULL, *buf_sup __free(kfree) = NULL;
struct device_link *link = to_devlink(dev);
- size_t len;
struct device *sup = link->supplier;
struct device *con = link->consumer;
- char *buf;
sysfs_remove_link(&link->link_dev.kobj, "consumer");
sysfs_remove_link(&link->link_dev.kobj, "supplier");
- len = max(strlen(dev_bus_name(sup)) + strlen(dev_name(sup)),
- strlen(dev_bus_name(con)) + strlen(dev_name(con)));
- len += strlen(":");
- len += strlen("supplier:") + 1;
- buf = kzalloc(len, GFP_KERNEL);
- if (!buf) {
- WARN(1, "Unable to properly free device link symlinks!\n");
- return;
+ if (device_is_registered(con)) {
+ buf_sup = kasprintf(GFP_KERNEL, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup));
+ if (!buf_sup)
+ goto out;
+ sysfs_remove_link(&con->kobj, buf_sup);
}
- if (device_is_registered(con)) {
- snprintf(buf, len, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup));
- sysfs_remove_link(&con->kobj, buf);
- }
- snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
- sysfs_remove_link(&sup->kobj, buf);
- kfree(buf);
+ buf_con = kasprintf(GFP_KERNEL, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
+ if (!buf_con)
+ goto out;
+ sysfs_remove_link(&sup->kobj, buf_con);
+
+ return;
+
+out:
+ WARN(1, "Unable to properly free device link symlinks!\n");
}
static struct class_interface devlink_class_intf = {
@@ -678,6 +673,9 @@
* @supplier: Supplier end of the link.
* @flags: Link flags.
*
+ * Return: On success, a device_link struct will be returned.
+ * On error or invalid flag settings, NULL will be returned.
+ *
* The caller is responsible for the proper synchronization of the link creation
* with runtime PM. First, setting the DL_FLAG_PM_RUNTIME flag will cause the
* runtime PM framework to take the link into account. Second, if the
@@ -1061,20 +1059,16 @@
* Device waiting for supplier to become available is not allowed to
* probe.
*/
- mutex_lock(&fwnode_link_lock);
- sup_fw = fwnode_links_check_suppliers(dev->fwnode);
- if (sup_fw) {
- if (!dev_is_best_effort(dev)) {
- fwnode_ret = -EPROBE_DEFER;
- dev_err_probe(dev, -EPROBE_DEFER,
- "wait for supplier %pfwf\n", sup_fw);
- } else {
- fwnode_ret = -EAGAIN;
+ scoped_guard(mutex, &fwnode_link_lock) {
+ sup_fw = fwnode_links_check_suppliers(dev->fwnode);
+ if (sup_fw) {
+ if (dev_is_best_effort(dev))
+ fwnode_ret = -EAGAIN;
+ else
+ return dev_err_probe(dev, -EPROBE_DEFER,
+ "wait for supplier %pfwf\n", sup_fw);
}
}
- mutex_unlock(&fwnode_link_lock);
- if (fwnode_ret == -EPROBE_DEFER)
- return fwnode_ret;
device_links_write_lock();
@@ -1093,10 +1087,8 @@
}
device_links_missing_supplier(dev);
- dev_err_probe(dev, -EPROBE_DEFER,
- "supplier %s not ready\n",
- dev_name(link->supplier));
- ret = -EPROBE_DEFER;
+ ret = dev_err_probe(dev, -EPROBE_DEFER,
+ "supplier %s not ready\n", dev_name(link->supplier));
break;
}
WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
@@ -1249,9 +1241,8 @@
bool val;
device_lock(dev);
- mutex_lock(&fwnode_link_lock);
- val = !!fwnode_links_check_suppliers(dev->fwnode);
- mutex_unlock(&fwnode_link_lock);
+ scoped_guard(mutex, &fwnode_link_lock)
+ val = !!fwnode_links_check_suppliers(dev->fwnode);
device_unlock(dev);
return sysfs_emit(buf, "%u\n", val);
}
@@ -1324,13 +1315,15 @@
*/
if (dev->fwnode && dev->fwnode->dev == dev) {
struct fwnode_handle *child;
+
fwnode_links_purge_suppliers(dev->fwnode);
- mutex_lock(&fwnode_link_lock);
+
+ guard(mutex)(&fwnode_link_lock);
+
fwnode_for_each_available_child_node(dev->fwnode, child)
__fw_devlink_pickup_dangling_consumers(child,
dev->fwnode);
__fw_devlink_link_to_consumers(dev);
- mutex_unlock(&fwnode_link_lock);
}
device_remove_file(dev, &dev_attr_waiting_for_supplier);
@@ -2339,10 +2332,10 @@
fw_devlink_parse_fwtree(fwnode);
- mutex_lock(&fwnode_link_lock);
+ guard(mutex)(&fwnode_link_lock);
+
__fw_devlink_link_to_consumers(dev);
__fw_devlink_link_to_suppliers(dev, fwnode);
- mutex_unlock(&fwnode_link_lock);
}
/* Device links support end. */
@@ -2591,7 +2584,7 @@
const struct device *dev = kobj_to_dev(kobj);
const void *ns = NULL;
- if (dev->class && dev->class->ns_type)
+ if (dev->class && dev->class->namespace)
ns = dev->class->namespace(dev);
return ns;
@@ -3170,7 +3163,7 @@
}
EXPORT_SYMBOL_GPL(device_initialize);
-struct kobject *virtual_device_parent(struct device *dev)
+struct kobject *virtual_device_parent(void)
{
static struct kobject *virtual_dir = NULL;
@@ -3248,7 +3241,7 @@
* in a "glue" directory to prevent namespace collisions.
*/
if (parent == NULL)
- parent_kobj = virtual_device_parent(dev);
+ parent_kobj = virtual_device_parent();
else if (parent->class && !dev->class->ns_type) {
subsys_put(sp);
return &parent->kobj;
@@ -4003,7 +3996,7 @@
struct device *child;
int error = 0;
- if (!parent->p)
+ if (!parent || !parent->p)
return 0;
klist_iter_init(&parent->p->klist_children, &i);
@@ -4033,7 +4026,7 @@
struct device *child;
int error = 0;
- if (!parent->p)
+ if (!parent || !parent->p)
return 0;
klist_iter_init(&parent->p->klist_children, &i);
@@ -4067,7 +4060,7 @@
struct klist_iter i;
struct device *child;
- if (!parent)
+ if (!parent || !parent->p)
return NULL;
klist_iter_init(&parent->p->klist_children, &i);
@@ -4515,9 +4508,11 @@
*/
int device_rename(struct device *dev, const char *new_name)
{
+ struct subsys_private *sp = NULL;
struct kobject *kobj = &dev->kobj;
char *old_device_name = NULL;
int error;
+ bool is_link_renamed = false;
dev = get_device(dev);
if (!dev)
@@ -4532,7 +4527,7 @@
}
if (dev->class) {
- struct subsys_private *sp = class_to_subsys(dev->class);
+ sp = class_to_subsys(dev->class);
if (!sp) {
error = -EINVAL;
@@ -4541,16 +4536,19 @@
error = sysfs_rename_link_ns(&sp->subsys.kobj, kobj, old_device_name,
new_name, kobject_namespace(kobj));
- subsys_put(sp);
if (error)
goto out;
+
+ is_link_renamed = true;
}
error = kobject_rename(kobj, new_name);
- if (error)
- goto out;
-
out:
+ if (error && is_link_renamed)
+ sysfs_rename_link_ns(&sp->subsys.kobj, kobj, new_name,
+ old_device_name, kobject_namespace(kobj));
+ subsys_put(sp);
+
put_device(dev);
kfree(old_device_name);
@@ -4872,7 +4870,7 @@
else
return;
- strscpy(dev_info->subsystem, subsys, sizeof(dev_info->subsystem));
+ strscpy(dev_info->subsystem, subsys);
/*
* Add device identifier DEVICE=:
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9641113..f0e4b4ab 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -248,7 +248,7 @@
list_for_each_entry(curr, &deferred_probe_pending_list, deferred_probe)
seq_printf(s, "%s\t%s", dev_name(curr->device),
- curr->device->p->deferred_probe_reason ?: "\n");
+ curr->deferred_probe_reason ?: "\n");
mutex_unlock(&deferred_probe_mutex);
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index a2ce0ea..2152eec 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -1231,6 +1231,6 @@
* devm_free_pages() does.
*/
WARN_ON(devres_release(dev, devm_percpu_release, devm_percpu_match,
- (__force void *)pdata));
+ (void *)(__force unsigned long)pdata));
}
EXPORT_SYMBOL_GPL(devm_free_percpu);
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 88c6fd1..b4eb5b8 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -150,7 +150,7 @@
*/
struct device *driver_find_device(const struct device_driver *drv,
struct device *start, const void *data,
- int (*match)(struct device *dev, const void *data))
+ device_match_t match)
{
struct klist_iter i;
struct device *dev;
diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index a03ee4b..324a9a3 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -849,6 +849,26 @@
{}
#endif
+/*
+ * Reject firmware file names with ".." path components.
+ * There are drivers that construct firmware file names from device-supplied
+ * strings, and we don't want some device to be able to tell us "I would like to
+ * be sent my firmware from ../../../etc/shadow, please".
+ *
+ * Search for ".." surrounded by either '/' or start/end of string.
+ *
+ * This intentionally only looks at the firmware name, not at the firmware base
+ * directory or at symlink contents.
+ */
+static bool name_contains_dotdot(const char *name)
+{
+ size_t name_len = strlen(name);
+
+ return strcmp(name, "..") == 0 || strncmp(name, "../", 3) == 0 ||
+ strstr(name, "/../") != NULL ||
+ (name_len >= 3 && strcmp(name+name_len-3, "/..") == 0);
+}
+
/* called from request_firmware() and request_firmware_work_func() */
static int
_request_firmware(const struct firmware **firmware_p, const char *name,
@@ -869,6 +889,14 @@
goto out;
}
+ if (name_contains_dotdot(name)) {
+ dev_warn(device,
+ "Firmware load for '%s' refused, path contains '..' component\n",
+ name);
+ ret = -EINVAL;
+ goto out;
+ }
+
ret = _request_firmware_prepare(&fw, name, device, buf, size,
offset, opt_flags);
if (ret <= 0) /* error or already assigned */
@@ -946,6 +974,8 @@
* @name will be used as $FIRMWARE in the uevent environment and
* should be distinctive enough not to be confused with any other
* firmware image for this or any other device.
+ * It must not contain any ".." path components - "foo/bar..bin" is
+ * allowed, but "foo/../bar.bin" is not.
*
* Caller must hold the reference count of @device.
*
diff --git a/drivers/base/module.c b/drivers/base/module.c
index f742ad2..c4eaa11 100644
--- a/drivers/base/module.c
+++ b/drivers/base/module.c
@@ -66,27 +66,31 @@
driver_name = make_driver_name(drv);
if (!driver_name) {
ret = -ENOMEM;
- goto out;
+ goto out_remove_kobj;
}
module_create_drivers_dir(mk);
if (!mk->drivers_dir) {
ret = -EINVAL;
- goto out;
+ goto out_free_driver_name;
}
ret = sysfs_create_link(mk->drivers_dir, &drv->p->kobj, driver_name);
if (ret)
- goto out;
+ goto out_remove_drivers_dir;
kfree(driver_name);
return 0;
-out:
- sysfs_remove_link(&drv->p->kobj, "module");
+
+out_remove_drivers_dir:
sysfs_remove_link(mk->drivers_dir, driver_name);
+
+out_free_driver_name:
kfree(driver_name);
+out_remove_kobj:
+ sysfs_remove_link(&drv->p->kobj, "module");
return ret;
}
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 4c3ee65..6f2a337 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1474,7 +1474,7 @@
USE_PLATFORM_PM_SLEEP_OPS
};
-struct bus_type platform_bus_type = {
+const struct bus_type platform_bus_type = {
.name = "platform",
.dev_groups = platform_dev_groups,
.match = platform_match,
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 11901f2..223faa9 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -2259,14 +2259,12 @@
.owner = THIS_MODULE,
.open = simple_open,
.read = mtip_hw_read_registers,
- .llseek = no_llseek,
};
static const struct file_operations mtip_flags_fops = {
.owner = THIS_MODULE,
.open = simple_open,
.read = mtip_hw_read_flags,
- .llseek = no_llseek,
};
static void mtip_hw_debugfs_init(struct driver_data *dd)
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 3edb37a..499c110 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2835,7 +2835,6 @@
.compat_ioctl = pkt_ctl_compat_ioctl,
#endif
.owner = THIS_MODULE,
- .llseek = no_llseek,
};
static struct miscdevice pkt_misc = {
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index bca06bf..a6c8e5c 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1983,7 +1983,6 @@
.owner = THIS_MODULE,
.open = ublk_ch_open,
.release = ublk_ch_release,
- .llseek = no_llseek,
.read_iter = ublk_ch_read_iter,
.write_iter = ublk_ch_write_iter,
.uring_cmd = ublk_ch_uring_cmd,
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index c3d2456..ad9c9bc 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2115,8 +2115,10 @@
zram->num_active_comps--;
}
- for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
- kfree(zram->comp_algs[prio]);
+ for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
+ /* Do not free statically defined compression algorithms */
+ if (zram->comp_algs[prio] != default_compressor)
+ kfree(zram->comp_algs[prio]);
zram->comp_algs[prio] = NULL;
}
diff --git a/drivers/bluetooth/hci_vhci.c b/drivers/bluetooth/hci_vhci.c
index 43e9ac5..aa6af351 100644
--- a/drivers/bluetooth/hci_vhci.c
+++ b/drivers/bluetooth/hci_vhci.c
@@ -679,7 +679,6 @@
.poll = vhci_poll,
.open = vhci_open,
.release = vhci_release,
- .llseek = no_llseek,
};
static struct miscdevice vhci_miscdev = {
diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index dd68b819..930d8a3 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -309,7 +309,7 @@
ATTRIBUTE_GROUPS(fsl_mc_bus);
-struct bus_type fsl_mc_bus_type = {
+const struct bus_type fsl_mc_bus_type = {
.name = "fsl-mc",
.match = fsl_mc_bus_match,
.uevent = fsl_mc_bus_uevent,
diff --git a/drivers/bus/moxtet.c b/drivers/bus/moxtet.c
index 8412406..6276551 100644
--- a/drivers/bus/moxtet.c
+++ b/drivers/bus/moxtet.c
@@ -484,7 +484,6 @@
.owner = THIS_MODULE,
.open = moxtet_debug_open,
.read = input_read,
- .llseek = no_llseek,
};
static ssize_t output_read(struct file *file, char __user *buf, size_t len,
@@ -549,7 +548,6 @@
.open = moxtet_debug_open,
.read = output_read,
.write = output_write,
- .llseek = no_llseek,
};
static int moxtet_register_debugfs(struct moxtet *moxtet)
diff --git a/drivers/char/applicom.c b/drivers/char/applicom.c
index 6931453..9fed970 100644
--- a/drivers/char/applicom.c
+++ b/drivers/char/applicom.c
@@ -111,7 +111,6 @@
static const struct file_operations ac_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.read = ac_read,
.write = ac_write,
.unlocked_ioctl = ac_ioctl,
diff --git a/drivers/char/ds1620.c b/drivers/char/ds1620.c
index a4f4291..44a1cdb 100644
--- a/drivers/char/ds1620.c
+++ b/drivers/char/ds1620.c
@@ -353,7 +353,6 @@
.open = ds1620_open,
.read = ds1620_read,
.unlocked_ioctl = ds1620_unlocked_ioctl,
- .llseek = no_llseek,
};
static struct miscdevice ds1620_miscdev = {
diff --git a/drivers/char/dtlk.c b/drivers/char/dtlk.c
index 5a1a733..27f5f9d 100644
--- a/drivers/char/dtlk.c
+++ b/drivers/char/dtlk.c
@@ -107,7 +107,6 @@
.unlocked_ioctl = dtlk_ioctl,
.open = dtlk_open,
.release = dtlk_release,
- .llseek = no_llseek,
};
/* local prototypes */
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 3dadc4a..e904e47 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -700,7 +700,6 @@
static const struct file_operations hpet_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.read = hpet_read,
.poll = hpet_poll,
.unlocked_ioctl = hpet_ioctl,
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index 9a45925..335eea8 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -903,7 +903,6 @@
.open = ipmi_open,
.release = ipmi_close,
.fasync = ipmi_fasync,
- .llseek = no_llseek,
};
static struct miscdevice ipmi_wdog_miscdev = {
diff --git a/drivers/char/pc8736x_gpio.c b/drivers/char/pc8736x_gpio.c
index c39a836..5f46968 100644
--- a/drivers/char/pc8736x_gpio.c
+++ b/drivers/char/pc8736x_gpio.c
@@ -235,7 +235,6 @@
.open = pc8736x_gpio_open,
.write = nsc_gpio_write,
.read = nsc_gpio_read,
- .llseek = no_llseek,
};
static void __init pc8736x_init_shadow(void)
diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c
index eaff98d..d1dfbd8 100644
--- a/drivers/char/ppdev.c
+++ b/drivers/char/ppdev.c
@@ -786,7 +786,6 @@
static const struct file_operations pp_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.read = pp_read,
.write = pp_write,
.poll = pp_poll,
diff --git a/drivers/char/scx200_gpio.c b/drivers/char/scx200_gpio.c
index 9f701dc..700e6af 100644
--- a/drivers/char/scx200_gpio.c
+++ b/drivers/char/scx200_gpio.c
@@ -68,7 +68,6 @@
.read = nsc_gpio_read,
.open = scx200_gpio_open,
.release = scx200_gpio_release,
- .llseek = no_llseek,
};
static struct cdev scx200_gpio_cdev; /* use 1 cdev for all pins */
diff --git a/drivers/char/sonypi.c b/drivers/char/sonypi.c
index bb5115b..0f8185e 100644
--- a/drivers/char/sonypi.c
+++ b/drivers/char/sonypi.c
@@ -1054,7 +1054,6 @@
.release = sonypi_misc_release,
.fasync = sonypi_misc_fasync,
.unlocked_ioctl = sonypi_misc_ioctl,
- .llseek = no_llseek,
};
static struct miscdevice sonypi_misc_device = {
diff --git a/drivers/char/tpm/tpm-dev.c b/drivers/char/tpm/tpm-dev.c
index e2c0baa..97c94b5 100644
--- a/drivers/char/tpm/tpm-dev.c
+++ b/drivers/char/tpm/tpm-dev.c
@@ -59,7 +59,6 @@
const struct file_operations tpm_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.open = tpm_open,
.read = tpm_common_read,
.write = tpm_common_write,
diff --git a/drivers/char/tpm/tpm_vtpm_proxy.c b/drivers/char/tpm/tpm_vtpm_proxy.c
index 11c5020..8fe4a01 100644
--- a/drivers/char/tpm/tpm_vtpm_proxy.c
+++ b/drivers/char/tpm/tpm_vtpm_proxy.c
@@ -243,7 +243,6 @@
static const struct file_operations vtpm_proxy_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.read = vtpm_proxy_fops_read,
.write = vtpm_proxy_fops_write,
.poll = vtpm_proxy_fops_poll,
diff --git a/drivers/char/tpm/tpmrm-dev.c b/drivers/char/tpm/tpmrm-dev.c
index eef0fb0..c25df7e 100644
--- a/drivers/char/tpm/tpmrm-dev.c
+++ b/drivers/char/tpm/tpmrm-dev.c
@@ -46,7 +46,6 @@
const struct file_operations tpmrm_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.open = tpmrm_open,
.read = tpm_common_read,
.write = tpm_common_write,
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index de7d720..99a7f24 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1093,7 +1093,6 @@
.poll = port_fops_poll,
.release = port_fops_release,
.fasync = port_fops_fasync,
- .llseek = no_llseek,
};
/*
diff --git a/drivers/counter/counter-chrdev.c b/drivers/counter/counter-chrdev.c
index afc94d0..3ee75e1 100644
--- a/drivers/counter/counter-chrdev.c
+++ b/drivers/counter/counter-chrdev.c
@@ -454,7 +454,6 @@
static const struct file_operations counter_fops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.read = counter_chrdev_read,
.poll = counter_chrdev_poll,
.unlocked_ioctl = counter_chrdev_ioctl,
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index bb83867..ef1621d 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -9,13 +9,12 @@
#include "cxlmem.h"
#include "core.h"
#include "cxl.h"
-#include "core.h"
struct dsmas_entry {
struct range dpa_range;
u8 handle;
struct access_coordinate coord[ACCESS_COORDINATE_MAX];
-
+ struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
int entries;
int qos_class;
};
@@ -163,7 +162,7 @@
val = cdat_normalize(le16_to_cpu(le_val), le64_to_cpu(le_base),
dslbis->data_type);
- cxl_access_coordinate_set(dent->coord, dslbis->data_type, val);
+ cxl_access_coordinate_set(dent->cdat_coord, dslbis->data_type, val);
return 0;
}
@@ -220,7 +219,7 @@
xa_for_each(dsmas_xa, index, dent) {
int qos_class;
- cxl_coordinates_combine(dent->coord, dent->coord, ep_c);
+ cxl_coordinates_combine(dent->coord, dent->cdat_coord, ep_c);
dent->entries = 1;
rc = cxl_root->ops->qos_class(cxl_root,
&dent->coord[ACCESS_COORDINATE_CPU],
@@ -241,8 +240,10 @@
static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
struct cxl_dpa_perf *dpa_perf)
{
- for (int i = 0; i < ACCESS_COORDINATE_MAX; i++)
+ for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
dpa_perf->coord[i] = dent->coord[i];
+ dpa_perf->cdat_coord[i] = dent->cdat_coord[i];
+ }
dpa_perf->dpa_range = dent->dpa_range;
dpa_perf->qos_class = dent->qos_class;
dev_dbg(dev,
@@ -546,19 +547,37 @@
MODULE_IMPORT_NS(CXL);
-void cxl_region_perf_data_calculate(struct cxl_region *cxlr,
- struct cxl_endpoint_decoder *cxled)
+static void cxl_bandwidth_add(struct access_coordinate *coord,
+ struct access_coordinate *c1,
+ struct access_coordinate *c2)
+{
+ for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
+ coord[i].read_bandwidth = c1[i].read_bandwidth +
+ c2[i].read_bandwidth;
+ coord[i].write_bandwidth = c1[i].write_bandwidth +
+ c2[i].write_bandwidth;
+ }
+}
+
+static bool dpa_perf_contains(struct cxl_dpa_perf *perf,
+ struct resource *dpa_res)
+{
+ struct range dpa = {
+ .start = dpa_res->start,
+ .end = dpa_res->end,
+ };
+
+ return range_contains(&perf->dpa_range, &dpa);
+}
+
+static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxled,
+ enum cxl_decoder_mode mode)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- struct range dpa = {
- .start = cxled->dpa_res->start,
- .end = cxled->dpa_res->end,
- };
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
struct cxl_dpa_perf *perf;
- switch (cxlr->mode) {
+ switch (mode) {
case CXL_DECODER_RAM:
perf = &mds->ram_perf;
break;
@@ -566,12 +585,473 @@
perf = &mds->pmem_perf;
break;
default:
- return;
+ return ERR_PTR(-EINVAL);
}
+ if (!dpa_perf_contains(perf, cxled->dpa_res))
+ return ERR_PTR(-EINVAL);
+
+ return perf;
+}
+
+/*
+ * Transient context for containing the current calculation of bandwidth when
+ * doing walking the port hierarchy to deal with shared upstream link.
+ */
+struct cxl_perf_ctx {
+ struct access_coordinate coord[ACCESS_COORDINATE_MAX];
+ struct cxl_port *port;
+};
+
+/**
+ * cxl_endpoint_gather_bandwidth - collect all the endpoint bandwidth in an xarray
+ * @cxlr: CXL region for the bandwidth calculation
+ * @cxled: endpoint decoder to start on
+ * @usp_xa: (output) the xarray that collects all the bandwidth coordinates
+ * indexed by the upstream device with data of 'struct cxl_perf_ctx'.
+ * @gp_is_root: (output) bool of whether the grandparent is cxl root.
+ *
+ * Return: 0 for success or -errno
+ *
+ * Collects aggregated endpoint bandwidth and store the bandwidth in
+ * an xarray indexed by the upstream device of the switch or the RP
+ * device. Each endpoint consists the minimum of the bandwidth from DSLBIS
+ * from the endpoint CDAT, the endpoint upstream link bandwidth, and the
+ * bandwidth from the SSLBIS of the switch CDAT for the switch upstream port to
+ * the downstream port that's associated with the endpoint. If the
+ * device is directly connected to a RP, then no SSLBIS is involved.
+ */
+static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr,
+ struct cxl_endpoint_decoder *cxled,
+ struct xarray *usp_xa,
+ bool *gp_is_root)
+{
+ struct cxl_port *endpoint = to_cxl_port(cxled->cxld.dev.parent);
+ struct cxl_port *parent_port = to_cxl_port(endpoint->dev.parent);
+ struct cxl_port *gp_port = to_cxl_port(parent_port->dev.parent);
+ struct access_coordinate pci_coord[ACCESS_COORDINATE_MAX];
+ struct access_coordinate sw_coord[ACCESS_COORDINATE_MAX];
+ struct access_coordinate ep_coord[ACCESS_COORDINATE_MAX];
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+ struct cxl_perf_ctx *perf_ctx;
+ struct cxl_dpa_perf *perf;
+ unsigned long index;
+ void *ptr;
+ int rc;
+
+ if (cxlds->rcd)
+ return -ENODEV;
+
+ perf = cxled_get_dpa_perf(cxled, cxlr->mode);
+ if (IS_ERR(perf))
+ return PTR_ERR(perf);
+
+ gp_port = to_cxl_port(parent_port->dev.parent);
+ *gp_is_root = is_cxl_root(gp_port);
+
+ /*
+ * If the grandparent is cxl root, then index is the root port,
+ * otherwise it's the parent switch upstream device.
+ */
+ if (*gp_is_root)
+ index = (unsigned long)endpoint->parent_dport->dport_dev;
+ else
+ index = (unsigned long)parent_port->uport_dev;
+
+ perf_ctx = xa_load(usp_xa, index);
+ if (!perf_ctx) {
+ struct cxl_perf_ctx *c __free(kfree) =
+ kzalloc(sizeof(*perf_ctx), GFP_KERNEL);
+
+ if (!c)
+ return -ENOMEM;
+ ptr = xa_store(usp_xa, index, c, GFP_KERNEL);
+ if (xa_is_err(ptr))
+ return xa_err(ptr);
+ perf_ctx = no_free_ptr(c);
+ perf_ctx->port = parent_port;
+ }
+
+ /* Direct upstream link from EP bandwidth */
+ rc = cxl_pci_get_bandwidth(pdev, pci_coord);
+ if (rc < 0)
+ return rc;
+
+ /*
+ * Min of upstream link bandwidth and Endpoint CDAT bandwidth from
+ * DSLBIS.
+ */
+ cxl_coordinates_combine(ep_coord, pci_coord, perf->cdat_coord);
+
+ /*
+ * If grandparent port is root, then there's no switch involved and
+ * the endpoint is connected to a root port.
+ */
+ if (!*gp_is_root) {
+ /*
+ * Retrieve the switch SSLBIS for switch downstream port
+ * associated with the endpoint bandwidth.
+ */
+ rc = cxl_port_get_switch_dport_bandwidth(endpoint, sw_coord);
+ if (rc)
+ return rc;
+
+ /*
+ * Min of the earlier coordinates with the switch SSLBIS
+ * bandwidth
+ */
+ cxl_coordinates_combine(ep_coord, ep_coord, sw_coord);
+ }
+
+ /*
+ * Aggregate the computed bandwidth with the current aggregated bandwidth
+ * of the endpoints with the same switch upstream device or RP.
+ */
+ cxl_bandwidth_add(perf_ctx->coord, perf_ctx->coord, ep_coord);
+
+ return 0;
+}
+
+static void free_perf_xa(struct xarray *xa)
+{
+ struct cxl_perf_ctx *ctx;
+ unsigned long index;
+
+ if (!xa)
+ return;
+
+ xa_for_each(xa, index, ctx)
+ kfree(ctx);
+ xa_destroy(xa);
+ kfree(xa);
+}
+DEFINE_FREE(free_perf_xa, struct xarray *, if (_T) free_perf_xa(_T))
+
+/**
+ * cxl_switch_gather_bandwidth - collect all the bandwidth at switch level in an xarray
+ * @cxlr: The region being operated on
+ * @input_xa: xarray indexed by upstream device of a switch with data of 'struct
+ * cxl_perf_ctx'
+ * @gp_is_root: (output) bool of whether the grandparent is cxl root.
+ *
+ * Return: a xarray of resulting cxl_perf_ctx per parent switch or root port
+ * or ERR_PTR(-errno)
+ *
+ * Iterate through the xarray. Take the minimum of the downstream calculated
+ * bandwidth, the upstream link bandwidth, and the SSLBIS of the upstream
+ * switch if exists. Sum the resulting bandwidth under the switch upstream
+ * device or a RP device. The function can be iterated over multiple switches
+ * if the switches are present.
+ */
+static struct xarray *cxl_switch_gather_bandwidth(struct cxl_region *cxlr,
+ struct xarray *input_xa,
+ bool *gp_is_root)
+{
+ struct xarray *res_xa __free(free_perf_xa) =
+ kzalloc(sizeof(*res_xa), GFP_KERNEL);
+ struct access_coordinate coords[ACCESS_COORDINATE_MAX];
+ struct cxl_perf_ctx *ctx, *us_ctx;
+ unsigned long index, us_index;
+ int dev_count = 0;
+ int gp_count = 0;
+ void *ptr;
+ int rc;
+
+ if (!res_xa)
+ return ERR_PTR(-ENOMEM);
+ xa_init(res_xa);
+
+ xa_for_each(input_xa, index, ctx) {
+ struct device *dev = (struct device *)index;
+ struct cxl_port *port = ctx->port;
+ struct cxl_port *parent_port = to_cxl_port(port->dev.parent);
+ struct cxl_port *gp_port = to_cxl_port(parent_port->dev.parent);
+ struct cxl_dport *dport = port->parent_dport;
+ bool is_root = false;
+
+ dev_count++;
+ if (is_cxl_root(gp_port)) {
+ is_root = true;
+ gp_count++;
+ }
+
+ /*
+ * If the grandparent is cxl root, then index is the root port,
+ * otherwise it's the parent switch upstream device.
+ */
+ if (is_root)
+ us_index = (unsigned long)port->parent_dport->dport_dev;
+ else
+ us_index = (unsigned long)parent_port->uport_dev;
+
+ us_ctx = xa_load(res_xa, us_index);
+ if (!us_ctx) {
+ struct cxl_perf_ctx *n __free(kfree) =
+ kzalloc(sizeof(*n), GFP_KERNEL);
+
+ if (!n)
+ return ERR_PTR(-ENOMEM);
+
+ ptr = xa_store(res_xa, us_index, n, GFP_KERNEL);
+ if (xa_is_err(ptr))
+ return ERR_PTR(xa_err(ptr));
+ us_ctx = no_free_ptr(n);
+ us_ctx->port = parent_port;
+ }
+
+ /*
+ * If the device isn't an upstream PCIe port, there's something
+ * wrong with the topology.
+ */
+ if (!dev_is_pci(dev))
+ return ERR_PTR(-EINVAL);
+
+ /* Retrieve the upstream link bandwidth */
+ rc = cxl_pci_get_bandwidth(to_pci_dev(dev), coords);
+ if (rc)
+ return ERR_PTR(-ENXIO);
+
+ /*
+ * Take the min of downstream bandwidth and the upstream link
+ * bandwidth.
+ */
+ cxl_coordinates_combine(coords, coords, ctx->coord);
+
+ /*
+ * Take the min of the calculated bandwdith and the upstream
+ * switch SSLBIS bandwidth if there's a parent switch
+ */
+ if (!is_root)
+ cxl_coordinates_combine(coords, coords, dport->coord);
+
+ /*
+ * Aggregate the calculated bandwidth common to an upstream
+ * switch.
+ */
+ cxl_bandwidth_add(us_ctx->coord, us_ctx->coord, coords);
+ }
+
+ /* Asymmetric topology detected. */
+ if (gp_count) {
+ if (gp_count != dev_count) {
+ dev_dbg(&cxlr->dev,
+ "Asymmetric hierarchy detected, bandwidth not updated\n");
+ return ERR_PTR(-EOPNOTSUPP);
+ }
+ *gp_is_root = true;
+ }
+
+ return no_free_ptr(res_xa);
+}
+
+/**
+ * cxl_rp_gather_bandwidth - handle the root port level bandwidth collection
+ * @xa: the xarray that holds the cxl_perf_ctx that has the bandwidth calculated
+ * below each root port device.
+ *
+ * Return: xarray that holds cxl_perf_ctx per host bridge or ERR_PTR(-errno)
+ */
+static struct xarray *cxl_rp_gather_bandwidth(struct xarray *xa)
+{
+ struct xarray *hb_xa __free(free_perf_xa) =
+ kzalloc(sizeof(*hb_xa), GFP_KERNEL);
+ struct cxl_perf_ctx *ctx;
+ unsigned long index;
+
+ if (!hb_xa)
+ return ERR_PTR(-ENOMEM);
+ xa_init(hb_xa);
+
+ xa_for_each(xa, index, ctx) {
+ struct cxl_port *port = ctx->port;
+ unsigned long hb_index = (unsigned long)port->uport_dev;
+ struct cxl_perf_ctx *hb_ctx;
+ void *ptr;
+
+ hb_ctx = xa_load(hb_xa, hb_index);
+ if (!hb_ctx) {
+ struct cxl_perf_ctx *n __free(kfree) =
+ kzalloc(sizeof(*n), GFP_KERNEL);
+
+ if (!n)
+ return ERR_PTR(-ENOMEM);
+ ptr = xa_store(hb_xa, hb_index, n, GFP_KERNEL);
+ if (xa_is_err(ptr))
+ return ERR_PTR(xa_err(ptr));
+ hb_ctx = no_free_ptr(n);
+ hb_ctx->port = port;
+ }
+
+ cxl_bandwidth_add(hb_ctx->coord, hb_ctx->coord, ctx->coord);
+ }
+
+ return no_free_ptr(hb_xa);
+}
+
+/**
+ * cxl_hb_gather_bandwidth - handle the host bridge level bandwidth collection
+ * @xa: the xarray that holds the cxl_perf_ctx that has the bandwidth calculated
+ * below each host bridge.
+ *
+ * Return: xarray that holds cxl_perf_ctx per ACPI0017 device or ERR_PTR(-errno)
+ */
+static struct xarray *cxl_hb_gather_bandwidth(struct xarray *xa)
+{
+ struct xarray *mw_xa __free(free_perf_xa) =
+ kzalloc(sizeof(*mw_xa), GFP_KERNEL);
+ struct cxl_perf_ctx *ctx;
+ unsigned long index;
+
+ if (!mw_xa)
+ return ERR_PTR(-ENOMEM);
+ xa_init(mw_xa);
+
+ xa_for_each(xa, index, ctx) {
+ struct cxl_port *port = ctx->port;
+ struct cxl_port *parent_port;
+ struct cxl_perf_ctx *mw_ctx;
+ struct cxl_dport *dport;
+ unsigned long mw_index;
+ void *ptr;
+
+ parent_port = to_cxl_port(port->dev.parent);
+ mw_index = (unsigned long)parent_port->uport_dev;
+
+ mw_ctx = xa_load(mw_xa, mw_index);
+ if (!mw_ctx) {
+ struct cxl_perf_ctx *n __free(kfree) =
+ kzalloc(sizeof(*n), GFP_KERNEL);
+
+ if (!n)
+ return ERR_PTR(-ENOMEM);
+ ptr = xa_store(mw_xa, mw_index, n, GFP_KERNEL);
+ if (xa_is_err(ptr))
+ return ERR_PTR(xa_err(ptr));
+ mw_ctx = no_free_ptr(n);
+ }
+
+ dport = port->parent_dport;
+ cxl_coordinates_combine(ctx->coord, ctx->coord, dport->coord);
+ cxl_bandwidth_add(mw_ctx->coord, mw_ctx->coord, ctx->coord);
+ }
+
+ return no_free_ptr(mw_xa);
+}
+
+/**
+ * cxl_region_update_bandwidth - Update the bandwidth access coordinates of a region
+ * @cxlr: The region being operated on
+ * @input_xa: xarray holds cxl_perf_ctx wht calculated bandwidth per ACPI0017 instance
+ */
+static void cxl_region_update_bandwidth(struct cxl_region *cxlr,
+ struct xarray *input_xa)
+{
+ struct access_coordinate coord[ACCESS_COORDINATE_MAX];
+ struct cxl_perf_ctx *ctx;
+ unsigned long index;
+
+ memset(coord, 0, sizeof(coord));
+ xa_for_each(input_xa, index, ctx)
+ cxl_bandwidth_add(coord, coord, ctx->coord);
+
+ for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
+ cxlr->coord[i].read_bandwidth = coord[i].read_bandwidth;
+ cxlr->coord[i].write_bandwidth = coord[i].write_bandwidth;
+ }
+}
+
+/**
+ * cxl_region_shared_upstream_bandwidth_update - Recalculate the bandwidth for
+ * the region
+ * @cxlr: the cxl region to recalculate
+ *
+ * The function walks the topology from bottom up and calculates the bandwidth. It
+ * starts at the endpoints, processes at the switches if any, processes at the rootport
+ * level, at the host bridge level, and finally aggregates at the region.
+ */
+void cxl_region_shared_upstream_bandwidth_update(struct cxl_region *cxlr)
+{
+ struct xarray *working_xa;
+ int root_count = 0;
+ bool is_root;
+ int rc;
+
lockdep_assert_held(&cxl_dpa_rwsem);
- if (!range_contains(&perf->dpa_range, &dpa))
+ struct xarray *usp_xa __free(free_perf_xa) =
+ kzalloc(sizeof(*usp_xa), GFP_KERNEL);
+
+ if (!usp_xa)
+ return;
+
+ xa_init(usp_xa);
+
+ /* Collect bandwidth data from all the endpoints. */
+ for (int i = 0; i < cxlr->params.nr_targets; i++) {
+ struct cxl_endpoint_decoder *cxled = cxlr->params.targets[i];
+
+ is_root = false;
+ rc = cxl_endpoint_gather_bandwidth(cxlr, cxled, usp_xa, &is_root);
+ if (rc)
+ return;
+ root_count += is_root;
+ }
+
+ /* Detect asymmetric hierarchy with some direct attached endpoints. */
+ if (root_count && root_count != cxlr->params.nr_targets) {
+ dev_dbg(&cxlr->dev,
+ "Asymmetric hierarchy detected, bandwidth not updated\n");
+ return;
+ }
+
+ /*
+ * Walk up one or more switches to deal with the bandwidth of the
+ * switches if they exist. Endpoints directly attached to RPs skip
+ * over this part.
+ */
+ if (!root_count) {
+ do {
+ working_xa = cxl_switch_gather_bandwidth(cxlr, usp_xa,
+ &is_root);
+ if (IS_ERR(working_xa))
+ return;
+ free_perf_xa(usp_xa);
+ usp_xa = working_xa;
+ } while (!is_root);
+ }
+
+ /* Handle the bandwidth at the root port of the hierarchy */
+ working_xa = cxl_rp_gather_bandwidth(usp_xa);
+ if (IS_ERR(working_xa))
+ return;
+ free_perf_xa(usp_xa);
+ usp_xa = working_xa;
+
+ /* Handle the bandwidth at the host bridge of the hierarchy */
+ working_xa = cxl_hb_gather_bandwidth(usp_xa);
+ if (IS_ERR(working_xa))
+ return;
+ free_perf_xa(usp_xa);
+ usp_xa = working_xa;
+
+ /*
+ * Aggregate all the bandwidth collected per CFMWS (ACPI0017) and
+ * update the region bandwidth with the final calculated values.
+ */
+ cxl_region_update_bandwidth(cxlr, usp_xa);
+}
+
+void cxl_region_perf_data_calculate(struct cxl_region *cxlr,
+ struct cxl_endpoint_decoder *cxled)
+{
+ struct cxl_dpa_perf *perf;
+
+ lockdep_assert_held(&cxl_dpa_rwsem);
+
+ perf = cxled_get_dpa_perf(cxled, cxlr->mode);
+ if (IS_ERR(perf))
return;
for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 72a506c..0c62b40 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -103,9 +103,11 @@
};
long cxl_pci_get_latency(struct pci_dev *pdev);
-
+int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c);
int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr,
enum access_coordinate_class access);
bool cxl_need_node_perf_attrs_update(int nid);
+int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
+ struct access_coordinate *c);
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index e5cdeaf..946f8e44 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -225,7 +225,7 @@
/**
* cxl_internal_send_cmd() - Kernel internal interface to send a mailbox command
- * @mds: The driver data for the operation
+ * @cxl_mbox: CXL mailbox context
* @mbox_cmd: initialized command to execute
*
* Context: Any context.
@@ -241,19 +241,19 @@
* error. While this distinction can be useful for commands from userspace, the
* kernel will only be able to use results when both are successful.
*/
-int cxl_internal_send_cmd(struct cxl_memdev_state *mds,
+int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
struct cxl_mbox_cmd *mbox_cmd)
{
size_t out_size, min_out;
int rc;
- if (mbox_cmd->size_in > mds->payload_size ||
- mbox_cmd->size_out > mds->payload_size)
+ if (mbox_cmd->size_in > cxl_mbox->payload_size ||
+ mbox_cmd->size_out > cxl_mbox->payload_size)
return -E2BIG;
out_size = mbox_cmd->size_out;
min_out = mbox_cmd->min_out;
- rc = mds->mbox_send(mds, mbox_cmd);
+ rc = cxl_mbox->mbox_send(cxl_mbox, mbox_cmd);
/*
* EIO is reserved for a payload size mismatch and mbox_send()
* may not return this error.
@@ -353,6 +353,7 @@
struct cxl_memdev_state *mds, u16 opcode,
size_t in_size, size_t out_size, u64 in_payload)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
*mbox = (struct cxl_mbox_cmd) {
.opcode = opcode,
.size_in = in_size,
@@ -374,7 +375,7 @@
/* Prepare to handle a full payload for variable sized output */
if (out_size == CXL_VARIABLE_PAYLOAD)
- mbox->size_out = mds->payload_size;
+ mbox->size_out = cxl_mbox->payload_size;
else
mbox->size_out = out_size;
@@ -398,6 +399,8 @@
const struct cxl_send_command *send_cmd,
struct cxl_memdev_state *mds)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
if (send_cmd->raw.rsvd)
return -EINVAL;
@@ -406,7 +409,7 @@
* gets passed along without further checking, so it must be
* validated here.
*/
- if (send_cmd->out.size > mds->payload_size)
+ if (send_cmd->out.size > cxl_mbox->payload_size)
return -EINVAL;
if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))
@@ -494,6 +497,7 @@
struct cxl_memdev_state *mds,
const struct cxl_send_command *send_cmd)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mem_command mem_cmd;
int rc;
@@ -505,7 +509,7 @@
* supports, but output can be arbitrarily large (simply write out as
* much data as the hardware provides).
*/
- if (send_cmd->in.size > mds->payload_size)
+ if (send_cmd->in.size > cxl_mbox->payload_size)
return -EINVAL;
/* Sanitize and construct a cxl_mem_command */
@@ -542,7 +546,7 @@
return put_user(ARRAY_SIZE(cxl_mem_commands), &q->n_commands);
/*
- * otherwise, return max(n_commands, total commands) cxl_command_info
+ * otherwise, return min(n_commands, total commands) cxl_command_info
* structures.
*/
cxl_for_each_cmd(cmd) {
@@ -591,6 +595,7 @@
u64 out_payload, s32 *size_out,
u32 *retval)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct device *dev = mds->cxlds.dev;
int rc;
@@ -601,7 +606,7 @@
cxl_mem_opcode_to_name(mbox_cmd->opcode),
mbox_cmd->opcode, mbox_cmd->size_in);
- rc = mds->mbox_send(mds, mbox_cmd);
+ rc = cxl_mbox->mbox_send(cxl_mbox, mbox_cmd);
if (rc)
goto out;
@@ -659,11 +664,12 @@
static int cxl_xfer_log(struct cxl_memdev_state *mds, uuid_t *uuid,
u32 *size, u8 *out)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
u32 remaining = *size;
u32 offset = 0;
while (remaining) {
- u32 xfer_size = min_t(u32, remaining, mds->payload_size);
+ u32 xfer_size = min_t(u32, remaining, cxl_mbox->payload_size);
struct cxl_mbox_cmd mbox_cmd;
struct cxl_mbox_get_log log;
int rc;
@@ -682,7 +688,7 @@
.payload_out = out,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
/*
* The output payload length that indicates the number
@@ -752,22 +758,23 @@
static struct cxl_mbox_get_supported_logs *cxl_get_gsl(struct cxl_memdev_state *mds)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_get_supported_logs *ret;
struct cxl_mbox_cmd mbox_cmd;
int rc;
- ret = kvmalloc(mds->payload_size, GFP_KERNEL);
+ ret = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
if (!ret)
return ERR_PTR(-ENOMEM);
mbox_cmd = (struct cxl_mbox_cmd) {
.opcode = CXL_MBOX_OP_GET_SUPPORTED_LOGS,
- .size_out = mds->payload_size,
+ .size_out = cxl_mbox->payload_size,
.payload_out = ret,
/* At least the record number field must be valid */
.min_out = 2,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0) {
kvfree(ret);
return ERR_PTR(rc);
@@ -910,6 +917,7 @@
enum cxl_event_log_type log,
struct cxl_get_event_payload *get_pl)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_clear_event_payload *payload;
u16 total = le16_to_cpu(get_pl->record_count);
u8 max_handles = CXL_CLEAR_EVENT_MAX_HANDLES;
@@ -920,8 +928,8 @@
int i;
/* Payload size may limit the max handles */
- if (pl_size > mds->payload_size) {
- max_handles = (mds->payload_size - sizeof(*payload)) /
+ if (pl_size > cxl_mbox->payload_size) {
+ max_handles = (cxl_mbox->payload_size - sizeof(*payload)) /
sizeof(__le16);
pl_size = struct_size(payload, handles, max_handles);
}
@@ -955,7 +963,7 @@
if (i == max_handles) {
payload->nr_recs = i;
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc)
goto free_pl;
i = 0;
@@ -966,7 +974,7 @@
if (i) {
payload->nr_recs = i;
mbox_cmd.size_in = struct_size(payload, handles, i);
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc)
goto free_pl;
}
@@ -979,6 +987,7 @@
static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
enum cxl_event_log_type type)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
struct device *dev = mds->cxlds.dev;
struct cxl_get_event_payload *payload;
@@ -995,11 +1004,11 @@
.payload_in = &log_type,
.size_in = sizeof(log_type),
.payload_out = payload,
- .size_out = mds->payload_size,
+ .size_out = cxl_mbox->payload_size,
.min_out = struct_size(payload, records, 0),
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc) {
dev_err_ratelimited(dev,
"Event log '%d': Failed to query event records : %d",
@@ -1070,6 +1079,7 @@
*/
static int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_get_partition_info pi;
struct cxl_mbox_cmd mbox_cmd;
int rc;
@@ -1079,7 +1089,7 @@
.size_out = sizeof(pi),
.payload_out = &pi,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc)
return rc;
@@ -1106,6 +1116,7 @@
*/
int cxl_dev_state_identify(struct cxl_memdev_state *mds)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
/* See CXL 2.0 Table 175 Identify Memory Device Output Payload */
struct cxl_mbox_identify id;
struct cxl_mbox_cmd mbox_cmd;
@@ -1120,7 +1131,7 @@
.size_out = sizeof(id),
.payload_out = &id,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0)
return rc;
@@ -1148,6 +1159,7 @@
static int __cxl_mem_sanitize(struct cxl_memdev_state *mds, u16 cmd)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
int rc;
u32 sec_out = 0;
struct cxl_get_security_output {
@@ -1159,14 +1171,13 @@
.size_out = sizeof(out),
};
struct cxl_mbox_cmd mbox_cmd = { .opcode = cmd };
- struct cxl_dev_state *cxlds = &mds->cxlds;
if (cmd != CXL_MBOX_OP_SANITIZE && cmd != CXL_MBOX_OP_SECURE_ERASE)
return -EINVAL;
- rc = cxl_internal_send_cmd(mds, &sec_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &sec_cmd);
if (rc < 0) {
- dev_err(cxlds->dev, "Failed to get security state : %d", rc);
+ dev_err(cxl_mbox->host, "Failed to get security state : %d", rc);
return rc;
}
@@ -1183,9 +1194,9 @@
sec_out & CXL_PMEM_SEC_STATE_LOCKED)
return -EINVAL;
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0) {
- dev_err(cxlds->dev, "Failed to sanitize device : %d", rc);
+ dev_err(cxl_mbox->host, "Failed to sanitize device : %d", rc);
return rc;
}
@@ -1214,7 +1225,7 @@
int rc;
/* synchronize with cxl_mem_probe() and decoder write operations */
- device_lock(&cxlmd->dev);
+ guard(device)(&cxlmd->dev);
endpoint = cxlmd->endpoint;
down_read(&cxl_region_rwsem);
/*
@@ -1226,7 +1237,6 @@
else
rc = -EBUSY;
up_read(&cxl_region_rwsem);
- device_unlock(&cxlmd->dev);
return rc;
}
@@ -1300,6 +1310,7 @@
int cxl_set_timestamp(struct cxl_memdev_state *mds)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_cmd mbox_cmd;
struct cxl_mbox_set_timestamp_in pi;
int rc;
@@ -1311,7 +1322,7 @@
.payload_in = &pi,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
/*
* Command is optional. Devices may have another way of providing
* a timestamp, or may return all 0s in timestamp fields.
@@ -1328,6 +1339,7 @@
struct cxl_region *cxlr)
{
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
struct cxl_mbox_poison_out *po;
struct cxl_mbox_poison_in pi;
int nr_records = 0;
@@ -1346,12 +1358,12 @@
.opcode = CXL_MBOX_OP_GET_POISON,
.size_in = sizeof(pi),
.payload_in = &pi,
- .size_out = mds->payload_size,
+ .size_out = cxl_mbox->payload_size,
.payload_out = po,
.min_out = struct_size(po, record, 0),
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc)
break;
@@ -1382,7 +1394,9 @@
/* Get Poison List output buffer is protected by mds->poison.lock */
static int cxl_poison_alloc_buf(struct cxl_memdev_state *mds)
{
- mds->poison.list_out = kvmalloc(mds->payload_size, GFP_KERNEL);
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
+ mds->poison.list_out = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
if (!mds->poison.list_out)
return -ENOMEM;
@@ -1408,6 +1422,19 @@
}
EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL);
+int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
+{
+ if (!cxl_mbox || !host)
+ return -EINVAL;
+
+ cxl_mbox->host = host;
+ mutex_init(&cxl_mbox->mbox_mutex);
+ rcuwait_init(&cxl_mbox->mbox_wait);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, CXL);
+
struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
{
struct cxl_memdev_state *mds;
@@ -1418,7 +1445,6 @@
return ERR_PTR(-ENOMEM);
}
- mutex_init(&mds->mbox_mutex);
mutex_init(&mds->event.log_lock);
mds->cxlds.dev = dev;
mds->cxlds.reg_map.host = dev;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 0277726..84fefb7 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -58,7 +58,7 @@
if (!mds)
return sysfs_emit(buf, "\n");
- return sysfs_emit(buf, "%zu\n", mds->payload_size);
+ return sysfs_emit(buf, "%zu\n", cxlds->cxl_mbox.payload_size);
}
static DEVICE_ATTR_RO(payload_max);
@@ -124,15 +124,16 @@
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
unsigned long state = mds->security.state;
int rc = 0;
/* sync with latest submission state */
- mutex_lock(&mds->mbox_mutex);
+ mutex_lock(&cxl_mbox->mbox_mutex);
if (mds->security.sanitize_active)
rc = sysfs_emit(buf, "sanitize\n");
- mutex_unlock(&mds->mbox_mutex);
+ mutex_unlock(&cxl_mbox->mbox_mutex);
if (rc)
return rc;
@@ -277,7 +278,7 @@
int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
{
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
struct cxl_mbox_inject_poison inject;
struct cxl_poison_record record;
struct cxl_mbox_cmd mbox_cmd;
@@ -307,13 +308,13 @@
.size_in = sizeof(inject),
.payload_in = &inject,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc)
goto out;
cxlr = cxl_dpa_to_region(cxlmd, dpa);
if (cxlr)
- dev_warn_once(mds->cxlds.dev,
+ dev_warn_once(cxl_mbox->host,
"poison inject dpa:%#llx region: %s\n", dpa,
dev_name(&cxlr->dev));
@@ -332,7 +333,7 @@
int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
{
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
struct cxl_mbox_clear_poison clear;
struct cxl_poison_record record;
struct cxl_mbox_cmd mbox_cmd;
@@ -371,13 +372,13 @@
.payload_in = &clear,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc)
goto out;
cxlr = cxl_dpa_to_region(cxlmd, dpa);
if (cxlr)
- dev_warn_once(mds->cxlds.dev,
+ dev_warn_once(cxl_mbox->host,
"poison clear dpa:%#llx region: %s\n", dpa,
dev_name(&cxlr->dev));
@@ -714,6 +715,7 @@
*/
static int cxl_mem_get_fw_info(struct cxl_memdev_state *mds)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_get_fw_info info;
struct cxl_mbox_cmd mbox_cmd;
int rc;
@@ -724,7 +726,7 @@
.payload_out = &info,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0)
return rc;
@@ -748,6 +750,7 @@
*/
static int cxl_mem_activate_fw(struct cxl_memdev_state *mds, int slot)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_activate_fw activate;
struct cxl_mbox_cmd mbox_cmd;
@@ -764,7 +767,7 @@
activate.action = CXL_FW_ACTIVATE_OFFLINE;
activate.slot = slot;
- return cxl_internal_send_cmd(mds, &mbox_cmd);
+ return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
}
/**
@@ -779,6 +782,7 @@
*/
static int cxl_mem_abort_fw_xfer(struct cxl_memdev_state *mds)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_transfer_fw *transfer;
struct cxl_mbox_cmd mbox_cmd;
int rc;
@@ -798,7 +802,7 @@
transfer->action = CXL_FW_TRANSFER_ACTION_ABORT;
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
kfree(transfer);
return rc;
}
@@ -829,12 +833,13 @@
{
struct cxl_memdev_state *mds = fwl->dd_handle;
struct cxl_mbox_transfer_fw *transfer;
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
if (!size)
return FW_UPLOAD_ERR_INVALID_SIZE;
mds->fw.oneshot = struct_size(transfer, data, size) <
- mds->payload_size;
+ cxl_mbox->payload_size;
if (cxl_mem_get_fw_info(mds))
return FW_UPLOAD_ERR_HW_ERROR;
@@ -854,6 +859,7 @@
{
struct cxl_memdev_state *mds = fwl->dd_handle;
struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
struct cxl_memdev *cxlmd = cxlds->cxlmd;
struct cxl_mbox_transfer_fw *transfer;
struct cxl_mbox_cmd mbox_cmd;
@@ -877,7 +883,7 @@
* sizeof(*transfer) is 128. These constraints imply that @cur_size
* will always be 128b aligned.
*/
- cur_size = min_t(size_t, size, mds->payload_size - sizeof(*transfer));
+ cur_size = min_t(size_t, size, cxl_mbox->payload_size - sizeof(*transfer));
remaining = size - cur_size;
size_in = struct_size(transfer, data, cur_size);
@@ -921,7 +927,7 @@
.poll_count = 30,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0) {
rc = FW_UPLOAD_ERR_RW_ERROR;
goto out_free;
@@ -1059,16 +1065,17 @@
static void sanitize_teardown_notifier(void *data)
{
struct cxl_memdev_state *mds = data;
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct kernfs_node *state;
/*
* Prevent new irq triggered invocations of the workqueue and
* flush inflight invocations.
*/
- mutex_lock(&mds->mbox_mutex);
+ mutex_lock(&cxl_mbox->mbox_mutex);
state = mds->security.sanitize_node;
mds->security.sanitize_node = NULL;
- mutex_unlock(&mds->mbox_mutex);
+ mutex_unlock(&cxl_mbox->mbox_mutex);
cancel_delayed_work_sync(&mds->security.poll_dwork);
sysfs_put(state);
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 51132a5..5b46bc4 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -211,37 +211,6 @@
}
EXPORT_SYMBOL_NS_GPL(cxl_await_media_ready, CXL);
-static int wait_for_valid(struct pci_dev *pdev, int d)
-{
- u32 val;
- int rc;
-
- /*
- * Memory_Info_Valid: When set, indicates that the CXL Range 1 Size high
- * and Size Low registers are valid. Must be set within 1 second of
- * deassertion of reset to CXL device. Likely it is already set by the
- * time this runs, but otherwise give a 1.5 second timeout in case of
- * clock skew.
- */
- rc = pci_read_config_dword(pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(0), &val);
- if (rc)
- return rc;
-
- if (val & CXL_DVSEC_MEM_INFO_VALID)
- return 0;
-
- msleep(1500);
-
- rc = pci_read_config_dword(pdev, d + CXL_DVSEC_RANGE_SIZE_LOW(0), &val);
- if (rc)
- return rc;
-
- if (val & CXL_DVSEC_MEM_INFO_VALID)
- return 0;
-
- return -ETIMEDOUT;
-}
-
static int cxl_set_mem_enable(struct cxl_dev_state *cxlds, u16 val)
{
struct pci_dev *pdev = to_pci_dev(cxlds->dev);
@@ -322,11 +291,13 @@
return devm_add_action_or_reset(host, disable_hdm, cxlhdm);
}
-int cxl_dvsec_rr_decode(struct device *dev, int d,
+int cxl_dvsec_rr_decode(struct device *dev, struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info)
{
struct pci_dev *pdev = to_pci_dev(dev);
+ struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
int hdm_count, rc, i, ranges = 0;
+ int d = cxlds->cxl_dvsec;
u16 cap, ctrl;
if (!d) {
@@ -353,12 +324,6 @@
if (!hdm_count || hdm_count > 2)
return -EINVAL;
- rc = wait_for_valid(pdev, d);
- if (rc) {
- dev_dbg(dev, "Failure awaiting MEM_INFO_VALID (%d)\n", rc);
- return rc;
- }
-
/*
* The current DVSEC values are moot if the memory capability is
* disabled, and they will remain moot after the HDM Decoder
@@ -376,6 +341,10 @@
u64 base, size;
u32 temp;
+ rc = cxl_dvsec_mem_range_valid(cxlds, i);
+ if (rc)
+ return rc;
+
rc = pci_read_config_dword(
pdev, d + CXL_DVSEC_RANGE_SIZE_HIGH(i), &temp);
if (rc)
@@ -390,10 +359,6 @@
size |= temp & CXL_DVSEC_MEM_SIZE_LOW_MASK;
if (!size) {
- info->dvsec_range[i] = (struct range) {
- .start = 0,
- .end = CXL_RESOURCE_NONE,
- };
continue;
}
@@ -411,12 +376,10 @@
base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
- info->dvsec_range[i] = (struct range) {
+ info->dvsec_range[ranges++] = (struct range) {
.start = base,
.end = base + size - 1
};
-
- ranges++;
}
info->ranges = ranges;
@@ -463,7 +426,15 @@
return -ENODEV;
}
- for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
+ if (!info->mem_enabled) {
+ rc = devm_cxl_enable_hdm(&port->dev, cxlhdm);
+ if (rc)
+ return rc;
+
+ return devm_cxl_enable_mem(&port->dev, cxlds);
+ }
+
+ for (i = 0, allowed = 0; i < info->ranges; i++) {
struct device *cxld_dev;
cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
@@ -477,7 +448,7 @@
allowed++;
}
- if (!allowed && info->mem_enabled) {
+ if (!allowed) {
dev_err(dev, "Range register decodes outside platform defined CXL ranges.\n");
return -ENXIO;
}
@@ -491,14 +462,7 @@
* match. If at least one DVSEC range is enabled and allowed, skip HDM
* Decoder Capability Enable.
*/
- if (info->mem_enabled)
- return 0;
-
- rc = devm_cxl_enable_hdm(&port->dev, cxlhdm);
- if (rc)
- return rc;
-
- return devm_cxl_enable_mem(&port->dev, cxlds);
+ return 0;
}
EXPORT_SYMBOL_NS_GPL(cxl_hdm_decode_init, CXL);
@@ -772,22 +736,20 @@
static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
{
- struct cxl_rcrb_info *ri = &dport->rcrb;
- void __iomem *dport_aer = NULL;
resource_size_t aer_phys;
struct device *host;
+ u16 aer_cap;
- if (dport->rch && ri->aer_cap) {
+ aer_cap = cxl_rcrb_to_aer(dport->dport_dev, dport->rcrb.base);
+ if (aer_cap) {
host = dport->reg_map.host;
- aer_phys = ri->aer_cap + ri->base;
- dport_aer = devm_cxl_iomap_block(host, aer_phys,
- sizeof(struct aer_capability_regs));
+ aer_phys = aer_cap + dport->rcrb.base;
+ dport->regs.dport_aer = devm_cxl_iomap_block(host, aer_phys,
+ sizeof(struct aer_capability_regs));
}
-
- dport->regs.dport_aer = dport_aer;
}
-static void cxl_dport_map_regs(struct cxl_dport *dport)
+static void cxl_dport_map_ras(struct cxl_dport *dport)
{
struct cxl_register_map *map = &dport->reg_map;
struct device *dev = dport->dport_dev;
@@ -797,22 +759,16 @@
else if (cxl_map_component_regs(map, &dport->regs.component,
BIT(CXL_CM_CAP_CAP_ID_RAS)))
dev_dbg(dev, "Failed to map RAS capability.\n");
-
- if (dport->rch)
- cxl_dport_map_rch_aer(dport);
}
static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
{
void __iomem *aer_base = dport->regs.dport_aer;
- struct pci_host_bridge *bridge;
u32 aer_cmd_mask, aer_cmd;
if (!aer_base)
return;
- bridge = to_pci_host_bridge(dport->dport_dev);
-
/*
* Disable RCH root port command interrupts.
* CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors
@@ -821,34 +777,35 @@
* the root cmd register's interrupts is required. But, PCI spec
* shows these are disabled by default on reset.
*/
- if (bridge->native_aer) {
- aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
- PCI_ERR_ROOT_CMD_NONFATAL_EN |
- PCI_ERR_ROOT_CMD_FATAL_EN);
- aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND);
- aer_cmd &= ~aer_cmd_mask;
- writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
- }
+ aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
+ PCI_ERR_ROOT_CMD_NONFATAL_EN |
+ PCI_ERR_ROOT_CMD_FATAL_EN);
+ aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND);
+ aer_cmd &= ~aer_cmd_mask;
+ writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
}
-void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport)
+/**
+ * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
+ * @dport: the cxl_dport that needs to be initialized
+ * @host: host device for devm operations
+ */
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
{
- struct device *dport_dev = dport->dport_dev;
+ dport->reg_map.host = host;
+ cxl_dport_map_ras(dport);
if (dport->rch) {
- struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport_dev);
+ struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
- if (host_bridge->native_aer)
- dport->rcrb.aer_cap = cxl_rcrb_to_aer(dport_dev, dport->rcrb.base);
- }
+ if (!host_bridge->native_aer)
+ return;
- dport->reg_map.host = host;
- cxl_dport_map_regs(dport);
-
- if (dport->rch)
+ cxl_dport_map_rch_aer(dport);
cxl_disable_rch_root_ints(dport);
+ }
}
-EXPORT_SYMBOL_NS_GPL(cxl_setup_parent_dport, CXL);
+EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL);
static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
struct cxl_dport *dport)
@@ -915,15 +872,13 @@
struct pci_dev *pdev = to_pci_dev(cxlds->dev);
struct aer_capability_regs aer_regs;
struct cxl_dport *dport;
- struct cxl_port *port;
int severity;
- port = cxl_pci_find_port(pdev, &dport);
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
if (!port)
return;
- put_device(&port->dev);
-
if (!cxl_rch_get_aer_info(dport->regs.dport_aer, &aer_regs))
return;
@@ -1076,3 +1031,26 @@
__cxl_endpoint_decoder_reset_detected);
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, CXL);
+
+int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c)
+{
+ int speed, bw;
+ u16 lnksta;
+ u32 width;
+
+ speed = pcie_link_speed_mbps(pdev);
+ if (speed < 0)
+ return speed;
+ speed /= BITS_PER_BYTE;
+
+ pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta);
+ width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta);
+ bw = speed * width;
+
+ for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
+ c[i].read_bandwidth = bw;
+ c[i].write_bandwidth = bw;
+ }
+
+ return 0;
+}
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 1d5007e..e666ec6 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -3,7 +3,6 @@
#include <linux/platform_device.h>
#include <linux/memregion.h>
#include <linux/workqueue.h>
-#include <linux/einj-cxl.h>
#include <linux/debugfs.h>
#include <linux/device.h>
#include <linux/module.h>
@@ -11,6 +10,7 @@
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/node.h>
+#include <cxl/einj.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <cxl.h>
@@ -828,27 +828,20 @@
&cxl_einj_inject_fops);
}
-static struct cxl_port *__devm_cxl_add_port(struct device *host,
- struct device *uport_dev,
- resource_size_t component_reg_phys,
- struct cxl_dport *parent_dport)
+static int cxl_port_add(struct cxl_port *port,
+ resource_size_t component_reg_phys,
+ struct cxl_dport *parent_dport)
{
- struct cxl_port *port;
- struct device *dev;
+ struct device *dev __free(put_device) = &port->dev;
int rc;
- port = cxl_port_alloc(uport_dev, parent_dport);
- if (IS_ERR(port))
- return port;
-
- dev = &port->dev;
- if (is_cxl_memdev(uport_dev)) {
- struct cxl_memdev *cxlmd = to_cxl_memdev(uport_dev);
+ if (is_cxl_memdev(port->uport_dev)) {
+ struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
rc = dev_set_name(dev, "endpoint%d", port->id);
if (rc)
- goto err;
+ return rc;
/*
* The endpoint driver already enumerated the component and RAS
@@ -861,19 +854,41 @@
} else if (parent_dport) {
rc = dev_set_name(dev, "port%d", port->id);
if (rc)
- goto err;
+ return rc;
rc = cxl_port_setup_regs(port, component_reg_phys);
if (rc)
- goto err;
- } else
+ return rc;
+ } else {
rc = dev_set_name(dev, "root%d", port->id);
- if (rc)
- goto err;
+ if (rc)
+ return rc;
+ }
rc = device_add(dev);
if (rc)
- goto err;
+ return rc;
+
+ /* Inhibit the cleanup function invoked */
+ dev = NULL;
+ return 0;
+}
+
+static struct cxl_port *__devm_cxl_add_port(struct device *host,
+ struct device *uport_dev,
+ resource_size_t component_reg_phys,
+ struct cxl_dport *parent_dport)
+{
+ struct cxl_port *port;
+ int rc;
+
+ port = cxl_port_alloc(uport_dev, parent_dport);
+ if (IS_ERR(port))
+ return port;
+
+ rc = cxl_port_add(port, component_reg_phys, parent_dport);
+ if (rc)
+ return ERR_PTR(rc);
rc = devm_add_action_or_reset(host, unregister_port, port);
if (rc)
@@ -891,10 +906,6 @@
port->pci_latency = cxl_pci_get_latency(to_pci_dev(uport_dev));
return port;
-
-err:
- put_device(dev);
- return ERR_PTR(rc);
}
/**
@@ -941,7 +952,7 @@
port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
if (IS_ERR(port))
- return (struct cxl_root *)port;
+ return ERR_CAST(port);
cxl_root = to_cxl_root(port);
cxl_root->ops = ops;
@@ -1258,18 +1269,13 @@
static int add_ep(struct cxl_ep *new)
{
struct cxl_port *port = new->dport->port;
- int rc;
- device_lock(&port->dev);
- if (port->dead) {
- device_unlock(&port->dev);
+ guard(device)(&port->dev);
+ if (port->dead)
return -ENXIO;
- }
- rc = xa_insert(&port->endpoints, (unsigned long)new->ep, new,
- GFP_KERNEL);
- device_unlock(&port->dev);
- return rc;
+ return xa_insert(&port->endpoints, (unsigned long)new->ep,
+ new, GFP_KERNEL);
}
/**
@@ -1393,14 +1399,14 @@
struct cxl_port *endpoint = cxlmd->endpoint;
struct device *host = endpoint_host(endpoint);
- device_lock(host);
- if (host->driver && !endpoint->dead) {
- devm_release_action(host, cxl_unlink_parent_dport, endpoint);
- devm_release_action(host, cxl_unlink_uport, endpoint);
- devm_release_action(host, unregister_port, endpoint);
+ scoped_guard(device, host) {
+ if (host->driver && !endpoint->dead) {
+ devm_release_action(host, cxl_unlink_parent_dport, endpoint);
+ devm_release_action(host, cxl_unlink_uport, endpoint);
+ devm_release_action(host, unregister_port, endpoint);
+ }
+ cxlmd->endpoint = NULL;
}
- cxlmd->endpoint = NULL;
- device_unlock(host);
put_device(&endpoint->dev);
put_device(host);
}
@@ -1477,12 +1483,11 @@
.cxlmd = cxlmd,
.depth = i,
};
- struct device *dev;
struct cxl_ep *ep;
bool died = false;
- dev = bus_find_device(&cxl_bus_type, NULL, &ctx,
- port_has_memdev);
+ struct device *dev __free(put_device) =
+ bus_find_device(&cxl_bus_type, NULL, &ctx, port_has_memdev);
if (!dev)
continue;
port = to_cxl_port(dev);
@@ -1512,7 +1517,6 @@
dev_name(&port->dev));
delete_switch_port(port);
}
- put_device(&port->dev);
device_unlock(&parent_port->dev);
}
}
@@ -1540,7 +1544,6 @@
struct device *dport_dev)
{
struct device *dparent = grandparent(dport_dev);
- struct cxl_port *port, *parent_port = NULL;
struct cxl_dport *dport, *parent_dport;
resource_size_t component_reg_phys;
int rc;
@@ -1556,50 +1559,52 @@
return -ENXIO;
}
- parent_port = find_cxl_port(dparent, &parent_dport);
+ struct cxl_port *parent_port __free(put_cxl_port) =
+ find_cxl_port(dparent, &parent_dport);
if (!parent_port) {
/* iterate to create this parent_port */
return -EAGAIN;
}
- device_lock(&parent_port->dev);
- if (!parent_port->dev.driver) {
- dev_warn(&cxlmd->dev,
- "port %s:%s disabled, failed to enumerate CXL.mem\n",
- dev_name(&parent_port->dev), dev_name(uport_dev));
- port = ERR_PTR(-ENXIO);
- goto out;
- }
-
- port = find_cxl_port_at(parent_port, dport_dev, &dport);
- if (!port) {
- component_reg_phys = find_component_registers(uport_dev);
- port = devm_cxl_add_port(&parent_port->dev, uport_dev,
- component_reg_phys, parent_dport);
- /* retry find to pick up the new dport information */
- if (!IS_ERR(port))
- port = find_cxl_port_at(parent_port, dport_dev, &dport);
- }
-out:
- device_unlock(&parent_port->dev);
-
- if (IS_ERR(port))
- rc = PTR_ERR(port);
- else {
- dev_dbg(&cxlmd->dev, "add to new port %s:%s\n",
- dev_name(&port->dev), dev_name(port->uport_dev));
- rc = cxl_add_ep(dport, &cxlmd->dev);
- if (rc == -EBUSY) {
- /*
- * "can't" happen, but this error code means
- * something to the caller, so translate it.
- */
- rc = -ENXIO;
+ /*
+ * Definition with __free() here to keep the sequence of
+ * dereferencing the device of the port before the parent_port releasing.
+ */
+ struct cxl_port *port __free(put_cxl_port) = NULL;
+ scoped_guard(device, &parent_port->dev) {
+ if (!parent_port->dev.driver) {
+ dev_warn(&cxlmd->dev,
+ "port %s:%s disabled, failed to enumerate CXL.mem\n",
+ dev_name(&parent_port->dev), dev_name(uport_dev));
+ return -ENXIO;
}
- put_device(&port->dev);
+
+ port = find_cxl_port_at(parent_port, dport_dev, &dport);
+ if (!port) {
+ component_reg_phys = find_component_registers(uport_dev);
+ port = devm_cxl_add_port(&parent_port->dev, uport_dev,
+ component_reg_phys, parent_dport);
+ if (IS_ERR(port))
+ return PTR_ERR(port);
+
+ /* retry find to pick up the new dport information */
+ port = find_cxl_port_at(parent_port, dport_dev, &dport);
+ if (!port)
+ return -ENXIO;
+ }
}
- put_device(&parent_port->dev);
+ dev_dbg(&cxlmd->dev, "add to new port %s:%s\n",
+ dev_name(&port->dev), dev_name(port->uport_dev));
+ rc = cxl_add_ep(dport, &cxlmd->dev);
+ if (rc == -EBUSY) {
+ /*
+ * "can't" happen, but this error code means
+ * something to the caller, so translate it.
+ */
+ rc = -ENXIO;
+ }
+
return rc;
}
@@ -1630,7 +1635,6 @@
struct device *dport_dev = grandparent(iter);
struct device *uport_dev;
struct cxl_dport *dport;
- struct cxl_port *port;
/*
* The terminal "grandparent" in PCI is NULL and @platform_bus
@@ -1649,7 +1653,8 @@
dev_dbg(dev, "scan: iter: %s dport_dev: %s parent: %s\n",
dev_name(iter), dev_name(dport_dev),
dev_name(uport_dev));
- port = find_cxl_port(dport_dev, &dport);
+ struct cxl_port *port __free(put_cxl_port) =
+ find_cxl_port(dport_dev, &dport);
if (port) {
dev_dbg(&cxlmd->dev,
"found already registered port %s:%s\n",
@@ -1664,18 +1669,13 @@
* the parent_port lock as the current port may be being
* reaped.
*/
- if (rc && rc != -EBUSY) {
- put_device(&port->dev);
+ if (rc && rc != -EBUSY)
return rc;
- }
/* Any more ports to add between this one and the root? */
- if (!dev_is_cxl_root_child(&port->dev)) {
- put_device(&port->dev);
+ if (!dev_is_cxl_root_child(&port->dev))
continue;
- }
- put_device(&port->dev);
return 0;
}
@@ -1983,7 +1983,6 @@
int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map)
{
struct cxl_port *port;
- int rc;
if (WARN_ON_ONCE(!cxld))
return -EINVAL;
@@ -1993,11 +1992,8 @@
port = to_cxl_port(cxld->dev.parent);
- device_lock(&port->dev);
- rc = cxl_decoder_add_locked(cxld, target_map);
- device_unlock(&port->dev);
-
- return rc;
+ guard(device)(&port->dev);
+ return cxl_decoder_add_locked(cxld, target_map);
}
EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
@@ -2241,6 +2237,26 @@
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_get_perf_coordinates, CXL);
+int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
+ struct access_coordinate *c)
+{
+ struct cxl_dport *dport = port->parent_dport;
+
+ /* Check this port is connected to a switch DSP and not an RP */
+ if (parent_port_is_cxl_root(to_cxl_port(port->dev.parent)))
+ return -ENODEV;
+
+ if (!coordinates_valid(dport->coord))
+ return -EINVAL;
+
+ for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
+ c[i].read_bandwidth = dport->coord[i].read_bandwidth;
+ c[i].write_bandwidth = dport->coord[i].write_bandwidth;
+ }
+
+ return 0;
+}
+
/* for user tooling to ensure port disable work has completed */
static ssize_t flush_store(const struct bus_type *bus, const char *buf, size_t count)
{
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 21ad5f2..e701e4b 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1983,6 +1983,7 @@
* then the region is already committed.
*/
p->state = CXL_CONFIG_COMMIT;
+ cxl_region_shared_upstream_bandwidth_update(cxlr);
return 0;
}
@@ -2004,6 +2005,7 @@
if (rc)
return rc;
p->state = CXL_CONFIG_ACTIVE;
+ cxl_region_shared_upstream_bandwidth_update(cxlr);
}
cxled->cxld.interleave_ways = p->interleave_ways;
@@ -2313,8 +2315,6 @@
struct cxl_region_params *p = &cxlr->params;
int i;
- unregister_memory_notifier(&cxlr->memory_notifier);
- unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
device_del(&cxlr->dev);
/*
@@ -2391,18 +2391,6 @@
return true;
}
-static int cxl_region_nid(struct cxl_region *cxlr)
-{
- struct cxl_region_params *p = &cxlr->params;
- struct resource *res;
-
- guard(rwsem_read)(&cxl_region_rwsem);
- res = p->res;
- if (!res)
- return NUMA_NO_NODE;
- return phys_to_target_node(res->start);
-}
-
static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
unsigned long action, void *arg)
{
@@ -2415,7 +2403,11 @@
if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
return NOTIFY_DONE;
- region_nid = cxl_region_nid(cxlr);
+ /*
+ * No need to hold cxl_region_rwsem; region parameters are stable
+ * within the cxl_region driver.
+ */
+ region_nid = phys_to_target_node(cxlr->params.res->start);
if (nid != region_nid)
return NOTIFY_DONE;
@@ -2434,7 +2426,11 @@
int *adist = data;
int region_nid;
- region_nid = cxl_region_nid(cxlr);
+ /*
+ * No need to hold cxl_region_rwsem; region parameters are stable
+ * within the cxl_region driver.
+ */
+ region_nid = phys_to_target_node(cxlr->params.res->start);
if (nid != region_nid)
return NOTIFY_OK;
@@ -2484,14 +2480,6 @@
if (rc)
goto err;
- cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
- cxlr->memory_notifier.priority = CXL_CALLBACK_PRI;
- register_memory_notifier(&cxlr->memory_notifier);
-
- cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
- cxlr->adist_notifier.priority = 100;
- register_mt_adistance_algorithm(&cxlr->adist_notifier);
-
rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
if (rc)
return ERR_PTR(rc);
@@ -3094,11 +3082,11 @@
struct cxl_region *cxlr = _cxlr;
struct cxl_nvdimm_bridge *cxl_nvb = cxlr->cxl_nvb;
- device_lock(&cxl_nvb->dev);
- if (cxlr->cxlr_pmem)
- devm_release_action(&cxl_nvb->dev, cxlr_pmem_unregister,
- cxlr->cxlr_pmem);
- device_unlock(&cxl_nvb->dev);
+ scoped_guard(device, &cxl_nvb->dev) {
+ if (cxlr->cxlr_pmem)
+ devm_release_action(&cxl_nvb->dev, cxlr_pmem_unregister,
+ cxlr->cxlr_pmem);
+ }
cxlr->cxl_nvb = NULL;
put_device(&cxl_nvb->dev);
}
@@ -3134,13 +3122,14 @@
dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent),
dev_name(dev));
- device_lock(&cxl_nvb->dev);
- if (cxl_nvb->dev.driver)
- rc = devm_add_action_or_reset(&cxl_nvb->dev,
- cxlr_pmem_unregister, cxlr_pmem);
- else
- rc = -ENXIO;
- device_unlock(&cxl_nvb->dev);
+ scoped_guard(device, &cxl_nvb->dev) {
+ if (cxl_nvb->dev.driver)
+ rc = devm_add_action_or_reset(&cxl_nvb->dev,
+ cxlr_pmem_unregister,
+ cxlr_pmem);
+ else
+ rc = -ENXIO;
+ }
if (rc)
goto err_bridge;
@@ -3386,6 +3375,14 @@
return 1;
}
+static void shutdown_notifiers(void *_cxlr)
+{
+ struct cxl_region *cxlr = _cxlr;
+
+ unregister_memory_notifier(&cxlr->memory_notifier);
+ unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
+}
+
static int cxl_region_probe(struct device *dev)
{
struct cxl_region *cxlr = to_cxl_region(dev);
@@ -3421,6 +3418,18 @@
if (rc)
return rc;
+ cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
+ cxlr->memory_notifier.priority = CXL_CALLBACK_PRI;
+ register_memory_notifier(&cxlr->memory_notifier);
+
+ cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
+ cxlr->adist_notifier.priority = 100;
+ register_mt_adistance_algorithm(&cxlr->adist_notifier);
+
+ rc = devm_add_action_or_reset(&cxlr->dev, shutdown_notifiers, cxlr);
+ if (rc)
+ return rc;
+
switch (cxlr->mode) {
case CXL_DECODER_PMEM:
return devm_cxl_add_pmem_region(cxlr);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9afb407..0d8b810 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -744,6 +744,7 @@
void put_cxl_root(struct cxl_root *cxl_root);
DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_cxl_root(_T))
+DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
void cxl_bus_rescan(void);
void cxl_bus_drain(void);
@@ -762,9 +763,10 @@
#ifdef CONFIG_PCIEAER_CXL
void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport);
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
#else
-static inline void cxl_setup_parent_dport(struct device *host,
- struct cxl_dport *dport) { }
+static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
+ struct device *host) { }
#endif
struct cxl_decoder *to_cxl_decoder(struct device *dev);
@@ -809,7 +811,7 @@
int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
int devm_cxl_add_passthrough_decoder(struct cxl_port *port);
-int cxl_dvsec_rr_decode(struct device *dev, int dvsec,
+int cxl_dvsec_rr_decode(struct device *dev, struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info);
bool is_cxl_region(struct device *dev);
@@ -889,6 +891,7 @@
struct access_coordinate *coord);
void cxl_region_perf_data_calculate(struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled);
+void cxl_region_shared_upstream_bandwidth_update(struct cxl_region *cxlr);
void cxl_memdev_update_perf(struct cxl_memdev *cxlmd);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index afb53d0..2a25d19 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -3,11 +3,12 @@
#ifndef __CXL_MEM_H__
#define __CXL_MEM_H__
#include <uapi/linux/cxl_mem.h>
+#include <linux/pci.h>
#include <linux/cdev.h>
#include <linux/uuid.h>
-#include <linux/rcuwait.h>
-#include <linux/cxl-event.h>
#include <linux/node.h>
+#include <cxl/event.h>
+#include <cxl/mailbox.h>
#include "cxl.h"
/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -397,11 +398,13 @@
* struct cxl_dpa_perf - DPA performance property entry
* @dpa_range: range for DPA address
* @coord: QoS performance data (i.e. latency, bandwidth)
+ * @cdat_coord: raw QoS performance data from CDAT
* @qos_class: QoS Class cookies
*/
struct cxl_dpa_perf {
struct range dpa_range;
struct access_coordinate coord[ACCESS_COORDINATE_MAX];
+ struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
int qos_class;
};
@@ -424,6 +427,7 @@
* @ram_res: Active Volatile memory capacity configuration
* @serial: PCIe Device Serial Number
* @type: Generic Memory Class device or Vendor Specific Memory device
+ * @cxl_mbox: CXL mailbox context
*/
struct cxl_dev_state {
struct device *dev;
@@ -438,8 +442,14 @@
struct resource ram_res;
u64 serial;
enum cxl_devtype type;
+ struct cxl_mailbox cxl_mbox;
};
+static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
+{
+ return dev_get_drvdata(cxl_mbox->host);
+}
+
/**
* struct cxl_memdev_state - Generic Type-3 Memory Device Class driver data
*
@@ -448,11 +458,8 @@
* the functionality related to that like Identify Memory Device and Get
* Partition Info
* @cxlds: Core driver state common across Type-2 and Type-3 devices
- * @payload_size: Size of space for payload
- * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
* @lsa_size: Size of Label Storage Area
* (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
- * @mbox_mutex: Mutex to synchronize mailbox access.
* @firmware_version: Firmware version for the memory device.
* @enabled_cmds: Hardware commands found enabled in CEL.
* @exclusive_cmds: Commands that are kernel-internal only
@@ -470,17 +477,13 @@
* @poison: poison driver state info
* @security: security driver state info
* @fw: firmware upload / activation state
- * @mbox_wait: RCU wait for mbox send completely
- * @mbox_send: @dev specific transport for transmitting mailbox commands
*
* See CXL 3.0 8.2.9.8.2 Capacity Configuration and Label Storage for
* details on capacity parameters.
*/
struct cxl_memdev_state {
struct cxl_dev_state cxlds;
- size_t payload_size;
size_t lsa_size;
- struct mutex mbox_mutex; /* Protects device mailbox and firmware */
char firmware_version[0x10];
DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
@@ -500,10 +503,6 @@
struct cxl_poison_state poison;
struct cxl_security_state security;
struct cxl_fw_state fw;
-
- struct rcuwait mbox_wait;
- int (*mbox_send)(struct cxl_memdev_state *mds,
- struct cxl_mbox_cmd *cmd);
};
static inline struct cxl_memdev_state *
@@ -814,7 +813,7 @@
CXL_PMEM_SEC_PASS_USER,
};
-int cxl_internal_send_cmd(struct cxl_memdev_state *mds,
+int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
struct cxl_mbox_cmd *cmd);
int cxl_dev_state_identify(struct cxl_memdev_state *mds);
int cxl_await_media_ready(struct cxl_dev_state *cxlds);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 7de232e..a9fd5cd 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -109,7 +109,6 @@
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct device *endpoint_parent;
- struct cxl_port *parent_port;
struct cxl_dport *dport;
struct dentry *dentry;
int rc;
@@ -146,7 +145,8 @@
if (rc)
return rc;
- parent_port = cxl_mem_find_port(cxlmd, &dport);
+ struct cxl_port *parent_port __free(put_cxl_port) =
+ cxl_mem_find_port(cxlmd, &dport);
if (!parent_port) {
dev_err(dev, "CXL port topology not found\n");
return -ENXIO;
@@ -166,23 +166,20 @@
else
endpoint_parent = &parent_port->dev;
- cxl_setup_parent_dport(dev, dport);
+ cxl_dport_init_ras_reporting(dport, dev);
- device_lock(endpoint_parent);
- if (!endpoint_parent->driver) {
- dev_err(dev, "CXL port topology %s not enabled\n",
- dev_name(endpoint_parent));
- rc = -ENXIO;
- goto unlock;
+ scoped_guard(device, endpoint_parent) {
+ if (!endpoint_parent->driver) {
+ dev_err(dev, "CXL port topology %s not enabled\n",
+ dev_name(endpoint_parent));
+ return -ENXIO;
+ }
+
+ rc = devm_cxl_add_endpoint(endpoint_parent, cxlmd, dport);
+ if (rc)
+ return rc;
}
- rc = devm_cxl_add_endpoint(endpoint_parent, cxlmd, dport);
-unlock:
- device_unlock(endpoint_parent);
- put_device(&parent_port->dev);
- if (rc)
- return rc;
-
/*
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 4be35dc..3716417 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -11,6 +11,7 @@
#include <linux/pci.h>
#include <linux/aer.h>
#include <linux/io.h>
+#include <cxl/mailbox.h>
#include "cxlmem.h"
#include "cxlpci.h"
#include "cxl.h"
@@ -124,6 +125,7 @@
u16 opcode;
struct cxl_dev_id *dev_id = id;
struct cxl_dev_state *cxlds = dev_id->cxlds;
+ struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
if (!cxl_mbox_background_complete(cxlds))
@@ -132,13 +134,13 @@
reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
if (opcode == CXL_MBOX_OP_SANITIZE) {
- mutex_lock(&mds->mbox_mutex);
+ mutex_lock(&cxl_mbox->mbox_mutex);
if (mds->security.sanitize_node)
mod_delayed_work(system_wq, &mds->security.poll_dwork, 0);
- mutex_unlock(&mds->mbox_mutex);
+ mutex_unlock(&cxl_mbox->mbox_mutex);
} else {
/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
- rcuwait_wake_up(&mds->mbox_wait);
+ rcuwait_wake_up(&cxl_mbox->mbox_wait);
}
return IRQ_HANDLED;
@@ -152,8 +154,9 @@
struct cxl_memdev_state *mds =
container_of(work, typeof(*mds), security.poll_dwork.work);
struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
- mutex_lock(&mds->mbox_mutex);
+ mutex_lock(&cxl_mbox->mbox_mutex);
if (cxl_mbox_background_complete(cxlds)) {
mds->security.poll_tmo_secs = 0;
if (mds->security.sanitize_node)
@@ -167,12 +170,12 @@
mds->security.poll_tmo_secs = min(15 * 60, timeout);
schedule_delayed_work(&mds->security.poll_dwork, timeout * HZ);
}
- mutex_unlock(&mds->mbox_mutex);
+ mutex_unlock(&cxl_mbox->mbox_mutex);
}
/**
* __cxl_pci_mbox_send_cmd() - Execute a mailbox command
- * @mds: The memory device driver data
+ * @cxl_mbox: CXL mailbox context
* @mbox_cmd: Command to send to the memory device.
*
* Context: Any context. Expects mbox_mutex to be held.
@@ -192,17 +195,18 @@
* not need to coordinate with each other. The driver only uses the primary
* mailbox.
*/
-static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
+static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
struct cxl_mbox_cmd *mbox_cmd)
{
- struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct cxl_dev_state *cxlds = mbox_to_cxlds(cxl_mbox);
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
void __iomem *payload = cxlds->regs.mbox + CXLDEV_MBOX_PAYLOAD_OFFSET;
struct device *dev = cxlds->dev;
u64 cmd_reg, status_reg;
size_t out_len;
int rc;
- lockdep_assert_held(&mds->mbox_mutex);
+ lockdep_assert_held(&cxl_mbox->mbox_mutex);
/*
* Here are the steps from 8.2.8.4 of the CXL 2.0 spec.
@@ -315,10 +319,10 @@
timeout = mbox_cmd->poll_interval_ms;
for (i = 0; i < mbox_cmd->poll_count; i++) {
- if (rcuwait_wait_event_timeout(&mds->mbox_wait,
- cxl_mbox_background_complete(cxlds),
- TASK_UNINTERRUPTIBLE,
- msecs_to_jiffies(timeout)) > 0)
+ if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
+ cxl_mbox_background_complete(cxlds),
+ TASK_UNINTERRUPTIBLE,
+ msecs_to_jiffies(timeout)) > 0)
break;
}
@@ -360,7 +364,7 @@
*/
size_t n;
- n = min3(mbox_cmd->size_out, mds->payload_size, out_len);
+ n = min3(mbox_cmd->size_out, cxl_mbox->payload_size, out_len);
memcpy_fromio(mbox_cmd->payload_out, payload, n);
mbox_cmd->size_out = n;
} else {
@@ -370,14 +374,14 @@
return 0;
}
-static int cxl_pci_mbox_send(struct cxl_memdev_state *mds,
+static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
struct cxl_mbox_cmd *cmd)
{
int rc;
- mutex_lock_io(&mds->mbox_mutex);
- rc = __cxl_pci_mbox_send_cmd(mds, cmd);
- mutex_unlock(&mds->mbox_mutex);
+ mutex_lock_io(&cxl_mbox->mbox_mutex);
+ rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
+ mutex_unlock(&cxl_mbox->mbox_mutex);
return rc;
}
@@ -385,6 +389,7 @@
static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
{
struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
struct device *dev = cxlds->dev;
unsigned long timeout;
@@ -417,8 +422,8 @@
return -ETIMEDOUT;
}
- mds->mbox_send = cxl_pci_mbox_send;
- mds->payload_size =
+ cxl_mbox->mbox_send = cxl_pci_mbox_send;
+ cxl_mbox->payload_size =
1 << FIELD_GET(CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK, cap);
/*
@@ -428,16 +433,15 @@
* there's no point in going forward. If the size is too large, there's
* no harm is soft limiting it.
*/
- mds->payload_size = min_t(size_t, mds->payload_size, SZ_1M);
- if (mds->payload_size < 256) {
+ cxl_mbox->payload_size = min_t(size_t, cxl_mbox->payload_size, SZ_1M);
+ if (cxl_mbox->payload_size < 256) {
dev_err(dev, "Mailbox is too small (%zub)",
- mds->payload_size);
+ cxl_mbox->payload_size);
return -ENXIO;
}
- dev_dbg(dev, "Mailbox payload sized %zu", mds->payload_size);
+ dev_dbg(dev, "Mailbox payload sized %zu", cxl_mbox->payload_size);
- rcuwait_init(&mds->mbox_wait);
INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mbox_sanitize_work);
/* background command interrupts are optional */
@@ -473,7 +477,6 @@
static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
struct cxl_register_map *map)
{
- struct cxl_port *port;
struct cxl_dport *dport;
resource_size_t component_reg_phys;
@@ -482,14 +485,12 @@
.resource = CXL_RESOURCE_NONE,
};
- port = cxl_pci_find_port(pdev, &dport);
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
if (!port)
return -EPROBE_DEFER;
component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
-
- put_device(&port->dev);
-
if (component_reg_phys == CXL_RESOURCE_NONE)
return -ENXIO;
@@ -578,9 +579,10 @@
*/
static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_get_event_payload *buf;
- buf = kvmalloc(mds->payload_size, GFP_KERNEL);
+ buf = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
if (!buf)
return -ENOMEM;
mds->event.buf = buf;
@@ -653,6 +655,7 @@
static int cxl_event_get_int_policy(struct cxl_memdev_state *mds,
struct cxl_event_interrupt_policy *policy)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_cmd mbox_cmd = {
.opcode = CXL_MBOX_OP_GET_EVT_INT_POLICY,
.payload_out = policy,
@@ -660,7 +663,7 @@
};
int rc;
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0)
dev_err(mds->cxlds.dev,
"Failed to get event interrupt policy : %d", rc);
@@ -671,6 +674,7 @@
static int cxl_event_config_msgnums(struct cxl_memdev_state *mds,
struct cxl_event_interrupt_policy *policy)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_cmd mbox_cmd;
int rc;
@@ -687,7 +691,7 @@
.size_in = sizeof(*policy),
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0) {
dev_err(mds->cxlds.dev, "Failed to set event interrupt policy : %d",
rc);
@@ -786,6 +790,23 @@
return 0;
}
+static int cxl_pci_type3_init_mailbox(struct cxl_dev_state *cxlds)
+{
+ int rc;
+
+ /*
+ * Fail the init if there's no mailbox. For a type3 this is out of spec.
+ */
+ if (!cxlds->reg_map.device_map.mbox.valid)
+ return -ENODEV;
+
+ rc = cxl_mailbox_init(&cxlds->cxl_mbox, cxlds->dev);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
@@ -846,6 +867,10 @@
if (rc)
dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
+ rc = cxl_pci_type3_init_mailbox(cxlds);
+ if (rc)
+ return rc;
+
rc = cxl_await_media_ready(cxlds);
if (rc == 0)
cxlds->media_ready = true;
diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
index 4ef93da..a6538a5 100644
--- a/drivers/cxl/pmem.c
+++ b/drivers/cxl/pmem.c
@@ -102,13 +102,15 @@
struct nd_cmd_get_config_size *cmd,
unsigned int buf_len)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
if (sizeof(*cmd) > buf_len)
return -EINVAL;
*cmd = (struct nd_cmd_get_config_size){
.config_size = mds->lsa_size,
.max_xfer =
- mds->payload_size - sizeof(struct cxl_mbox_set_lsa),
+ cxl_mbox->payload_size - sizeof(struct cxl_mbox_set_lsa),
};
return 0;
@@ -118,6 +120,7 @@
struct nd_cmd_get_config_data_hdr *cmd,
unsigned int buf_len)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_get_lsa get_lsa;
struct cxl_mbox_cmd mbox_cmd;
int rc;
@@ -139,7 +142,7 @@
.payload_out = cmd->out_buf,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
cmd->status = 0;
return rc;
@@ -149,6 +152,7 @@
struct nd_cmd_set_config_hdr *cmd,
unsigned int buf_len)
{
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_set_lsa *set_lsa;
struct cxl_mbox_cmd mbox_cmd;
int rc;
@@ -175,7 +179,7 @@
.size_in = struct_size(set_lsa, data, cmd->in_length),
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
/*
* Set "firmware" status (4-packed bytes at the end of the input
@@ -233,15 +237,13 @@
if (!is_cxl_nvdimm(dev))
return 0;
- device_lock(dev);
- if (!dev->driver)
- goto out;
-
- cxl_nvd = to_cxl_nvdimm(dev);
- if (cxl_nvd->cxlmd && cxl_nvd->cxlmd->cxl_nvb == data)
- release = true;
-out:
- device_unlock(dev);
+ scoped_guard(device, dev) {
+ if (dev->driver) {
+ cxl_nvd = to_cxl_nvdimm(dev);
+ if (cxl_nvd->cxlmd && cxl_nvd->cxlmd->cxl_nvb == data)
+ release = true;
+ }
+ }
if (release)
device_release_driver(dev);
return 0;
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index d7d5d98..861dde6 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -98,7 +98,7 @@
struct cxl_port *root;
int rc;
- rc = cxl_dvsec_rr_decode(cxlds->dev, cxlds->cxl_dvsec, &info);
+ rc = cxl_dvsec_rr_decode(cxlds->dev, port, &info);
if (rc < 0)
return rc;
diff --git a/drivers/cxl/security.c b/drivers/cxl/security.c
index 21856a3..452d1a9 100644
--- a/drivers/cxl/security.c
+++ b/drivers/cxl/security.c
@@ -14,6 +14,7 @@
{
struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
unsigned long security_flags = 0;
struct cxl_get_security_output {
@@ -29,7 +30,7 @@
.payload_out = &out,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0)
return 0;
@@ -70,7 +71,7 @@
{
struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
struct cxl_mbox_cmd mbox_cmd;
struct cxl_set_pass set_pass;
@@ -87,7 +88,7 @@
.payload_in = &set_pass,
};
- return cxl_internal_send_cmd(mds, &mbox_cmd);
+ return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
}
static int __cxl_pmem_security_disable(struct nvdimm *nvdimm,
@@ -96,7 +97,7 @@
{
struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
struct cxl_disable_pass dis_pass;
struct cxl_mbox_cmd mbox_cmd;
@@ -112,7 +113,7 @@
.payload_in = &dis_pass,
};
- return cxl_internal_send_cmd(mds, &mbox_cmd);
+ return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
}
static int cxl_pmem_security_disable(struct nvdimm *nvdimm,
@@ -131,12 +132,12 @@
{
struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
struct cxl_mbox_cmd mbox_cmd = {
.opcode = CXL_MBOX_OP_FREEZE_SECURITY,
};
- return cxl_internal_send_cmd(mds, &mbox_cmd);
+ return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
}
static int cxl_pmem_security_unlock(struct nvdimm *nvdimm,
@@ -144,7 +145,7 @@
{
struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
u8 pass[NVDIMM_PASSPHRASE_LEN];
struct cxl_mbox_cmd mbox_cmd;
int rc;
@@ -156,7 +157,7 @@
.payload_in = pass,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0)
return rc;
@@ -169,7 +170,7 @@
{
struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox;
struct cxl_mbox_cmd mbox_cmd;
struct cxl_pass_erase erase;
int rc;
@@ -185,7 +186,7 @@
.payload_in = &erase,
};
- rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+ rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
if (rc < 0)
return rc;
diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index 518eaa0..b360dca 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -1911,7 +1911,6 @@
const struct file_operations fw_device_ops = {
.owner = THIS_MODULE,
- .llseek = no_llseek,
.open = fw_device_op_open,
.read = fw_device_op_read,
.unlocked_ioctl = fw_device_op_ioctl,
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 69c1513..88c5c4f 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -2886,7 +2886,6 @@
static const struct file_operations fops_reset_counts = {
.owner = THIS_MODULE,
.open = simple_open,
- .llseek = no_llseek,
.write = reset_all_on_write,
};
diff --git a/drivers/firmware/arm_scmi/raw_mode.c b/drivers/firmware/arm_scmi/raw_mode.c
index 130d13e..9e89a6a 100644
--- a/drivers/firmware/arm_scmi/raw_mode.c
+++ b/drivers/firmware/arm_scmi/raw_mode.c
@@ -950,7 +950,6 @@
.open = scmi_dbg_raw_mode_open,
.release = scmi_dbg_raw_mode_release,
.write = scmi_dbg_raw_mode_reset_write,
- .llseek = no_llseek,
.owner = THIS_MODULE,
};
@@ -960,7 +959,6 @@
.read = scmi_dbg_raw_mode_message_read,
.write = scmi_dbg_raw_mode_message_write,
.poll = scmi_dbg_raw_mode_message_poll,
- .llseek = no_llseek,
.owner = THIS_MODULE,
};
@@ -977,7 +975,6 @@
.read = scmi_dbg_raw_mode_message_read,
.write = scmi_dbg_raw_mode_message_async_write,
.poll = scmi_dbg_raw_mode_message_poll,
- .llseek = no_llseek,
.owner = THIS_MODULE,
};
@@ -1001,7 +998,6 @@
.release = scmi_dbg_raw_mode_release,
.read = scmi_test_dbg_raw_mode_notif_read,
.poll = scmi_test_dbg_raw_mode_notif_poll,
- .llseek = no_llseek,
.owner = THIS_MODULE,
};
@@ -1025,7 +1021,6 @@
.release = scmi_dbg_raw_mode_release,
.read = scmi_test_dbg_raw_mode_errors_read,
.poll = scmi_test_dbg_raw_mode_errors_poll,
- .llseek = no_llseek,
.owner = THIS_MODULE,
};
diff --git a/drivers/firmware/efi/capsule-loader.c b/drivers/firmware/efi/capsule-loader.c
index 97bafb5..0c17bdd 100644
--- a/drivers/firmware/efi/capsule-loader.c
+++ b/drivers/firmware/efi/capsule-loader.c
@@ -309,7 +309,6 @@
.open = efi_capsule_open,
.write = efi_capsule_write,
.release = efi_capsule_release,
- .llseek = no_llseek,
};
static struct miscdevice efi_capsule_misc = {
diff --git a/drivers/firmware/efi/test/efi_test.c b/drivers/firmware/efi/test/efi_test.c
index 47d67bb..9e26287 100644
--- a/drivers/firmware/efi/test/efi_test.c
+++ b/drivers/firmware/efi/test/efi_test.c
@@ -750,7 +750,6 @@
.unlocked_ioctl = efi_test_ioctl,
.open = efi_test_open,
.release = efi_test_close,
- .llseek = no_llseek,
};
static struct miscdevice efi_test_dev = {
diff --git a/drivers/firmware/turris-mox-rwtm.c b/drivers/firmware/turris-mox-rwtm.c
index 525ebdc..f3bc0d4 100644
--- a/drivers/firmware/turris-mox-rwtm.c
+++ b/drivers/firmware/turris-mox-rwtm.c
@@ -386,7 +386,6 @@
.open = rwtm_debug_open,
.read = do_sign_read,
.write = do_sign_write,
- .llseek = no_llseek,
};
static void rwtm_debugfs_release(void *root)
diff --git a/drivers/gnss/core.c b/drivers/gnss/core.c
index 48f2ee0..883ef86 100644
--- a/drivers/gnss/core.c
+++ b/drivers/gnss/core.c
@@ -206,7 +206,6 @@
.read = gnss_read,
.write = gnss_write,
.poll = gnss_poll,
- .llseek = no_llseek,
};
static struct class *gnss_class;
diff --git a/drivers/gpio/gpio-mockup.c b/drivers/gpio/gpio-mockup.c
index 455eecf..d39c661 100644
--- a/drivers/gpio/gpio-mockup.c
+++ b/drivers/gpio/gpio-mockup.c
@@ -347,7 +347,6 @@
.open = gpio_mockup_debugfs_open,
.read = gpio_mockup_debugfs_read,
.write = gpio_mockup_debugfs_write,
- .llseek = no_llseek,
.release = single_release,
};
diff --git a/drivers/gpio/gpio-sloppy-logic-analyzer.c b/drivers/gpio/gpio-sloppy-logic-analyzer.c
index aed6d1f6..07e0d71 100644
--- a/drivers/gpio/gpio-sloppy-logic-analyzer.c
+++ b/drivers/gpio/gpio-sloppy-logic-analyzer.c
@@ -217,7 +217,6 @@
.owner = THIS_MODULE,
.open = trigger_open,
.write = trigger_write,
- .llseek = no_llseek,
.release = single_release,
};
diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index 5aac59d..78c9d9e 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -2842,7 +2842,6 @@
.poll = lineinfo_watch_poll,
.read = lineinfo_watch_read,
.owner = THIS_MODULE,
- .llseek = no_llseek,
.unlocked_ioctl = gpio_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = gpio_ioctl_compat,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index dcd5904..9b1e0ede 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1083,10 +1083,6 @@
struct amdgpu_virt virt;
- /* link all shadow bo */
- struct list_head shadow_list;
- struct mutex shadow_list_lock;
-
/* record hw reset is performed */
bool has_hw_reset;
u8 reset_magic[AMDGPU_RESET_MAGIC_NUM];
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index 57bda66..2ca1271 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
@@ -511,7 +511,7 @@
return -EINVAL;
}
- /* udpate aca bank to aca source error_cache first */
+ /* update aca bank to aca source error_cache first */
ret = aca_banks_update(adev, smu_type, handler_aca_log_bank_error, qctx, NULL);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 1254a43..3bc0cbf45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -950,28 +950,30 @@
* @inst: xcc's instance number on a multi-XCC setup
*/
static void get_wave_count(struct amdgpu_device *adev, int queue_idx,
- int *wave_cnt, int *vmid, uint32_t inst)
+ struct kfd_cu_occupancy *queue_cnt, uint32_t inst)
{
int pipe_idx;
int queue_slot;
unsigned int reg_val;
-
+ unsigned int wave_cnt;
/*
* Program GRBM with appropriate MEID, PIPEID, QUEUEID and VMID
* parameters to read out waves in flight. Get VMID if there are
* non-zero waves in flight.
*/
- *vmid = 0xFF;
- *wave_cnt = 0;
pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe;
queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe;
- soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0, inst);
- reg_val = RREG32_SOC15_IP(GC, SOC15_REG_OFFSET(GC, inst, mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
- queue_slot);
- *wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK;
- if (*wave_cnt != 0)
- *vmid = (RREG32_SOC15(GC, inst, mmCP_HQD_VMID) &
- CP_HQD_VMID__VMID_MASK) >> CP_HQD_VMID__VMID__SHIFT;
+ soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0, GET_INST(GC, inst));
+ reg_val = RREG32_SOC15_IP(GC, SOC15_REG_OFFSET(GC, GET_INST(GC, inst),
+ mmSPI_CSQ_WF_ACTIVE_COUNT_0) + queue_slot);
+ wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK;
+ if (wave_cnt != 0) {
+ queue_cnt->wave_cnt += wave_cnt;
+ queue_cnt->doorbell_off =
+ (RREG32_SOC15(GC, GET_INST(GC, inst), mmCP_HQD_PQ_DOORBELL_CONTROL) &
+ CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET_MASK) >>
+ CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
+ }
}
/**
@@ -981,9 +983,8 @@
* or more queues running and submitting waves to compute units.
*
* @adev: Handle of device from which to get number of waves in flight
- * @pasid: Identifies the process for which this query call is invoked
- * @pasid_wave_cnt: Output parameter updated with number of waves in flight that
- * belong to process with given pasid
+ * @cu_occupancy: Array that gets filled with wave_cnt and doorbell offset
+ * for comparison later.
* @max_waves_per_cu: Output parameter updated with maximum number of waves
* possible per Compute Unit
* @inst: xcc's instance number on a multi-XCC setup
@@ -1011,34 +1012,28 @@
* number of waves that are in flight for the queue at specified index. The
* index ranges from 0 to 7.
*
- * If non-zero waves are in flight, read CP_HQD_VMID register to obtain VMID
- * of the wave(s).
+ * If non-zero waves are in flight, store the corresponding doorbell offset
+ * of the queue, along with the wave count.
*
- * Determine if VMID from above step maps to pasid provided as parameter. If
- * it matches agrregate the wave count. That the VMID will not match pasid is
- * a normal condition i.e. a device is expected to support multiple queues
- * from multiple proceses.
+ * Determine if the queue belongs to the process by comparing the doorbell
+ * offset against the process's queues. If it matches, aggregate the wave
+ * count for the process.
*
* Reading registers referenced above involves programming GRBM appropriately
*/
-void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device *adev, int pasid,
- int *pasid_wave_cnt, int *max_waves_per_cu, uint32_t inst)
+void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device *adev,
+ struct kfd_cu_occupancy *cu_occupancy,
+ int *max_waves_per_cu, uint32_t inst)
{
int qidx;
- int vmid;
int se_idx;
- int sh_idx;
int se_cnt;
- int sh_cnt;
- int wave_cnt;
int queue_map;
- int pasid_tmp;
int max_queue_cnt;
- int vmid_wave_cnt = 0;
DECLARE_BITMAP(cp_queue_bitmap, AMDGPU_MAX_QUEUES);
lock_spi_csq_mutexes(adev);
- soc15_grbm_select(adev, 1, 0, 0, 0, inst);
+ soc15_grbm_select(adev, 1, 0, 0, 0, GET_INST(GC, inst));
/*
* Iterate through the shader engines and arrays of the device
@@ -1048,51 +1043,38 @@
AMDGPU_MAX_QUEUES);
max_queue_cnt = adev->gfx.mec.num_pipe_per_mec *
adev->gfx.mec.num_queue_per_pipe;
- sh_cnt = adev->gfx.config.max_sh_per_se;
se_cnt = adev->gfx.config.max_shader_engines;
for (se_idx = 0; se_idx < se_cnt; se_idx++) {
- for (sh_idx = 0; sh_idx < sh_cnt; sh_idx++) {
+ amdgpu_gfx_select_se_sh(adev, se_idx, 0, 0xffffffff, inst);
+ queue_map = RREG32_SOC15(GC, GET_INST(GC, inst), mmSPI_CSQ_WF_ACTIVE_STATUS);
- amdgpu_gfx_select_se_sh(adev, se_idx, sh_idx, 0xffffffff, inst);
- queue_map = RREG32_SOC15(GC, inst, mmSPI_CSQ_WF_ACTIVE_STATUS);
-
- /*
- * Assumption: queue map encodes following schema: four
- * pipes per each micro-engine, with each pipe mapping
- * eight queues. This schema is true for GFX9 devices
- * and must be verified for newer device families
+ /*
+ * Assumption: queue map encodes following schema: four
+ * pipes per each micro-engine, with each pipe mapping
+ * eight queues. This schema is true for GFX9 devices
+ * and must be verified for newer device families
+ */
+ for (qidx = 0; qidx < max_queue_cnt; qidx++) {
+ /* Skip qeueus that are not associated with
+ * compute functions
*/
- for (qidx = 0; qidx < max_queue_cnt; qidx++) {
+ if (!test_bit(qidx, cp_queue_bitmap))
+ continue;
- /* Skip qeueus that are not associated with
- * compute functions
- */
- if (!test_bit(qidx, cp_queue_bitmap))
- continue;
+ if (!(queue_map & (1 << qidx)))
+ continue;
- if (!(queue_map & (1 << qidx)))
- continue;
-
- /* Get number of waves in flight and aggregate them */
- get_wave_count(adev, qidx, &wave_cnt, &vmid,
- inst);
- if (wave_cnt != 0) {
- pasid_tmp =
- RREG32(SOC15_REG_OFFSET(OSSSYS, inst,
- mmIH_VMID_0_LUT) + vmid);
- if (pasid_tmp == pasid)
- vmid_wave_cnt += wave_cnt;
- }
- }
+ /* Get number of waves in flight and aggregate them */
+ get_wave_count(adev, qidx, &cu_occupancy[qidx],
+ inst);
}
}
amdgpu_gfx_select_se_sh(adev, 0xffffffff, 0xffffffff, 0xffffffff, inst);
- soc15_grbm_select(adev, 0, 0, 0, 0, inst);
+ soc15_grbm_select(adev, 0, 0, 0, 0, GET_INST(GC, inst));
unlock_spi_csq_mutexes(adev);
/* Update the output parameters and return */
- *pasid_wave_cnt = vmid_wave_cnt;
*max_waves_per_cu = adev->gfx.cu_info.simd_per_cu *
adev->gfx.cu_info.max_waves_per_simd;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
index 988c50a..b6a91a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
@@ -52,8 +52,9 @@
uint8_t vmid, uint16_t *p_pasid);
void kgd_gfx_v9_set_vm_context_page_table_base(struct amdgpu_device *adev,
uint32_t vmid, uint64_t page_table_base);
-void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device *adev, int pasid,
- int *pasid_wave_cnt, int *max_waves_per_cu, uint32_t inst);
+void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device *adev,
+ struct kfd_cu_occupancy *cu_occupancy,
+ int *max_waves_per_cu, uint32_t inst);
void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev,
uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr,
uint32_t inst);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 4afef5b..ce5ca30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1499,7 +1499,7 @@
}
}
- ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0);
+ ret = amdgpu_bo_pin(bo, domain);
if (ret)
pr_err("Error in Pinning BO to domain: %d\n", domain);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
index 42e64bc..45affc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
@@ -87,8 +87,9 @@
* part of the system bios. On boot, the system bios puts a
* copy of the igp rom at the start of vram if a discrete card is
* present.
+ * For SR-IOV, the vbios image is also put in VRAM in the VF.
*/
-static bool igp_read_bios_from_vram(struct amdgpu_device *adev)
+static bool amdgpu_read_bios_from_vram(struct amdgpu_device *adev)
{
uint8_t __iomem *bios;
resource_size_t vram_base;
@@ -284,10 +285,6 @@
acpi_status status;
bool found = false;
- /* ATRM is for the discrete card only */
- if (adev->flags & AMD_IS_APU)
- return false;
-
/* ATRM is for on-platform devices only */
if (dev_is_removable(&adev->pdev->dev))
return false;
@@ -343,11 +340,8 @@
static bool amdgpu_read_disabled_bios(struct amdgpu_device *adev)
{
- if (adev->flags & AMD_IS_APU)
- return igp_read_bios_from_vram(adev);
- else
- return (!adev->asic_funcs || !adev->asic_funcs->read_disabled_bios) ?
- false : amdgpu_asic_read_disabled_bios(adev);
+ return (!adev->asic_funcs || !adev->asic_funcs->read_disabled_bios) ?
+ false : amdgpu_asic_read_disabled_bios(adev);
}
#ifdef CONFIG_ACPI
@@ -414,7 +408,36 @@
}
#endif
-bool amdgpu_get_bios(struct amdgpu_device *adev)
+static bool amdgpu_get_bios_apu(struct amdgpu_device *adev)
+{
+ if (amdgpu_acpi_vfct_bios(adev)) {
+ dev_info(adev->dev, "Fetched VBIOS from VFCT\n");
+ goto success;
+ }
+
+ if (amdgpu_read_bios_from_vram(adev)) {
+ dev_info(adev->dev, "Fetched VBIOS from VRAM BAR\n");
+ goto success;
+ }
+
+ if (amdgpu_read_bios(adev)) {
+ dev_info(adev->dev, "Fetched VBIOS from ROM BAR\n");
+ goto success;
+ }
+
+ if (amdgpu_read_platform_bios(adev)) {
+ dev_info(adev->dev, "Fetched VBIOS from platform\n");
+ goto success;
+ }
+
+ dev_err(adev->dev, "Unable to locate a BIOS ROM\n");
+ return false;
+
+success:
+ return true;
+}
+
+static bool amdgpu_get_bios_dgpu(struct amdgpu_device *adev)
{
if (amdgpu_atrm_get_bios(adev)) {
dev_info(adev->dev, "Fetched VBIOS from ATRM\n");
@@ -426,7 +449,8 @@
goto success;
}
- if (igp_read_bios_from_vram(adev)) {
+ /* this is required for SR-IOV */
+ if (amdgpu_read_bios_from_vram(adev)) {
dev_info(adev->dev, "Fetched VBIOS from VRAM BAR\n");
goto success;
}
@@ -455,10 +479,24 @@
return false;
success:
- adev->is_atom_fw = adev->asic_type >= CHIP_VEGA10;
return true;
}
+bool amdgpu_get_bios(struct amdgpu_device *adev)
+{
+ bool found;
+
+ if (adev->flags & AMD_IS_APU)
+ found = amdgpu_get_bios_apu(adev);
+ else
+ found = amdgpu_get_bios_dgpu(adev);
+
+ if (found)
+ adev->is_atom_fw = adev->asic_type >= CHIP_VEGA10;
+
+ return found;
+}
+
/* helper function for soc15 and onwards to read bios from rom */
bool amdgpu_soc15_read_bios_from_rom(struct amdgpu_device *adev,
u8 *bios, u32 length_bytes)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f462841..c2394c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4107,9 +4107,6 @@
spin_lock_init(&adev->mm_stats.lock);
spin_lock_init(&adev->wb.lock);
- INIT_LIST_HEAD(&adev->shadow_list);
- mutex_init(&adev->shadow_list_lock);
-
INIT_LIST_HEAD(&adev->reset_list);
INIT_LIST_HEAD(&adev->ras_list);
@@ -5030,80 +5027,6 @@
}
/**
- * amdgpu_device_recover_vram - Recover some VRAM contents
- *
- * @adev: amdgpu_device pointer
- *
- * Restores the contents of VRAM buffers from the shadows in GTT. Used to
- * restore things like GPUVM page tables after a GPU reset where
- * the contents of VRAM might be lost.
- *
- * Returns:
- * 0 on success, negative error code on failure.
- */
-static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
-{
- struct dma_fence *fence = NULL, *next = NULL;
- struct amdgpu_bo *shadow;
- struct amdgpu_bo_vm *vmbo;
- long r = 1, tmo;
-
- if (amdgpu_sriov_runtime(adev))
- tmo = msecs_to_jiffies(8000);
- else
- tmo = msecs_to_jiffies(100);
-
- dev_info(adev->dev, "recover vram bo from shadow start\n");
- mutex_lock(&adev->shadow_list_lock);
- list_for_each_entry(vmbo, &adev->shadow_list, shadow_list) {
- /* If vm is compute context or adev is APU, shadow will be NULL */
- if (!vmbo->shadow)
- continue;
- shadow = vmbo->shadow;
-
- /* No need to recover an evicted BO */
- if (!shadow->tbo.resource ||
- shadow->tbo.resource->mem_type != TTM_PL_TT ||
- shadow->tbo.resource->start == AMDGPU_BO_INVALID_OFFSET ||
- shadow->parent->tbo.resource->mem_type != TTM_PL_VRAM)
- continue;
-
- r = amdgpu_bo_restore_shadow(shadow, &next);
- if (r)
- break;
-
- if (fence) {
- tmo = dma_fence_wait_timeout(fence, false, tmo);
- dma_fence_put(fence);
- fence = next;
- if (tmo == 0) {
- r = -ETIMEDOUT;
- break;
- } else if (tmo < 0) {
- r = tmo;
- break;
- }
- } else {
- fence = next;
- }
- }
- mutex_unlock(&adev->shadow_list_lock);
-
- if (fence)
- tmo = dma_fence_wait_timeout(fence, false, tmo);
- dma_fence_put(fence);
-
- if (r < 0 || tmo <= 0) {
- dev_err(adev->dev, "recover vram bo from shadow failed, r is %ld, tmo is %ld\n", r, tmo);
- return -EIO;
- }
-
- dev_info(adev->dev, "recover vram bo from shadow done\n");
- return 0;
-}
-
-
-/**
* amdgpu_device_reset_sriov - reset ASIC for SR-IOV vf
*
* @adev: amdgpu_device pointer
@@ -5165,12 +5088,8 @@
if (r)
return r;
- if (adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
+ if (adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST)
amdgpu_inc_vram_lost(adev);
- r = amdgpu_device_recover_vram(adev);
- }
- if (r)
- return r;
/* need to be called during full access so we can't do it later like
* bare-metal does.
@@ -5569,9 +5488,7 @@
}
}
- if (!r)
- r = amdgpu_device_recover_vram(tmp_adev);
- else
+ if (r)
tmp_adev->asic_reset_res = r;
}
@@ -6189,7 +6106,7 @@
p2p_addressable = !(adev->gmc.aper_base & address_mask ||
aper_limit & address_mask);
}
- return is_large_bar && p2p_access && p2p_addressable;
+ return pcie_p2p && is_large_bar && p2p_access && p2p_addressable;
#else
return false;
#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 092ec11..b119d27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -233,6 +233,7 @@
}
if (!adev->enable_virtual_display) {
+ new_abo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(new_abo,
amdgpu_display_supported_domains(adev, new_abo->flags));
if (unlikely(r != 0)) {
@@ -1474,7 +1475,7 @@
if ((!(mode->flags & DRM_MODE_FLAG_INTERLACE)) &&
((amdgpu_encoder->underscan_type == UNDERSCAN_ON) ||
((amdgpu_encoder->underscan_type == UNDERSCAN_AUTO) &&
- connector->display_info.is_hdmi &&
+ connector && connector->display_info.is_hdmi &&
amdgpu_display_is_hdtv_mode(mode)))) {
if (amdgpu_encoder->underscan_hborder != 0)
amdgpu_crtc->h_border = amdgpu_encoder->underscan_hborder;
@@ -1759,6 +1760,7 @@
r = amdgpu_bo_reserve(aobj, true);
if (r == 0) {
+ aobj->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(aobj, AMDGPU_GEM_DOMAIN_VRAM);
if (r != 0)
dev_err(adev->dev, "Failed to pin cursor BO (%d)\n", r);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f57411e..81d9877 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -117,9 +117,10 @@
* - 3.56.0 - Update IB start address and size alignment for decode and encode
* - 3.57.0 - Compute tunneling on GFX10+
* - 3.58.0 - Add GFX12 DCC support
+ * - 3.59.0 - Cleared VRAM
*/
#define KMS_DRIVER_MAJOR 3
-#define KMS_DRIVER_MINOR 58
+#define KMS_DRIVER_MINOR 59
#define KMS_DRIVER_PATCHLEVEL 0
/*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 0e617df..1a5df8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -43,8 +43,6 @@
#include "amdgpu_hmm.h"
#include "amdgpu_xgmi.h"
-static const struct drm_gem_object_funcs amdgpu_gem_object_funcs;
-
static vm_fault_t amdgpu_gem_fault(struct vm_fault *vmf)
{
struct ttm_buffer_object *bo = vmf->vma->vm_private_data;
@@ -87,11 +85,11 @@
static void amdgpu_gem_object_free(struct drm_gem_object *gobj)
{
- struct amdgpu_bo *robj = gem_to_amdgpu_bo(gobj);
+ struct amdgpu_bo *aobj = gem_to_amdgpu_bo(gobj);
- if (robj) {
- amdgpu_hmm_unregister(robj);
- amdgpu_bo_unref(&robj);
+ if (aobj) {
+ amdgpu_hmm_unregister(aobj);
+ ttm_bo_put(&aobj->tbo);
}
}
@@ -126,7 +124,6 @@
bo = &ubo->bo;
*obj = &bo->tbo.base;
- (*obj)->funcs = &amdgpu_gem_object_funcs;
return 0;
}
@@ -295,7 +292,7 @@
return drm_gem_ttm_mmap(obj, vma);
}
-static const struct drm_gem_object_funcs amdgpu_gem_object_funcs = {
+const struct drm_gem_object_funcs amdgpu_gem_object_funcs = {
.free = amdgpu_gem_object_free,
.open = amdgpu_gem_object_open,
.close = amdgpu_gem_object_close,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h
index f302647..3a8f579 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h
@@ -33,6 +33,8 @@
#define AMDGPU_GEM_DOMAIN_MAX 0x3
#define gem_to_amdgpu_bo(gobj) container_of((gobj), struct amdgpu_bo, tbo.base)
+extern const struct drm_gem_object_funcs amdgpu_gem_object_funcs;
+
unsigned long amdgpu_gem_timeout(uint64_t timeout_ns);
/*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ad6bf5d..16f2605 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -107,8 +107,11 @@
/*
* Do the coredump immediately after a job timeout to get a very
* close dump/snapshot/representation of GPU's current error status
+ * Skip it for SRIOV, since VF FLR will be triggered by host driver
+ * before job timeout
*/
- amdgpu_job_core_dump(adev, job);
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_job_core_dump(adev, job);
if (amdgpu_gpu_recovery &&
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e32161f..44819cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -77,24 +77,6 @@
amdgpu_bo_destroy(tbo);
}
-static void amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo)
-{
- struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
- struct amdgpu_bo *shadow_bo = ttm_to_amdgpu_bo(tbo), *bo;
- struct amdgpu_bo_vm *vmbo;
-
- bo = shadow_bo->parent;
- vmbo = to_amdgpu_bo_vm(bo);
- /* in case amdgpu_device_recover_vram got NULL of bo->parent */
- if (!list_empty(&vmbo->shadow_list)) {
- mutex_lock(&adev->shadow_list_lock);
- list_del_init(&vmbo->shadow_list);
- mutex_unlock(&adev->shadow_list_lock);
- }
-
- amdgpu_bo_destroy(tbo);
-}
-
/**
* amdgpu_bo_is_amdgpu_bo - check if the buffer object is an &amdgpu_bo
* @bo: buffer object to be checked
@@ -108,8 +90,7 @@
bool amdgpu_bo_is_amdgpu_bo(struct ttm_buffer_object *bo)
{
if (bo->destroy == &amdgpu_bo_destroy ||
- bo->destroy == &amdgpu_bo_user_destroy ||
- bo->destroy == &amdgpu_bo_vm_destroy)
+ bo->destroy == &amdgpu_bo_user_destroy)
return true;
return false;
@@ -583,6 +564,7 @@
if (bo == NULL)
return -ENOMEM;
drm_gem_private_object_init(adev_to_drm(adev), &bo->tbo.base, size);
+ bo->tbo.base.funcs = &amdgpu_gem_object_funcs;
bo->vm_bo = NULL;
bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
bp->domain;
@@ -723,52 +705,6 @@
}
/**
- * amdgpu_bo_add_to_shadow_list - add a BO to the shadow list
- *
- * @vmbo: BO that will be inserted into the shadow list
- *
- * Insert a BO to the shadow list.
- */
-void amdgpu_bo_add_to_shadow_list(struct amdgpu_bo_vm *vmbo)
-{
- struct amdgpu_device *adev = amdgpu_ttm_adev(vmbo->bo.tbo.bdev);
-
- mutex_lock(&adev->shadow_list_lock);
- list_add_tail(&vmbo->shadow_list, &adev->shadow_list);
- vmbo->shadow->parent = amdgpu_bo_ref(&vmbo->bo);
- vmbo->shadow->tbo.destroy = &amdgpu_bo_vm_destroy;
- mutex_unlock(&adev->shadow_list_lock);
-}
-
-/**
- * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
- *
- * @shadow: &amdgpu_bo shadow to be restored
- * @fence: dma_fence associated with the operation
- *
- * Copies a buffer object's shadow content back to the object.
- * This is used for recovering a buffer from its shadow in case of a gpu
- * reset where vram context may be lost.
- *
- * Returns:
- * 0 for success or a negative error code on failure.
- */
-int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence)
-
-{
- struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
- struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
- uint64_t shadow_addr, parent_addr;
-
- shadow_addr = amdgpu_bo_gpu_offset(shadow);
- parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
-
- return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
- amdgpu_bo_size(shadow), NULL, fence,
- true, false, 0);
-}
-
-/**
* amdgpu_bo_kmap - map an &amdgpu_bo buffer object
* @bo: &amdgpu_bo buffer object to be mapped
* @ptr: kernel virtual address to be returned
@@ -851,7 +787,7 @@
if (bo == NULL)
return NULL;
- ttm_bo_get(&bo->tbo);
+ drm_gem_object_get(&bo->tbo.base);
return bo;
}
@@ -863,40 +799,30 @@
*/
void amdgpu_bo_unref(struct amdgpu_bo **bo)
{
- struct ttm_buffer_object *tbo;
-
if ((*bo) == NULL)
return;
- tbo = &((*bo)->tbo);
- ttm_bo_put(tbo);
+ drm_gem_object_put(&(*bo)->tbo.base);
*bo = NULL;
}
/**
- * amdgpu_bo_pin_restricted - pin an &amdgpu_bo buffer object
+ * amdgpu_bo_pin - pin an &amdgpu_bo buffer object
* @bo: &amdgpu_bo buffer object to be pinned
* @domain: domain to be pinned to
- * @min_offset: the start of requested address range
- * @max_offset: the end of requested address range
*
- * Pins the buffer object according to requested domain and address range. If
- * the memory is unbound gart memory, binds the pages into gart table. Adjusts
- * pin_count and pin_size accordingly.
+ * Pins the buffer object according to requested domain. If the memory is
+ * unbound gart memory, binds the pages into gart table. Adjusts pin_count and
+ * pin_size accordingly.
*
* Pinning means to lock pages in memory along with keeping them at a fixed
* offset. It is required when a buffer can not be moved, for example, when
* a display buffer is being scanned out.
*
- * Compared with amdgpu_bo_pin(), this function gives more flexibility on
- * where to pin a buffer if there are specific restrictions on where a buffer
- * must be located.
- *
* Returns:
* 0 for success or a negative error code on failure.
*/
-int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
- u64 min_offset, u64 max_offset)
+int amdgpu_bo_pin(struct amdgpu_bo *bo, u32 domain)
{
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct ttm_operation_ctx ctx = { false, false };
@@ -905,9 +831,6 @@
if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
return -EPERM;
- if (WARN_ON_ONCE(min_offset > max_offset))
- return -EINVAL;
-
/* Check domain to be pinned to against preferred domains */
if (bo->preferred_domains & domain)
domain = bo->preferred_domains & domain;
@@ -933,14 +856,6 @@
return -EINVAL;
ttm_bo_pin(&bo->tbo);
-
- if (max_offset != 0) {
- u64 domain_start = amdgpu_ttm_domain_start(adev,
- mem_type);
- WARN_ON_ONCE(max_offset <
- (amdgpu_bo_gpu_offset(bo) - domain_start));
- }
-
return 0;
}
@@ -957,17 +872,6 @@
bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
amdgpu_bo_placement_from_domain(bo, domain);
for (i = 0; i < bo->placement.num_placement; i++) {
- unsigned int fpfn, lpfn;
-
- fpfn = min_offset >> PAGE_SHIFT;
- lpfn = max_offset >> PAGE_SHIFT;
-
- if (fpfn > bo->placements[i].fpfn)
- bo->placements[i].fpfn = fpfn;
- if (!bo->placements[i].lpfn ||
- (lpfn && lpfn < bo->placements[i].lpfn))
- bo->placements[i].lpfn = lpfn;
-
if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
bo->placements[i].mem_type == TTM_PL_VRAM)
bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
@@ -994,24 +898,6 @@
}
/**
- * amdgpu_bo_pin - pin an &amdgpu_bo buffer object
- * @bo: &amdgpu_bo buffer object to be pinned
- * @domain: domain to be pinned to
- *
- * A simple wrapper to amdgpu_bo_pin_restricted().
- * Provides a simpler API for buffers that do not have any strict restrictions
- * on where a buffer must be located.
- *
- * Returns:
- * 0 for success or a negative error code on failure.
- */
-int amdgpu_bo_pin(struct amdgpu_bo *bo, u32 domain)
-{
- bo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
- return amdgpu_bo_pin_restricted(bo, domain, 0, 0);
-}
-
-/**
* amdgpu_bo_unpin - unpin an &amdgpu_bo buffer object
* @bo: &amdgpu_bo buffer object to be unpinned
*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index d7e2795..717e47b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -136,8 +136,6 @@
struct amdgpu_bo_vm {
struct amdgpu_bo bo;
- struct amdgpu_bo *shadow;
- struct list_head shadow_list;
struct amdgpu_vm_bo_base entries[];
};
@@ -275,22 +273,6 @@
return bo->flags & AMDGPU_GEM_CREATE_ENCRYPTED;
}
-/**
- * amdgpu_bo_shadowed - check if the BO is shadowed
- *
- * @bo: BO to be tested.
- *
- * Returns:
- * NULL if not shadowed or else return a BO pointer.
- */
-static inline struct amdgpu_bo *amdgpu_bo_shadowed(struct amdgpu_bo *bo)
-{
- if (bo->tbo.type == ttm_bo_type_kernel)
- return to_amdgpu_bo_vm(bo)->shadow;
-
- return NULL;
-}
-
bool amdgpu_bo_is_amdgpu_bo(struct ttm_buffer_object *bo);
void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain);
@@ -322,8 +304,6 @@
struct amdgpu_bo *amdgpu_bo_ref(struct amdgpu_bo *bo);
void amdgpu_bo_unref(struct amdgpu_bo **bo);
int amdgpu_bo_pin(struct amdgpu_bo *bo, u32 domain);
-int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
- u64 min_offset, u64 max_offset);
void amdgpu_bo_unpin(struct amdgpu_bo *bo);
int amdgpu_bo_init(struct amdgpu_device *adev);
void amdgpu_bo_fini(struct amdgpu_device *adev);
@@ -349,9 +329,6 @@
u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo);
void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
struct amdgpu_mem_stats *stats);
-void amdgpu_bo_add_to_shadow_list(struct amdgpu_bo_vm *vmbo);
-int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
- struct dma_fence **fence);
uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev,
uint32_t domain);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 189574d..0b28b2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -2853,7 +2853,7 @@
if (ret)
return ret;
- /* Start rlc autoload after psp recieved all the gfx firmware */
+ /* Start rlc autoload after psp received all the gfx firmware */
if (psp->autoload_supported && ucode->ucode_id == (amdgpu_sriov_vf(adev) ?
adev->virt.autoload_ucode_id : AMDGPU_UCODE_ID_RLC_G)) {
ret = psp_rlc_autoload_start(psp);
@@ -3425,9 +3425,11 @@
const struct psp_firmware_header_v1_2 *sos_hdr_v1_2;
const struct psp_firmware_header_v1_3 *sos_hdr_v1_3;
const struct psp_firmware_header_v2_0 *sos_hdr_v2_0;
- int err = 0;
+ const struct psp_firmware_header_v2_1 *sos_hdr_v2_1;
+ int fw_index, fw_bin_count, start_index = 0;
+ const struct psp_fw_bin_desc *fw_bin;
uint8_t *ucode_array_start_addr;
- int fw_index = 0;
+ int err = 0;
err = amdgpu_ucode_request(adev, &adev->psp.sos_fw, "amdgpu/%s_sos.bin", chip_name);
if (err)
@@ -3478,15 +3480,30 @@
case 2:
sos_hdr_v2_0 = (const struct psp_firmware_header_v2_0 *)adev->psp.sos_fw->data;
- if (le32_to_cpu(sos_hdr_v2_0->psp_fw_bin_count) >= UCODE_MAX_PSP_PACKAGING) {
+ fw_bin_count = le32_to_cpu(sos_hdr_v2_0->psp_fw_bin_count);
+
+ if (fw_bin_count >= UCODE_MAX_PSP_PACKAGING) {
dev_err(adev->dev, "packed SOS count exceeds maximum limit\n");
err = -EINVAL;
goto out;
}
- for (fw_index = 0; fw_index < le32_to_cpu(sos_hdr_v2_0->psp_fw_bin_count); fw_index++) {
- err = parse_sos_bin_descriptor(psp,
- &sos_hdr_v2_0->psp_fw_bin[fw_index],
+ if (sos_hdr_v2_0->header.header_version_minor == 1) {
+ sos_hdr_v2_1 = (const struct psp_firmware_header_v2_1 *)adev->psp.sos_fw->data;
+
+ fw_bin = sos_hdr_v2_1->psp_fw_bin;
+
+ if (psp_is_aux_sos_load_required(psp))
+ start_index = le32_to_cpu(sos_hdr_v2_1->psp_aux_fw_bin_index);
+ else
+ fw_bin_count -= le32_to_cpu(sos_hdr_v2_1->psp_aux_fw_bin_index);
+
+ } else {
+ fw_bin = sos_hdr_v2_0->psp_fw_bin;
+ }
+
+ for (fw_index = start_index; fw_index < fw_bin_count; fw_index++) {
+ err = parse_sos_bin_descriptor(psp, fw_bin + fw_index,
sos_hdr_v2_0);
if (err)
goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
index 74a9651..e8abbbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h
@@ -138,6 +138,7 @@
int (*vbflash_stat)(struct psp_context *psp);
int (*fatal_error_recovery_quirk)(struct psp_context *psp);
bool (*get_ras_capability)(struct psp_context *psp);
+ bool (*is_aux_sos_load_required)(struct psp_context *psp);
};
struct ta_funcs {
@@ -464,6 +465,9 @@
((psp)->funcs->fatal_error_recovery_quirk ? \
(psp)->funcs->fatal_error_recovery_quirk((psp)) : 0)
+#define psp_is_aux_sos_load_required(psp) \
+ ((psp)->funcs->is_aux_sos_load_required ? (psp)->funcs->is_aux_sos_load_required((psp)) : 0)
+
extern const struct amd_ip_funcs psp_ip_funcs;
extern const struct amdgpu_ip_block_version psp_v3_1_ip_block;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 61a2f38..1a1395c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -882,7 +882,7 @@
if (ret)
return ret;
- /* gfx block ras dsiable cmd must send to ras-ta */
+ /* gfx block ras disable cmd must send to ras-ta */
if (head->block == AMDGPU_RAS_BLOCK__GFX)
con->features |= BIT(head->block);
@@ -3468,6 +3468,11 @@
/* aca is disabled by default */
adev->aca.is_enabled = false;
+
+ /* bad page feature is not applicable to specific app platform */
+ if (adev->gmc.is_app_apu &&
+ amdgpu_ip_version(adev, UMC_HWIP, 0) == IP_VERSION(12, 0, 0))
+ amdgpu_bad_page_threshold = 0;
}
static void amdgpu_ras_counte_dw(struct work_struct *work)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index aab8077..f28f6b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -58,7 +58,7 @@
#define EEPROM_I2C_MADDR_4 0x40000
/*
- * The 2 macros bellow represent the actual size in bytes that
+ * The 2 macros below represent the actual size in bytes that
* those entities occupy in the EEPROM memory.
* RAS_TABLE_RECORD_SIZE is different than sizeof(eeprom_table_record) which
* uses uint64 to store 6b fields such as retired_page.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index bdf1ef8..c586ab4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -260,6 +260,36 @@
return 0;
}
+/**
+ * amdgpu_sync_kfd - sync to KFD fences
+ *
+ * @sync: sync object to add KFD fences to
+ * @resv: reservation object with KFD fences
+ *
+ * Extract all KFD fences and add them to the sync object.
+ */
+int amdgpu_sync_kfd(struct amdgpu_sync *sync, struct dma_resv *resv)
+{
+ struct dma_resv_iter cursor;
+ struct dma_fence *f;
+ int r = 0;
+
+ dma_resv_iter_begin(&cursor, resv, DMA_RESV_USAGE_BOOKKEEP);
+ dma_resv_for_each_fence_unlocked(&cursor, f) {
+ void *fence_owner = amdgpu_sync_get_owner(f);
+
+ if (fence_owner != AMDGPU_FENCE_OWNER_KFD)
+ continue;
+
+ r = amdgpu_sync_fence(sync, f);
+ if (r)
+ break;
+ }
+ dma_resv_iter_end(&cursor);
+
+ return r;
+}
+
/* Free the entry back to the slab */
static void amdgpu_sync_entry_free(struct amdgpu_sync_entry *e)
{
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
index cf1e9e8..e3272dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
@@ -51,6 +51,7 @@
int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync,
struct dma_resv *resv, enum amdgpu_sync_mode mode,
void *owner);
+int amdgpu_sync_kfd(struct amdgpu_sync *sync, struct dma_resv *resv);
struct dma_fence *amdgpu_sync_peek_fence(struct amdgpu_sync *sync,
struct amdgpu_ring *ring);
struct dma_fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b8bc7fa..74adb983 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1970,7 +1970,7 @@
DRM_INFO("amdgpu: %uM of GTT memory ready.\n",
(unsigned int)(gtt_size / (1024 * 1024)));
- /* Initiailize doorbell pool on PCI BAR */
+ /* Initialize doorbell pool on PCI BAR */
r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_DOORBELL, adev->doorbell.size / PAGE_SIZE);
if (r) {
DRM_ERROR("Failed initializing doorbell heap.\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
index 5bc37ac..4e23419 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
@@ -136,6 +136,14 @@
struct psp_fw_bin_desc psp_fw_bin[];
};
+/* version_major=2, version_minor=1 */
+struct psp_firmware_header_v2_1 {
+ struct common_firmware_header header;
+ uint32_t psp_fw_bin_count;
+ uint32_t psp_aux_fw_bin_index;
+ struct psp_fw_bin_desc psp_fw_bin[];
+};
+
/* version_major=1, version_minor=0 */
struct ta_firmware_header_v1_0 {
struct common_firmware_header header;
@@ -426,6 +434,7 @@
struct psp_firmware_header_v1_1 psp_v1_1;
struct psp_firmware_header_v1_3 psp_v1_3;
struct psp_firmware_header_v2_0 psp_v2_0;
+ struct psp_firmware_header_v2_0 psp_v2_1;
struct ta_firmware_header_v1_0 ta;
struct ta_firmware_header_v2_0 ta_v2_0;
struct gfx_firmware_header_v1_0 gfx;
@@ -447,7 +456,7 @@
uint8_t raw[0x100];
};
-#define UCODE_MAX_PSP_PACKAGING ((sizeof(union amdgpu_firmware_header) - sizeof(struct common_firmware_header) - 4) / sizeof(struct psp_fw_bin_desc))
+#define UCODE_MAX_PSP_PACKAGING (((sizeof(union amdgpu_firmware_header) - sizeof(struct common_firmware_header) - 4) / sizeof(struct psp_fw_bin_desc)) * 2)
/*
* fw loading support
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index e5f508d..d4c2afa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -338,6 +338,7 @@
else
domain = AMDGPU_GEM_DOMAIN_VRAM;
+ rbo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(rbo, domain);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 2452dfa..6005280 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -465,7 +465,6 @@
{
uint64_t new_vm_generation = amdgpu_vm_generation(adev, vm);
struct amdgpu_vm_bo_base *bo_base;
- struct amdgpu_bo *shadow;
struct amdgpu_bo *bo;
int r;
@@ -486,16 +485,10 @@
spin_unlock(&vm->status_lock);
bo = bo_base->bo;
- shadow = amdgpu_bo_shadowed(bo);
r = validate(param, bo);
if (r)
return r;
- if (shadow) {
- r = validate(param, shadow);
- if (r)
- return r;
- }
if (bo->tbo.type != ttm_bo_type_kernel) {
amdgpu_vm_bo_moved(bo_base);
@@ -1176,6 +1169,12 @@
AMDGPU_SYNC_EQ_OWNER, vm);
if (r)
goto error_free;
+ if (bo) {
+ r = amdgpu_sync_kfd(&sync, bo->tbo.base.resv);
+ if (r)
+ goto error_free;
+ }
+
} else {
struct drm_gem_object *obj = &bo->tbo.base;
@@ -2149,10 +2148,6 @@
{
struct amdgpu_vm_bo_base *bo_base;
- /* shadow bo doesn't have bo base, its validation needs its parent */
- if (bo->parent && (amdgpu_bo_shadowed(bo->parent) == bo))
- bo = bo->parent;
-
for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
struct amdgpu_vm *vm = bo_base->vm;
@@ -2482,7 +2477,6 @@
root_bo = amdgpu_bo_ref(&root->bo);
r = amdgpu_bo_reserve(root_bo, true);
if (r) {
- amdgpu_bo_unref(&root->shadow);
amdgpu_bo_unref(&root_bo);
goto error_free_delayed;
}
@@ -2575,11 +2569,6 @@
vm->last_update = dma_fence_get_stub();
vm->is_compute_context = true;
- /* Free the shadow bo for compute VM */
- amdgpu_bo_unref(&to_amdgpu_bo_vm(vm->root.bo)->shadow);
-
- goto unreserve_bo;
-
unreserve_bo:
amdgpu_bo_unreserve(vm->root.bo);
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index a076f43..f78a043 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -383,14 +383,6 @@
if (r)
return r;
- if (vmbo->shadow) {
- struct amdgpu_bo *shadow = vmbo->shadow;
-
- r = ttm_bo_validate(&shadow->tbo, &shadow->placement, &ctx);
- if (r)
- return r;
- }
-
if (!drm_dev_enter(adev_to_drm(adev), &idx))
return -ENODEV;
@@ -448,10 +440,7 @@
int32_t xcp_id)
{
struct amdgpu_bo_param bp;
- struct amdgpu_bo *bo;
- struct dma_resv *resv;
unsigned int num_entries;
- int r;
memset(&bp, 0, sizeof(bp));
@@ -484,42 +473,7 @@
if (vm->root.bo)
bp.resv = vm->root.bo->tbo.base.resv;
- r = amdgpu_bo_create_vm(adev, &bp, vmbo);
- if (r)
- return r;
-
- bo = &(*vmbo)->bo;
- if (vm->is_compute_context || (adev->flags & AMD_IS_APU)) {
- (*vmbo)->shadow = NULL;
- return 0;
- }
-
- if (!bp.resv)
- WARN_ON(dma_resv_lock(bo->tbo.base.resv,
- NULL));
- resv = bp.resv;
- memset(&bp, 0, sizeof(bp));
- bp.size = amdgpu_vm_pt_size(adev, level);
- bp.domain = AMDGPU_GEM_DOMAIN_GTT;
- bp.flags = AMDGPU_GEM_CREATE_CPU_GTT_USWC;
- bp.type = ttm_bo_type_kernel;
- bp.resv = bo->tbo.base.resv;
- bp.bo_ptr_size = sizeof(struct amdgpu_bo);
- bp.xcp_id_plus1 = xcp_id + 1;
-
- r = amdgpu_bo_create(adev, &bp, &(*vmbo)->shadow);
-
- if (!resv)
- dma_resv_unlock(bo->tbo.base.resv);
-
- if (r) {
- amdgpu_bo_unref(&bo);
- return r;
- }
-
- amdgpu_bo_add_to_shadow_list(*vmbo);
-
- return 0;
+ return amdgpu_bo_create_vm(adev, &bp, vmbo);
}
/**
@@ -569,7 +523,6 @@
return 0;
error_free_pt:
- amdgpu_bo_unref(&pt->shadow);
amdgpu_bo_unref(&pt_bo);
return r;
}
@@ -581,17 +534,10 @@
*/
static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
{
- struct amdgpu_bo *shadow;
-
if (!entry->bo)
return;
entry->bo->vm_bo = NULL;
- shadow = amdgpu_bo_shadowed(entry->bo);
- if (shadow) {
- ttm_bo_set_bulk_move(&shadow->tbo, NULL);
- amdgpu_bo_unref(&shadow);
- }
ttm_bo_set_bulk_move(&entry->bo->tbo, NULL);
spin_lock(&entry->vm->status_lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 4772fba..46d9fb4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -35,16 +35,7 @@
*/
static int amdgpu_vm_sdma_map_table(struct amdgpu_bo_vm *table)
{
- int r;
-
- r = amdgpu_ttm_alloc_gart(&table->bo.tbo);
- if (r)
- return r;
-
- if (table->shadow)
- r = amdgpu_ttm_alloc_gart(&table->shadow->tbo);
-
- return r;
+ return amdgpu_ttm_alloc_gart(&table->bo.tbo);
}
/* Allocate a new job for @count PTE updates */
@@ -265,17 +256,13 @@
if (!p->pages_addr) {
/* set page commands needed */
- if (vmbo->shadow)
- amdgpu_vm_sdma_set_ptes(p, vmbo->shadow, pe, addr,
- count, incr, flags);
amdgpu_vm_sdma_set_ptes(p, bo, pe, addr, count,
incr, flags);
return 0;
}
/* copy commands needed */
- ndw -= p->adev->vm_manager.vm_pte_funcs->copy_pte_num_dw *
- (vmbo->shadow ? 2 : 1);
+ ndw -= p->adev->vm_manager.vm_pte_funcs->copy_pte_num_dw;
/* for padding */
ndw -= 7;
@@ -290,8 +277,6 @@
pte[i] |= flags;
}
- if (vmbo->shadow)
- amdgpu_vm_sdma_copy_ptes(p, vmbo->shadow, pe, nptes);
amdgpu_vm_sdma_copy_ptes(p, bo, pe, nptes);
pe += nptes * 8;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
index 90138bc..3277526 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h
@@ -180,6 +180,6 @@
#define for_each_xcp(xcp_mgr, xcp, i) \
for (i = 0, xcp = amdgpu_get_next_xcp(xcp_mgr, &i); xcp; \
- xcp = amdgpu_get_next_xcp(xcp_mgr, &i))
+ ++i, xcp = amdgpu_get_next_xcp(xcp_mgr, &i))
#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index 26e2188..5e8833e 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
@@ -94,8 +94,6 @@
case AMDGPU_RING_TYPE_VCN_ENC:
case AMDGPU_RING_TYPE_VCN_JPEG:
ip_blk = AMDGPU_XCP_VCN;
- if (aqua_vanjaram_xcp_vcn_shared(adev))
- inst_mask = 1 << (inst_idx * 2);
break;
default:
DRM_ERROR("Not support ring type %d!", ring->funcs->type);
@@ -105,6 +103,8 @@
for (xcp_id = 0; xcp_id < adev->xcp_mgr->num_xcps; xcp_id++) {
if (adev->xcp_mgr->xcp[xcp_id].ip[ip_blk].inst_mask & inst_mask) {
ring->xcp_id = xcp_id;
+ dev_dbg(adev->dev, "ring:%s xcp_id :%u", ring->name,
+ ring->xcp_id);
if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
adev->gfx.enforce_isolation[xcp_id].xcp_id = xcp_id;
break;
@@ -394,38 +394,31 @@
struct amdgpu_xcp_ip *ip)
{
struct amdgpu_device *adev = xcp_mgr->adev;
+ int num_sdma, num_vcn, num_shared_vcn, num_xcp;
int num_xcc_xcp, num_sdma_xcp, num_vcn_xcp;
- int num_sdma, num_vcn;
num_sdma = adev->sdma.num_instances;
num_vcn = adev->vcn.num_vcn_inst;
+ num_shared_vcn = 1;
+
+ num_xcc_xcp = adev->gfx.num_xcc_per_xcp;
+ num_xcp = NUM_XCC(adev->gfx.xcc_mask) / num_xcc_xcp;
switch (xcp_mgr->mode) {
case AMDGPU_SPX_PARTITION_MODE:
- num_sdma_xcp = num_sdma;
- num_vcn_xcp = num_vcn;
- break;
case AMDGPU_DPX_PARTITION_MODE:
- num_sdma_xcp = num_sdma / 2;
- num_vcn_xcp = num_vcn / 2;
- break;
case AMDGPU_TPX_PARTITION_MODE:
- num_sdma_xcp = num_sdma / 3;
- num_vcn_xcp = num_vcn / 3;
- break;
case AMDGPU_QPX_PARTITION_MODE:
- num_sdma_xcp = num_sdma / 4;
- num_vcn_xcp = num_vcn / 4;
- break;
case AMDGPU_CPX_PARTITION_MODE:
- num_sdma_xcp = 2;
- num_vcn_xcp = num_vcn ? 1 : 0;
+ num_sdma_xcp = DIV_ROUND_UP(num_sdma, num_xcp);
+ num_vcn_xcp = DIV_ROUND_UP(num_vcn, num_xcp);
break;
default:
return -EINVAL;
}
- num_xcc_xcp = adev->gfx.num_xcc_per_xcp;
+ if (num_vcn && num_xcp > num_vcn)
+ num_shared_vcn = num_xcp / num_vcn;
switch (ip_id) {
case AMDGPU_XCP_GFXHUB:
@@ -441,7 +434,8 @@
ip->ip_funcs = &sdma_v4_4_2_xcp_funcs;
break;
case AMDGPU_XCP_VCN:
- ip->inst_mask = XCP_INST_MASK(num_vcn_xcp, xcp_id);
+ ip->inst_mask =
+ XCP_INST_MASK(num_vcn_xcp, xcp_id / num_shared_vcn);
/* TODO : Assign IP funcs */
break;
default:
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
index 742adbc..70c1399 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
@@ -1881,6 +1881,7 @@
return r;
if (!atomic) {
+ abo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
if (unlikely(r != 0)) {
amdgpu_bo_unreserve(abo);
@@ -2401,6 +2402,7 @@
return ret;
}
+ aobj->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
ret = amdgpu_bo_pin(aobj, AMDGPU_GEM_DOMAIN_VRAM);
amdgpu_bo_unreserve(aobj);
if (ret) {
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
index 8d46eba..f154c24 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
@@ -1931,6 +1931,7 @@
return r;
if (!atomic) {
+ abo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
if (unlikely(r != 0)) {
amdgpu_bo_unreserve(abo);
@@ -2485,6 +2486,7 @@
return ret;
}
+ aobj->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
ret = amdgpu_bo_pin(aobj, AMDGPU_GEM_DOMAIN_VRAM);
amdgpu_bo_unreserve(aobj);
if (ret) {
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
index f08dc6a..a7fcb13 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
@@ -1861,6 +1861,7 @@
return r;
if (!atomic) {
+ abo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
if (unlikely(r != 0)) {
amdgpu_bo_unreserve(abo);
@@ -2321,6 +2322,7 @@
return ret;
}
+ aobj->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
ret = amdgpu_bo_pin(aobj, AMDGPU_GEM_DOMAIN_VRAM);
amdgpu_bo_unreserve(aobj);
if (ret) {
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
index a6a3adf..77ac3f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
@@ -1828,6 +1828,7 @@
return r;
if (!atomic) {
+ abo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
if (unlikely(r != 0)) {
amdgpu_bo_unreserve(abo);
@@ -2320,6 +2321,7 @@
return ret;
}
+ aobj->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
ret = amdgpu_bo_pin(aobj, AMDGPU_GEM_DOMAIN_VRAM);
amdgpu_bo_unreserve(aobj);
if (ret) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index d1357c0..47b47d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
@@ -202,12 +202,16 @@
SOC15_REG_ENTRY_STR(GC, 0, regCP_IB1_BUFSZ)
};
-static const struct soc15_reg_golden golden_settings_gc_12_0[] = {
+static const struct soc15_reg_golden golden_settings_gc_12_0_rev0[] = {
SOC15_REG_GOLDEN_VALUE(GC, 0, regDB_MEM_CONFIG, 0x0000000f, 0x0000000f),
SOC15_REG_GOLDEN_VALUE(GC, 0, regCB_HW_CONTROL_1, 0x03000000, 0x03000000),
SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL5, 0x00000070, 0x00000020)
};
+static const struct soc15_reg_golden golden_settings_gc_12_0[] = {
+ SOC15_REG_GOLDEN_VALUE(GC, 0, regDB_MEM_CONFIG, 0x00008000, 0x00008000),
+};
+
#define DEFAULT_SH_MEM_CONFIG \
((SH_MEM_ADDRESS_MODE_64 << SH_MEM_CONFIG__ADDRESS_MODE__SHIFT) | \
(SH_MEM_ALIGNMENT_MODE_UNALIGNED << SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT) | \
@@ -3495,10 +3499,14 @@
switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
case IP_VERSION(12, 0, 0):
case IP_VERSION(12, 0, 1):
+ soc15_program_register_sequence(adev,
+ golden_settings_gc_12_0,
+ (const u32)ARRAY_SIZE(golden_settings_gc_12_0));
+
if (adev->rev_id == 0)
soc15_program_register_sequence(adev,
- golden_settings_gc_12_0,
- (const u32)ARRAY_SIZE(golden_settings_gc_12_0));
+ golden_settings_gc_12_0_rev0,
+ (const u32)ARRAY_SIZE(golden_settings_gc_12_0_rev0));
break;
default:
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 408e560..c100845 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -1701,7 +1701,15 @@
WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regCP_MEC_CNTL, 0);
} else {
WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regCP_MEC_CNTL,
- (CP_MEC_CNTL__MEC_ME1_HALT_MASK | CP_MEC_CNTL__MEC_ME2_HALT_MASK));
+ (CP_MEC_CNTL__MEC_INVALIDATE_ICACHE_MASK |
+ CP_MEC_CNTL__MEC_ME1_PIPE0_RESET_MASK |
+ CP_MEC_CNTL__MEC_ME1_PIPE1_RESET_MASK |
+ CP_MEC_CNTL__MEC_ME1_PIPE2_RESET_MASK |
+ CP_MEC_CNTL__MEC_ME1_PIPE3_RESET_MASK |
+ CP_MEC_CNTL__MEC_ME2_PIPE0_RESET_MASK |
+ CP_MEC_CNTL__MEC_ME2_PIPE1_RESET_MASK |
+ CP_MEC_CNTL__MEC_ME1_HALT_MASK |
+ CP_MEC_CNTL__MEC_ME2_HALT_MASK));
adev->gfx.kiq[xcc_id].ring.sched.ready = false;
}
udelay(50);
@@ -2240,6 +2248,8 @@
r = gfx_v9_4_3_xcc_cp_compute_load_microcode(adev, xcc_id);
if (r)
return r;
+ } else {
+ gfx_v9_4_3_xcc_cp_compute_enable(adev, false, xcc_id);
}
r = gfx_v9_4_3_xcc_kiq_resume(adev, xcc_id);
@@ -2299,12 +2309,6 @@
return 0;
}
-static void gfx_v9_4_3_xcc_cp_enable(struct amdgpu_device *adev, bool enable,
- int xcc_id)
-{
- gfx_v9_4_3_xcc_cp_compute_enable(adev, enable, xcc_id);
-}
-
static void gfx_v9_4_3_xcc_fini(struct amdgpu_device *adev, int xcc_id)
{
if (amdgpu_gfx_disable_kcq(adev, xcc_id))
@@ -2336,7 +2340,7 @@
}
gfx_v9_4_3_xcc_kcq_fini_register(adev, xcc_id);
- gfx_v9_4_3_xcc_cp_enable(adev, false, xcc_id);
+ gfx_v9_4_3_xcc_cp_compute_enable(adev, false, xcc_id);
}
static int gfx_v9_4_3_hw_init(void *handle)
diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c b/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c
index 6c18918..d4f72e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c
@@ -153,7 +153,7 @@
WREG32_SOC15(GC, 0, regGFX_IMU_C2PMSG_16, imu_reg_val);
}
- //disble imu Rtavfs, SmsRepair, DfllBTC, and ClkB
+ //disable imu Rtavfs, SmsRepair, DfllBTC, and ClkB
imu_reg_val = RREG32_SOC15(GC, 0, regGFX_IMU_SCRATCH_10);
imu_reg_val |= 0x10007;
WREG32_SOC15(GC, 0, regGFX_IMU_SCRATCH_10, imu_reg_val);
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index ee91ff9..231a3d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -161,7 +161,7 @@
int api_status_off)
{
union MESAPI__QUERY_MES_STATUS mes_status_pkt;
- signed long timeout = 3000000; /* 3000 ms */
+ signed long timeout = 2100000; /* 2100 ms */
struct amdgpu_device *adev = mes->adev;
struct amdgpu_ring *ring = &mes->ring[0];
struct MES_API_STATUS *api_status;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index e499b28..8d27421 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -146,7 +146,7 @@
int api_status_off)
{
union MESAPI__QUERY_MES_STATUS mes_status_pkt;
- signed long timeout = 3000000; /* 3000 ms */
+ signed long timeout = 2100000; /* 2100 ms */
struct amdgpu_device *adev = mes->adev;
struct amdgpu_ring *ring = &mes->ring[pipe];
spinlock_t *ring_lock = &mes->ring_lock[pipe];
@@ -479,6 +479,11 @@
union MESAPI__MISC misc_pkt;
int pipe;
+ if (mes->adev->enable_uni_mes)
+ pipe = AMDGPU_MES_KIQ_PIPE;
+ else
+ pipe = AMDGPU_MES_SCHED_PIPE;
+
memset(&misc_pkt, 0, sizeof(misc_pkt));
misc_pkt.header.type = MES_API_TYPE_SCHEDULER;
@@ -513,6 +518,7 @@
misc_pkt.wait_reg_mem.reg_offset2 = input->wrm_reg.reg1;
break;
case MES_MISC_OP_SET_SHADER_DEBUGGER:
+ pipe = AMDGPU_MES_SCHED_PIPE;
misc_pkt.opcode = MESAPI_MISC__SET_SHADER_DEBUGGER;
misc_pkt.set_shader_debugger.process_context_addr =
input->set_shader_debugger.process_context_addr;
@@ -530,11 +536,6 @@
return -EINVAL;
}
- if (mes->adev->enable_uni_mes)
- pipe = AMDGPU_MES_KIQ_PIPE;
- else
- pipe = AMDGPU_MES_SCHED_PIPE;
-
return mes_v12_0_submit_pkt_and_poll_completion(mes, pipe,
&misc_pkt, sizeof(misc_pkt),
offsetof(union MESAPI__MISC, api_status));
@@ -608,6 +609,7 @@
mes_set_hw_res_pkt.disable_mes_log = 1;
mes_set_hw_res_pkt.use_different_vmid_compute = 1;
mes_set_hw_res_pkt.enable_reg_active_poll = 1;
+ mes_set_hw_res_pkt.enable_level_process_quantum_check = 1;
/*
* Keep oversubscribe timer for sdma . When we have unmapped doorbell
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
index fa479df..739fce4 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
@@ -365,7 +365,7 @@
data &= ~PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK;
} else {
- /* Disbale ASPM L1 */
+ /* Disable ASPM L1 */
data &= ~PCIE_LC_CNTL__LC_L1_INACTIVITY_MASK;
/* Disable ASPM TxL0s */
data &= ~PCIE_LC_CNTL__LC_L0S_INACTIVITY_MASK;
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
index 1251ee3..51e470e 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
@@ -81,6 +81,8 @@
/* memory training timeout define */
#define MEM_TRAIN_SEND_MSG_TIMEOUT_US 3000000
+#define regMP1_PUB_SCRATCH0 0x3b10090
+
static int psp_v13_0_init_microcode(struct psp_context *psp)
{
struct amdgpu_device *adev = psp->adev;
@@ -807,6 +809,20 @@
}
}
+static bool psp_v13_0_is_aux_sos_load_required(struct psp_context *psp)
+{
+ struct amdgpu_device *adev = psp->adev;
+ u32 pmfw_ver;
+
+ if (amdgpu_ip_version(adev, MP0_HWIP, 0) != IP_VERSION(13, 0, 6))
+ return false;
+
+ /* load 4e version of sos if pmfw version less than 85.115.0 */
+ pmfw_ver = RREG32(regMP1_PUB_SCRATCH0 / 4);
+
+ return (pmfw_ver < 0x557300);
+}
+
static const struct psp_funcs psp_v13_0_funcs = {
.init_microcode = psp_v13_0_init_microcode,
.wait_for_bootloader = psp_v13_0_wait_for_bootloader_steady_state,
@@ -830,6 +846,7 @@
.vbflash_stat = psp_v13_0_vbflash_status,
.fatal_error_recovery_quirk = psp_v13_0_fatal_error_recovery_quirk,
.get_ras_capability = psp_v13_0_get_ras_capability,
+ .is_aux_sos_load_required = psp_v13_0_is_aux_sos_load_required,
};
void psp_v13_0_set_psp_funcs(struct psp_context *psp)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index aa63754..e65194f 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -710,7 +710,7 @@
upper_32_bits(wptr_gpu_addr));
wptr_poll_cntl = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_CNTL + sdma_offsets[i]);
if (ring->use_pollmem) {
- /*wptr polling is not enogh fast, directly clean the wptr register */
+ /*wptr polling is not enough fast, directly clean the wptr register */
WREG32(mmSDMA0_GFX_RB_WPTR + sdma_offsets[i], 0);
wptr_poll_cntl = REG_SET_FIELD(wptr_poll_cntl,
SDMA0_GFX_RB_WPTR_POLL_CNTL,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
index cfd8e18..a876349 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
@@ -1080,13 +1080,16 @@
unsigned bytes = count * 8;
ib->ptr[ib->length_dw++] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_COPY) |
- SDMA_PKT_COPY_LINEAR_HEADER_SUB_OP(SDMA_SUBOP_COPY_LINEAR);
+ SDMA_PKT_COPY_LINEAR_HEADER_SUB_OP(SDMA_SUBOP_COPY_LINEAR) |
+ SDMA_PKT_COPY_LINEAR_HEADER_CPV(1);
+
ib->ptr[ib->length_dw++] = bytes - 1;
ib->ptr[ib->length_dw++] = 0; /* src/dst endian swap */
ib->ptr[ib->length_dw++] = lower_32_bits(src);
ib->ptr[ib->length_dw++] = upper_32_bits(src);
ib->ptr[ib->length_dw++] = lower_32_bits(pe);
ib->ptr[ib->length_dw++] = upper_32_bits(pe);
+ ib->ptr[ib->length_dw++] = 0;
}
@@ -1744,7 +1747,7 @@
}
static const struct amdgpu_vm_pte_funcs sdma_v7_0_vm_pte_funcs = {
- .copy_pte_num_dw = 7,
+ .copy_pte_num_dw = 8,
.copy_pte = sdma_v7_0_vm_copy_pte,
.write_pte = sdma_v7_0_vm_write_pte,
.set_pte_pde = sdma_v7_0_vm_set_pte_pde,
diff --git a/drivers/gpu/drm/amd/amdgpu/smuio_v9_0.c b/drivers/gpu/drm/amd/amdgpu/smuio_v9_0.c
index e4e30b9..c04fdd2 100644
--- a/drivers/gpu/drm/amd/amdgpu/smuio_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/smuio_v9_0.c
@@ -60,7 +60,7 @@
{
u32 data;
- /* CGTT_ROM_CLK_CTRL0 is not availabe for APUs */
+ /* CGTT_ROM_CLK_CTRL0 is not available for APUs */
if (adev->flags & AMD_IS_APU)
return;
diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c b/drivers/gpu/drm/amd/amdgpu/soc24.c
index b0c3678..fd4c3d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc24.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc24.c
@@ -250,13 +250,6 @@
adev->nbio.funcs->program_aspm(adev);
}
-static void soc24_enable_doorbell_aperture(struct amdgpu_device *adev,
- bool enable)
-{
- adev->nbio.funcs->enable_doorbell_aperture(adev, enable);
- adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, enable);
-}
-
const struct amdgpu_ip_block_version soc24_common_ip_block = {
.type = AMD_IP_BLOCK_TYPE_COMMON,
.major = 1,
@@ -454,6 +447,11 @@
if (amdgpu_sriov_vf(adev))
xgpu_nv_mailbox_get_irq(adev);
+ /* Enable selfring doorbell aperture late because doorbell BAR
+ * aperture will change if resize BAR successfully in gmc sw_init.
+ */
+ adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, true);
+
return 0;
}
@@ -491,7 +489,7 @@
adev->df.funcs->hw_init(adev);
/* enable the doorbell aperture */
- soc24_enable_doorbell_aperture(adev, true);
+ adev->nbio.funcs->enable_doorbell_aperture(adev, true);
return 0;
}
@@ -500,8 +498,13 @@
{
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- /* disable the doorbell aperture */
- soc24_enable_doorbell_aperture(adev, false);
+ /* Disable the doorbell aperture and selfring doorbell aperture
+ * separately in hw_fini because soc21_enable_doorbell_aperture
+ * has been removed and there is no need to delay disabling
+ * selfring doorbell.
+ */
+ adev->nbio.funcs->enable_doorbell_aperture(adev, false);
+ adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
if (amdgpu_sriov_vf(adev))
xgpu_nv_mailbox_put_irq(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index b1fd226..9d4f535 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -1395,170 +1395,6 @@
}
}
-static int vcn_v4_0_5_limit_sched(struct amdgpu_cs_parser *p,
- struct amdgpu_job *job)
-{
- struct drm_gpu_scheduler **scheds;
-
- /* The create msg must be in the first IB submitted */
- if (atomic_read(&job->base.entity->fence_seq))
- return -EINVAL;
-
- /* if VCN0 is harvested, we can't support AV1 */
- if (p->adev->vcn.harvest_config & AMDGPU_VCN_HARVEST_VCN0)
- return -EINVAL;
-
- scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_ENC]
- [AMDGPU_RING_PRIO_0].sched;
- drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
- return 0;
-}
-
-static int vcn_v4_0_5_dec_msg(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
- uint64_t addr)
-{
- struct ttm_operation_ctx ctx = { false, false };
- struct amdgpu_bo_va_mapping *map;
- uint32_t *msg, num_buffers;
- struct amdgpu_bo *bo;
- uint64_t start, end;
- unsigned int i;
- void *ptr;
- int r;
-
- addr &= AMDGPU_GMC_HOLE_MASK;
- r = amdgpu_cs_find_mapping(p, addr, &bo, &map);
- if (r) {
- DRM_ERROR("Can't find BO for addr 0x%08llx\n", addr);
- return r;
- }
-
- start = map->start * AMDGPU_GPU_PAGE_SIZE;
- end = (map->last + 1) * AMDGPU_GPU_PAGE_SIZE;
- if (addr & 0x7) {
- DRM_ERROR("VCN messages must be 8 byte aligned!\n");
- return -EINVAL;
- }
-
- bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
- amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
- r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
- if (r) {
- DRM_ERROR("Failed validating the VCN message BO (%d)!\n", r);
- return r;
- }
-
- r = amdgpu_bo_kmap(bo, &ptr);
- if (r) {
- DRM_ERROR("Failed mapping the VCN message (%d)!\n", r);
- return r;
- }
-
- msg = ptr + addr - start;
-
- /* Check length */
- if (msg[1] > end - addr) {
- r = -EINVAL;
- goto out;
- }
-
- if (msg[3] != RDECODE_MSG_CREATE)
- goto out;
-
- num_buffers = msg[2];
- for (i = 0, msg = &msg[6]; i < num_buffers; ++i, msg += 4) {
- uint32_t offset, size, *create;
-
- if (msg[0] != RDECODE_MESSAGE_CREATE)
- continue;
-
- offset = msg[1];
- size = msg[2];
-
- if (offset + size > end) {
- r = -EINVAL;
- goto out;
- }
-
- create = ptr + addr + offset - start;
-
- /* H264, HEVC and VP9 can run on any instance */
- if (create[0] == 0x7 || create[0] == 0x10 || create[0] == 0x11)
- continue;
-
- r = vcn_v4_0_5_limit_sched(p, job);
- if (r)
- goto out;
- }
-
-out:
- amdgpu_bo_kunmap(bo);
- return r;
-}
-
-#define RADEON_VCN_ENGINE_TYPE_ENCODE (0x00000002)
-#define RADEON_VCN_ENGINE_TYPE_DECODE (0x00000003)
-
-#define RADEON_VCN_ENGINE_INFO (0x30000001)
-#define RADEON_VCN_ENGINE_INFO_MAX_OFFSET 16
-
-#define RENCODE_ENCODE_STANDARD_AV1 2
-#define RENCODE_IB_PARAM_SESSION_INIT 0x00000003
-#define RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET 64
-
-/* return the offset in ib if id is found, -1 otherwise
- * to speed up the searching we only search upto max_offset
- */
-static int vcn_v4_0_5_enc_find_ib_param(struct amdgpu_ib *ib, uint32_t id, int max_offset)
-{
- int i;
-
- for (i = 0; i < ib->length_dw && i < max_offset && ib->ptr[i] >= 8; i += ib->ptr[i]/4) {
- if (ib->ptr[i + 1] == id)
- return i;
- }
- return -1;
-}
-
-static int vcn_v4_0_5_ring_patch_cs_in_place(struct amdgpu_cs_parser *p,
- struct amdgpu_job *job,
- struct amdgpu_ib *ib)
-{
- struct amdgpu_ring *ring = amdgpu_job_ring(job);
- struct amdgpu_vcn_decode_buffer *decode_buffer;
- uint64_t addr;
- uint32_t val;
- int idx;
-
- /* The first instance can decode anything */
- if (!ring->me)
- return 0;
-
- /* RADEON_VCN_ENGINE_INFO is at the top of ib block */
- idx = vcn_v4_0_5_enc_find_ib_param(ib, RADEON_VCN_ENGINE_INFO,
- RADEON_VCN_ENGINE_INFO_MAX_OFFSET);
- if (idx < 0) /* engine info is missing */
- return 0;
-
- val = amdgpu_ib_get_value(ib, idx + 2); /* RADEON_VCN_ENGINE_TYPE */
- if (val == RADEON_VCN_ENGINE_TYPE_DECODE) {
- decode_buffer = (struct amdgpu_vcn_decode_buffer *)&ib->ptr[idx + 6];
-
- if (!(decode_buffer->valid_buf_flag & 0x1))
- return 0;
-
- addr = ((u64)decode_buffer->msg_buffer_address_hi) << 32 |
- decode_buffer->msg_buffer_address_lo;
- return vcn_v4_0_5_dec_msg(p, job, addr);
- } else if (val == RADEON_VCN_ENGINE_TYPE_ENCODE) {
- idx = vcn_v4_0_5_enc_find_ib_param(ib, RENCODE_IB_PARAM_SESSION_INIT,
- RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET);
- if (idx >= 0 && ib->ptr[idx + 2] == RENCODE_ENCODE_STANDARD_AV1)
- return vcn_v4_0_5_limit_sched(p, job);
- }
- return 0;
-}
-
static const struct amdgpu_ring_funcs vcn_v4_0_5_unified_ring_vm_funcs = {
.type = AMDGPU_RING_TYPE_VCN_ENC,
.align_mask = 0x3f,
@@ -1566,7 +1402,6 @@
.get_rptr = vcn_v4_0_5_unified_ring_get_rptr,
.get_wptr = vcn_v4_0_5_unified_ring_get_wptr,
.set_wptr = vcn_v4_0_5_unified_ring_set_wptr,
- .patch_cs_in_place = vcn_v4_0_5_ring_patch_cs_in_place,
.emit_frame_size =
SOC15_FLUSH_GPU_TLB_NUM_WREG * 3 +
SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 4 +
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 71b465f..648f400 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3540,6 +3540,30 @@
return debug_map_and_unlock(dqm);
}
+bool kfd_dqm_is_queue_in_process(struct device_queue_manager *dqm,
+ struct qcm_process_device *qpd,
+ int doorbell_off, u32 *queue_format)
+{
+ struct queue *q;
+ bool r = false;
+
+ if (!queue_format)
+ return r;
+
+ dqm_lock(dqm);
+
+ list_for_each_entry(q, &qpd->queues_list, list) {
+ if (q->properties.doorbell_off == doorbell_off) {
+ *queue_format = q->properties.format;
+ r = true;
+ goto out;
+ }
+ }
+
+out:
+ dqm_unlock(dqm);
+ return r;
+}
#if defined(CONFIG_DEBUG_FS)
static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 08b4082..09ab36f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -324,6 +324,9 @@
int debug_lock_and_unmap(struct device_queue_manager *dqm);
int debug_map_and_unlock(struct device_queue_manager *dqm);
int debug_refresh_runlist(struct device_queue_manager *dqm);
+bool kfd_dqm_is_queue_in_process(struct device_queue_manager *dqm,
+ struct qcm_process_device *qpd,
+ int doorbell_off, u32 *queue_format);
static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
{
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
index bb8cbfc..37b69fe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
@@ -306,23 +306,8 @@
client_id == SOC15_IH_CLIENTID_UTCL2) {
struct kfd_vm_fault_info info = {0};
uint16_t ring_id = SOC15_RING_ID_FROM_IH_ENTRY(ih_ring_entry);
- uint32_t node_id = SOC15_NODEID_FROM_IH_ENTRY(ih_ring_entry);
- uint32_t vmid_type = SOC15_VMID_TYPE_FROM_IH_ENTRY(ih_ring_entry);
- int hub_inst = 0;
struct kfd_hsa_memory_exception_data exception_data;
- /* gfxhub */
- if (!vmid_type && dev->adev->gfx.funcs->ih_node_to_logical_xcc) {
- hub_inst = dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev->adev,
- node_id);
- if (hub_inst < 0)
- hub_inst = 0;
- }
-
- /* mmhub */
- if (vmid_type && client_id == SOC15_IH_CLIENTID_VMC)
- hub_inst = node_id / 4;
-
info.vmid = vmid;
info.mc_id = client_id;
info.page_addr = ih_ring_entry[4] |
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
index d163d92..2b72d5b4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v12.c
@@ -341,6 +341,10 @@
m->sdmax_rlcx_doorbell_offset =
q->doorbell_off << SDMA0_QUEUE0_DOORBELL_OFFSET__OFFSET__SHIFT;
+ m->sdmax_rlcx_sched_cntl = (amdgpu_sdma_phase_quantum
+ << SDMA0_QUEUE0_SCHEDULE_CNTL__CONTEXT_QUANTUM__SHIFT)
+ & SDMA0_QUEUE0_SCHEDULE_CNTL__CONTEXT_QUANTUM_MASK;
+
m->sdma_engine_id = q->sdma_engine_id;
m->sdma_queue_id = q->sdma_queue_id;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index a902950..d07acf1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -270,6 +270,11 @@
struct kfd_node *dev = NULL;
struct kfd_process *proc = NULL;
struct kfd_process_device *pdd = NULL;
+ int i;
+ struct kfd_cu_occupancy cu_occupancy[AMDGPU_MAX_QUEUES];
+ u32 queue_format;
+
+ memset(cu_occupancy, 0x0, sizeof(cu_occupancy));
pdd = container_of(attr, struct kfd_process_device, attr_cu_occupancy);
dev = pdd->dev;
@@ -287,8 +292,29 @@
/* Collect wave count from device if it supports */
wave_cnt = 0;
max_waves_per_cu = 0;
- dev->kfd2kgd->get_cu_occupancy(dev->adev, proc->pasid, &wave_cnt,
- &max_waves_per_cu, 0);
+
+ /*
+ * For GFX 9.4.3, fetch the CU occupancy from the first XCC in the partition.
+ * For AQL queues, because of cooperative dispatch we multiply the wave count
+ * by number of XCCs in the partition to get the total wave counts across all
+ * XCCs in the partition.
+ * For PM4 queues, there is no cooperative dispatch so wave_cnt stay as it is.
+ */
+ dev->kfd2kgd->get_cu_occupancy(dev->adev, cu_occupancy,
+ &max_waves_per_cu, ffs(dev->xcc_mask) - 1);
+
+ for (i = 0; i < AMDGPU_MAX_QUEUES; i++) {
+ if (cu_occupancy[i].wave_cnt != 0 &&
+ kfd_dqm_is_queue_in_process(dev->dqm, &pdd->qpd,
+ cu_occupancy[i].doorbell_off,
+ &queue_format)) {
+ if (unlikely(queue_format == KFD_QUEUE_FORMAT_PM4))
+ wave_cnt += cu_occupancy[i].wave_cnt;
+ else
+ wave_cnt += (NUM_XCC(dev->xcc_mask) *
+ cu_occupancy[i].wave_cnt);
+ }
+ }
/* Translate wave count to number of compute units */
cu_cnt = (wave_cnt + (max_waves_per_cu - 1)) / max_waves_per_cu;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index b439d4d..01b960b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -517,7 +517,6 @@
if (retval)
goto err_destroy_queue;
- kfd_procfs_del_queue(pqn->q);
dqm = pqn->q->device->dqm;
retval = dqm->ops.destroy_queue(dqm, &pdd->qpd, pqn->q);
if (retval) {
@@ -527,6 +526,7 @@
if (retval != -ETIME)
goto err_destroy_queue;
}
+ kfd_procfs_del_queue(pqn->q);
kfd_queue_release_buffers(pdd, &pqn->q->properties);
pqm_clean_queue_resource(pqm, pqn);
uninit_queue(pqn->q);
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0cff667..6e79028 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -808,6 +808,20 @@
}
/**
+ * dmub_hpd_sense_callback - DMUB HPD sense processing callback.
+ * @adev: amdgpu_device pointer
+ * @notify: dmub notification structure
+ *
+ * HPD sense changes can occur during low power states and need to be
+ * notified from firmware to driver.
+ */
+static void dmub_hpd_sense_callback(struct amdgpu_device *adev,
+ struct dmub_notification *notify)
+{
+ DRM_DEBUG_DRIVER("DMUB HPD SENSE callback.\n");
+}
+
+/**
* register_dmub_notify_callback - Sets callback for DMUB notify
* @adev: amdgpu_device pointer
* @type: Type of dmub notification
@@ -1757,25 +1771,41 @@
static enum dmub_ips_disable_type dm_get_default_ips_mode(
struct amdgpu_device *adev)
{
- /*
- * On DCN35 systems with Z8 enabled, it's possible for IPS2 + Z8 to
- * cause a hard hang. A fix exists for newer PMFW.
- *
- * As a workaround, for non-fixed PMFW, force IPS1+RCG as the deepest
- * IPS state in all cases, except for s0ix and all displays off (DPMS),
- * where IPS2 is allowed.
- *
- * When checking pmfw version, use the major and minor only.
- */
- if (amdgpu_ip_version(adev, DCE_HWIP, 0) == IP_VERSION(3, 5, 0) &&
- (adev->pm.fw_version & 0x00FFFF00) < 0x005D6300)
- return DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+ enum dmub_ips_disable_type ret = DMUB_IPS_ENABLE;
- if (amdgpu_ip_version(adev, DCE_HWIP, 0) >= IP_VERSION(3, 5, 0))
- return DMUB_IPS_ENABLE;
+ switch (amdgpu_ip_version(adev, DCE_HWIP, 0)) {
+ case IP_VERSION(3, 5, 0):
+ /*
+ * On DCN35 systems with Z8 enabled, it's possible for IPS2 + Z8 to
+ * cause a hard hang. A fix exists for newer PMFW.
+ *
+ * As a workaround, for non-fixed PMFW, force IPS1+RCG as the deepest
+ * IPS state in all cases, except for s0ix and all displays off (DPMS),
+ * where IPS2 is allowed.
+ *
+ * When checking pmfw version, use the major and minor only.
+ */
+ if ((adev->pm.fw_version & 0x00FFFF00) < 0x005D6300)
+ ret = DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+ else if (amdgpu_ip_version(adev, GC_HWIP, 0) > IP_VERSION(11, 5, 0))
+ /*
+ * Other ASICs with DCN35 that have residency issues with
+ * IPS2 in idle.
+ * We want them to use IPS2 only in display off cases.
+ */
+ ret = DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+ break;
+ case IP_VERSION(3, 5, 1):
+ ret = DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+ break;
+ default:
+ /* ASICs older than DCN35 do not have IPSs */
+ if (amdgpu_ip_version(adev, DCE_HWIP, 0) < IP_VERSION(3, 5, 0))
+ ret = DMUB_IPS_DISABLE_ALL;
+ break;
+ }
- /* ASICs older than DCN35 do not have IPSs */
- return DMUB_IPS_DISABLE_ALL;
+ return ret;
}
static int amdgpu_dm_init(struct amdgpu_device *adev)
@@ -3808,6 +3838,12 @@
DRM_ERROR("amdgpu: fail to register dmub hpd callback");
return -EINVAL;
}
+
+ if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD_SENSE_NOTIFY,
+ dmub_hpd_sense_callback, true)) {
+ DRM_ERROR("amdgpu: fail to register dmub hpd sense callback");
+ return -EINVAL;
+ }
}
list_for_each_entry(connector,
@@ -4449,6 +4485,7 @@
#define AMDGPU_DM_DEFAULT_MIN_BACKLIGHT 12
#define AMDGPU_DM_DEFAULT_MAX_BACKLIGHT 255
+#define AMDGPU_DM_MIN_SPREAD ((AMDGPU_DM_DEFAULT_MAX_BACKLIGHT - AMDGPU_DM_DEFAULT_MIN_BACKLIGHT) / 2)
#define AUX_BL_DEFAULT_TRANSITION_TIME_MS 50
static void amdgpu_dm_update_backlight_caps(struct amdgpu_display_manager *dm,
@@ -4463,6 +4500,21 @@
return;
amdgpu_acpi_get_backlight_caps(&caps);
+
+ /* validate the firmware value is sane */
+ if (caps.caps_valid) {
+ int spread = caps.max_input_signal - caps.min_input_signal;
+
+ if (caps.max_input_signal > AMDGPU_DM_DEFAULT_MAX_BACKLIGHT ||
+ caps.min_input_signal < 0 ||
+ spread > AMDGPU_DM_DEFAULT_MAX_BACKLIGHT ||
+ spread < AMDGPU_DM_MIN_SPREAD) {
+ DRM_DEBUG_KMS("DM: Invalid backlight caps: min=%d, max=%d\n",
+ caps.min_input_signal, caps.max_input_signal);
+ caps.caps_valid = false;
+ }
+ }
+
if (caps.caps_valid) {
dm->backlight_caps[bl_idx].caps_valid = true;
if (caps.aux_support)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index 2d7755e..15d4690 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -50,7 +50,7 @@
#define AMDGPU_DM_MAX_NUM_EDP 2
-#define AMDGPU_DMUB_NOTIFICATION_MAX 6
+#define AMDGPU_DMUB_NOTIFICATION_MAX 7
#define HDMI_AMD_VENDOR_SPECIFIC_DATA_BLOCK_IEEE_REGISTRATION_ID 0x00001A
#define AMD_VSDB_VERSION_3_FEATURECAP_REPLAYMODE 0x40
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index c0c61c0..83a31b9 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1147,7 +1147,7 @@
params[count].num_slices_v = aconnector->dsc_settings.dsc_num_slices_v;
params[count].bpp_overwrite = aconnector->dsc_settings.dsc_bits_per_pixel;
params[count].compression_possible = stream->sink->dsc_caps.dsc_dec_caps.is_dsc_supported;
- dc_dsc_get_policy_for_timing(params[count].timing, 0, &dsc_policy);
+ dc_dsc_get_policy_for_timing(params[count].timing, 0, &dsc_policy, dc_link_get_highest_encoding_format(stream->link));
if (!dc_dsc_compute_bandwidth_range(
stream->sink->ctx->dc->res_pool->dscs[0],
stream->sink->ctx->dc->debug.dsc_min_slice_height_override,
@@ -1681,7 +1681,7 @@
{
struct dc_dsc_policy dsc_policy = {0};
- dc_dsc_get_policy_for_timing(&stream->timing, 0, &dsc_policy);
+ dc_dsc_get_policy_for_timing(&stream->timing, 0, &dsc_policy, dc_link_get_highest_encoding_format(stream->link));
dc_dsc_compute_bandwidth_range(stream->sink->ctx->dc->res_pool->dscs[0],
stream->sink->ctx->dc->debug.dsc_min_slice_height_override,
dsc_policy.min_target_bpp * 16,
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 25f63b2..495e3cd 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -961,6 +961,7 @@
else
domain = AMDGPU_GEM_DOMAIN_VRAM;
+ rbo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(rbo, domain);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
index 08c494a..0d5fefb 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c
@@ -114,6 +114,7 @@
domain = amdgpu_display_supported_domains(adev, rbo->flags);
+ rbo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
r = amdgpu_bo_pin(rbo, domain);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
diff --git a/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c b/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
index e47e9db..6817994 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
@@ -569,7 +569,7 @@
break;
}
data->lb_partitions[i] = bw_floor2(bw_div(data->lb_size_per_component[i], data->lb_line_pitch), bw_int_to_fixed(1));
- /*clamp the partitions to the maxium number supported by the lb*/
+ /* clamp the partitions to the maximum number supported by the lb */
if ((surface_type[i] != bw_def_graphics || dceip->graphics_lb_nodownscaling_multi_line_prefetching == 1)) {
data->lb_partitions_max[i] = bw_int_to_fixed(10);
}
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
index f770828..0e243f4 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
@@ -59,6 +59,7 @@
display_count = 0;
for (i = 0; i < context->stream_count; i++) {
const struct dc_stream_state *stream = context->streams[i];
+ const struct dc_stream_status *stream_status = &context->stream_status[i];
/* Don't count SubVP phantom pipes as part of active
* display count
@@ -66,13 +67,7 @@
if (dc_state_get_stream_subvp_type(context, stream) == SUBVP_PHANTOM)
continue;
- /*
- * Only notify active stream or virtual stream.
- * Need to notify virtual stream to work around
- * headless case. HPD does not fire when system is in
- * S0i2.
- */
- if (!stream->dpms_off || stream->signal == SIGNAL_TYPE_VIRTUAL)
+ if (!stream->dpms_off || (stream_status && stream_status->plane_count))
display_count++;
}
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
index 97164b5..b46a3af 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
@@ -1222,6 +1222,12 @@
ctx->dc->debug.disable_dpp_power_gate = false;
ctx->dc->debug.disable_hubp_power_gate = false;
ctx->dc->debug.disable_dsc_power_gate = false;
+
+ /* Disable dynamic IPS2 in older PMFW (93.12) for Z8 interop. */
+ if (ctx->dc->config.disable_ips == DMUB_IPS_ENABLE &&
+ ctx->dce_version == DCN_VERSION_3_5 &&
+ ((clk_mgr->base.smu_ver & 0x00FFFFFF) <= 0x005d0c00))
+ ctx->dc->config.disable_ips = DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
} else {
/*let's reset the config control flag*/
ctx->dc->config.disable_ips = DMUB_IPS_DISABLE_ALL; /*pmfw not support it, disable it all*/
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index ae78815..5c39390 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1767,7 +1767,7 @@
if (crtc_timing->pix_clk_100hz != pix_clk_100hz)
return false;
- if (!se->funcs->dp_get_pixel_format)
+ if (!se || !se->funcs->dp_get_pixel_format)
return false;
if (!se->funcs->dp_get_pixel_format(
@@ -2376,7 +2376,7 @@
return false;
}
-static enum surface_update_type get_plane_info_update_type(const struct dc_surface_update *u)
+static enum surface_update_type get_plane_info_update_type(const struct dc *dc, const struct dc_surface_update *u)
{
union surface_update_flags *update_flags = &u->surface->update_flags;
enum surface_update_type update_type = UPDATE_TYPE_FAST;
@@ -2455,7 +2455,7 @@
/* todo: below are HW dependent, we should add a hook to
* DCE/N resource and validated there.
*/
- if (u->plane_info->tiling_info.gfx9.swizzle != DC_SW_LINEAR) {
+ if (!dc->debug.skip_full_updated_if_possible) {
/* swizzled mode requires RQ to be setup properly,
* thus need to run DML to calculate RQ settings
*/
@@ -2547,7 +2547,7 @@
update_flags->raw = 0; // Reset all flags
- type = get_plane_info_update_type(u);
+ type = get_plane_info_update_type(dc, u);
elevate_update_type(&overall_type, type);
type = get_scaling_info_update_type(dc, u);
@@ -2596,6 +2596,12 @@
elevate_update_type(&overall_type, UPDATE_TYPE_MED);
}
+ if (u->sdr_white_level_nits)
+ if (u->sdr_white_level_nits != u->surface->sdr_white_level_nits) {
+ update_flags->bits.sdr_white_level_nits = 1;
+ elevate_update_type(&overall_type, UPDATE_TYPE_FULL);
+ }
+
if (u->cm2_params) {
if ((u->cm2_params->component_settings.shaper_3dlut_setting
!= u->surface->mcm_shaper_3dlut_setting)
@@ -2876,6 +2882,10 @@
surface->hdr_mult =
srf_update->hdr_mult;
+ if (srf_update->sdr_white_level_nits)
+ surface->sdr_white_level_nits =
+ srf_update->sdr_white_level_nits;
+
if (srf_update->blend_tf)
memcpy(&surface->blend_tf, srf_update->blend_tf,
sizeof(surface->blend_tf));
@@ -4679,6 +4689,8 @@
srf_updates[i].scaling_info ||
(srf_updates[i].hdr_mult.value &&
srf_updates[i].hdr_mult.value != srf_updates->surface->hdr_mult.value) ||
+ (srf_updates[i].sdr_white_level_nits &&
+ srf_updates[i].sdr_white_level_nits != srf_updates->surface->sdr_white_level_nits) ||
srf_updates[i].in_transfer_func ||
srf_updates[i].func_shaper ||
srf_updates[i].lut3d_func ||
@@ -5744,6 +5756,27 @@
}
/**
+ * dc_process_dmub_dpia_set_tps_notification - Submits tps notification
+ *
+ * @dc: [in] dc structure
+ * @link_index: [in] link index
+ * @tps: [in] request tps
+ *
+ * Submits set_tps_notification command to dmub via inbox message
+ */
+void dc_process_dmub_dpia_set_tps_notification(const struct dc *dc, uint32_t link_index, uint8_t tps)
+{
+ union dmub_rb_cmd cmd = {0};
+
+ cmd.set_tps_notification.header.type = DMUB_CMD__DPIA;
+ cmd.set_tps_notification.header.sub_type = DMUB_CMD__DPIA_SET_TPS_NOTIFICATION;
+ cmd.set_tps_notification.tps_notification.instance = dc->links[link_index]->ddc_hw_inst;
+ cmd.set_tps_notification.tps_notification.tps = tps;
+
+ dc_wake_and_execute_dmub_cmd(dc->ctx, &cmd, DM_DMUB_WAIT_TYPE_WAIT);
+}
+
+/**
* dc_process_dmub_dpia_hpd_int_enable - Submits DPIA DPD interruption
*
* @dc: [in] dc structure
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h
index 4c94dd3..3992ad7 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -55,7 +55,7 @@
struct set_config_cmd_payload;
struct dmub_notification;
-#define DC_VER "3.2.299"
+#define DC_VER "3.2.301"
#define MAX_SURFACES 3
#define MAX_PLANES 6
@@ -462,6 +462,7 @@
bool support_edp0_on_dp1;
unsigned int enable_fpo_flicker_detection;
bool disable_hbr_audio_dp2;
+ bool consolidated_dpia_dp_lt;
};
enum visual_confirm {
@@ -762,7 +763,8 @@
uint32_t disable_mst_dsc_work_around:1; /* bit 3 */
uint32_t enable_force_tbt3_work_around:1; /* bit 4 */
uint32_t disable_usb4_pm_support:1; /* bit 5 */
- uint32_t reserved:26;
+ uint32_t enable_consolidated_dpia_dp_lt:1; /* bit 6 */
+ uint32_t reserved:25;
} bits;
uint32_t raw;
};
@@ -1056,6 +1058,9 @@
unsigned int force_lls;
bool notify_dpia_hr_bw;
bool enable_ips_visual_confirm;
+ unsigned int sharpen_policy;
+ unsigned int scale_to_sharpness_policy;
+ bool skip_full_updated_if_possible;
};
@@ -1269,6 +1274,7 @@
uint32_t tmz_changed:1;
uint32_t mcm_transfer_function_enable_change:1; /* disable or enable MCM transfer func */
uint32_t full_update:1;
+ uint32_t sdr_white_level_nits:1;
} bits;
uint32_t raw;
@@ -1351,6 +1357,7 @@
bool adaptive_sharpness_en;
int sharpness_level;
enum linear_light_scaling linear_light_scaling;
+ unsigned int sdr_white_level_nits;
};
struct dc_plane_info {
@@ -1508,6 +1515,7 @@
*/
struct dc_cm2_parameters *cm2_params;
const struct dc_csc_transform *cursor_csc_color_matrix;
+ unsigned int sdr_white_level_nits;
};
/*
@@ -2520,6 +2528,8 @@
uint8_t mst_alloc_slots,
uint8_t *mst_slots_in_use);
+void dc_process_dmub_dpia_set_tps_notification(const struct dc *dc, uint32_t link_index, uint8_t tps);
+
void dc_process_dmub_dpia_hpd_int_enable(const struct dc *dc,
uint32_t hpd_int_enable);
diff --git a/drivers/gpu/drm/amd/display/dc/dc_dp_types.h b/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
index 519c3df..41bd95e 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
@@ -969,6 +969,14 @@
uint8_t raw;
};
+union dpcd_max_uncompressed_pixel_rate_cap {
+ struct {
+ uint16_t max_uncompressed_pixel_rate_cap :15;
+ uint16_t valid :1;
+ } bits;
+ uint8_t raw[2];
+};
+
union dp_fec_capability1 {
struct {
uint8_t AGGREGATED_ERROR_COUNTERS_CAPABLE :1;
@@ -1170,6 +1178,7 @@
struct dc_lttpr_caps lttpr_caps;
struct adaptive_sync_caps adaptive_sync_caps;
struct dpcd_usb4_dp_tunneling_info usb4_dp_tun_info;
+ union dpcd_max_uncompressed_pixel_rate_cap max_uncompressed_pixel_rate_cap;
union dp_128b_132b_supported_link_rates dp_128b_132b_supported_link_rates;
union dp_main_line_channel_coding_cap channel_coding_cap;
@@ -1340,6 +1349,9 @@
#ifndef DP_CABLE_ATTRIBUTES_UPDATED_BY_DPTX
#define DP_CABLE_ATTRIBUTES_UPDATED_BY_DPTX 0x110
#endif
+#ifndef DPCD_MAX_UNCOMPRESSED_PIXEL_RATE_CAP
+#define DPCD_MAX_UNCOMPRESSED_PIXEL_RATE_CAP 0x221c
+#endif
#ifndef DP_REPEATER_CONFIGURATION_AND_STATUS_SIZE
#define DP_REPEATER_CONFIGURATION_AND_STATUS_SIZE 0x50
#endif
diff --git a/drivers/gpu/drm/amd/display/dc/dc_dsc.h b/drivers/gpu/drm/amd/display/dc/dc_dsc.h
index fe3078b..9014c24 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dsc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_dsc.h
@@ -59,6 +59,7 @@
uint32_t max_target_bpp_limit_override_x16;
uint32_t slice_height_granularity;
uint32_t dsc_force_odm_hslice_override;
+ bool force_dsc_when_not_needed;
};
bool dc_dsc_parse_dsc_dpcd(const struct dc *dc,
@@ -100,7 +101,8 @@
*/
void dc_dsc_get_policy_for_timing(const struct dc_crtc_timing *timing,
uint32_t max_target_bpp_limit_override_x16,
- struct dc_dsc_policy *policy);
+ struct dc_dsc_policy *policy,
+ const enum dc_link_encoding_format link_encoding);
void dc_dsc_policy_set_max_target_bpp_limit(uint32_t limit);
diff --git a/drivers/gpu/drm/amd/display/dc/dc_spl_translate.c b/drivers/gpu/drm/amd/display/dc/dc_spl_translate.c
index cd6de93..603552d 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_spl_translate.c
+++ b/drivers/gpu/drm/amd/display/dc/dc_spl_translate.c
@@ -186,19 +186,17 @@
spl_in->h_active = pipe_ctx->plane_res.scl_data.h_active;
spl_in->v_active = pipe_ctx->plane_res.scl_data.v_active;
+
+ spl_in->debug.sharpen_policy = (enum sharpen_policy)pipe_ctx->stream->ctx->dc->debug.sharpen_policy;
+ spl_in->debug.scale_to_sharpness_policy =
+ (enum scale_to_sharpness_policy)pipe_ctx->stream->ctx->dc->debug.scale_to_sharpness_policy;
+
/* Check if it is stream is in fullscreen and if its HDR.
* Use this to determine sharpness levels
*/
spl_in->is_fullscreen = dm_helpers_is_fullscreen(pipe_ctx->stream->ctx, pipe_ctx->stream);
spl_in->is_hdr_on = dm_helpers_is_hdr_on(pipe_ctx->stream->ctx, pipe_ctx->stream);
- spl_in->hdr_multx100 = 0;
- if (spl_in->is_hdr_on) {
- spl_in->hdr_multx100 = (uint32_t)dc_fixpt_floor(dc_fixpt_mul(plane_state->hdr_mult,
- dc_fixpt_from_int(100)));
- /* Disable sharpness for HDR Mult > 6.0 */
- if (spl_in->hdr_multx100 > 600)
- spl_in->adaptive_sharpness.enable = false;
- }
+ spl_in->sdr_white_level_nits = plane_state->sdr_white_level_nits;
}
/// @brief Translate SPL output parameters to pipe context
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
index e7019c95b..4fce64a 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
@@ -313,9 +313,6 @@
if (swath_height_c > 0)
log2_swath_height_c = dml_log2(swath_height_c);
-
- if (req128_c && log2_swath_height_c > 0)
- log2_swath_height_c -= 1;
}
rq_param->dlg.rq_l.swath_height = 1 << log2_swath_height_l;
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c
index ae52510..3fa9a5d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c
@@ -313,9 +313,6 @@
if (swath_height_c > 0)
log2_swath_height_c = dml_log2(swath_height_c);
-
- if (req128_c && log2_swath_height_c > 0)
- log2_swath_height_c -= 1;
}
rq_param->dlg.rq_l.swath_height = 1 << log2_swath_height_l;
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
index 0b132ce..2b275e6 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
@@ -1924,15 +1924,6 @@
*PixelPTEReqWidth = 32768.0 / BytePerPixel;
*PTERequestSize = 64;
FractionOfPTEReturnDrop = 0;
- } else if (MacroTileSizeBytes == 4096) {
- PixelPTEReqHeightPTEs = 1;
- *PixelPTEReqHeight = MacroTileHeight;
- *PixelPTEReqWidth = 8 * *MacroTileWidth;
- *PTERequestSize = 64;
- if (ScanDirection != dm_vert)
- FractionOfPTEReturnDrop = 0;
- else
- FractionOfPTEReturnDrop = 7.0 / 8;
} else if (GPUVMMinPageSize == 4 && MacroTileSizeBytes > 4096) {
PixelPTEReqHeightPTEs = 16;
*PixelPTEReqHeight = 16 * BlockHeight256Bytes;
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c
index 547dfcc..d851c08 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c
@@ -8926,7 +8926,7 @@
// The prefetch scheduling should only be calculated once as per AllowForPStateChangeOrStutterInVBlank requirement
// If the AllowForPStateChangeOrStutterInVBlank requirement is not strict (i.e. only try those power saving feature
- // if possible, then will try to program for the best power saving features in order of diffculty (dram, fclk, stutter)
+ // if possible, then will try to program for the best power saving features in order of difficulty (dram, fclk, stutter)
s->iteration = 0;
s->MaxTotalRDBandwidth = 0;
s->AllPrefetchModeTested = false;
@@ -9977,7 +9977,7 @@
dml_print("DML_DLG: %s: GPUVMMinPageSizeKBytes = %u\n", __func__, GPUVMMinPageSizeKBytes);
#endif
- // just suppluy with enough parameters to calculate meta and dte
+ // just supply with enough parameters to calculate meta and dte
CalculateVMAndRowBytes(
0, // dml_bool_t ViewportStationary,
1, // dml_bool_t DCCEnable,
@@ -10110,7 +10110,7 @@
/// Note: In this function, it is assumed that DCFCLK, SOCCLK freq are the state values, and mode_program will just use the DML calculated DPPCLK and DISPCLK
/// @param mode_lib mode_lib data struct that house all the input/output/bbox and calculation values.
/// @param state_idx Power state idx chosen
-/// @param display_cfg Display Congiuration
+/// @param display_cfg Display Configuration
/// @param call_standalone Calling mode_programming without calling mode support. Some of the "support" struct member will be pre-calculated before doing mode programming
/// TODO: Add clk_cfg input, could be useful for standalone mode
dml_bool_t dml_mode_programming(
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c
index b0d9aed..8697eac 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c
@@ -858,7 +858,9 @@
plane->immediate_flip = plane_state->flip_immediate;
- plane->composition.rect_out_height_spans_vactive = plane_state->dst_rect.height >= stream->timing.v_addressable;
+ plane->composition.rect_out_height_spans_vactive =
+ plane_state->dst_rect.height >= stream->timing.v_addressable &&
+ stream->dst.height >= stream->timing.v_addressable;
}
//TODO : Could be possibly moved to a common helper layer.
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
index d63558e..1cf9015 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
@@ -940,9 +940,11 @@
/* find synchronizable timing groups */
for (j = i + 1; j < display_config->display_config.num_streams; j++) {
if (memcmp(master_timing,
- &display_config->display_config.stream_descriptors[j].timing,
- sizeof(struct dml2_timing_cfg)) == 0 &&
- display_config->display_config.stream_descriptors[i].output.output_encoder == display_config->display_config.stream_descriptors[j].output.output_encoder) {
+ &display_config->display_config.stream_descriptors[j].timing,
+ sizeof(struct dml2_timing_cfg)) == 0 &&
+ display_config->display_config.stream_descriptors[i].output.output_encoder == display_config->display_config.stream_descriptors[j].output.output_encoder &&
+ (display_config->display_config.stream_descriptors[i].output.output_encoder != dml2_hdmi || //hdmi requires formats match
+ display_config->display_config.stream_descriptors[i].output.output_format == display_config->display_config.stream_descriptors[j].output.output_format)) {
set_bit_in_bitfield(&pmo->scratch.pmo_dcn4.synchronized_timing_group_masks[timing_group_idx], j);
set_bit_in_bitfield(&stream_mapped_mask, j);
}
diff --git a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
index a1727e5..ebd5df1 100644
--- a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
+++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
@@ -668,6 +668,7 @@
*/
static bool decide_dsc_target_bpp_x16(
const struct dc_dsc_policy *policy,
+ const struct dc_dsc_config_options *options,
const struct dsc_enc_caps *dsc_common_caps,
const int target_bandwidth_kbps,
const struct dc_crtc_timing *timing,
@@ -682,7 +683,7 @@
if (decide_dsc_bandwidth_range(policy->min_target_bpp * 16, policy->max_target_bpp * 16,
num_slices_h, dsc_common_caps, timing, link_encoding, &range)) {
if (target_bandwidth_kbps >= range.stream_kbps) {
- if (policy->enable_dsc_when_not_needed)
+ if (policy->enable_dsc_when_not_needed || options->force_dsc_when_not_needed)
/* enable max bpp even dsc is not needed */
*target_bpp_x16 = range.max_target_bpp_x16;
} else if (target_bandwidth_kbps >= range.max_kbps) {
@@ -882,7 +883,7 @@
memset(dsc_cfg, 0, sizeof(struct dc_dsc_config));
- dc_dsc_get_policy_for_timing(timing, options->max_target_bpp_limit_override_x16, &policy);
+ dc_dsc_get_policy_for_timing(timing, options->max_target_bpp_limit_override_x16, &policy, link_encoding);
pic_width = timing->h_addressable + timing->h_border_left + timing->h_border_right;
pic_height = timing->v_addressable + timing->v_border_top + timing->v_border_bottom;
@@ -1080,6 +1081,7 @@
if (target_bandwidth_kbps > 0) {
is_dsc_possible = decide_dsc_target_bpp_x16(
&policy,
+ options,
&dsc_common_caps,
target_bandwidth_kbps,
timing,
@@ -1171,7 +1173,8 @@
void dc_dsc_get_policy_for_timing(const struct dc_crtc_timing *timing,
uint32_t max_target_bpp_limit_override_x16,
- struct dc_dsc_policy *policy)
+ struct dc_dsc_policy *policy,
+ const enum dc_link_encoding_format link_encoding)
{
uint32_t bpc = 0;
@@ -1235,10 +1238,7 @@
policy->max_target_bpp = max_target_bpp_limit_override_x16 / 16;
/* enable DSC when not needed, default false */
- if (dsc_policy_enable_dsc_when_not_needed)
- policy->enable_dsc_when_not_needed = dsc_policy_enable_dsc_when_not_needed;
- else
- policy->enable_dsc_when_not_needed = false;
+ policy->enable_dsc_when_not_needed = dsc_policy_enable_dsc_when_not_needed;
}
void dc_dsc_policy_set_max_target_bpp_limit(uint32_t limit)
@@ -1267,4 +1267,5 @@
options->dsc_force_odm_hslice_override = dc->debug.force_odm_combine;
options->max_target_bpp_limit_override_x16 = 0;
options->slice_height_granularity = 1;
+ options->force_dsc_when_not_needed = false;
}
diff --git a/drivers/gpu/drm/amd/display/dc/hubbub/dcn35/dcn35_hubbub.c b/drivers/gpu/drm/amd/display/dc/hubbub/dcn35/dcn35_hubbub.c
index 6293173..5eb3da8 100644
--- a/drivers/gpu/drm/amd/display/dc/hubbub/dcn35/dcn35_hubbub.c
+++ b/drivers/gpu/drm/amd/display/dc/hubbub/dcn35/dcn35_hubbub.c
@@ -545,6 +545,7 @@
DCHUBBUB_ARB_MAX_REQ_OUTSTAND, 256,
DCHUBBUB_ARB_MIN_REQ_OUTSTAND, 256);
+ memset(&hubbub2->watermarks.a.cstate_pstate, 0, sizeof(hubbub2->watermarks.a.cstate_pstate));
}
/*static void hubbub35_set_request_limit(struct hubbub *hubbub,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
index d52ce58..4fbed02 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
@@ -57,6 +57,7 @@
#include "panel_cntl.h"
#include "dc_state_priv.h"
#include "dpcd_defs.h"
+#include "dsc.h"
/* include DCE11 register header files */
#include "dce/dce_11_0_d.h"
#include "dce/dce_11_0_sh_mask.h"
@@ -1823,6 +1824,48 @@
}
}
+static void clean_up_dsc_blocks(struct dc *dc)
+{
+ struct display_stream_compressor *dsc = NULL;
+ struct timing_generator *tg = NULL;
+ struct stream_encoder *se = NULL;
+ struct dccg *dccg = dc->res_pool->dccg;
+ struct pg_cntl *pg_cntl = dc->res_pool->pg_cntl;
+ int i;
+
+ if (dc->ctx->dce_version != DCN_VERSION_3_5 &&
+ dc->ctx->dce_version != DCN_VERSION_3_51)
+ return;
+
+ for (i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
+ struct dcn_dsc_state s = {0};
+
+ dsc = dc->res_pool->dscs[i];
+ dsc->funcs->dsc_read_state(dsc, &s);
+ if (s.dsc_fw_en) {
+ /* disable DSC in OPTC */
+ if (i < dc->res_pool->timing_generator_count) {
+ tg = dc->res_pool->timing_generators[i];
+ tg->funcs->set_dsc_config(tg, OPTC_DSC_DISABLED, 0, 0);
+ }
+ /* disable DSC in stream encoder */
+ if (i < dc->res_pool->stream_enc_count) {
+ se = dc->res_pool->stream_enc[i];
+ se->funcs->dp_set_dsc_config(se, OPTC_DSC_DISABLED, 0, 0);
+ se->funcs->dp_set_dsc_pps_info_packet(se, false, NULL, true);
+ }
+ /* disable DSC block */
+ if (dccg->funcs->set_ref_dscclk)
+ dccg->funcs->set_ref_dscclk(dccg, dsc->inst);
+ dsc->funcs->dsc_disable(dsc);
+
+ /* power down DSC */
+ if (pg_cntl != NULL)
+ pg_cntl->funcs->dsc_pg_control(pg_cntl, dsc->inst, false);
+ }
+ }
+}
+
/*
* When ASIC goes from VBIOS/VGA mode to driver/accelerated mode we need:
* 1. Power down all DC HW blocks
@@ -1927,6 +1970,13 @@
clk_mgr_exit_optimized_pwr_state(dc, dc->clk_mgr);
power_down_all_hw_blocks(dc);
+
+ /* DSC could be enabled on eDP during VBIOS post.
+ * To clean up dsc blocks if eDP is in link but not active.
+ */
+ if (edp_link_with_sink && (edp_stream_num == 0))
+ clean_up_dsc_blocks(dc);
+
disable_vga_and_power_gate_all_controllers(dc);
if (edp_link_with_sink && !keep_edp_vdd_on)
dc->hwss.edp_power_control(edp_link_with_sink, false);
@@ -2046,13 +2096,20 @@
* as well.
*/
for (i = 0; i < num_pipes; i++) {
- pipe_ctx[i]->stream_res.tg->funcs->set_drr(
- pipe_ctx[i]->stream_res.tg, ¶ms);
+ /* dc_state_destruct() might null the stream resources, so fetch tg
+ * here first to avoid a race condition. The lifetime of the pointee
+ * itself (the timing_generator object) is not a problem here.
+ */
+ struct timing_generator *tg = pipe_ctx[i]->stream_res.tg;
- if (adjust.v_total_max != 0 && adjust.v_total_min != 0)
- pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control(
- pipe_ctx[i]->stream_res.tg,
- event_triggers, num_frames);
+ if ((tg != NULL) && tg->funcs) {
+ if (tg->funcs->set_drr)
+ tg->funcs->set_drr(tg, ¶ms);
+ if (adjust.v_total_max != 0 && adjust.v_total_min != 0)
+ if (tg->funcs->set_static_screen_control)
+ tg->funcs->set_static_screen_control(
+ tg, event_triggers, num_frames);
+ }
}
}
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
index 42c5228..bded335 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
@@ -455,7 +455,7 @@
struct mcif_wb *mcif_wb;
struct mcif_warmup_params warmup_params = {0};
unsigned int i, i_buf;
- /*make sure there is no active DWB eanbled */
+ /* make sure there is no active DWB enabled */
for (i = 0; i < num_dwb; i++) {
dwb = dc->res_pool->dwbc[wb_info[i].dwb_pipe_inst];
if (dwb->dwb_is_efc_transition || dwb->dwb_is_drc) {
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
index a36e116..2e8c9f7 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
@@ -1032,6 +1032,20 @@
struct dsc_config dsc_cfg;
struct dsc_optc_config dsc_optc_cfg = {0};
enum optc_dsc_mode optc_dsc_mode;
+ struct dcn_dsc_state dsc_state = {0};
+
+ if (!dsc) {
+ DC_LOG_DSC("DSC is NULL for tg instance %d:", pipe_ctx->stream_res.tg->inst);
+ return;
+ }
+
+ if (dsc->funcs->dsc_read_state) {
+ dsc->funcs->dsc_read_state(dsc, &dsc_state);
+ if (!dsc_state.dsc_fw_en) {
+ DC_LOG_DSC("DSC has been disabled for tg instance %d:", pipe_ctx->stream_res.tg->inst);
+ return;
+ }
+ }
/* Enable DSC hw block */
dsc_cfg.pic_width = (stream->timing.h_addressable + stream->timing.h_border_left + stream->timing.h_border_right) / opp_cnt;
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
index 479fd3e..bd309db 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
@@ -334,7 +334,20 @@
struct dsc_config dsc_cfg;
struct dsc_optc_config dsc_optc_cfg = {0};
enum optc_dsc_mode optc_dsc_mode;
+ struct dcn_dsc_state dsc_state = {0};
+ if (!dsc) {
+ DC_LOG_DSC("DSC is NULL for tg instance %d:", pipe_ctx->stream_res.tg->inst);
+ return;
+ }
+
+ if (dsc->funcs->dsc_read_state) {
+ dsc->funcs->dsc_read_state(dsc, &dsc_state);
+ if (!dsc_state.dsc_fw_en) {
+ DC_LOG_DSC("DSC has been disabled for tg instance %d:", pipe_ctx->stream_res.tg->inst);
+ return;
+ }
+ }
/* Enable DSC hw block */
dsc_cfg.pic_width = (stream->timing.h_addressable + stream->timing.h_border_left + stream->timing.h_border_right) / opp_cnt;
dsc_cfg.pic_height = stream->timing.v_addressable + stream->timing.v_border_top + stream->timing.v_border_bottom;
diff --git a/drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dpia.c b/drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dpia.c
index 46fb364..6499807 100644
--- a/drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dpia.c
+++ b/drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dpia.c
@@ -50,8 +50,31 @@
DC_LOG_MST("dpia : status[%d]: alloc_slots[%d]: used_slots[%d]\n",
status, mst_alloc_slots, prev_mst_slots_in_use);
- ASSERT(link_enc);
- link_enc->funcs->update_mst_stream_allocation_table(link_enc, table);
+ if (link_enc)
+ link_enc->funcs->update_mst_stream_allocation_table(link_enc, table);
+}
+
+static void set_dio_dpia_link_test_pattern(struct dc_link *link,
+ const struct link_resource *link_res,
+ struct encoder_set_dp_phy_pattern_param *tp_params)
+{
+ if (tp_params->dp_phy_pattern != DP_TEST_PATTERN_VIDEO_MODE)
+ return;
+
+ struct link_encoder *link_enc = link_enc_cfg_get_link_enc(link);
+
+ if (!link_enc)
+ return;
+
+ link_enc->funcs->dp_set_phy_pattern(link_enc, tp_params);
+ link->dc->link_srv->dp_trace_source_sequence(link, DPCD_SOURCE_SEQ_AFTER_SET_SOURCE_PATTERN);
+}
+
+static void set_dio_dpia_lane_settings(struct dc_link *link,
+ const struct link_resource *link_res,
+ const struct dc_link_settings *link_settings,
+ const struct dc_lane_settings lane_settings[LANE_COUNT_DP_MAX])
+{
}
static const struct link_hwss dpia_link_hwss = {
@@ -65,8 +88,8 @@
.ext = {
.set_throttled_vcp_size = set_dio_throttled_vcp_size,
.enable_dp_link_output = enable_dio_dp_link_output,
- .set_dp_link_test_pattern = set_dio_dp_link_test_pattern,