pci-v5.7-changes
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl6GTQMUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vy3PhAAmqpYBRobOsG8QbmKDjoJEFtkqdvD
z6+4zf/R+hF11RyXjMDwihIe8d+tkQ4eAaYu6Oh5PrTyanz0G0PgeCrivZeytULk
thqQIWzDQMVA5vN/2/Vy8s5s+3HzP8z/MZOFScJ7+xA1MndXptPRTNmFUbjx+GAv
x8/pTp0u9AF6m7itX65DxXvwkzjWamt+Ar4Yx2IcuKAU/M5RtfuZO3PpDnqn7/wk
JFlkRoYeFB6qNnnkPdeyPHl9dALhuhzgdTyklQEnKVW3nf3xThYDhcEwdh6kBQgl
0dH8lL5LXy7PKGN8RES4wB0Vqndw/HlsCF5O4wkkfItbnbJxGJtS139e5973m0ud
sgWvF4yJAT2jCKhIeNz34sePQJMyWALhv0XzZCsJ0YeGHsrV1jrHELkwUT1+eIsT
3UV0iZ6aL06zQJDyKUbbIcQzEQ/wwBC+x9VgsyL54K1quCQZ1N1Nl/dvrb4cRG9m
m9EhJK/brDf4c0uFlOmMTSxV1t5J+z6ZSQnh1ShD/o5yBsxqN6q5brDT6LEs+jbM
LsIkA18jJOd4OyiDs98YiFKvIfFQbQ0LEBQpJwhF0snvfBFMMbUYN/T/NYneWON/
F0TpkFoP7PXDuq55iNaLdnObfzrpC9kdzUyWvePUvjxIl55bkf+/qtUny+H48t4L
dNggvW052d7BHes=
=deWu
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)
- Add more 32 GT/s link speed decoding and improve the implementation
(Yicong Yang)
Resource management:
- Add support for sizing programmable host bridge apertures and fix a
related alpha Nautilus regression (Ivan Kokshaysky)
Interrupts:
- Add boot interrupt quirk mechanism for Xeon chipsets and document
boot interrupts (Sean V Kelley)
PCIe native device hotplug:
- When possible, disable in-band presence detect and use PDS
(Alexandru Gagniuc)
- Add DMI table for devices that don't use in-band presence detection
but don't advertise that correctly (Stuart Hayes)
- Fix hang when powering slots up/down via sysfs (Lukas Wunner)
- Fix an MSI interrupt race (Stuart Hayes)
Virtualization:
- Add ACS quirks for Zhaoxin devices (Raymond Pang)
Error handling:
- Add Error Disconnect Recover (EDR) support so firmware can report
devices disconnected via DPC and we can try to recover (Kuppuswamy
Sathyanarayanan)
Peer-to-peer DMA:
- Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
Maier)
ASPM:
- Reduce severity of common clock config message (Chris Packham)
- Clear the correct bits when enabling L1 substates, so we don't go
to the wrong state (Yicong Yang)
Endpoint framework:
- Replace EPF linkup ops with notifier call chain and improve locking
(Kishon Vijay Abraham I)
- Fix concurrent memory allocation in OB address region (Kishon Vijay
Abraham I)
- Move PF function number assignment to EPC core to support multiple
function creation methods (Kishon Vijay Abraham I)
- Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)
- Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
Offset (Kishon Vijay Abraham I)
- Add support for testing DMA transfers (Kishon Vijay Abraham I)
- Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)
- Add support for tests to clear IRQ (Kishon Vijay Abraham I)
- Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)
Amlogic Meson PCIe controller driver:
- Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
Pommarel)
- Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
Pommarel)
Cadence PCIe controller driver:
- Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
Abraham I)
Intel VMD host bridge driver:
- Add two VMD Device IDs that require bus restriction mode (Sushma
Kalakota)
Mobiveil PCIe controller driver:
- Refactor and modularize mobiveil driver (Hou Zhiqiang)
- Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)
Microsoft Hyper-V host bridge driver:
- Add support for Hyper-V PCI protocol version 1.3 and
PCI_BUS_RELATIONS2 (Long Li)
- Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
Feng)
- Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)
NVIDIA Tegra PCIe controller driver:
- Use pci_parse_request_of_pci_ranges() (Rob Herring)
- Add support for endpoint mode and related DT updates (Vidya Sagar)
- Reduce -EPROBE_DEFER error message log level (Thierry Reding)
Qualcomm PCIe controller driver:
- Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)
Synopsys DesignWare PCIe controller driver:
- Refactor core initialization code for endpoint mode (Vidya Sagar)
- Fix endpoint MSI-X to use correct table address (Kishon Vijay
Abraham I)
TI DRA7xx PCIe controller driver:
- Fix MSI IRQ handling (Vignesh Raghavendra)
TI Keystone PCIe controller driver:
- Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)
Miscellaneous:
- Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
Feng)
- Use ioremap(), not phys_to_virt(), for platform ROM to fix video
ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"
* tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
PCI: tegra: Print -EPROBE_DEFER error message at debug level
misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
tools: PCI: Add 'e' to clear IRQ
misc: pci_endpoint_test: Add ioctl to clear IRQ
misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
misc: pci_endpoint_test: Add support to get DMA option from userspace
tools: PCI: Add 'd' command line option to support DMA
misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
PCI: endpoint: functions/pci-epf-test: Print throughput information
PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
PCI: pciehp: Fix MSI interrupt race
PCI: pciehp: Fix indefinite wait on sysfs requests
PCI: endpoint: Fix clearing start entry in configfs
PCI: tegra: Add support for PCIe endpoint mode in Tegra194
PCI: sysfs: Revert "rescan" file renames
...
This commit is contained in:
commit
86f26a77cb
88 changed files with 4836 additions and 1635 deletions
155
Documentation/PCI/boot-interrupts.rst
Normal file
155
Documentation/PCI/boot-interrupts.rst
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
Boot Interrupts
|
||||
===============
|
||||
|
||||
:Author: - Sean V Kelley <sean.v.kelley@linux.intel.com>
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
On PCI Express, interrupts are represented with either MSI or inbound
|
||||
interrupt messages (Assert_INTx/Deassert_INTx). The integrated IO-APIC in a
|
||||
given Core IO converts the legacy interrupt messages from PCI Express to
|
||||
MSI interrupts. If the IO-APIC is disabled (via the mask bits in the
|
||||
IO-APIC table entries), the messages are routed to the legacy PCH. This
|
||||
in-band interrupt mechanism was traditionally necessary for systems that
|
||||
did not support the IO-APIC and for boot. Intel in the past has used the
|
||||
term "boot interrupts" to describe this mechanism. Further, the PCI Express
|
||||
protocol describes this in-band legacy wire-interrupt INTx mechanism for
|
||||
I/O devices to signal PCI-style level interrupts. The subsequent paragraphs
|
||||
describe problems with the Core IO handling of INTx message routing to the
|
||||
PCH and mitigation within BIOS and the OS.
|
||||
|
||||
|
||||
Issue
|
||||
=====
|
||||
|
||||
When in-band legacy INTx messages are forwarded to the PCH, they in turn
|
||||
trigger a new interrupt for which the OS likely lacks a handler. When an
|
||||
interrupt goes unhandled over time, they are tracked by the Linux kernel as
|
||||
Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
|
||||
reaches a specific count with the error "nobody cared". This disabled IRQ
|
||||
now prevents valid usage by an existing interrupt which may happen to share
|
||||
the IRQ line.
|
||||
|
||||
irq 19: nobody cared (try booting with the "irqpoll" option)
|
||||
CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
|
||||
Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
|
||||
Call Trace:
|
||||
<IRQ>
|
||||
? dump_stack+0x46/0x5e
|
||||
? __report_bad_irq+0x2e/0xb0
|
||||
? note_interrupt+0x242/0x290
|
||||
? nNIKAL100_memoryRead16+0x8/0x10 [nikal]
|
||||
? handle_irq_event_percpu+0x55/0x70
|
||||
? handle_irq_event+0x4f/0x80
|
||||
? handle_fasteoi_irq+0x81/0x180
|
||||
? handle_irq+0x1c/0x30
|
||||
? do_IRQ+0x41/0xd0
|
||||
? common_interrupt+0x84/0x84
|
||||
</IRQ>
|
||||
|
||||
handlers:
|
||||
irq_default_primary_handler threaded usb_hcd_irq
|
||||
Disabling IRQ #19
|
||||
|
||||
|
||||
Conditions
|
||||
==========
|
||||
|
||||
The use of threaded interrupts is the most likely condition to trigger
|
||||
this problem today. Threaded interrupts may not be reenabled after the IRQ
|
||||
handler wakes. These "one shot" conditions mean that the threaded interrupt
|
||||
needs to keep the interrupt line masked until the threaded handler has run.
|
||||
Especially when dealing with high data rate interrupts, the thread needs to
|
||||
run to completion; otherwise some handlers will end up in stack overflows
|
||||
since the interrupt of the issuing device is still active.
|
||||
|
||||
Affected Chipsets
|
||||
=================
|
||||
|
||||
The legacy interrupt forwarding mechanism exists today in a number of
|
||||
devices including but not limited to chipsets from AMD/ATI, Broadcom, and
|
||||
Intel. Changes made through the mitigations below have been applied to
|
||||
drivers/pci/quirks.c
|
||||
|
||||
Starting with ICX there are no longer any IO-APICs in the Core IO's
|
||||
devices. IO-APIC is only in the PCH. Devices connected to the Core IO's
|
||||
PCIe Root Ports will use native MSI/MSI-X mechanisms.
|
||||
|
||||
Mitigations
|
||||
===========
|
||||
|
||||
The mitigations take the form of PCI quirks. The preference has been to
|
||||
first identify and make use of a means to disable the routing to the PCH.
|
||||
In such a case a quirk to disable boot interrupt generation can be
|
||||
added.[1]
|
||||
|
||||
Intel® 6300ESB I/O Controller Hub
|
||||
Alternate Base Address Register:
|
||||
BIE: Boot Interrupt Enable
|
||||
0 = Boot interrupt is enabled.
|
||||
1 = Boot interrupt is disabled.
|
||||
|
||||
Intel® Sandy Bridge through Sky Lake based Xeon servers:
|
||||
Coherent Interface Protocol Interrupt Control
|
||||
dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
|
||||
When this bit is set. Local INTx messages received from the
|
||||
Intel® Quick Data DMA/PCI Express ports are not routed to legacy
|
||||
PCH - they are either converted into MSI via the integrated IO-APIC
|
||||
(if the IO-APIC mask bit is clear in the appropriate entries)
|
||||
or cause no further action (when mask bit is set)
|
||||
|
||||
In the absence of a way to directly disable the routing, another approach
|
||||
has been to make use of PCI Interrupt pin to INTx routing tables for
|
||||
purposes of redirecting the interrupt handler to the rerouted interrupt
|
||||
line by default. Therefore, on chipsets where this INTx routing cannot be
|
||||
disabled, the Linux kernel will reroute the valid interrupt to its legacy
|
||||
interrupt. This redirection of the handler will prevent the occurrence of
|
||||
the spurious interrupt detection which would ordinarily disable the IRQ
|
||||
line due to excessive unhandled counts.[2]
|
||||
|
||||
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
|
||||
disable) the redirection of the interrupt handler to the PCH interrupt
|
||||
line. The option can be overridden by either pci=ioapicreroute or
|
||||
pci=noioapicreroute.[3]
|
||||
|
||||
|
||||
More Documentation
|
||||
==================
|
||||
|
||||
There is an overview of the legacy interrupt handling in several datasheets
|
||||
(6300ESB and 6700PXH below). While largely the same, it provides insight
|
||||
into the evolution of its handling with chipsets.
|
||||
|
||||
Example of disabling of the boot interrupt
|
||||
------------------------------------------
|
||||
|
||||
Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
|
||||
5.7.3 Boot Interrupt
|
||||
https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
|
||||
|
||||
Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
|
||||
Datasheet - Volume 2: Registers (Document # 330784-003)
|
||||
6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
|
||||
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
|
||||
|
||||
Example of handler rerouting
|
||||
----------------------------
|
||||
|
||||
Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
|
||||
2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
|
||||
https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
|
||||
|
||||
|
||||
If you have any legacy PCI interrupt questions that aren't answered, email me.
|
||||
|
||||
Cheers,
|
||||
Sean V Kelley
|
||||
sean.v.kelley@linux.intel.com
|
||||
|
||||
[1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
|
||||
[2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
|
||||
[3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
|
||||
|
|
@ -16,3 +16,4 @@ Linux PCI Bus Subsystem
|
|||
pci-error-recovery
|
||||
pcieaer-howto
|
||||
endpoint/index
|
||||
boot-interrupts
|
||||
|
|
|
|||
|
|
@ -156,12 +156,6 @@ default reset_link function, but different upstream ports might
|
|||
have different specifications to reset pci express link, so all
|
||||
upstream ports should provide their own reset_link functions.
|
||||
|
||||
In struct pcie_port_service_driver, a new pointer, reset_link, is
|
||||
added.
|
||||
::
|
||||
|
||||
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
|
||||
|
||||
Section 3.2.2.2 provides more detailed info on when to call
|
||||
reset_link.
|
||||
|
||||
|
|
@ -212,15 +206,10 @@ error_detected(dev, pci_channel_io_frozen) to all drivers within
|
|||
a hierarchy in question. Then, performing link reset at upstream is
|
||||
necessary. As different kinds of devices might use different approaches
|
||||
to reset link, AER port service driver is required to provide the
|
||||
function to reset link. Firstly, kernel looks for if the upstream
|
||||
component has an aer driver. If it has, kernel uses the reset_link
|
||||
callback of the aer driver. If the upstream component has no aer driver
|
||||
and the port is downstream port, we will perform a hot reset as the
|
||||
default by setting the Secondary Bus Reset bit of the Bridge Control
|
||||
register associated with the downstream port. As for upstream ports,
|
||||
they should provide their own aer service drivers with reset_link
|
||||
function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
|
||||
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
|
||||
function to reset link via callback parameter of pcie_do_recovery()
|
||||
function. If reset_link is not NULL, recovery function will use it
|
||||
to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER
|
||||
and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
|
||||
to mmio_enabled.
|
||||
|
||||
helper functions
|
||||
|
|
@ -243,9 +232,9 @@ messages to root port when an error is detected.
|
|||
|
||||
::
|
||||
|
||||
int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
|
||||
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);`
|
||||
|
||||
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
|
||||
pci_aer_clear_nonfatal_status clears non-fatal errors in the uncorrectable
|
||||
error status register.
|
||||
|
||||
Frequent Asked Questions
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue