pci-v5.7-changes

-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl6GTQMUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vy3PhAAmqpYBRobOsG8QbmKDjoJEFtkqdvD
 z6+4zf/R+hF11RyXjMDwihIe8d+tkQ4eAaYu6Oh5PrTyanz0G0PgeCrivZeytULk
 thqQIWzDQMVA5vN/2/Vy8s5s+3HzP8z/MZOFScJ7+xA1MndXptPRTNmFUbjx+GAv
 x8/pTp0u9AF6m7itX65DxXvwkzjWamt+Ar4Yx2IcuKAU/M5RtfuZO3PpDnqn7/wk
 JFlkRoYeFB6qNnnkPdeyPHl9dALhuhzgdTyklQEnKVW3nf3xThYDhcEwdh6kBQgl
 0dH8lL5LXy7PKGN8RES4wB0Vqndw/HlsCF5O4wkkfItbnbJxGJtS139e5973m0ud
 sgWvF4yJAT2jCKhIeNz34sePQJMyWALhv0XzZCsJ0YeGHsrV1jrHELkwUT1+eIsT
 3UV0iZ6aL06zQJDyKUbbIcQzEQ/wwBC+x9VgsyL54K1quCQZ1N1Nl/dvrb4cRG9m
 m9EhJK/brDf4c0uFlOmMTSxV1t5J+z6ZSQnh1ShD/o5yBsxqN6q5brDT6LEs+jbM
 LsIkA18jJOd4OyiDs98YiFKvIfFQbQ0LEBQpJwhF0snvfBFMMbUYN/T/NYneWON/
 F0TpkFoP7PXDuq55iNaLdnObfzrpC9kdzUyWvePUvjxIl55bkf+/qtUny+H48t4L
 dNggvW052d7BHes=
 =deWu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)

   - Add more 32 GT/s link speed decoding and improve the implementation
     (Yicong Yang)

  Resource management:

   - Add support for sizing programmable host bridge apertures and fix a
     related alpha Nautilus regression (Ivan Kokshaysky)

  Interrupts:

   - Add boot interrupt quirk mechanism for Xeon chipsets and document
     boot interrupts (Sean V Kelley)

  PCIe native device hotplug:

   - When possible, disable in-band presence detect and use PDS
     (Alexandru Gagniuc)

   - Add DMI table for devices that don't use in-band presence detection
     but don't advertise that correctly (Stuart Hayes)

   - Fix hang when powering slots up/down via sysfs (Lukas Wunner)

   - Fix an MSI interrupt race (Stuart Hayes)

  Virtualization:

   - Add ACS quirks for Zhaoxin devices (Raymond Pang)

  Error handling:

   - Add Error Disconnect Recover (EDR) support so firmware can report
     devices disconnected via DPC and we can try to recover (Kuppuswamy
     Sathyanarayanan)

  Peer-to-peer DMA:

   - Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
     Maier)

  ASPM:

   - Reduce severity of common clock config message (Chris Packham)

   - Clear the correct bits when enabling L1 substates, so we don't go
     to the wrong state (Yicong Yang)

  Endpoint framework:

   - Replace EPF linkup ops with notifier call chain and improve locking
     (Kishon Vijay Abraham I)

   - Fix concurrent memory allocation in OB address region (Kishon Vijay
     Abraham I)

   - Move PF function number assignment to EPC core to support multiple
     function creation methods (Kishon Vijay Abraham I)

   - Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)

   - Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
     Offset (Kishon Vijay Abraham I)

   - Add support for testing DMA transfers (Kishon Vijay Abraham I)

   - Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)

   - Add support for tests to clear IRQ (Kishon Vijay Abraham I)

   - Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)

  Amlogic Meson PCIe controller driver:

   - Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
     Pommarel)

   - Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
     Pommarel)

  Cadence PCIe controller driver:

   - Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
     Abraham I)

  Intel VMD host bridge driver:

   - Add two VMD Device IDs that require bus restriction mode (Sushma
     Kalakota)

  Mobiveil PCIe controller driver:

   - Refactor and modularize mobiveil driver (Hou Zhiqiang)

   - Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)

  Microsoft Hyper-V host bridge driver:

   - Add support for Hyper-V PCI protocol version 1.3 and
     PCI_BUS_RELATIONS2 (Long Li)

   - Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
     Feng)

   - Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)

  NVIDIA Tegra PCIe controller driver:

   - Use pci_parse_request_of_pci_ranges() (Rob Herring)

   - Add support for endpoint mode and related DT updates (Vidya Sagar)

   - Reduce -EPROBE_DEFER error message log level (Thierry Reding)

  Qualcomm PCIe controller driver:

   - Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)

  Synopsys DesignWare PCIe controller driver:

   - Refactor core initialization code for endpoint mode (Vidya Sagar)

   - Fix endpoint MSI-X to use correct table address (Kishon Vijay
     Abraham I)

  TI DRA7xx PCIe controller driver:

   - Fix MSI IRQ handling (Vignesh Raghavendra)

  TI Keystone PCIe controller driver:

   - Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)

  Miscellaneous:

   - Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
     Feng)

   - Use ioremap(), not phys_to_virt(), for platform ROM to fix video
     ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"

* tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
  misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
  PCI: tegra: Print -EPROBE_DEFER error message at debug level
  misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
  misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
  tools: PCI: Add 'e' to clear IRQ
  misc: pci_endpoint_test: Add ioctl to clear IRQ
  misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
  PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
  PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
  PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
  misc: pci_endpoint_test: Add support to get DMA option from userspace
  tools: PCI: Add 'd' command line option to support DMA
  misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
  PCI: endpoint: functions/pci-epf-test: Print throughput information
  PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
  PCI: pciehp: Fix MSI interrupt race
  PCI: pciehp: Fix indefinite wait on sysfs requests
  PCI: endpoint: Fix clearing start entry in configfs
  PCI: tegra: Add support for PCIe endpoint mode in Tegra194
  PCI: sysfs: Revert "rescan" file renames
  ...
This commit is contained in:
Linus Torvalds 2020-04-03 14:25:02 -07:00
commit 86f26a77cb
88 changed files with 4836 additions and 1635 deletions

View file

@ -0,0 +1,155 @@
.. SPDX-License-Identifier: GPL-2.0
===============
Boot Interrupts
===============
:Author: - Sean V Kelley <sean.v.kelley@linux.intel.com>
Overview
========
On PCI Express, interrupts are represented with either MSI or inbound
interrupt messages (Assert_INTx/Deassert_INTx). The integrated IO-APIC in a
given Core IO converts the legacy interrupt messages from PCI Express to
MSI interrupts. If the IO-APIC is disabled (via the mask bits in the
IO-APIC table entries), the messages are routed to the legacy PCH. This
in-band interrupt mechanism was traditionally necessary for systems that
did not support the IO-APIC and for boot. Intel in the past has used the
term "boot interrupts" to describe this mechanism. Further, the PCI Express
protocol describes this in-band legacy wire-interrupt INTx mechanism for
I/O devices to signal PCI-style level interrupts. The subsequent paragraphs
describe problems with the Core IO handling of INTx message routing to the
PCH and mitigation within BIOS and the OS.
Issue
=====
When in-band legacy INTx messages are forwarded to the PCH, they in turn
trigger a new interrupt for which the OS likely lacks a handler. When an
interrupt goes unhandled over time, they are tracked by the Linux kernel as
Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
reaches a specific count with the error "nobody cared". This disabled IRQ
now prevents valid usage by an existing interrupt which may happen to share
the IRQ line.
irq 19: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
Call Trace:
<IRQ>
? dump_stack+0x46/0x5e
? __report_bad_irq+0x2e/0xb0
? note_interrupt+0x242/0x290
? nNIKAL100_memoryRead16+0x8/0x10 [nikal]
? handle_irq_event_percpu+0x55/0x70
? handle_irq_event+0x4f/0x80
? handle_fasteoi_irq+0x81/0x180
? handle_irq+0x1c/0x30
? do_IRQ+0x41/0xd0
? common_interrupt+0x84/0x84
</IRQ>
handlers:
irq_default_primary_handler threaded usb_hcd_irq
Disabling IRQ #19
Conditions
==========
The use of threaded interrupts is the most likely condition to trigger
this problem today. Threaded interrupts may not be reenabled after the IRQ
handler wakes. These "one shot" conditions mean that the threaded interrupt
needs to keep the interrupt line masked until the threaded handler has run.
Especially when dealing with high data rate interrupts, the thread needs to
run to completion; otherwise some handlers will end up in stack overflows
since the interrupt of the issuing device is still active.
Affected Chipsets
=================
The legacy interrupt forwarding mechanism exists today in a number of
devices including but not limited to chipsets from AMD/ATI, Broadcom, and
Intel. Changes made through the mitigations below have been applied to
drivers/pci/quirks.c
Starting with ICX there are no longer any IO-APICs in the Core IO's
devices. IO-APIC is only in the PCH. Devices connected to the Core IO's
PCIe Root Ports will use native MSI/MSI-X mechanisms.
Mitigations
===========
The mitigations take the form of PCI quirks. The preference has been to
first identify and make use of a means to disable the routing to the PCH.
In such a case a quirk to disable boot interrupt generation can be
added.[1]
Intel® 6300ESB I/O Controller Hub
Alternate Base Address Register:
BIE: Boot Interrupt Enable
0 = Boot interrupt is enabled.
1 = Boot interrupt is disabled.
Intel® Sandy Bridge through Sky Lake based Xeon servers:
Coherent Interface Protocol Interrupt Control
dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
When this bit is set. Local INTx messages received from the
Intel® Quick Data DMA/PCI Express ports are not routed to legacy
PCH - they are either converted into MSI via the integrated IO-APIC
(if the IO-APIC mask bit is clear in the appropriate entries)
or cause no further action (when mask bit is set)
In the absence of a way to directly disable the routing, another approach
has been to make use of PCI Interrupt pin to INTx routing tables for
purposes of redirecting the interrupt handler to the rerouted interrupt
line by default. Therefore, on chipsets where this INTx routing cannot be
disabled, the Linux kernel will reroute the valid interrupt to its legacy
interrupt. This redirection of the handler will prevent the occurrence of
the spurious interrupt detection which would ordinarily disable the IRQ
line due to excessive unhandled counts.[2]
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
disable) the redirection of the interrupt handler to the PCH interrupt
line. The option can be overridden by either pci=ioapicreroute or
pci=noioapicreroute.[3]
More Documentation
==================
There is an overview of the legacy interrupt handling in several datasheets
(6300ESB and 6700PXH below). While largely the same, it provides insight
into the evolution of its handling with chipsets.
Example of disabling of the boot interrupt
------------------------------------------
Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
5.7.3 Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
Datasheet - Volume 2: Registers (Document # 330784-003)
6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Example of handler rerouting
----------------------------
Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
If you have any legacy PCI interrupt questions that aren't answered, email me.
Cheers,
Sean V Kelley
sean.v.kelley@linux.intel.com
[1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
[2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
[3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/

View file

@ -16,3 +16,4 @@ Linux PCI Bus Subsystem
pci-error-recovery
pcieaer-howto
endpoint/index
boot-interrupts

View file

@ -156,12 +156,6 @@ default reset_link function, but different upstream ports might
have different specifications to reset pci express link, so all
upstream ports should provide their own reset_link functions.
In struct pcie_port_service_driver, a new pointer, reset_link, is
added.
::
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
Section 3.2.2.2 provides more detailed info on when to call
reset_link.
@ -212,15 +206,10 @@ error_detected(dev, pci_channel_io_frozen) to all drivers within
a hierarchy in question. Then, performing link reset at upstream is
necessary. As different kinds of devices might use different approaches
to reset link, AER port service driver is required to provide the
function to reset link. Firstly, kernel looks for if the upstream
component has an aer driver. If it has, kernel uses the reset_link
callback of the aer driver. If the upstream component has no aer driver
and the port is downstream port, we will perform a hot reset as the
default by setting the Secondary Bus Reset bit of the Bridge Control
register associated with the downstream port. As for upstream ports,
they should provide their own aer service drivers with reset_link
function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
function to reset link via callback parameter of pcie_do_recovery()
function. If reset_link is not NULL, recovery function will use it
to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER
and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
to mmio_enabled.
helper functions
@ -243,9 +232,9 @@ messages to root port when an error is detected.
::
int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);`
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
pci_aer_clear_nonfatal_status clears non-fatal errors in the uncorrectable
error status register.
Frequent Asked Questions