kernel-ark/drivers/pci
Yinghai Lu ac205b7bb7 PCI: make sriov work with hotplug remove
When hot removing a pci express module that has a pcie switch and supports
SRIOV, we got:

[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain🚌device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain🚌dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...

 +-[0000:80]-+-00.0-[81-8f]--
 |           +-01.0-[90-9f]--
 |           +-02.0-[a0-af]--
 |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
 |           |                               |            +-00.1 Device
 |           |                               |            +-00.2 Device
 |           |                               |            \-00.3 Device
 |           |                               \-03.0-[b3]--+-00.0 Device
 |           |                                            +-00.1 Device
 |           |                                            +-00.2 Device
 |           |                                            \-00.3 Device

root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
end devices  are b2:00.0 and b3.00.0.
VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3

Root cause: when doing pci_stop_bus_device() with phys fn, it will stop
virt fn and remove the fn, so
	list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.

Solution is just replacing list_for_each_safe() with
list_for_each_prev_safe().  This will make sure we can get valid n pointer
to PF instead of the freed VF pointer (because newly added devices are
inserted to the bus->devices list tail).

During reviewing the patch, Bjorn said:
|   The PCI hot-remove path calls pci_stop_bus_devices() via
|   pci_remove_bus_device().
|
|   pci_stop_bus_devices() traverses the bus->devices list (point A below),
|   stopping each device in turn, which calls the driver remove() method.  When
|   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
|   also uses pci_remove_bus_device() to remove the VF devices from the
|   bus->devices list (point B).
|
|       pci_remove_bus_device
|         pci_stop_bus_device
|           pci_stop_bus_devices(subordinate)
|             list_for_each(bus->devices)             <-- A
|               pci_stop_bus_device(PF)
|                 ...
|                   driver->remove
|                     pci_disable_sriov
|                       ...
|                         pci_remove_bus_device(VF)
|                             <remove from bus_list>  <-- B
|
|   At B, we're changing the same list we're iterating through at A, so when
|   the driver remove() method returns, the pci_stop_bus_devices() iterator has
|   a pointer to a list entry that has already been freed.

Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141
				 https://lkml.org/lkml/2012/1/23/360

-v5: According to Linus to make remove more robust, Change to
     list_for_each_prev_safe instead. That is more reasonable, because
     those devices are added to tail of the list before.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-02-14 08:44:59 -08:00
..
hotplug PCI: drivers/pci/hotplug/ibmphp_ebda.c: add missing iounmap 2012-02-14 08:44:47 -08:00
pcie module_param: make bool parameters really bool (drivers & misc) 2012-01-13 09:32:20 +10:30
.gitignore
access.c PCI: Introduce INTx check & mask API 2012-01-06 12:10:34 -08:00
ats.c Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci 2012-01-11 18:50:26 -08:00
bus.c PCI: add helpers for building PCI bus resource lists 2012-01-06 12:10:50 -08:00
hotplug-pci.c pci: Fix files needing export.h for EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:22 -04:00
hotplug.c
htirq.c pci: Fix files needing export.h for EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:22 -04:00
ioapic.c pci, x86/io-apic: Allow PCI_IOAPIC to be user configurable on x86 2011-12-06 09:21:05 +01:00
iov.c PCI: set pci sriov page size before reading SRIOV BAR 2012-02-10 12:01:56 -08:00
irq.c pci: Fix files needing export.h for EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:22 -04:00
Kconfig pci, x86/io-apic: Allow PCI_IOAPIC to be user configurable on x86 2011-12-06 09:21:05 +01:00
Makefile PCI: Move ATS implementation into own file 2011-10-14 09:05:33 -07:00
msi.c x86/PCI: Expand the x86_msi_ops to have a restore MSIs. 2012-01-06 14:02:26 -08:00
msi.h
of.c PCI: OF: Don't crash when bridge parent is NULL. 2011-08-19 08:51:37 -07:00
pci-acpi.c PCI/ACPI/PM: Avoid resuming devices that don't signal PME 2012-01-06 12:10:29 -08:00
pci-driver.c PCI: pci_has_legacy_pm_support add driver and device to WARN 2012-01-06 12:10:30 -08:00
pci-label.c switch ->is_visible() to returning umode_t 2012-01-03 22:54:55 -05:00
pci-stub.c
pci-sysfs.c PCI: Make rescan bus increase bridge resource size if needed 2012-02-14 08:44:53 -08:00
pci.c PCI: Introduce __pci_reset_function_locked to be used when holding device_lock. 2012-02-14 08:44:48 -08:00
pci.h PCI: Enable ATS at the device state restore 2012-01-06 12:11:18 -08:00
probe.c PCI: Make pci_rescan_bus handle add_list 2012-02-14 08:44:53 -08:00
proc.c
quirks.c pci: Fix files needing export.h for EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:22 -04:00
remove.c PCI: make sriov work with hotplug remove 2012-02-14 08:44:59 -08:00
rom.c pci: Fix files needing export.h for EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:22 -04:00
search.c
setup-bus.c PCI: remove add_to_failed_list() 2012-02-14 08:44:58 -08:00
setup-irq.c PCI: Make the struct pci_dev * argument of pci_fixup_irqs const. 2011-07-22 08:26:06 -07:00
setup-res.c PCI: Move pdev_sort_resources() to setup-bus.c 2012-02-14 08:44:54 -08:00
slot.c pci: add module.h to files implicitly relying on its presence. 2011-10-31 19:31:23 -04:00
syscall.c
vpd.c pci: Fix files needing export.h for EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:22 -04:00
xen-pcifront.c Xen: consolidate and simplify struct xenbus_driver instantiation 2012-01-04 17:01:17 -05:00