The original post: /r/debian by /u/sinisterpisces on 2024-12-25 17:18:39.

Hello,

I just saw this in my console, using a Thunderbolt 3 (Aquantia-based) NIC. I know these aren’t ideal, but this looks like it might be a power setting that I could tweak. The card is rock-solid when it’s plugged in, and was only down for ~2 seconds. OTOH, since it came back up, it’s not visible in lspci anymore. :P

tl;dr the retimer controlling the bus this device is on resets and takes the device down with it. Is there some flag I can set to keep this from happening?

vectorsigma ~% uname -a [TrueNAS 24.10.1]
Linux vectorsigma 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Mon Dec 16 20:59:32 UTC 2024 x86_64 GNU/Linux

Meanwhile in the log:

Dec 25 08:28:38 vectorsigma kernel: thunderbolt 1-0:1.1: retimer disconnected

Dec 25 08:28:38 vectorsigma kernel: thunderbolt 1-1: device disconnected

Dec 25 08:28:38 vectorsigma kernel: pcieport 0000:00:07.2: pciehp: Slot(5): Link Down

Dec 25 08:28:38 vectorsigma kernel: pcieport 0000:00:07.2: pciehp: Slot(5): Card not present

Dec 25 08:28:38 vectorsigma kernel: atlantic 0000:2e:00.0 enp46s0: failed to kill vid 0081/0

Dec 25 08:28:39 vectorsigma kernel: pci_bus 0000:2e: busn_res: [bus 2e] is released

Dec 25 08:28:39 vectorsigma kernel: pci_bus 0000:2d: busn_res: [bus 2d-2e] is released

Dec 25 08:28:40 vectorsigma kernel: thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 device=0x15ee

Dec 25 08:28:41 vectorsigma kernel: thunderbolt 1-1: new device found, vendor=0x56 device=0x10d2

Dec 25 08:28:41 vectorsigma kernel: thunderbolt 1-1: QNAP Systems, Inc. QNA-T310G1S

Additional diagnostic information:

8087:15ee appears to be an Intel retimer, so I think that’s actually part of the motherboard’s Thunderbolt implementation. So, it’s not clear to me if this is a problem with the settings on the TB3 NIC, or if the motherboard’s settings are the issue.

It looks like the retimer is getting disconnected, and then takes the NIC down with it, but I’m just guessing at this point.

TrueNAS has locked their OS down enough that I can’t set a UDEV rule to just auto-authorize every device on the bus, so I have to manually re-active at the NIC via the device tree every time this happens:

root@vectorsigma:/sys/devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1# echo 1 > authorized

Yes, this is ridiculous. Worse, it makes the NIC unusable for a remote system, as I can’t exactly SSH in to fix the NIC if need the NIC to be working to SSH in.