#1060706 linux-image-6.1.0-17-amd64: intel i225 NIC loses PCIe link, network becomes unusable

Package:
src:linux
Source:
src:linux
Submitter:
Arno Lehmann
Date:
2024-02-12 12:36:05 UTC
Severity:
normal
Tags:
#1060706#5
Date:
2024-01-13 10:45:29 UTC
From:
To:
Dear Maintainer,


just having the computer run for a while, the network loses connection because
the NIC detached from PCIe. I suspect this is related to power management but
am not really sure.

As this seemed to be a known problem, I added pcie_aspm=off to the kernel
command line.

The problem happens more or less randomly, the computer is usually running 24/7:

# journalctl --grep 'PCIe link lost' --quiet | cat
Sep 20 14:21:17 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Okt 06 05:44:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Okt 07 16:39:10 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
Okt 23 18:31:25 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Okt 30 11:16:06 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Okt 31 13:50:06 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
Nov 22 18:59:11 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Nov 23 15:45:49 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Dez 19 07:33:02 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Jan 01 09:57:40 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Jan 10 16:15:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached


This is what I find in the kernel or system log:

Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
Jan 13 11:16:31 Zwerg kernel: ------------[ cut here ]------------
Jan 13 11:16:31 Zwerg kernel: igc: Failed to read reg 0xc030!
Jan 13 11:16:31 Zwerg kernel: WARNING: CPU: 18 PID: 6389 at drivers/net/ethernet/intel/igc/igc_main.c:6482 igc_rd32+0x91/0xa0 [igc]
Jan 13 11:16:31 Zwerg kernel: Modules linked in: rfcomm cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative nfsv3 nfs_acl rpcs>
Jan 13 11:16:31 Zwerg kernel:  configfs efivarfs ip_tables x_tables autofs4 xfs libcrc32c crc32c_generic dm_crypt dm_mod hid_generic amdgpu crc32_pc>
Jan 13 11:16:31 Zwerg kernel: CPU: 18 PID: 6389 Comm: kworker/18:1 Not tainted 6.1.0-17-amd64 #1  Debian 6.1.69-1
Jan 13 11:16:31 Zwerg kernel: Hardware name: ASUS System Product Name/ROG STRIX X670E-A GAMING WIFI, BIOS 1410 04/28/2023
Jan 13 11:16:31 Zwerg kernel: Workqueue: events igc_watchdog_task [igc]
Jan 13 11:16:31 Zwerg kernel: RIP: 0010:igc_rd32+0x91/0xa0 [igc]
Jan 13 11:16:31 Zwerg kernel: Code: 48 c7 c6 d0 55 56 c0 e8 0b 7d 6c f8 48 8b bd 28 ff ff ff e8 31 c7 23 f8 84 c0 74 b4 89 de 48 c7 c7 f8 55 56 c0 e>
Jan 13 11:16:31 Zwerg kernel: RSP: 0018:ffffac56d5f13df0 EFLAGS: 00010286
Jan 13 11:16:31 Zwerg kernel: RAX: 0000000000000000 RBX: 000000000000c030 RCX: 0000000000000027
Jan 13 11:16:31 Zwerg kernel: RDX: ffffa046f85a03a8 RSI: 0000000000000001 RDI: ffffa046f85a03a0
Jan 13 11:16:31 Zwerg kernel: RBP: ffffa03f45710c28 R08: 0000000000000000 R09: ffffac56d5f13c68
Jan 13 11:16:31 Zwerg kernel: R10: 0000000000000003 R11: ffffa04717f7ffe8 R12: ffffa03f45710000
Jan 13 11:16:31 Zwerg kernel: R13: 0000000000000000 R14: ffffa03f456efd40 R15: 000000000000c030
Jan 13 11:16:31 Zwerg kernel: FS:  0000000000000000(0000) GS:ffffa046f8580000(0000) knlGS:0000000000000000
Jan 13 11:16:31 Zwerg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 11:16:31 Zwerg kernel: CR2: 00007f1fc894f000 CR3: 00000008a8538000 CR4: 0000000000750ee0
Jan 13 11:16:31 Zwerg kernel: PKRU: 55555554
Jan 13 11:16:31 Zwerg kernel: Call Trace:
Jan 13 11:16:31 Zwerg kernel:  <TASK>


Obviously, the kernel parameter to disable PCIe power management was not solving this problem.

The way to recover is to restart the computer.

#1060706#10
Date:
2024-01-13 11:48:28 UTC
From:
To:
Possibly not related, but there's BIOS 1807 available.
#1060706#15
Date:
2024-01-13 12:47:04 UTC
From:
To:
Control: tags -1 + moreinfo

Just to be clear, can you confirm this is or is not a regression from
a previous running 6.1.y kernel? I'm asking because I suspect that
this similar to
https://lore.kernel.org/intel-wired-lan/20221031170535.77be0eb5@kernel.org/
and did not ever worked reliably with your hardware?

Regards,
Salvatore

#1060706#22
Date:
2024-01-13 15:45:37 UTC
From:
To:
Am 13.01.2024 um 12:48 schrieb Diederik de Haas:

I'll definitely give that a try -- when I'm physically close to the box!
Thanks for reminding me!

Arno

#1060706#27
Date:
2024-01-13 15:39:51 UTC
From:
To:
Hi Salvatore,

Am 13.01.2024 um 13:47 schrieb Salvatore Bonaccorso:

On this hardware, the network issues appeared right from the start.

First time I encountered it was with the Debian installation sime time
last year, and that's where my research led me to turn off PCIe power
management.

Actually I don't even know which was the first kernel version I had on
this host, but it's been on Bookworm for all its lifetime.

The symptoms sound quite different to me. But I can't claim to know
anything interesting about the different functionalities of PCIe or the
Linux way to use them...

Cheers,

Arno

#1060706#32
Date:
2024-01-13 16:13:37 UTC
From:
To:
Via https://snapshot.debian.org/package/linux-signed-amd64/ you have easy
access to previous (6.1) kernels uploaded to Debian with which you can check
if the problem was present in early 6.1 kernels.

#1060706#37
Date:
2024-01-13 19:22:39 UTC
From:
To:
Hi Diederik,

Am 13.01.2024 um 17:13 schrieb Diederik de Haas:
...
6.1.0-11-amd64

As I usually keep this box updated, and the problems happens only
randomly, I think the best way forward might be to try with a kernel
that did *not* show this problem.

Does that look reasonable?

So, I have:
# journalctl --grep PCIe\ link\ lost
-- Boot 86e1a04baba04a409c34796c0fb079ff --
Sep 20 14:21:17 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
-- Boot da6a00d9278a422686ca46d80e2f3ca6 --
-- Boot 28fcdfe079c446c6b184bb5b6407da73 --
Okt 06 05:44:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
Okt 07 16:39:10 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
-- Boot 51e3605887764b60b6d0130d4f6356c0 --
-- Boot ce944a4bbffc45b38c1357d3e822cd46 --
Okt 23 18:31:25 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
-- Boot e6d80407cab74d0b9e28b74642b544c0 --
Okt 30 11:16:06 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
Okt 31 13:50:06 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
-- Boot 452f25ce23fe4d569490fbc42683ecd6 --
Nov 22 18:59:11 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
-- Boot f1add031e2fa495aba569ab9c374ce65 --
Nov 23 15:45:49 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
-- Boot f766dabb981e4aa49f0922d7794dea76 --
-- Boot 6d7c91a86ab44da1973f5ca716dad105 --
-- Boot 3ba3df042e0648a1aebfa4fcea5499bf --
Dez 19 07:33:02 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
-- Boot a4aea30bb33747e7853abec194a2a395 --
Jan 01 09:57:40 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
-- Boot 377a326561dc4909b45c55cffcd1a94d --
Jan 10 16:15:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
-- Boot 50c5a6a9cc34496984fe3cde6ba8b72a --
Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
-- Boot a3c69838cab4426992a2f518a72a5e2b --


So I conclude I should look at something earlier than what was used with
boot 86e1a04baba04a409c34796c0fb079ff, i.e.

journalctl --boot 86e1a04baba04a409c34796c0fb079ff  | head -n 1
Aug 30 18:16:18 Zwerg kernel: Linux version 6.1.0-11-amd64
(debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU
ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian
6.1.38-4 (2023-08-08)

correct?

Via the page you reference, I find a kernel package
linux-image-6.1.0-1-amd64 6.1.4-1 which might be worth a try.

I'll need some time to sort out how to install such a package...

Thanks for your suggestion,

Arno

#1060706#42
Date:
2024-01-13 22:42:52 UTC
From:
To:
Yes

https://snapshot.debian.org/package/linux-signed-amd64/6.1.4%2B1/#linux-image-6.1.0-1-amd64_6.1.4-1

It should be as simple as downloading that .deb file and installing it via
``dpkg -i <deb-file>`` or
``apt install ./<deb-file>``

If you also have custom kernel modules via dkms, then you'd also need the
corresponding linux-headers package.
https://snapshot.debian.org/package/linux/6.1.4-1/#linux-headers-6.1.0-1-amd64_6.1.4-1

You could also try version 6.1~rc3+1~exp1, but if it's present in 6.1.4-1,
then I guess it's safe to say the issue is present in the whole 6.1 series
and it probably has never worked (as Salvatore thought).

#1060706#47
Date:
2024-01-18 21:12:33 UTC
From:
To:
Hi,

This "feels" like its probably not really a regression, thus the
similarity (though not the identical case as the referenced thread).

What about newer kernels? Do 6.6.11-1 or 6.7-1~exp1 taken from
unstable (resp. experimental) show the same problem?

If yes, then it might be an idea to bring it upstream.

Regards,
Salvatore

#1060706#52
Date:
2024-01-18 21:22:06 UTC
From:
To:
Hello,

Am 18.01.2024 um 22:12 schrieb Salvatore Bonaccorso:

Well, tricky... at this stage, we're guessing what will tell us more --
newer kernel or an older one. And then we'll need to wait for while to
see what happens.

Well, tomorrow morning I'll be on site and can then install another
kernel and reboot.

Cheers,

Arno

#1060706#57
Date:
2024-01-19 13:35:40 UTC
From:
To:
Hi all,

I have now installed an early 6.1 kernel:

$ uname -a
Linux Zwerg 6.1.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.4-1
(2023-01-07) x86_64 GNU/Linux

and not updated anything else. Also, still running with PCIe power
management in non-default:

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.1.0-1-amd64 root=/dev/mapper/Zwerg--vg-root ro
pcie_aspm=off quiet


Let's see how long this works :-) Or, rather, how much patience I have.
Failures were between few hours and up to four weeks apart...

Cheers,

Arno

#1060706#62
Date:
2024-01-24 16:08:27 UTC
From:
To:
Some news, but unfortunately not helping me to understand what we see :-)

Network link was lost during the day.

dmesg shows this:
[Tue Jan 23 06:54:24 2024] igc 0000:0a:00.0 eno1: NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX
[Tue Jan 23 16:24:13 2024] [drm:retrieve_link_cap [amdgpu]] *ERROR*
retrieve_link_cap: Read receiver caps dpcd data failed.
[Tue Jan 23 23:09:16 2024] igc 0000:0a:00.0 eno1: NIC Link is Down
[Tue Jan 23 23:09:19 2024] igc 0000:0a:00.0 eno1: NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX
[Wed Jan 24 12:00:23 2024] systemd-journald[750]:<irrelevant>
[Wed Jan 24 14:46:17 2024] nfs: server <redacted> not responding, timed out
[Wed Jan 24 14:46:17 2024] nfs: server <redacted> not responding, timed out
[Wed Jan 24 17:00:09 2024] nfs: server <redacted> not responding, timed out

Here, I rmmod'ed the igc module and modprobe'd it immediately.

[Wed Jan 24 17:00:36 2024] igc 0000:0a:00.0 eno1: PHC removed
[Wed Jan 24 17:00:42 2024] Intel(R) 2.5G Ethernet Linux Driver
[Wed Jan 24 17:00:42 2024] Copyright(c) 2018 Intel Corporation.
[Wed Jan 24 17:00:42 2024] igc 0000:0a:00.0: PCIe PTM not supported by
PCIe bus/controller
[Wed Jan 24 17:00:42 2024] pps pps0: new PPS source ptp0
[Wed Jan 24 17:00:42 2024] igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PHC added
[Wed Jan 24 17:00:42 2024] igc 0000:0a:00.0: 4.000 Gb/s available PCIe
bandwidth (5.0 GT/s PCIe x1 link)
[Wed Jan 24 17:00:42 2024] igc 0000:0a:00.0 eth0: MAC: c8:7f:54:67:6d:cc
[Wed Jan 24 17:00:42 2024] igc 0000:0a:00.0 eno1: renamed from eth0
[Wed Jan 24 17:00:45 2024] igc 0000:0a:00.0 eno1: NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX
[Wed Jan 24 17:00:45 2024] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link
becomes ready


So, we have a case of the NIC becoming unresponsive for some reason, but
I can not see or even guess the reason. I'll leave the system as it is
for a few more days, I think, and then try a much newer kernel.

Or -- any better suggestions?

Cheers,

Arno

#1060706#67
Date:
2024-01-27 09:06:28 UTC
From:
To:
Hi all,

newest developments:

some time shortly after leaving this computer yesterday evening, (at
least) the NVMe storage disappeared for the kernel. Console showed
messages about inaccesible files from journald all the time.

Reset using the hardware switch resulted in the UEFI interface, as the
firmware could find no storage. Power cycling fixed that.

Obviously, there is no useful log available.


I rebooted into the recent kernel, but that lost the network nearly
immediately. rmmod / modprobe took a few seconds, then also nearly
immediately the same:

# journalctl -b --grep 'PCIe link lost' --quiet | cat
Jan 27 09:44:53 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
Jan 27 09:48:05 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached

Looks like I can repeat that as much as I like now:

[Sa Jan 27 09:48:45 2024] igc: probe of 0000:0a:00.0 failed with error -13
[Sa Jan 27 09:52:15 2024] Intel(R) 2.5G Ethernet Linux Driver
[Sa Jan 27 09:52:15 2024] Copyright(c) 2018 Intel Corporation.
[Sa Jan 27 09:52:15 2024] igc 0000:0a:00.0: PCIe PTM not supported by
PCIe bus/controller
[Sa Jan 27 09:52:15 2024] igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
[Sa Jan 27 09:52:15 2024] ------------[ cut here ]------------
[Sa Jan 27 09:52:15 2024] igc: Failed to read reg 0x10!
[Sa Jan 27 09:52:15 2024] WARNING: CPU: 19 PID: 4334 at
drivers/net/ethernet/intel/igc/igc_main.c:6482 igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024] Modules linked in: igc(+) rfcomm
cpufreq_userspace cpufreq_powersave cpufreq_ondemand
cpufreq_conservative nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace fscache netfs qrtr overlay cmac algif_hash
algif_skcipher af_alg bnep sunrpc binfmt_misc nls_ascii nls_cp437 vfat
fat ext4 mbcache jbd2 btusb btrtl btbcm btintel btmtk bluetooth
jitterentropy_rng intel_rapl_msr intel_rapl_common uvcvideo edac_mce_amd
videobuf2_vmalloc drbg snd_hda_codec_hdmi videobuf2_memops ansi_cprng
videobuf2_v4l2 snd_usb_audio snd_hda_intel kvm_amd eeepc_wmi asus_nb_wmi
videobuf2_common ecdh_generic snd_intel_dspcfg asus_wmi ecc
snd_usbmidi_lib snd_intel_sdw_acpi crc16 snd_rawmidi battery videodev
snd_seq_device snd_hda_codec platform_profile kvm snd_hda_core
sparse_keymap snd_hwdep mc ledtrig_audio irqbypass snd_pcm rfkill rapl
snd_timer sp5100_tco wmi_bmof ccp snd k10temp watchdog pcspkr soundcore
joydev sg acpi_cpufreq evdev msr parport_pc ppdev lp parport fuse loop
[Sa Jan 27 09:52:15 2024]  efi_pstore configfs efivarfs ip_tables
x_tables autofs4 xfs libcrc32c crc32c_generic dm_crypt dm_mod
hid_generic amdgpu usbhid crc32_pclmul hid crc32c_intel sr_mod gpu_sched
cdrom drm_buddy i2c_algo_bit ghash_clmulni_intel drm_display_helper
sha512_ssse3 cec sha512_generic rc_core drm_ttm_helper sha256_ssse3 ttm
ahci sha1_ssse3 libahci xhci_pci drm_kms_helper nvme xhci_hcd libata
nvme_core drm aesni_intel usbcore t10_pi scsi_mod crc64_rocksoft_generic
crypto_simd crc64_rocksoft cryptd crc_t10dif crct10dif_generic
crct10dif_pclmul i2c_piix4 crc64 crct10dif_common scsi_common usb_common
video wmi gpio_amdpt gpio_generic button [last unloaded: igc]
[Sa Jan 27 09:52:15 2024] CPU: 19 PID: 4334 Comm: modprobe Tainted: G
  W          6.1.0-17-amd64 #1  Debian 6.1.69-1
[Sa Jan 27 09:52:15 2024] Hardware name: ASUS System Product Name/ROG
STRIX X670E-A GAMING WIFI, BIOS 1410 04/28/2023
[Sa Jan 27 09:52:15 2024] RIP: 0010:igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024] Code: 48 c7 c6 d0 c5 83 c0 e8 0b 0d 9f ca 48
8b bd 28 ff ff ff e8 31 57 56 ca 84 c0 74 b4 89 de 48 c7 c7 f8 c5 83 c0
e8 df 08 07 ca <0f> 0b eb a2 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
00 00 41 56
[Sa Jan 27 09:52:15 2024] RSP: 0018:ffffb41e9535bbc8 EFLAGS: 00010282
[Sa Jan 27 09:52:15 2024] RAX: 0000000000000000 RBX: 0000000000000010
RCX: 0000000000000027
[Sa Jan 27 09:52:15 2024] RDX: ffff9b1e785e03a8 RSI: 0000000000000001
RDI: ffff9b1e785e03a0
[Sa Jan 27 09:52:15 2024] RBP: ffff9b1714610c28 R08: 0000000000000000
R09: ffffb41e9535ba40
[Sa Jan 27 09:52:15 2024] R10: 0000000000000003 R11: ffff9b1e97f7ffe8
R12: ffff9b1714610000
[Sa Jan 27 09:52:15 2024] R13: ffff9b1714610980 R14: ffff9b1714610000
R15: ffff9b1714610c28
[Sa Jan 27 09:52:15 2024] FS:  00007fe52dcb3040(0000)
GS:ffff9b1e785c0000(0000) knlGS:0000000000000000
[Sa Jan 27 09:52:15 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sa Jan 27 09:52:15 2024] CR2: 00007fe52d5af1f4 CR3: 000000011d41e000
CR4: 0000000000750ee0
[Sa Jan 27 09:52:15 2024] PKRU: 55555554
[Sa Jan 27 09:52:15 2024] Call Trace:
[Sa Jan 27 09:52:15 2024]  <TASK>
[Sa Jan 27 09:52:15 2024]  ? __warn+0x7d/0xc0
[Sa Jan 27 09:52:15 2024]  ? igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024]  ? report_bug+0xe2/0x150
[Sa Jan 27 09:52:15 2024]  ? handle_bug+0x41/0x70
[Sa Jan 27 09:52:15 2024]  ? exc_invalid_op+0x13/0x60
[Sa Jan 27 09:52:15 2024]  ? asm_exc_invalid_op+0x16/0x20
[Sa Jan 27 09:52:15 2024]  ? igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024]  ? igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024]  igc_get_invariants_base+0xb5/0x260 [igc]
[Sa Jan 27 09:52:15 2024]  igc_probe+0x2b9/0x8d0 [igc]
[Sa Jan 27 09:52:15 2024]  local_pci_probe+0x41/0x80
[Sa Jan 27 09:52:15 2024]  pci_device_probe+0xc3/0x240
[Sa Jan 27 09:52:15 2024]  really_probe+0xde/0x380
[Sa Jan 27 09:52:15 2024]  ? pm_runtime_barrier+0x50/0x90
[Sa Jan 27 09:52:15 2024]  __driver_probe_device+0x78/0x120
[Sa Jan 27 09:52:15 2024]  driver_probe_device+0x1f/0x90
[Sa Jan 27 09:52:15 2024]  __driver_attach+0xce/0x1c0
[Sa Jan 27 09:52:15 2024]  ? __device_attach_driver+0x110/0x110
[Sa Jan 27 09:52:15 2024]  bus_for_each_dev+0x87/0xd0
[Sa Jan 27 09:52:15 2024]  bus_add_driver+0x1ae/0x200
[Sa Jan 27 09:52:15 2024]  driver_register+0x89/0xe0
[Sa Jan 27 09:52:15 2024]  ? 0xffffffffc1174000
[Sa Jan 27 09:52:15 2024]  do_one_initcall+0x59/0x220
[Sa Jan 27 09:52:15 2024]  do_init_module+0x4a/0x1f0
[Sa Jan 27 09:52:15 2024]  __do_sys_finit_module+0xac/0x120
[Sa Jan 27 09:52:15 2024]  do_syscall_64+0x5b/0xc0
[Sa Jan 27 09:52:15 2024]  ? do_syscall_64+0x67/0xc0
[Sa Jan 27 09:52:15 2024]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[Sa Jan 27 09:52:15 2024] RIP: 0033:0x7fe52d720559
[Sa Jan 27 09:52:15 2024] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00
00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 8 4c 8b
4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 77 08 0d 00 f7 d8
64 89 01 48
[Sa Jan 27 09:52:15 2024] RSP: 002b:00007ffd885b6ab8 EFLAGS: 00000246
ORIG_RAX: 0000000000000139
[Sa Jan 27 09:52:15 2024] RAX: ffffffffffffffda RBX: 0000559d3674ec30
RCX: 00007fe52d720559
[Sa Jan 27 09:52:15 2024] RDX: 0000000000000000 RSI: 0000559d35c644a0
RDI: 0000000000000003
[Sa Jan 27 09:52:15 2024] RBP: 0000559d35c644a0 R08: 0000000000000000
R09: 0000559d367513f0
[Sa Jan 27 09:52:15 2024] R10: 0000000000000003 R11: 0000000000000246
R12: 0000000000040000
[Sa Jan 27 09:52:15 2024] R13: 0000000000000000 R14: 0000559d3674edc0
R15: 0000000000000000
[Sa Jan 27 09:52:15 2024]  </TASK>
[Sa Jan 27 09:52:15 2024] ---[ end trace 0000000000000000 ]---


I'll now reboot into the old kernel and see if I can send this message
then :-)

...

And:

# uname -a
Linux Zwerg 6.1.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.4-1
(2023-01-07) x86_64 GNU/Linux

so far... ok, I'll give this kernel another try, but next round will
then be a backported ner-bleeding-edge one, I guess.

Cheers,

Arno

#1060706#72
Date:
2024-02-01 14:30:48 UTC
From:
To:
Another one:

[Do Feb 1 04:19:21 2024] igc 0000:0a:00.0 eno1: PCIe link lost, device
now detached
[Do Feb 1 04:19:21 2024] ------------[ cut here ]------------
[Do Feb 1 04:19:21 2024] igc: Failed to read reg 0xc030!
[Do Feb 1 04:19:21 2024] WARNING: CPU: 6 PID: 90291 at
drivers/net/ethernet/intel/igc/igc_main.c:6384 igc_rd32+0x91/0xa0 [igc]
[Do Feb 1 04:19:21 2024] Modules linked in: rfcomm cpufreq_userspace
cpufreq_powersave cpufreq_ondemand cpufreq_conservative nfsv3 nfs_acl
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
netfs qrtr overlay cmac algif_hash algif_skcipher af_alg bnep sunrpc
binfmt_misc nls_ascii nls_cp437 vfat fat ext4 mbcache jbd2
intel_rapl_msr intel_rapl_common btusb btrtl btbcm btintel btmtk
bluetooth edac_mce_amd jitterentropy_rng kvm_amd snd_hda_codec_hdmi
uvcvideo drbg snd_hda_intel kvm eeepc_wmi videobuf2_vmalloc ansi_cprng
videobuf2_memops snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio
snd_hda_codec videobuf2_v4l2 asus_wmi platform_profile videobuf2_common
battery snd_usbmidi_lib ccp sparse_keymap irqbypass ecdh_generic
sp5100_tco snd_hda_core snd_rawmidi ledtrig_audio ecc crc16 rapl rfkill
videodev pcspkr wmi_bmof snd_seq_device rng_core watchdog k10temp
snd_hwdep mc snd_pcm snd_timer joydev snd soundcore sg acpi_cpufreq
evdev msr parport_pc ppdev lp parport fuse loop efi_pstore
[Do Feb 1 04:19:21 2024] configfs efivarfs ip_tables x_tables autofs4
xfs libcrc32c crc32c_generic dm_crypt dm_mod hid_generic amdgpu
gpu_sched drm_buddy i2c_algo_bit drm_display_helper usbhid hid cec
sr_mod rc_core cdrom drm_ttm_helper ttm crc32_pclmul crc32c_intel
drm_kms_helper ahci ghash_clmulni_intel sha512_ssse3 libahci xhci_pci
sha512_generic libata xhci_hcd nvme drm nvme_core aesni_intel usbcore
igc scsi_mod t10_pi crypto_simd crc64_rocksoft_generic cryptd
crc64_rocksoft i2c_piix4 crc_t10dif ptp crct10dif_generic
crct10dif_pclmul crc64 pps_core crct10dif_common usb_common scsi_common
video wmi gpio_amdpt gpio_generic button
[Do Feb 1 04:19:21 2024] CPU: 6 PID: 90291 Comm: kworker/6:2 Not tainted
6.1.0-1-amd64 #1 Debian 6.1.4-1
[Do Feb 1 04:19:21 2024] Hardware name: ASUS System Product Name/ROG
STRIX X670E-A GAMING WIFI, BIOS 1410 04/28/2023
[Do Feb 1 04:19:21 2024] Workqueue: events igc_watchdog_task [igc]
[Do Feb 1 04:19:21 2024] RIP: 0010:igc_rd32+0x91/0xa0 [igc]
[Do Feb 1 04:19:21 2024] Code: 48 c7 c6 f8 b4 71 c0 e8 78 08 90 f0 48 8b
bd 28 ff ff ff e8 d1 50 48 f0 84 c0 74 b4 89 de 48 c7 c7 20 b5 71 c0 e8
b3 34 8c f0 <0f> 0b eb a2 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 41 56
[Do Feb 1 04:19:21 2024] RSP: 0018:ffffafa297007df0 EFLAGS: 00010282
[Do Feb 1 04:19:21 2024] RAX: 0000000000000000 RBX: 000000000000c030
RCX: 0000000000000000
[Do Feb 1 04:19:21 2024] RDX: 0000000000000002 RSI: ffffffffb193289e
RDI: 00000000ffffffff
[Do Feb 1 04:19:21 2024] RBP: ffff988bd7b88c20 R08: 0000000000000000
R09: ffffafa297007c78
[Do Feb 1 04:19:21 2024] R10: 0000000000000003 R11: ffff989b17f7ffe8
R12: ffff988bd7b88000
[Do Feb 1 04:19:21 2024] R13: 0000000000000000 R14: ffff989345341d40
R15: 000000000000c030
[Do Feb 1 04:19:21 2024] FS: 0000000000000000(0000)
GS:ffff989af8400000(0000) knlGS:0000000000000000
[Do Feb 1 04:19:21 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Do Feb 1 04:19:21 2024] CR2: 00007f6ce7d94000 CR3: 00000008850cc000
CR4: 0000000000750ee0
[Do Feb 1 04:19:21 2024] PKRU: 55555554
[Do Feb 1 04:19:21 2024] Call Trace:
[Do Feb 1 04:19:21 2024] <TASK>
[Do Feb 1 04:19:21 2024] igc_update_stats+0x86/0x6c0 [igc]
[Do Feb 1 04:19:21 2024] igc_watchdog_task+0xa3/0x2c0 [igc]
[Do Feb 1 04:19:21 2024] process_one_work+0x1c7/0x380
[Do Feb 1 04:19:21 2024] worker_thread+0x4d/0x380
[Do Feb 1 04:19:21 2024] ? _raw_spin_lock_irqsave+0x23/0x50
[Do Feb 1 04:19:21 2024] ? rescuer_thread+0x3a0/0x3a0
[Do Feb 1 04:19:21 2024] kthread+0xe9/0x110
[Do Feb 1 04:19:21 2024] ? kthread_complete_and_exit+0x20/0x20
[Do Feb 1 04:19:21 2024] ret_from_fork+0x22/0x30
[Do Feb 1 04:19:21 2024] </TASK>
[Do Feb 1 04:19:21 2024] ---[ end trace 0000000000000000 ]---


next round: trying a more bleeding-edge kernel from backports...

# uname -a
Linux Zwerg 6.5.0-0.deb12.4-amd64 #1 SMP PREEMPT_DYNAMIC Debian
6.5.10-1~bpo12+1 (2023-11-23) x86_64 GNU/Linux

is what I just booted into.

Now -- we wait.

Cheers,

Arno

#1060706#77
Date:
2024-02-02 23:02:42 UTC
From:
To:
FWIW I'm having the same problems. Granted, this NIC is in a
Thunderbolt dock, so one can't exclude this as a factor, but the
errors are identical. This is even the case running the 6.7.1-1~exp1
kernel from experimental.

#1060706#82
Date:
2024-02-08 18:29:10 UTC
From:
To:
Hi all,

so, latest news.

System lost access to the NVMe again and could recover from that only
after powercycling. Pings, until that powercycle, worked so I assume the
NIC and software above it were still functional.

Rebooted into the 6.5 backported kernel, downloaded the newest BIOS,
noticed the NIC getting lost, wrote the BIOS image to USB key, rebooted
into the UEFI / BIOS control tool, flashed the newest firmware, set all
defaults and conservative power saving settings and booted into Debian
again.

Kernel is
# uname -a
Linux Zwerg 6.5.0-0.deb12.4-amd64 #1 SMP PREEMPT_DYNAMIC Debian
6.5.10-1~bpo12+1 (2023-11-23) x86_64 GNU/Linux

These are the latest such events:
Jan 27 09:44:53 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
Jan 27 09:48:05 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
Jan 27 09:52:16 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
Feb 01 04:19:17 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
Feb 01 14:43:03 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
Feb 08 18:33:38 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
Feb 08 19:00:32 Zwerg kernel: igc 0000:0b:00.0 eno1: PCIe link lost,
device now detached
Feb 08 19:02:38 Zwerg kernel: igc 0000:0b:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached

I think it's safe to say that the actual kernel version does not have an
effect on those events.

Naturally, the NVMe connectivity losses are not logged but I believe it
might be an interesting thing to see if I can capture that. Perhaps
sending system logs to USB storage might work. However, I think it would
be important to understand if this ticket's topic is a matter of the igc
module, or perhaps about the power or PCIe management functionality (of
which I know even less).

The big question: What can I do to help further pinpointing this problem?

Thanks,

Arno

#1060706#87
Date:
2024-02-09 12:39:23 UTC
From:
To:
And another instance, and this time I thought about getting messages
from an attempted igc module reloading.




[Fr Feb  9 13:25:08 2024] igc 0000:0b:00.0 eno1: PCIe link lost, device
now detached
[Fr Feb  9 13:25:08 2024] ------------[ cut here ]------------
[Fr Feb  9 13:25:08 2024] igc: Failed to read reg 0xc030!
[Fr Feb  9 13:25:08 2024] WARNING: CPU: 20 PID: 84300 at
drivers/net/ethernet/intel/igc/igc_main.c:6583 igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:25:08 2024] Modules linked in: exfat rfcomm
cpufreq_userspace cpufreq_powersave cpufreq_ondemand
cpufreq_conservative nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace fscache netfs qrtr overlay cmac algif_hash
algif_skcipher af_alg bnep sunrpc binfmt_misc nls_ascii nls_cp437 vfat
fat ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common btusb btrtl btbcm
btintel btmtk bluetooth snd_hda_codec_hdmi mt7921e mt7921_common
edac_mce_amd mt76_connac_lib snd_hda_intel uvcvideo snd_intel_dspcfg
mt76 sha3_generic snd_intel_sdw_acpi videobuf2_vmalloc snd_usb_audio
kvm_amd snd_hda_codec jitterentropy_rng uvc snd_usbmidi_lib
videobuf2_memops mac80211 snd_hda_core drbg videobuf2_v4l2 libarc4
snd_rawmidi eeepc_wmi asus_nb_wmi kvm videodev snd_hwdep snd_seq_device
ansi_cprng asus_wmi cfg80211 snd_pcm videobuf2_common battery irqbypass
ecdh_generic ledtrig_audio ecc sparse_keymap sp5100_tco mc crc16 ccp
snd_timer platform_profile rapl wmi_bmof watchdog pcspkr k10temp snd
rfkill soundcore joydev sg evdev msr
[Fr Feb  9 13:25:08 2024] CPU: 20 PID: 84300 Comm: kworker/20:0 Not
tainted 6.5.0-0.deb12.4-amd64 #1  Debian 6.5.10-1~bpo12+1
[Fr Feb  9 13:25:08 2024] Hardware name: ASUS System Product Name/ROG
STRIX X670E-A GAMING WIFI, BIOS 1904 01/29/2024
[Fr Feb  9 13:25:08 2024] Workqueue: events igc_watchdog_task [igc]
[Fr Feb  9 13:25:08 2024] RIP: 0010:igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:25:08 2024] Code: 48 c7 c6 10 36 3a c0 e8 81 aa dd e6 48
8b bb 28 ff ff ff e8 05 12 b4 e6 84 c0 74 bc 89 ee 48 c7 c7 38 36 3a c0
e8 c3 2e 53 e6 <0f> 0b eb aa b8 ff ff ff ff e9 15 0f 04 e7 0f 1f 44 00
00 90 90 90
[Fr Feb  9 13:25:08 2024] RSP: 0018:ffffb034cc61bdd8 EFLAGS: 00010282
[Fr Feb  9 13:25:08 2024] RAX: 0000000000000000 RBX: ffff97078f882cb8
RCX: 0000000000000027
[Fr Feb  9 13:25:08 2024] RDX: ffff97169e7213c8 RSI: 0000000000000001
RDI: ffff97169e7213c0
[Fr Feb  9 13:25:08 2024] RBP: 000000000000c030 R08: 0000000000000000
R09: ffffb034cc61bc68
[Fr Feb  9 13:25:08 2024] R10: 0000000000000003 R11: ffff9716dde3ac28
R12: ffff97078f882000
[Fr Feb  9 13:25:08 2024] R13: 0000000000000000 R14: ffff970784592d40
R15: 000000000000c030
[Fr Feb  9 13:25:08 2024] FS:  0000000000000000(0000)
GS:ffff97169e700000(0000) knlGS:0000000000000000
[Fr Feb  9 13:25:08 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fr Feb  9 13:25:08 2024] CR2: 00007f5271155f80 CR3: 0000000434bc6000
CR4: 0000000000750ee0
[Fr Feb  9 13:25:08 2024] PKRU: 55555554
[Fr Feb  9 13:25:08 2024] Call Trace:
[Fr Feb  9 13:25:08 2024]  <TASK>
[Fr Feb  9 13:25:08 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:25:08 2024]  ? __warn+0x81/0x130
[Fr Feb  9 13:25:08 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:25:08 2024]  ? report_bug+0x171/0x1a0
[Fr Feb  9 13:25:08 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Fr Feb  9 13:25:08 2024]  ? prb_read_valid+0x1b/0x30
[Fr Feb  9 13:25:08 2024]  ? handle_bug+0x41/0x70
[Fr Feb  9 13:25:08 2024]  ? exc_invalid_op+0x17/0x70
[Fr Feb  9 13:25:08 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Fr Feb  9 13:25:08 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:25:08 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:25:08 2024]  igc_update_stats+0x8a/0x6d0 [igc]
[Fr Feb  9 13:25:08 2024]  igc_watchdog_task+0x9d/0x4a0 [igc]
[Fr Feb  9 13:25:08 2024]  process_one_work+0x1df/0x3e0
[Fr Feb  9 13:25:08 2024]  worker_thread+0x51/0x390
[Fr Feb  9 13:25:08 2024]  ? __pfx_worker_thread+0x10/0x10
[Fr Feb  9 13:25:08 2024]  kthread+0xe5/0x120
[Fr Feb  9 13:25:08 2024]  ? __pfx_kthread+0x10/0x10
[Fr Feb  9 13:25:08 2024]  ret_from_fork+0x31/0x50
[Fr Feb  9 13:25:08 2024]  ? __pfx_kthread+0x10/0x10
[Fr Feb  9 13:25:08 2024]  ret_from_fork_asm+0x1b/0x30
[Fr Feb  9 13:25:08 2024]  </TASK>
[Fr Feb  9 13:25:08 2024] ---[ end trace 0000000000000000 ]---

subsequent rmmod igc && modprobe igc got me

[Fr Feb  9 13:27:09 2024] igc 0000:0b:00.0 eno1: PHC removed
[Fr Feb  9 13:27:17 2024] Intel(R) 2.5G Ethernet Linux Driver
[Fr Feb  9 13:27:17 2024] Copyright(c) 2018 Intel Corporation.
[Fr Feb  9 13:27:17 2024] igc 0000:0b:00.0: enabling device (0000 -> 0002)
[Fr Feb  9 13:27:17 2024] igc 0000:0b:00.0: PCIe PTM not supported by
PCIe bus/controller
[Fr Feb  9 13:27:17 2024] igc 0000:0b:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
[Fr Feb  9 13:27:17 2024] ------------[ cut here ]------------
[Fr Feb  9 13:27:17 2024] igc: Failed to read reg 0x10!
[Fr Feb  9 13:27:17 2024] WARNING: CPU: 3 PID: 84566 at
drivers/net/ethernet/intel/igc/igc_main.c:6583 igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:27:17 2024] Modules linked in: igc(+) exfat rfcomm
cpufreq_userspace cpufreq_powersave cpufreq_ondemand
cpufreq_conservative nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace fscache netfs qrtr overlay cmac algif_hash
algif_skcipher af_alg bnep sunrpc binfmt_misc nls_ascii nls_cp437 vfat
fat ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common btusb btrtl btbcm
btintel btmtk bluetooth snd_hda_codec_hdmi mt7921e mt7921_common
edac_mce_amd mt76_connac_lib snd_hda_intel uvcvideo snd_intel_dspcfg
mt76 sha3_generic snd_intel_sdw_acpi videobuf2_vmalloc snd_usb_audio
kvm_amd snd_hda_codec jitterentropy_rng uvc snd_usbmidi_lib
videobuf2_memops mac80211 snd_hda_core drbg videobuf2_v4l2 libarc4
snd_rawmidi eeepc_wmi asus_nb_wmi kvm videodev snd_hwdep snd_seq_device
ansi_cprng asus_wmi cfg80211 snd_pcm videobuf2_common battery irqbypass
ecdh_generic ledtrig_audio ecc sparse_keymap sp5100_tco mc crc16 ccp
snd_timer platform_profile rapl wmi_bmof watchdog pcspkr k10temp snd
rfkill soundcore joydev sg evdev
[Fr Feb  9 13:27:17 2024]  msr parport_pc ppdev lp parport fuse loop
efi_pstore configfs efivarfs ip_tables x_tables autofs4 xfs libcrc32c
crc32c_generic sd_mod dm_crypt dm_mod uas usb_storage hid_generic amdgpu
amdxcp drm_buddy gpu_sched i2c_algo_bit drm_suballoc_helper usbhid
drm_display_helper hid sr_mod cec cdrom rc_core drm_ttm_helper ttm
crc32_pclmul crc32c_intel drm_kms_helper ghash_clmulni_intel ahci
sha512_ssse3 libahci xhci_pci sha512_generic libata xhci_hcd nvme drm
nvme_core aesni_intel scsi_mod t10_pi usbcore crypto_simd cryptd
crc64_rocksoft_generic i2c_piix4 crc64_rocksoft crc_t10dif
crct10dif_generic crct10dif_pclmul scsi_common crc64 crct10dif_common
usb_common video wmi gpio_amdpt gpio_generic button [last unloaded: igc]
[Fr Feb  9 13:27:17 2024] CPU: 3 PID: 84566 Comm: modprobe Tainted: G
     W          6.5.0-0.deb12.4-amd64 #1  Debian 6.5.10-1~bpo12+1
[Fr Feb  9 13:27:17 2024] Hardware name: ASUS System Product Name/ROG
STRIX X670E-A GAMING WIFI, BIOS 1904 01/29/2024
[Fr Feb  9 13:27:17 2024] RIP: 0010:igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:27:17 2024] Code: 48 c7 c6 10 36 3a c0 e8 81 aa dd e6 48
8b bb 28 ff ff ff e8 05 12 b4 e6 84 c0 74 bc 89 ee 48 c7 c7 38 36 3a c0
e8 c3 2e 53 e6 <0f> 0b eb aa b8 ff ff ff ff e9 15 0f 04 e7 0f 1f 44 00
00 90 90 90
[Fr Feb  9 13:27:17 2024] RSP: 0018:ffffb034ccb2baa0 EFLAGS: 00010286
[Fr Feb  9 13:27:17 2024] RAX: 0000000000000000 RBX: ffff9707a086ecb8
RCX: 0000000000000027
[Fr Feb  9 13:27:17 2024] RDX: ffff97169e2e13c8 RSI: 0000000000000001
RDI: ffff97169e2e13c0
[Fr Feb  9 13:27:17 2024] RBP: 0000000000000010 R08: 0000000000000000
R09: ffffb034ccb2b930
[Fr Feb  9 13:27:17 2024] R10: 0000000000000003 R11: ffff9716dde3ac28
R12: ffff9707a086e000
[Fr Feb  9 13:27:17 2024] R13: ffff9707a086e9c0 R14: ffff9707a086e000
R15: ffff9707a086ecb8
[Fr Feb  9 13:27:17 2024] FS:  00007f709389c040(0000)
GS:ffff97169e2c0000(0000) knlGS:0000000000000000
[Fr Feb  9 13:27:17 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fr Feb  9 13:27:17 2024] CR2: 0000559154b5b188 CR3: 00000004e4972000
CR4: 0000000000750ee0
[Fr Feb  9 13:27:17 2024] PKRU: 55555554
[Fr Feb  9 13:27:17 2024] Call Trace:
[Fr Feb  9 13:27:17 2024]  <TASK>
[Fr Feb  9 13:27:17 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:27:17 2024]  ? __warn+0x81/0x130
[Fr Feb  9 13:27:17 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:27:17 2024]  ? report_bug+0x171/0x1a0
[Fr Feb  9 13:27:17 2024]  ? prb_read_valid+0x1b/0x30
[Fr Feb  9 13:27:17 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Fr Feb  9 13:27:17 2024]  ? handle_bug+0x41/0x70
[Fr Feb  9 13:27:17 2024]  ? exc_invalid_op+0x17/0x70
[Fr Feb  9 13:27:17 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Fr Feb  9 13:27:17 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[Fr Feb  9 13:27:17 2024]  igc_get_invariants_base+0xb9/0x260 [igc]
[Fr Feb  9 13:27:17 2024]  igc_probe+0x2ed/0x970 [igc]
[Fr Feb  9 13:27:17 2024]  local_pci_probe+0x42/0xa0
[Fr Feb  9 13:27:17 2024]  pci_device_probe+0xc7/0x240
[Fr Feb  9 13:27:17 2024]  really_probe+0x19f/0x400
[Fr Feb  9 13:27:17 2024]  ? __pfx___driver_attach+0x10/0x10
[Fr Feb  9 13:27:17 2024]  __driver_probe_device+0x78/0x160
[Fr Feb  9 13:27:17 2024]  driver_probe_device+0x1f/0x90
[Fr Feb  9 13:27:17 2024]  __driver_attach+0xd2/0x1c0
[Fr Feb  9 13:27:17 2024]  bus_for_each_dev+0x85/0xd0
[Fr Feb  9 13:27:17 2024]  bus_add_driver+0x116/0x220
[Fr Feb  9 13:27:17 2024]  driver_register+0x59/0x100
[Fr Feb  9 13:27:17 2024]  ? __pfx_igc_init_module+0x10/0x10 [igc]
[Fr Feb  9 13:27:17 2024]  do_one_initcall+0x5a/0x320
[Fr Feb  9 13:27:17 2024]  do_init_module+0x60/0x240
[Fr Feb  9 13:27:17 2024]  init_module_from_file+0x86/0xc0
[Fr Feb  9 13:27:17 2024]  idempotent_init_module+0x120/0x2b0
[Fr Feb  9 13:27:17 2024]  __x64_sys_finit_module+0x5e/0xb0
[Fr Feb  9 13:27:17 2024]  do_syscall_64+0x5c/0xc0
[Fr Feb  9 13:27:17 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Fr Feb  9 13:27:17 2024]  ? ksys_mmap_pgoff+0xec/0x1f0
[Fr Feb  9 13:27:17 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Fr Feb  9 13:27:17 2024]  ? exit_to_user_mode_prepare+0x40/0x1e0
[Fr Feb  9 13:27:17 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Fr Feb  9 13:27:17 2024]  ? syscall_exit_to_user_mode+0x2b/0x40
[Fr Feb  9 13:27:17 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Fr Feb  9 13:27:17 2024]  ? do_syscall_64+0x6b/0xc0
[Fr Feb  9 13:27:17 2024]  ? do_syscall_64+0x6b/0xc0
[Fr Feb  9 13:27:17 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Fr Feb  9 13:27:17 2024]  ? exit_to_user_mode_prepare+0x40/0x1e0
[Fr Feb  9 13:27:17 2024]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Fr Feb  9 13:27:17 2024] RIP: 0033:0x7f709399e719
[Fr Feb  9 13:27:17 2024] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00
00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b
4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8
64 89 01 48
[Fr Feb  9 13:27:17 2024] RSP: 002b:00007ffd5ebd1f78 EFLAGS: 00000246
ORIG_RAX: 0000000000000139
[Fr Feb  9 13:27:17 2024] RAX: ffffffffffffffda RBX: 0000563800fbbc30
RCX: 00007f709399e719
[Fr Feb  9 13:27:17 2024] RDX: 0000000000000000 RSI: 00005637fff544a0
RDI: 0000000000000003
[Fr Feb  9 13:27:17 2024] RBP: 00005637fff544a0 R08: 0000000000000000
R09: 0000563800fbe650
[Fr Feb  9 13:27:17 2024] R10: 0000000000000003 R11: 0000000000000246
R12: 0000000000040000
[Fr Feb  9 13:27:17 2024] R13: 0000000000000000 R14: 0000563800fbbdc0
R15: 0000000000000000
[Fr Feb  9 13:27:17 2024]  </TASK>
[Fr Feb  9 13:27:17 2024] ---[ end trace 0000000000000000 ]---
[Fr Feb  9 13:27:57 2024] igc: probe of 0000:0b:00.0 failed with error -13


Can anybody suggest what information I can provide to tackle this?

Thanks,

Arno

#1060706#92
Date:
2024-02-09 13:43:07 UTC
From:
To:
I see you have (now) an up-to-date BIOS. Good.

$ scripts/get_maintainer.pl drivers/net/ethernet/intel/igc/ returned this:
Jesse Brandeburg <jesse.brandeburg@intel.com> (supporter:INTEL ETHERNET DRIVERS)
Tony Nguyen <anthony.l.nguyen@intel.com> (supporter:INTEL ETHERNET DRIVERS)
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING DRIVERS)
Eric Dumazet <edumazet@google.com> (maintainer:NETWORKING DRIVERS)
Jakub Kicinski <kuba@kernel.org> (maintainer:NETWORKING DRIVERS)
Paolo Abeni <pabeni@redhat.com> (maintainer:NETWORKING DRIVERS)
intel-wired-lan@lists.osuosl.org (moderated list:INTEL ETHERNET DRIVERS)
netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
linux-kernel@vger.kernel.org (open list)

To do that, I'd certainly send an email to netdev@vger.kernel.org as that is
the Mailing List. You can choose to add others from that list too.
In that email I recommend to include the following info:
- Description of the problems: I'd focus on the NIC stuff, but do also mention
  the issue you encountered with NVMe.
- A list or table with the kernel versions you detected the problem with.
  Try to find/use the upstream version as the Debian version (6.1.0-17) is
  often not (that) useful for the upstream maintainers. `uname -a` will show
  both. Via https://tracker.debian.org/pkg/linux I found that 6.1.0-17 is
  upstream version 6.1.69 as the 6.1.69-1 upload had "Bump ABI to 17" at the
  end of the changelog.
  IIUC this is not a regression; mention that too.
- A/The stacktrace(s) you got. This usually allows the upstream maintainers
  to pinpoint where the problem lies.

HTH

#1060706#97
Date:
2024-02-12 11:56:45 UTC
From:
To:
Reported upstream, see

https://lore.kernel.org/netdev/3179622f-7090-4a57-98ba-9042809a0d2a@its-lehmann.de/T/#u

keep your fingers crossed this will have someone interested ;-)

Cheers,

Arno

#1060706#102
Date:
2024-02-12 12:32:47 UTC
From:
To:
Control: forwarded -1 https://lore.kernel.org/netdev/3179622f-7090-4a57-98ba-9042809a0d2a@its-lehmann.de/

Excellent, thanks