When I try to bring eth0 up followed by eth0.3 the machine hangs. /etc/network/interfaces looks like: auto lo iface lo inet loopback #auto eth0 iface eth0 inet loopback #auto eth0.3 iface eth0.3 inet static address 10.4.4.202 netmask 255.255.255.0 gateway 10.4.4.1 If I: 'ifup eth0' wait a few seconds until 'link becomes ready' 'ifup eth0.3' then everything is fine. If I: 'ifup eth0;ifup eth0.3' then I get a stack trace followed by the machine locking up (unless the interfaces have been previously up. In that case doing 'rmmod tg3;modprobe tg3;ifdown eth0;ifdown eth0.3' before ifup will cause it to panic). The machine is a HS20 (Type 8832) blade in an IBM bladecenter and unfortunately I've not had any success with Serial Over Lan so I can't currently get a backtrace. I will try to see if I can adjust the resolution sufficiently to take a screenshot of the whole backtrace though. lspci: 00:00.0 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset) (rev 33) 00:00.1 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset) 00:00.2 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset) 00:01.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 00:0f.0 Host bridge: Broadcom CSB6 South Bridge (rev b0) 00:0f.1 IDE interface: Broadcom CSB6 RAID/IDE Controller (rev b0) 00:0f.2 USB Controller: Broadcom CSB6 OHCI USB Controller (rev 05) 00:0f.3 ISA bridge: Broadcom GCLE-2 Host Bridge 00:10.0 Host bridge: Broadcom CIOB-E I/O Bridge with Gigabit Ethernet (rev 12) 00:10.2 Host bridge: Broadcom CIOB-E I/O Bridge with Gigabit Ethernet (rev 12) 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 02) 01:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 02) Please ask for any extra information that would be useful. Thanks,
Hi, Am Mittwoch, 30. Juni 2010 schrieben Sie: "loopback" is only for the loopback interface. I guess you don't want to have l3 addresses on eth0. Please try: iface eth0 inet static up ip l s eth0 up down ip l s eth0 down Greetings Timo
Hi, Am Mittwoch, 30. Juni 2010 schrieb ich: Sorry, I meant "manual" instead of "static". Greetings Timo
-- Sorry, I just replied direct instead of to the bug report: >> "loopback" is only for the loopback interface. I guess you don't want to >> have l3 addresses on eth0. Please try: >> >> iface eth0 inet static > Sorry, I meant "manual" instead of "static". >> up ip l s eth0 up >> down ip l s eth0 down Thanks, but it does the same thing. I will look at it more in the morning. I didn't think about the loopback statement as a possible cause. I'm also looking into kexec as a possible way to get some debug information but its an uphill struggle! Thanks, Ian
Hello, It's obviously a kernel bug. I can't help you with that. I have to reassign it. The probable cause is the managing of up/down callbacks that vlan cannot handle and has NULL pointers assigned and that tg3 without checking calls. I can advise you to try to build your own kernel. 2.6.33 series is pretty stable *IFF* you leave out bridging vlans on bonds on bnx2 :-). (Those work again in 2.6.34 though with bonds being the culprit). ipmisol? It's pretty easy to set up. But your motherboard must be new enough. The linux based BMC's can handle ipmisol pretty well, but the older non linux based BMC really s...k at doing their job. BTW: I only have experience with DELL and supermicro ;-). G200eW WPCM450
Okay, I've tried various other combinations of eth0 config with no luck. I haven't been able to get more information on the actual crash but I've attached a screenshot, just in case its of any use. Could this bug be related to 585770? That bug report uses the same kernel and the tg3 driver also. I will try changing kernel version to see if that resolves it. Thanks, Ian
Hello, It's definitely the same bug. Well, compiling a vanilla kernel will not have the: bugfix/all/vlan-macvlan-propagate-transmission-state-to-upper-layer.patch in it, which introduced the bug. They probably left out the parts where the propagate have to check for NULL pointers to see if the propagation is supported ;-). Hmmm, I actually reported/fixed the same type of bug in 2.6.29: vlan-macvlan-fix-null-pointer-dereferences-in-ethtool-handlers.patch But then again a rewrite took place after that to make a better grouping of those functions which makes layering of those devices elegantly,fast and possible. Actually the rewrite is stable/finished around 2.6.34 ;-). If you don't need bonding I would check out 2.6.33.5 ;-).
Downgrading to linux-image-2.6.26-2-686 seems to fix the problem. I can't find a newer prebuilt kernel to test with apart from the one in experimental (doesn't seem to be installable). 2.6.26-2 is an acceptable fix for me but I can test later kernels if requested. Does this bug need moving to the kernel? Thanks, Ian
Hi, Yes, I need to reassign it to the debian-kernel maintainers. It's not a problem with vanilla kernels I guess, since I have a lot of servers with vlan tagging, and I only use vanilla ;-). (Something about the taste maybe). But I should be the one mv-ing it, else I am a bad, bad maintainer ;-)