#587666 Hangs when loading vlan interface immediately after main interface

#587666#5
Date:
2010-06-30 17:58:53 UTC
From:
To:
When I try to bring eth0 up followed by eth0.3 the machine hangs.

/etc/network/interfaces looks like:

	auto lo
	iface lo inet loopback

	#auto eth0
	iface eth0 inet loopback

	#auto eth0.3
	iface eth0.3 inet static
	address 10.4.4.202
		netmask 255.255.255.0
		gateway 10.4.4.1

If I:
	'ifup eth0'
	wait a few seconds until 'link becomes ready'
	'ifup eth0.3'

then everything is fine.

If I:
	'ifup eth0;ifup eth0.3'

then I get a stack trace followed by the machine locking up (unless the interfaces have been previously up. In that case doing 'rmmod tg3;modprobe tg3;ifdown eth0;ifdown eth0.3' before ifup will cause it to panic).

The machine is a HS20 (Type 8832) blade in an IBM bladecenter and unfortunately I've not had any success with Serial Over Lan so I can't currently get a backtrace. I will try to see if I can adjust the resolution sufficiently to take a screenshot of the whole backtrace though.

lspci:

00:00.0 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset) (rev 33)
00:00.1 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset)
00:00.2 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset)
00:01.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 Host bridge: Broadcom CSB6 South Bridge (rev b0)
00:0f.1 IDE interface: Broadcom CSB6 RAID/IDE Controller (rev b0)
00:0f.2 USB Controller: Broadcom CSB6 OHCI USB Controller (rev 05)
00:0f.3 ISA bridge: Broadcom GCLE-2 Host Bridge
00:10.0 Host bridge: Broadcom CIOB-E I/O Bridge with Gigabit Ethernet (rev 12)
00:10.2 Host bridge: Broadcom CIOB-E I/O Bridge with Gigabit Ethernet (rev 12)
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 02)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 02)

Please ask for any extra information that would be useful.

Thanks,

#587666#10
Date:
2010-06-30 19:44:34 UTC
From:
To:
Hi,

Am Mittwoch, 30. Juni 2010 schrieben Sie:
"loopback" is only for the loopback interface. I guess you don't want to have
l3 addresses on eth0. Please try:

iface eth0 inet static
	up ip l s eth0 up
	down ip l s eth0 down


Greetings
Timo

#587666#15
Date:
2010-06-30 20:25:07 UTC
From:
To:
Hi,

Am Mittwoch, 30. Juni 2010 schrieb ich:
Sorry, I meant "manual" instead of "static".


Greetings
Timo

#587666#20
Date:
2010-06-30 21:50:34 UTC
From:
To:
-- Sorry, I just replied direct instead of to the bug report:




 >> "loopback" is only for the loopback interface. I guess you don't want to
 >> have l3 addresses on eth0. Please try:
 >>
 >> iface eth0 inet static
 > Sorry, I meant "manual" instead of "static".
 >>     up ip l s eth0 up
 >>     down ip l s eth0 down

Thanks, but it does the same thing.

I will look at it more in the morning. I didn't think about the loopback
statement as a possible cause.

I'm also looking into kexec as a possible way to get some debug
information but its an uphill struggle!

Thanks,

Ian

#587666#25
Date:
2010-07-01 08:50:18 UTC
From:
To:
Hello,

It's obviously a kernel bug. I can't help you with that. I have
to reassign it. The probable cause is the managing of up/down
callbacks that vlan cannot handle and has NULL pointers assigned and
that tg3 without checking calls.
I can advise you to try to build your own kernel. 2.6.33 series
is pretty stable *IFF* you leave out bridging vlans on bonds on
bnx2 :-). (Those work again in 2.6.34 though with bonds being the
culprit).
ipmisol? It's pretty easy to set up. But your motherboard must be
new enough. The linux based BMC's can handle ipmisol pretty well,
but the older non linux based BMC really s...k at doing their
job.

BTW: I only have experience with DELL and supermicro ;-).
G200eW WPCM450

#587666#30
Date:
2010-07-01 14:50:50 UTC
From:
To:
Okay, I've tried various other combinations of eth0 config with no luck.
I haven't been able to get more information on the actual crash but I've
attached a screenshot, just in case its of any use.

Could this bug be related to 585770? That bug report uses the same
kernel and the tg3 driver also.

I will try changing kernel version to see if that resolves it.

Thanks,

Ian

#587666#35
Date:
2010-07-01 16:19:41 UTC
From:
To:
Hello,

It's definitely the same bug.

Well, compiling a vanilla kernel will not have the:
bugfix/all/vlan-macvlan-propagate-transmission-state-to-upper-layer.patch
in it, which introduced the bug. They probably left out the parts
where the propagate have to check for NULL pointers to see if the
propagation is supported ;-).

Hmmm, I actually reported/fixed the same type of bug in 2.6.29:
vlan-macvlan-fix-null-pointer-dereferences-in-ethtool-handlers.patch

But then again a rewrite took place after that to make a better
grouping of those functions which makes layering of those devices
elegantly,fast and possible.
Actually the rewrite is stable/finished around 2.6.34 ;-). If you
don't need bonding I would check out 2.6.33.5 ;-).

#587666#40
Date:
2010-07-01 17:18:25 UTC
From:
To:
Downgrading to linux-image-2.6.26-2-686 seems to fix the problem. I
can't find a newer prebuilt kernel to test with apart from the one in
experimental (doesn't seem to be installable).

2.6.26-2 is an acceptable fix for me but I can test later kernels if
requested.

Does this bug need moving to the kernel?

Thanks,

Ian

#587666#45
Date:
2010-07-02 10:43:15 UTC
From:
To:
Hi,
Yes, I need to reassign it to the debian-kernel maintainers. It's
not a problem with vanilla kernels I guess, since I have a lot of
servers with vlan tagging, and I only use vanilla ;-). (Something
about the taste maybe).

But I should be the one mv-ing it, else I am a bad, bad maintainer
;-)