#608173 openafs module crash when shutting down afs service

#608173#5
Date:
2010-12-28 09:40:17 UTC
From:
To:
When shutting down a freshly installed openafs-client machine the
openafs module crashed. Details from the syslog are below. The
particular circumstances were as follows:

1) Client machine was freshly installed with squeeze.
   Only basic packages were installed, no desktop, fileserver etc.
   Afs services configured to start at boot time.
2) Some files were copied from a afs server which acts both as
   dbserver and fileserver. The file server runs lenny with openafs
   packages from backports.
3) Afs server was shutdown (hardware problems).
4) Afs client was shutdown without problem.
5) Afs client was started, afs server was not.
6) Afs client machine works normally.
7) Afs client machine shutdown. Afs module crashed:


Dec 27 12:10:04 tv init: Switching to runlevel: 0
Dec 27 12:10:05 tv kernel: [ 8287.213136] COLD shutting down of: CB... afs... BkG... CTrunc... AFSDB... RxEvent... md: md2 still in use.
Dec 27 12:10:05 tv kernel: [ 8287.301475] md: md1 still in use.
Dec 27 12:10:05 tv kernel: [ 8287.301548] md: md0 still in use.
Dec 27 12:10:05 tv kernel: [ 8287.301749] md: md2 still in use.
Dec 27 12:10:05 tv kernel: [ 8287.301855] md: md1 still in use.
Dec 27 12:10:05 tv kernel: [ 8287.301929] md: md0 still in use.
Dec 27 12:10:06 tv kernel: [ 8287.716024] UnmaskRxkSignals... RxListener...
Dec 27 12:10:06 tv kernel: [ 8287.716403] WARNING: not all blocks freed: large -1 small -6
Dec 27 12:10:06 tv kernel: [ 8287.716406]  ALL allocated tables
Dec 27 12:10:06 tv kernel: [ 8287.785471] ------------[ cut here ]------------
Dec 27 12:10:06 tv kernel: [ 8287.786488] kernel BUG at /build/buildd-linux-2.6_2.6.32-29-i386-Of6Yt1/linux-2.6-2.6.32/debian/build/source_i386_none/fs/fs-writeback.c:156!
Dec 27 12:10:06 tv kernel: [ 8287.787528] invalid opcode: 0000 [#1] SMP
Dec 27 12:10:06 tv kernel: [ 8287.788567] last sysfs file: /sys/devices/virtual/block/md0/md/level
Dec 27 12:10:06 tv kernel: [ 8287.788948] Modules linked in: loop firewire_sbp2 tda9887 tda8290 tda827x tda10023 tuner_simple tuner_types wm8775 tuner cx25840 dvb_usb_dib0700 budget_ci dib7000p dib7000m nouveau dib0070 ir_common ivtv cx2341x dvb_usb budget_core v4l2_common saa7146 videodev dib3000mc ttm dib8000 v4l1_compat psmouse snd_intel8x0 drm_kms_helper snd_ac97_codec dibx000_common ttpci_eeprom tveeprom serio_raw pcspkr ac97_bus drm snd_pcm evdev dvb_core snd_timer i2c_i801 i2c_algo_bit snd i2c_core soundcore shpchp rng_core button processor snd_page_alloc pci_hotplug ext3 jbd mbcache raid1 md_mod sg sr_mod cdrom sd_mod crc_t10dif ata_generic usbhid hid uhci_hcd ata_piix firewire_ohci ehci_hcd sata_promise libata e1000 firewire_core crc_itu_t thermal scsi_mod usbcore nls_base thermal_sys [last unloaded: openafs]
Dec 27 12:10:06 tv kernel: [ 8287.788948]
Dec 27 12:10:06 tv kernel: [ 8287.788948] Pid: 1994, comm: sync Tainted: P           (2.6.32-5-686 #1) To Be Filled By O.E.M.
Dec 27 12:10:06 tv kernel: [ 8287.788948] EIP: 0060:[<c10c8b31>] EFLAGS: 00010246 CPU: 0
Dec 27 12:10:06 tv kernel: [ 8287.788948] EIP is at bdi_queue_work+0x14/0x8f
Dec 27 12:10:06 tv kernel: [ 8287.788948] EAX: 00000000 EBX: 0000000e ECX: 00000000 EDX: f66a1b00
Dec 27 12:10:06 tv kernel: [ 8287.788948] ESI: f4482100 EDI: f66a1b00 EBP: f64ce000 ESP: f64cff84
Dec 27 12:10:06 tv kernel: [ 8287.788948]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Dec 27 12:10:06 tv kernel: [ 8287.788948] Process sync (pid: 1994, ti=f64ce000 task=f66a8000 task.ti=f64ce000)
Dec 27 12:10:06 tv kernel: [ 8287.788948] Stack:
Dec 27 12:10:06 tv kernel: [ 8287.788948]  0000000e 00000001 f4482100 c10c8f42 0000000e 00000000 00000000 00000000
Dec 27 12:10:06 tv kernel: [ 8287.788948] <0> bfc2f0b4 00000000 c10cc1c8 c10030fb bfc2f0b4 00000001 00000000 00000001
Dec 27 12:10:06 tv kernel: [ 8287.788948] <0> 00000000 bfc2f008 00000024 0000007b 0000007b 00000000 00000033 00000024
Dec 27 12:10:06 tv kernel: [ 8287.788948] Call Trace:
Dec 27 12:10:06 tv kernel: [ 8287.788948]  [<c10c8f42>] ? wakeup_flusher_threads+0x55/0x6b
Dec 27 12:10:06 tv kernel: [ 8287.788948]  [<c10cc1c8>] ? sys_sync+0x7/0x29
Dec 27 12:10:06 tv kernel: [ 8287.788948]  [<c10030fb>] ? sysenter_do_call+0x12/0x28
Dec 27 12:10:06 tv kernel: [ 8287.788948] Code: c1 ba 08 00 00 00 6a 02 e8 4c ff ff ff 5b 83 c4 2c 89 f8 5b 5e 5f c3 57 89 d7 56 89 c6 53 8b 80 d0 00 00 00 85 c0 89 42 10 75 04 <0f> 0b eb fe 8b 86 d4 00 00 00 89 42 14 83 be d4 00 00 00 00 75
Dec 27 12:10:06 tv kernel: [ 8287.788948] EIP: [<c10c8b31>] bdi_queue_work+0x14/0x8f SS:ESP 0068:f64cff84
Dec 27 12:10:06 tv kernel: [ 8287.825082] ---[ end trace 9bdd764202d12871 ]---
Dec 27 12:10:06 tv ntpd[1595]: ntpd exiting on signal 15
Dec 27 12:10:06 tv acpid: exiting
Dec 27 12:10:06 tv rpc.statd[941]: Caught signal 15, un-registering and exiting
Dec 27 12:10:06 tv kernel: Kernel logging (proc) stopped.
Dec 27 12:10:06 tv rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="1093" x-info="http://www.rsyslog.com"] exiting on signal 15.


The problem seems to be repeatable. After this crash I started
the afs client and tried to stop afs services with
/etc/init.d/openafs-client stop
and the machine frooze with a message on the screen wich was similar to
the one above. It was not in the logs but I photographed it and can provide
it if deemed useful.

#608173#10
Date:
2010-12-28 11:18:26 UTC
From:
To:
Here is the information about the crash when stopping afs services:

root@tv:~# /etc/init.d/openafs-client stop
Stopping AFS services:afsd: Shutting down all afs processes and afs state
 openafs.
root@tv:~# [  350.188180] bdi f6616c00/<NULL> is not registred!
[  350.190355] bdi f6616b00/<NULL> is not registred!
[  350.192895] bdi f6616400/<NULL> is not registred!
[  350.195445] bdi f6616900/<NULL> is not registred!
[  350.197538] BUG: unable to handle kernel NULL pointer dereference at (null)
[  350.199738] IP: [<c1097df8>] bdi_forker_task+0x11b/0x26b
[  350.201525] *pde = 00000000
[  350.201525] Oops: 0000 [#1] SMP
[  350.201525] last sysfs file: /sys/devices/virtual/bdi/afs/uevent
[  350.201525] Modules linked in: loop firewire_sbp2 tda9887 tda8290 tda827x tda10023 tuner_simple tuner_types wm8775 tuner cx25840 dvb_usb_dib0700 dib7000p dib7000m snd_intel8x0 budget_ci dib0070 ir_common dvb_usb snd_ac97_codec budget_core ivtv dib3000mc dib8000 ac97_bus cx2341x dibx000_common saa7146 v412_common snd_pcm nouveau videodev snd_timer ttpci_eeprom dvb_core v411_compat ttm snd psmouse drm_kms_helper serio_raw drm tveeprom evdev soundcore i2c_algo_bit rng_core i2c_i801 pcspkr shpchp i2c_core button snd_page_alloc pci_hotplug processor ext3 jbd mbcache raid1 md_mod sg sd_mod crc_t10dif sr_mod cdrom ata_generic usbhid hid uhci_hcd ata_piix sata_promise thermal ehci_hcd libata firewire_ohci thermal_sys firewire_core crc_itu_t e1000 scsi_mod usbcore nls_base [last unloaded: openafs]
[  350.201525]
[  350.201525] Pid: 17, comm: bdi-default Tainted: P        W  (2.6.32-5-686 #1) To Be Filled By O.E.M.
[  350.201525] EIP: 0060:[<c1097df8>] EFLAGS: 00010286 CPU: 0
[  350.201525] EIP is at bdi_forker_task+0x11b/0x26b
[  350.201525] EAX: 0000003c EBX: 00000000 ECX: f6cd5f64 EDX: c12fcb3c
[  350.201525] ESI: 00000000 EDI: c1395878 EBP: f6c33300 ESP: f6cd5f6c
[  350.201525]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  350.201525] Process bdi-default (pid: 17, ti=f6cd4000 task=f6c33300 task.ti=f6cd4000)
[  350.201525] Stack:
[  350.201525]  c1395858 f6cd5f84 c2a08100 c1395880 c1395878 c1395870 00000001 c1030669
[  350.201525] <0> 00000000 c2a036ac f6c334bc fffedb30 f6c23f74 00000296 00000000 00000000
[  350.201525] <0> 00000000 f6c23f70 c1395858 c1097cdd 00000000 c10439c0 00000000 00000000
[  350.201525] Call Trace:
[  350.201525]  [<c1020669>] ? __wake_up_common+0x34/0x59
[  350.201525]  [<c1097cdd>] ? bdi_forker_task+0x0/0x26b
[  350.201525]  [<c10439c0>] ? kthread+0x61/0x66
[  350.201525]  [<c104395f>] ? kthread+0x0/0x66
[  350.201525]  [<c1003d47>] ? kernel_thread_helper+0x7/0x10
[  350.201525] Code: c1 e8 a0 41 1d 00 83 c4 0c eb 1e 8b 53 04 8b 03 89 50 04 89 02 8d 43 08 ba 95 76 09 c1 c7 43 04 00 02 20 00 e8 ec 6e fd ff 89 f3 <8b> 36 81 fb b0 58 39 c1 0f 85 5b ff ff ff b8 01 00 00 00 87 45
[  350.201525] EIP: [<c1097df8>] bdi_forker_task+0x11b/0x26b
[  350.201525] CR2: 0000000000000000
[  350.275036] ---[ end trace 420bafa9739498f6 ]---
[  350.277290] Kernel panic - not syncing: Fatal exception in interrupt
[  350.277296] Pid: 17, comm: bdi-default Tainted: P      D W  2.6.32-5-686 #1

#608173#15
Date:
2011-01-06 00:39:11 UTC
From:
To:
Anders Lennartsson <deb@lennartsson.se> writes:

Hi Anders,

I'm going to forward this bug upstream.  It's been my past experience that
whenever the shutdown message says "COLD", trouble is ahead and the
shutdown often crashes.  I don't know what causes cold shutdowns instead
of the normal warm shutdowns; hopefully upstream can figure out what's
going on.

Once AFS shutdown has crashed once, you're generally hosed until you do a
clean reboot.  Have you rebooted this system since?  Can you reproduce the
problem again after a reboot?

#608173#24
Date:
2011-01-06 06:08:18 UTC
From:
To:
Hello

Thanks for pointing out this detail with the COLD shutdown. I have not
seen that before.

Yes it was reproducible after a reboot. Crash was as in total
crash. No keyboard function, no ssh login etc. But the power
button functioned :)

The first time I noticed this was when I asked the machine to
shutdown, where the afs crash stopped the shutdown sequence. After a
forced rebooting I made an attempt to shutdown the OpenAFS service
only, something which repeatably (well, three times in a row) crashed
the client machine totally and forced a power button induced
reboot. This was as long as the OpenAFS server was shutdown.

Once new disks to the server arrived and an OS was installed,
including OpenAFS db-server and fileserver, and the server machine was
up and running, I could start and shutdown OpenAFS client services on
the client machine normally. I had not changed or updated any packages
in the meantime. All the files I had backuped on the client
machine transferred back to the server over afs without problem, I
compared checksums after copying into afs.

A few more details that may (or may not) be important:

The client machine is a Pentium4 running Squeeze i386, (686 kernel).

I have a laptop that runs Squeeze amd64 and it is also an OpenAFS
client, but the client is not set to start at boot time. But when I
start afs services I normally don't shut them down each time I bring
the laptop away. Occasionally I reboot and a running afs client has
never caused any problems if it cannot contact the server at client
shutdown, nor computer shutdown.

#608173#29
Date:
2011-09-09 18:45:50 UTC
From:
To:
 This is my first time reporting a bug, so I am not sure if this is a new bug or its similar to this one.


    dpkg -l | grep openafs
	ii  openafs-client                      1.4.12.1+dfsg-4              AFS distributed filesystem client support
	ii  openafs-modules-dkms                1.4.12.1+dfsg-4              AFS distributed filesystem kernel module DKMS source

	How to reporduce
	1- install openafs-modules-dkms openafs-client
	2- service openafs-client stop
	3- change AFS_DYNROOT=false in /etc/openafs/afs.conf.client ( let openafs client fail to load root.afs)
	4- service openafs-client start
	5- service openafs-client stop
	system hangs
	================================================================
	using openafs-modules-dkms 1.4.12.1+dfsg-4

	[  719.950866] BUG: unable to handle kernel NULL pointer dereference at (null)
	[  719.950869] IP: [<ffffffff810c8e60>] bdi_forker_task+0x13f/0x2bd
	[  719.950873] PGD 0
	[  719.950874] Oops: 0000 [#1] SMP
	[  719.950876] last sysfs file: /sys/devices/virtual/bdi/afs/uevent
	[  719.950878] CPU 1
	[  719.950878] Modules linked in: netconsole configfs ext4 jbd2 crc16 snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd i2c_piix4
	 i2c_core soundcore parport_pc psmouse snd_page_alloc parport joydev processor button serio_raw pcspkr ac evdev ext3 jbd mbcache dm_mod sg
	  sr_mod usbhid hid cdrom sd_mod crc_t10dif ata_generic ata_piix ahci ohci_hcd e1000 libata ehci_hcd scsi_mod thermal thermal_sys usbcore n
	  ls_base [last unloaded: openafs]
	  [  719.950892] Pid: 25, comm: bdi-default Tainted: P        W  2.6.32-5-amd64 #1 VirtualBox
	  [  719.950893] RIP: 0010:[<ffffffff810c8e60>]  [<ffffffff810c8e60>] bdi_forker_task+0x13f/0x2bd
	  [  719.950896] RSP: 0018:ffff88003fa7de50  EFLAGS: 00010202
	  [  719.950897] RAX: 0000000000000044 RBX: 0000000000000000 RCX: ffffffff8142c000
	  [  719.950898] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000246
	  [  719.950899] RBP: 0000000000000000 R08: 00000000000073ca R09: ffffffff814611f0
	  [  719.950901] R10: 00000001018960c0 R11: ffffffff81027ff8 R12: ffffffff81473980
	  [  719.950902] R13: ffff88003fa33f90 R14: ffff88003fa7de60 R15: ffffffff814739d0
	  [  719.950903] FS:  0000000000000000(0000) GS:ffff880001880000(0000) knlGS:0000000000000000
	  [  719.950905] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
	  [  719.950906] CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006e0
	  [  719.950913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	  [  719.950914] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
	  [  719.950916] Process bdi-default (pid: 25, threadinfo ffff88003fa7c000, task ffff88003fa33f90)
	  [  719.950917] Stack:
	  [  719.950918]  ffffffff814739c0 ffffffff814739b0 ffff88003fa33f90 ffff88003fa34288
	  [  719.950919] <0> 000000023f9afcd8 0000000000000000 ffff88003fa34288 ffffffff8103aa76
	  [  719.950921] <0> 00000003cc1c13c4 ffff88003f9afcc8 ffffffff81473980 ffff88003f9afcd0
	  [  719.950923] Call Trace:
	  [  719.950925]  [<ffffffff8103aa76>] ? __wake_up_common+0x44/0x72
	  [  719.950927]  [<ffffffff810c8d21>] ? bdi_forker_task+0x0/0x2bd
	  [  719.950929]  [<ffffffff81064c4d>] ? kthread+0x79/0x81
	  [  719.950931]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
	  [  719.950933]  [<ffffffff81064bd4>] ? kthread+0x0/0x81
	  [  719.950934]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
	  [  719.950935] Code: 8b 53 08 48 8d 7b 10 48 c7 c6 9b 85 0c 81 48 89 50 08 48 89 02 48 b8 00 02 20 00 00 00 ad de 48 89 43 08 e8 df f2 fc
	  ff 48 89 eb <48> 8b 6d 00 48 81 fb 30 3a 47 81 0f 85 3a ff ff ff b8 01 00 00
	  [  719.950945] RIP  [<ffffffff810c8e60>] bdi_forker_task+0x13f/0x2bd
	  [  719.950947]  RSP <ffff88003fa7de50>
	  [  719.950948] CR2: 0000000000000000
	  [  719.950950] ---[ end trace 22ba0eb7caa8e3c2 ]---
	  [  719.950951] Kernel panic - not syncing: Fatal exception in interrupt
	  [  719.950952] Pid: 25, comm: bdi-default Tainted: P      D W  2.6.32-5-amd64 #1
	  [  719.950953] Call Trace:
	  [  719.950956]  [<ffffffff812fa67a>] ? panic+0x86/0x143
	  [  719.950959]  [<ffffffff8106861c>] ? down_trylock+0x28/0x2e
	  [  719.950961]  [<ffffffff8104e71e>] ? console_unblank+0x16/0x60
	  [  719.950963]  [<ffffffff812fd3b5>] ? oops_end+0xa7/0xb4
	  [  719.950966]  [<ffffffff810323fc>] ? no_context+0x1e9/0x1f8
	  [  719.950968]  [<ffffffff810325b1>] ? __bad_area_nosemaphore+0x1a6/0x1ca
	  [  719.950970]  [<ffffffff8107747f>] ? is_module_text_address+0x5/0xc
	  [  719.950972]  [<ffffffff810c8e1a>] ? bdi_forker_task+0xf9/0x2bd
	  [  719.950974]  [<ffffffff812fc895>] ? page_fault+0x25/0x30
	  [  719.950977]  [<ffffffff81027ff8>] ? flat_send_IPI_mask+0x0/0x5
	  [  719.950979]  [<ffffffff810c8e60>] ? bdi_forker_task+0x13f/0x2bd
	  [  719.950981]  [<ffffffff8103aa76>] ? __wake_up_common+0x44/0x72
	  [  719.950983]  [<ffffffff810c8d21>] ? bdi_forker_task+0x0/0x2bd
	  [  719.950984]  [<ffffffff81064c4d>] ? kthread+0x79/0x81
	  [  719.950986]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
	  [  719.950987]  [<ffffffff81064bd4>] ? kthread+0x0/0x81
	  [  719.950988]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20


	  =====================================================================
	  using openafs-modules-dkms    1.4.14+dfsg-2~bpo60+1

	  [   70.379011] ------------[ cut here ]------------
	  [   70.379012] WARNING: at /build/buildd-linux-2.6_2.6.32-35-amd64-aZSlKL/linux-2.6-2.6.32/debian/build/source_amd64_none/mm/backing-dev.c
	  :492 bdi_forker_task+0xf9/0x2bd()
	  [   70.379014] Hardware name: VirtualBox
	  [   70.379014] Modules linked in: netconsole configfs ext4 jbd2 crc16 snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore
	   snd_page_alloc psmouse i2c_piix4 parport_pc i2c_core parport joydev button pcspkr evdev serio_raw ac processor ext3 jbd mbcache dm_mod sg
	    sr_mod usbhid hid cdrom sd_mod crc_t10dif ata_generic ahci ohci_hcd ehci_hcd ata_piix usbcore libata nls_base e1000 thermal thermal_sys s
		csi_mod [last unloaded: openafs]
		[   70.379027] Pid: 25, comm: bdi-default Tainted: P        W  2.6.32-5-amd64 #1
		[   70.379027] Call Trace:
		[   70.379029]  [<ffffffff810c8e1a>] ? bdi_forker_task+0xf9/0x2bd
		[   70.379031]  [<ffffffff810c8e1a>] ? bdi_forker_task+0xf9/0x2bd
		[   70.379032]  [<ffffffff8104df20>] ? warn_slowpath_common+0x77/0xa3
		[   70.379034]  [<ffffffff810c8e1a>] ? bdi_forker_task+0xf9/0x2bd
		[   70.379036]  [<ffffffff8103aa76>] ? __wake_up_common+0x44/0x72
		[   70.379038]  [<ffffffff810c8d21>] ? bdi_forker_task+0x0/0x2bd
		[   70.379039]  [<ffffffff81064c4d>] ? kthread+0x79/0x81
		[   70.379041]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
		[   70.379042]  [<ffffffff81064bd4>] ? kthread+0x0/0x81
		[   70.379043]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
		[   70.379044] ---[ end trace d79eb7cb9e6bff56 ]---
		[   70.379045] bdi ffff88003f34ec00/<NULL> is not registered!
		=========================================================================
		-- System Information:
		Debian Release: 6.0.2
		  APT prefers proposed-updates
		    APT policy: (500, 'proposed-updates'), (500, 'stable')
			Architecture: amd64 (x86_64)

			Kernel: Linux 2.6.32-5-amd64 (SMP w/4 CPU cores)
			Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
			Shell: /bin/sh linked to /bin/dash

			Versions of packages openafs-modules-dkms depends on:
			ii  bison                     1:2.4.1.dfsg-3 A parser generator that is compati
			ii  dkms                      2.1.1.2-5      Dynamic Kernel Module Support Fram
			ii  flex                      2.5.35-10      A fast lexical analyzer generator.
			ii  libc6-dev                 2.11.2-10      Embedded GNU C Library: Developmen

			openafs-modules-dkms recommends no packages.

			openafs-modules-dkms suggests no packages.

#608173#34
Date:
2011-09-09 19:30:53 UTC
From:
To:
Muayad <muayad_y@yahoo.com> writes:

It's not related to this problem.  It's an unrelated issue.

Yes, if you don't use dynroot and don't have a root.afs volume that works,
the client won't start properly.  It's a known (minor) issue.  It may be
fixed in 1.6 at this point.