#804629 linux-image-amd64: Cannot mount LVM RAID1 file system at boot

Package:
lvm2
Source:
lvm2
Description:
Linux Logical Volume Manager
Submitter:
Olaf Meeuwissen
Date:
2023-03-03 01:30:03 UTC
Severity:
important
Tags:
#804629#5
Date:
2015-11-09 23:34:59 UTC
From:
To:
Dear maintainer(s),

I recently put one of my file systems on LVM RAID1.  All is fine when I
lvconvert or lvcreate the RAID1 mirror on a running kernel.  The problem
happens when I reboot the system.  This happens with:

 - linux-image-4.2.0-1-amd64  4.2.5-1
 - linux-image-4.1.0-2-amd64  4.1.6-1
 - linux-image-4.1.0-1-amd64  4.1.3-1

Rebooting with a linux-image-4.0.0-2-amd64 (4.0.8-2) kernel works fine.
The LVM RAID1 file system mounts automatically and is fully usable.

During the failing boots I see:

  Setting up LVM Volume Groups...  lvmetad is not active yet, using direct activation during sysin
it
    device-mapper: reload ioctl on (253:8) failed: Invalid argument

and a little later:

  fsck.ext4: No such file or directory while trying to open /dev/mapper/helix-srv
  Possibly non-existent device?

Trying to activate the LV after the boot does not work either.

  # lvchange -ay helix/srv
    device-mapper: reload iocth on (253:8) failed: Invalid argument

In my syslog I see:

  mdX: invalid bitmap file superblock: bad magic
  mdX: bitmap file superblock
           magic: 00000401

and

  mdX: failed to create bitmap (-22)
  device-mapper: table: 253:8 raid: Fail to run raid araay

I have found https://bugzilla.kernel.org/show_bug.cgi?id=100491[1][1] which
mentions some of the messages I observe.  I do not get the OOPS but it
might be related.

For the time being I'll stick to 4.0.

#804629#10
Date:
2015-12-14 04:22:11 UTC
From:
To:
--- Please enter the report below this line. ---

This seems to be a duplicate of #795772, which I've filed on src:linux months
ago. (I wonder why so little info on this bug available, like nobody else uses
LVM's RAID1 feature?)

Debian Release: stretch/sid
  990 stable          security.debian.org
  990 stable          ftp.fi.debian.org
  990 stable          dl.google.com
  500 testing         security.debian.org
  500 testing         ftp.fi.debian.org
--- Package information. ---
Depends                       (Version) | Installed
=======================================-+-===========
linux-image-3.14-2-amd64                |


Package's Recommends field is empty.

Package's Suggests field is empty.

#804629#15
Date:
2015-12-20 15:59:51 UTC
From:
To:
I'm having a very similar issue when I upgraded to a newer kernel. With the
older kernels I can still boot, but I think that is actually a bug. On the
old kernel, dmesg shows:

[  244.454698] md/raid:mdX: device dm-88 operational as raid disk 3
[  244.454810] md/raid:mdX: device dm-86 operational as raid disk 2
[  244.454914] md/raid:mdX: device dm-84 operational as raid disk 1
[  244.455043] md/raid:mdX: device dm-82 operational as raid disk 0
[  244.455977] md/raid:mdX: allocated 0kB
[  244.456192] md/raid:mdX: raid level 5 active with 4 out of 4 devices,
algorithm 2
[  244.456358] RAID conf printout:
[  244.456362]  --- level:5 rd:4 wd:4
[  244.456366]  disk 0, o:1, dev:dm-82
[  244.456369]  disk 1, o:1, dev:dm-84
[  244.456373]  disk 2, o:1, dev:dm-86
[  244.456377]  disk 3, o:1, dev:dm-88
[  244.458568] mdX: invalid bitmap file superblock: bad magic
[  244.458705] created bitmap (2 pages) for device mdX
[  244.476629] mdX: bitmap initialized from disk: read 1 pages, set 0 of
2736 bits

The bitmap knows the superblock is invalid, but because of a bug simply
just let's the devices assemble and it "works." However, in newer versions
of the kernel this is "fixed," so if there is an invalid superblock, it
won't allow the devices to assemble and that is why booting (or assembling)
fails. Running an old kernel (3.16-3-amd64 #1 SMP Debian 3.16.5-1
(2014-10-10) x86_64 GNU/Linux because it's the only way to get my system to
boot):


lvcreate -i 3 -L4G --type raid5 -n testlv /dev/mylvm
# wait a minute
vgchange -a n mylvm/testln
vgchange -a n mylvm/testln

Doing that shows me the error about the invalid superblock. On some level,
everything just works, so is the check necessary on lvm logical volumes
with type raid? I think the check in the kernel to stop it on invalid
superblocks was introduced here:
https://github.com/torvalds/linux/commit/f7357273198adc86fe11c2a7be8a0816f44103bb

#804629#20
Date:
2016-01-04 12:32:22 UTC
From:
To:
But then again, I guess  md should get metadata from some backend provided by
lvm, not from the device directly? (Don't know for sure, didn't dive deep into
the driver yet.)

#804629#25
Date:
2016-01-09 08:40:49 UTC
From:
To:
--- Please enter the report below this line. ---

Actually, after some research and having run dozens of old kernel versions I
came to the conclusion that this LVM/RAID1 shit NEVER WORKED PROPERLY.

It's just unbelievable. This RAID1 feature was added to LVM2 in 2011. dm-raid1
support was added to the kernel in v3.1, around 2011 or 2012 too. I tried
almost all the kernel versions since 3.1 and in most of them all I get at boot
is some shit about "wrong ioctl" and "bad magic" instead of activated volume.
Which volume, mind you, was created just fine in the same kernel version, and
it even works -- until reboot!

SO IN FIVE FUCKING YEARS THIS SHIT NEVER WORKED AND NOBODY GIVE A FUCK ABOUT
IT! So much for all that fucking "bazaar" and shit!

It happened to work only due to the bug introduced somewhere in 4.0, which, by
unfortunate coincidence, was exactly when some poor fuckers like me choose to
convert some of their LVM volumes to RAID1. And it worked somehow, but then -
oops! - the bug was fixed!

Sorry, I just frustrated as fuck. I can't believe it. "Linux is reliable",
they said. "All bugs are shallow", they said...

Five years to test. FIVE FUCKING YEARS!

Sure, you can say "Just don't use it, nobody but uses it anyway". Fine, but
you know what pisses me most? You have no guarantee that some kernel feature
YOU use isn't also fucked up! Storage? Network security? Cryptography? It
takes just one fucker to "improve" something in it, and you are fucked up for
next several years, and NOBODY would know and care about it!

#804629#30
Date:
2016-01-15 00:33:45 UTC
From:
To:
Just FYI, this bug is still present with

 - linux-image-4.2.0-1-amd64 4.2.6-1
 - linux-image-4.3.0-1-amd64 4.3.3-5

I've read through the thread and will take my system off LVM RAID1.
I have submitted #811033 against lvm2, requesting this to be documented.

Hope this helps,

#804629#45
Date:
2016-03-08 15:22:50 UTC
From:
To:
Dear Maintainer,

I am affected by this bug as well.

Removing the mirroring from the failing LV allows mounting the volume.
Adding the mirroring back works as well until next reboot.

Best regards,
Tomas

#804629#50
Date:
2016-05-20 03:02:13 UTC
From:
To:
So to save everyone else the hour and a half I spent figuring out how to
fix this, here's what you need to do:

# lvconvert --mirrors -1 <lv device> <pv_device>
# vgchange -ay
# lvconvert -m 1 <lv_device>

Where lv_device is the device that's having issues, and <pv_device> is the
SECONDARY device on the RAID1 mierror.

This will throw a warning, but will remove the secondary device.

vgchange should then be able to bring up the (now un-mirrored) volumes.

 After vgchange, run lvconvert again to add the mirror back.  This requires
a resync.

The above has to be done after every reboot.

#804629#55
Date:
2016-06-28 18:14:01 UTC
From:
To:
In the case of a configured lvm mirror, you can follow a procedure like
this:

lvconvert -m 0 <LV Path>
lvconvert -m 1 --mirrorlog mirrored --alloc anywhere <LV Path>

For example, in my case it was

lvconvert -m 0 /dev/vg0/lv0
lvconvert -m 1 --mirrorlog mirrored --alloc anywhere /dev/vg0/lv0

#804629#60
Date:
2017-01-24 09:40:48 UTC
From:
To:
Dear maintainer(s) and users affected by this bug,

I was affected by this bug as well. The lvm was created as:
lvcreate --type raid1 --name foo -L 50G bar
with kernel 3.16.36-1+deb8u2 (yesterday).

After an upgrade to 3.16.39-1 (today), the volume would not activate:

[   50.848231] md/raid1:mdX: active with 2 out of 2 mirrors
[   50.850482] mdX: invalid bitmap file superblock: bad magic
[   50.851611] mdX: bitmap file superblock:
[   50.851613]          magic: 55555555
[   50.851615]        version: 1431655765
[   50.851616]           uuid: 55555555.55555555.55555555.55555555
[   50.851618]         events: 134
[   50.851619] events cleared: 134
[   50.851620]          state: 00000000
[   50.851621]      chunksize: 524288 B
[   50.851622]   daemon sleep: 5s
[   50.851623]      sync size: 52428800 KB
[   50.851624] max write behind: 0
[   50.851627] mdX: failed to create bitmap (-22)
[   50.852809] device-mapper: table: 253:26: raid: Fail to run raid array
[   50.853940] device-mapper: ioctl: error adding target to table


There was a second lv (also a --type raid1, but created earlier), that
had some similar issue, but I was able to activate it partialy by using
lvchange -a y --activationmode partial bar/otherfoo

The other LV was not able to be activated even as degraded, thus I
decided to rsync all files from one of the rimages and create a new one...

HTH!

regards

sebastian

#804629#65
Date:
2017-03-21 23:34:47 UTC
From:
To:
I was experiencing the same issue as describe in this bug report and, as
pointed by Harrison Metzger, the bug was probably introduced after
kernel 3.16-3-amd64 because for sure it was working until that version.

I don't get what causes the problem. It could be interesting to test if
the problem is affect only if the mirror is a type raid1 or also if it
is a type "mirror" (legacy). I don't know neighter if there is a problem
while shutting down (is it the same if the OS reboot or it before
rebooting the mirror is deactivate?).

But the funniest thing is that if you activate lvmetad (switch to option
"use_lvmetad = 1" in /etc/lvm/lvm.conf) the lvm with mirror type as
raid1 starts to work again after reboot (at least tested on squeeze
Linux 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.13-1~bpo8+1 (2017-02-27)
x86_64 GNU/Linux). I don't know why nor it seem documented. Hope that
this will save some headache to you as well.

Cheers,

risca

#804629#70
Date:
2017-03-24 10:28:01 UTC
From:
To:
Hello,

W dniu 22.03.2017 o 00:34, SW Riccardo Scartozzi pisze:
kernel issue and forgot to report back here :-/

The issue is fixed in recent kernels. I checked back to 4.4 or 4.5 and
it was working there, so it's fixed already for some time.

The legacy mirror is not affected, only raid1. The steps to reproduce are:
lvcreate -L 100M --type raid1 -m 1 -n test-raid
lvchange -an test-raid
lvchange -ay test-raid

and it either works or not. On 3.16.39-1+deb8u2 I'm getting:
  device-mapper: reload ioctl on  failed: Zły argument
and in dmesg:
#v+
[1000636.943873] mdX: invalid bitmap file superblock: bad magic
[1000636.943879] mdX: bitmap file superblock:
[1000636.943882]          magic: 00000000
[1000636.943884]        version: 0
[1000636.943887]           uuid: 00000000.00000000.00000000.00000000
[1000636.943888]         events: 19
[1000636.943890] events cleared: 2
[1000636.943892]          state: 00000000
[1000636.943893]      chunksize: 524288 B
[1000636.943895]   daemon sleep: 5s
[1000636.943897]      sync size: 102400 KB
[1000636.943898] max write behind: 0
[1000636.943902] mdX: failed to create bitmap (-22)
[1000636.944003] device-mapper: table: 253:48: raid: Fail to run raid array
[1000636.944006] device-mapper: ioctl: error adding target to table
#v-

On recent kernels it just works.

I wasn't changing any lvmetad settings at all. However, if you create a
volume on a working kernel, like 4.9, it works fine on 3.16 as well.

I believe the bug can be closed now.

#804629#75
Date:
2017-03-24 10:45:04 UTC
From:
To:
Sorry but as said in my previous post I'm still experiencing the issue
even with the kernel version 4.9 from backports.
Sorry, but the bug is still present!

As Jarek Kamiński himself said:
I experience the issue every time on system reboot (but I will make a
go).


Here is what I get on logs (sorry for not posting earlier):

[    0.000000] Linux version 4.9.0-0.bpo.2-amd64
(debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10) )
#1 SMP Debian 4.9.13-1~bpo8+1 (2017-02-27)
[    0.000000] Command line:
BOOT_IMAGE=/boot/vmlinuz-4.9.0-0.bpo.2-amd64 root=/dev/md1 ro
rootdelay=10
[    0.000000] Kernel command line:
BOOT_IMAGE=/boot/vmlinuz-4.9.0-0.bpo.2-amd64 root=/dev/md1 ro
rootdelay=10
[    0.600171] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000]
with a huge-page mapping due to MTRR override.
[    0.976001] systemd-udevd[90]: starting version 215
[    0.976401] random: systemd-udevd: uninitialized urandom read (16
bytes read)
[    1.004407] usb usb1: Manufacturer: Linux 4.9.0-0.bpo.2-amd64
xhci-hcd
[    1.007544] usb usb2: Manufacturer: Linux 4.9.0-0.bpo.2-amd64
xhci-hcd
[    1.293276] usb usb3: Manufacturer: Linux 4.9.0-0.bpo.2-amd64
ehci_hcd
[    1.313268] usb usb4: Manufacturer: Linux 4.9.0-0.bpo.2-amd64
ehci_hcd
[   14.155760] md: raid1 personality registered for level 1
[   14.157449] md: raid0 personality registered for level 0
[   14.573040] md: raid6 personality registered for level 6
[   14.573101] md: raid5 personality registered for level 5
[   14.573160] md: raid4 personality registered for level 4
[   14.611467] md: md1 stopped.
[   14.611779] md: bind<sdb1>
[   14.611910] md: bind<sda1>
[   14.612648] md/raid1:md1: active with 2 out of 2 mirrors
[   14.612730] md1: detected capacity change from 0 to 20970405888
[   14.895923] EXT4-fs (md1): mounted filesystem with ordered data mode.
Opts: (null)
[   15.936768] systemd-udevd[359]: starting version 215
[   17.755542] EXT4-fs (md1): re-mounted. Opts: errors=remount-ro
[   19.971243] md/raid1:mdX: not clean -- starting background
reconstruction
[   19.972180] md/raid1:mdX: active with 2 out of 2 mirrors
[   19.977646] mdX: invalid bitmap file superblock: bad magic
[   19.978540] mdX: bitmap file superblock:
[   19.978547] mdX: failed to create bitmap (-22)
System infos:
$ uname -a
Linux SWhost 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.13-1~bpo8+1
(2017-02-27) x86_64 GNU/Linux
$ cat /etc/debian_version
8.7


Thank you,

risca

#804629#80
Date:
2017-03-24 11:02:21 UTC
From:
To:
W dniu 24.03.2017 o 11:45, SW Riccardo Scartozzi pisze:

Have you tried creating the volume on 4.9, or just activating a
previously-created volume?

#804629#85
Date:
2017-03-27 10:29:34 UTC
From:
To:
I've just had the chance to try again now. Sorry but I had to try on a
different server.
The logical volume with raid1 has to be created on kernel linux 4.9, if
the raid is created on different linux version (as it was my first case)
it won't be possible to activate it neighter after switching to kernel
4.9!!!

I can then also confirm that the activation of lvmetad is totally
irrelevant.

So I can confirm what Jarek Kamiński said: the bug is solved since 4.9.

Thank you again,

risca

#804629#90
Date:
2019-11-13 21:45:13 UTC
From:
To:
I stumbled across this issue yet again in a fresh buster installation.

Various conversions did not solve it for me, but after reading [1], I
checked and found that my initramfs indeed did not contain raid1.ko.

Adding raid1 to /etc/initramfs/modules made my system boot.

This suggests /usr/share/initramfs-tools/hooks/lvm2 might be at fault here.

I have to check whether this applies to bullseye/sid as well, but I
guess it does.

Cheers,
sur5r

[1] https://unix.stackexchange.com/questions/187236/grub2-lvm2-raid1-boot

#804629#95
Date:
2023-03-03 01:20:21 UTC
From:
To:
On Wed, 13 Nov 2019 21:45:13 +0000 Jakob Haufe <sur5r@debian.org> wrote:
 > [...]
 > Adding raid1 to /etc/initramfs/modules made my system boot.
 >
 > This suggests /usr/share/initramfs-tools/hooks/lvm2 might be at fault
here.
 >
 > I have to check whether this applies to bullseye/sid as well, but I
 > guess it does.


I can confirm that this still is an issue in bullseye. The workaround is
still the same, adding raid1 to /etc/initramfs/modules and rebuilding
initramfs solves the issue.