#674033 ganeti2: Some disk operations broken in squeeze

#674033#5
Date:
2012-05-22 16:21:43 UTC
From:
To:
I ran into this a while ago but forgot to follow up by reporting a bug after
fixing all the damage..

I had upgraded to ganeti2 and then tried converting some plain disk
types to drbd, and everything broke:

2011-12-30 10:39:37,045: ganeti-masterd pid=2140/JobQueue22 INFO Op 1/1: Starting opcode CLUSTER_REPAIR_DISK_SIZES
2011-12-30 10:39:37,104: ganeti-masterd pid=2140/ClientReq13 INFO Received job poll request for 4053
2011-12-30 10:39:37,241: ganeti-masterd pid=2140/JobQueue22 INFO Disk 0 of instance FOO has mismatched size, correcting: recorded 1024, actual 0
2011-12-30 10:39:37,270: ganeti-masterd pid=2140/ClientReq13 INFO Received job poll request for 4053
2011-12-30 10:39:37,332: ganeti-masterd pid=2140/JobQueue22 WARNING Disk 1 of instance FOO did not return valid size information, ignoring

Then when the disk conversion failed things like this started happening:

2011-12-30 10:40:11,046: ganeti-masterd pid=2140/JobQueue19 WARNING Could not prepare block device sda on node BAR (is_primary=False, pass=1): Error while assembling disk: Can't activate lv /dev/xenvg/0c44e017-a28f-4e56-a6bb-990ce83c2116.sda:   One or more specified logical volume(s) not found.

The fix is already in the upstream version, backporting it onto the
stable package makes everything happy:

http://git.ganeti.org/?p=ganeti.git;a=commitdiff;h=e50d88078e1dbfe3d78aa174b760aa6142f54b6c

    commit e50d88078e1dbfe3d78aa174b760aa6142f54b6c
    Author: Iustin Pop <iustin@google.com>
    Date:   Tue Feb 15 14:39:44 2011 +0100

        Fix LUClusterRepairDiskSizes and rpc result usage

        This LU was introduced before the RPC result conversion from .data to
        .payload, and it has managed to keep the old-style usage (how? it's
        the only LU that does so). Fix by changing to payload, and add some
        extra logging for easier diagnose.

        Signed-off-by: Iustin Pop <iustin@google.com>
        Reviewed-by: Stephen Shirley <diamond@google.com>
        Reviewed-by: Michael Hanselmann <hansmi@google.com>
        (cherry picked from commit 043beb38f4e10b75d0820c361c668c441c7a6980)


I only ran into this when I tried converting from plain to drbd, so it's
possible that most people will never have this problem.  2.1.x is also fairly
old at this point, but it is the version in stable..

#674033#10
Date:
2012-05-22 17:53:58 UTC
From:
To:
Hi,

Not sure if this is the cause. The patch you quote only breaks
repair-disks-sizes, in that it wouldn't work at all, not break disk
activation.

We're supporting newer version in backports; 2.4.5 is the current
squeeze-backports version.

Would that work for you? Alternatively, I could try to prepare a bugfix
for stable, not sure if it could go in or not.

Thanks for the report!

iustin

#674033#15
Date:
2012-05-22 18:15:19 UTC
From:
To:
What had happened was the plain->drbd conversion failed, leaving the
logical volume half converted with the wrong name.

I just noticed that too.. I might upgrade if I have further problems,
but other than this initial issue it's been rock solid..

I manually pushed out fixed files on my current cluster, and if I set up
a new one I will go with 2.4.5.  I'll be fine as long as an updated
package isn't released without the fix.  Fixed packages may benefit
other users though :-)

Thanks for Ganeti :-)