Fabre

#776214 multipath not automounting iscsi devices listed in fstab #776214

Package:: multipath-tools

Source:: multipath-tools

Description:: maintain multipath block device access

Submitter:: Christian Seiler

Date:: 2015-01-26 14:39:04 UTC

Severity:: important

#776214#5

Date:: 2015-01-19 19:32:11 UTC

From:

To:

Dear Maintainer,

tl;dr: systemd + open-iscsi = 90s hang at boot in some cases,
       and umountiscsi.sh is not called on shutdown. Attached a
       debdiff that fixes that without being too invasive.

Longer explanation: if you have the following configuration:

 - Jessie
 - systemd as init
 - open-iscsi configured to automatically log in to some iSCSI target,
   iSCSI disk /dev/sdb is then available
 - /etc/fstab containing an entry like
     /dev/sdb1           /data ext4 rw,_netdev 0 0
   or (when using LVM)
     /dev/vg_.../lv_...  /data ext4 rw,_netdev 0 0

the system boot will hang for 90s because of systemd's default timeout
when devices are not available.

The reason behind this is that open-iscsi contains the following LSB
headers:
      Required-Start:    $network $remote_fs
      Required-Stop:     $network $remote_fs sendsigs
Here, $network maps to network-online.target in systemd, that's fine,
but $remote_fs maps to remote-fs.target in systemd, that is the problem.
This is because

 a) systemd treats file systems that couldn't be mounted as hard
    failures.
and
 b) systemd's logic of mounting all remote filesystems is to mount
    all filesystems in /etc/fstab that are marked _netdev (and not
    makred noauto)

Therefore, systemd waits for the iSCSI device to appear for 90s before
timing out and proceeding with boot. Only then remote-fs.target is
reached and systemd starts the open-iscsi init script.

That in turn will then make the devices appear. The init script will
then call a "mount -a -O _netdev" and "swapon -a -e" in it's start()
routine, that will then cause the mount points to be activated.

So in the end, the boot is kind-of successful in the sense that
everything kind of works at the end of boot, with the following two caveats:

 - there is this needless 90s delay (or whatever other delay the admin
   has configured) in waiting on the iSCSI targets

 - if I want to use systemd's features to order to order a specific
   service after remote-fs.target to make sure that the remove file
   systems I have are mounted, maybe because the service needs the
   data on them, then this won't work consistently, because the
   file systems will only be mounted after open-iscsi is started,
   which will then be in parallel to any services I have ordered
   after remote-fs.target, for example:
      - exporting a subdirectory of an iSCSI filesystem via NFS; if
        nfs-kernel-server gets started too early, this might fail
        because the directory that is exported doesn't exist

If I modify the init script to remove $remote_fs from it's LSB headers,
then booting works as expected. However, this causes two problems:

 1. I assume that $remote_fs is in there because you want to support
    NFS-based sepearte /usr. Removing $remote_fs from LSB headers
    would break such a configuration under sysvinit, since the
    open-iscsi tools wouldn't be able to be called.

    However, systemd in Debian currently doesn't really support a
    separate /usr that's not mounted from initrd anyway.

 2. Shutting down is racy.

Shutting down is racy because you then have the following constellation:

 - systemd tracks services' states. And while bug #732793 does not occur
   anymore because invoke-rc.d strips the .sh from umountiscsi.sh, the
   call to umountiscsi.sh stop doesn't really do anything, because
   systemd already thinks it is stopped, since it was never started.

 - OTOH, systemd will tear down remote filesystems on its own. But
   because open-iscsi is only ordered after network-online.target then,
   tearing down the remote filesystesm will be done in parallel (!) to
   stopping open-iscsi.

   This has the unfortunate effect that it could be the case that the
   umount call to the filesystem is made after open-iscsi has been
   stopped. This will then cause the kernel to hang trying to umount
   the filesystem.

   I haven't been able to reproduce this race yet, i.e. I have gotten
   lucky so far, in that umount was typically faster on my system than
   stopping open-iscsi - BUT I am really not comfortable with having
   such a flimsy race in place, especially since umount will sync
   stuff to the filesystem and stopping open-iscsi too early could
   easily cause severe data loss.

So far for my analysis. How do we proceed from here?

 - it is quite clear that you probably don't want to change the sysvinit
   logic now, especially so late in the Jessie freeze

 - however, this bug w.r.t. systemd should definitely be fixed in my
   eyes

Therefore, I suggest that you provide a unit file specifically for
systemd. In order to as minimally invasive as possible (especially this
late in the freeze), the unit file should ideally call the original init
script.

After Jessie one should consider redoing the entire logic for
systemd-based systems, there are a lot more features of sytstemd that
one can leverage to make things work better. But to fix this immediate
bug, the changes I mentioned are sufficient.

I have created a debdiff for a test package that changes the following:

 - add systemd unit that just calls the init script but has adjusted
   dependencies:
     - no more After=remote-fs.target
     - new Before=remote-fs-pre.target
 - add dh-systemd as build-dep and use dh_systemd in debian/rules
 - move #DEBHELPER# around in postinst, to make sure package upgrades
   don't break the system (dh_systemd_enable code has to come before
   the unit is first started, otherwise weird things occur)
 - do the equivalent of umountiscsi.sh start so that systemd will
   track that service as 'running' - then at shutdown the open-iscsi
   init script will be able to call the stop action of that script

I have now tested this under systemd with Jessie, in two different
configurations of Jessie running systemd:

 1. root on normal device, separate iSCSI devices mounted
 2. root on iSCSI, boot via PXE

In both cases, iSCSI now seems to work as expected. There are a couple
of caveats though:

 - as discussed before, non-initrd-mounted separate /usr on NFS
   won't work together with this constellation
       - unlikely to work well with systemd anyway, regardless of
         iSCSI, and I don't think this is something that could be
         fixed without a major redesign of the remote-fs*.target
         logic across the board

 - irrespective of systemd, while looking at it I noticed that
   umountiscsi.sh's logic is incomplete, it doesn't try to umount
   filesystems on LVM on top of iSCSI, unless they were marked with
   _netdev (it only detects direct devices).

   OTOH, this has been the case since at least Squeeze, so it can't
   be that critical.

 - the current design of using umountiscsi.sh doesn't integrate well
   with systemd's dependency logic. I don't think this is a huge issue,
   as far as I can see, stuff works as well under systemd with my patch
   as under sysvinit (except for the /usr-NFS thing), but I do think
   that you could make the whole thing a lot more robust if this is
   redesigned a bit - but I don't think that is something that should
   go to Jessie.

#776214#10

Date:: 2015-01-20 09:58:58 UTC

From:

To:

Hello Christian,

Actually, from what I know so far, systemd aggressively backgrounds any
processes that is taking time. And only processes that depend on it, are
put on hold, again in the background.

I think you may be missing something here. I believe devices marked
_netdev are always backgrounded. At least in sysvinit. And not having
them do so in systemd is highly unlikely.

Have you had luck root causing in why there is the 90 sec delay ?

I am willing to accept a systemd unit. But it is too late for Jessie
right now. If you have the unit ready and tested, for now, we can put it
into experimental.

I would not want to ship something for Jessie now. Ideally, systemd's
logic on handling init scripts should take care of it. It has worked for
other sysvinit scripts so far.

And introducing the systemd unit now in Jessie is late. Because it
wouldn't have had enough test cycles.

Can you please elaborate more here ? Or perhaps just file a separate bug
report. The current init scripts are designed to support LVM + iSCSI.

I agree. We need to switch to systemd. But I haven't had the time to do
it, and right now, your patch is too late. :-(


This is one reason why I keep telling most Debian (Enterprise) users to
at least keep track of testing. Because they usually end up reporting
bugs too late in the cycle.

#776214#15

Date:: 2015-01-20 15:37:44 UTC

From:

To:

Hello Ritesh,

Well, yes, in principle, but the way dependencies are expressed (both
by
default and in the current Debian packaging of systemd), you can still
have serialization of things. See below.

First, if you look at sysvinit with LSB dependency-based boot (Squeeze,
Wheezy, Jessie w/ sysvinit-core). Debian does use startpar(8) to
parallelize some aspects of sysvinit boot, but there are a couple of
syncronization points. They are defined in /etc/insserv.conf and the
relevant ones are:

  $local_fs
  $remote_fs

If you look at the configuration, you will see that $remote_fs is
$local_fs and the mountnfs init script.

Also, there's the fact that all rcS scripts will completed before any
rc[2-5] scripts are run (the way inittab + rc are set up), so that's an
additional syncronization point.

So if you have an init script with Requires-Start: $local_fs, it will
be
ordered after all scripts (primarily mountall) that appear for
$local_fs
in /etc/insserv.conf, but (according to insserv logic) as early as
otherwise possible.

Same with Requires-Start: $remote_fs: it will be ordered after
$local_fs
(i.e. after mountall) and also after mountnfs.

So you have the following boot ordering

  1. anything in rcS that doesn't require $local_fs
  2. $local_fs stuff (i.e. mainly mountall)
  3. anything else in rcS that doesn't require $remote_fs
  4. $remote_fs stuff (i.e. mainly mountnfs)
  5. anything else in rcS
  6. anything in rc[2-5]

So if you have Requires-Start: $remote_fs in the open-iscsi init
script,
you have the following situation:

  - early boot services (1) are started
  - local file systems are mounted (2)
  - some other services started (3)
  - tries to mount remote file systems (4)
       /etc/init.d/mountnfs calls /etc/network/if-up.d/mountnfs
        (or waits until networking has called that dynamically once
         the network is up, depending on your configuration)
       /etc/network/if-up.d/mountnfs effectively does
            mount -a -O _netdev
       At this point, open-iscsi is NOT started. So mount will fail for
       all mount points on iSCSI devices. However, since mountnfs
doesn't
       check the exit code of the mount command, it will happily
continue
       on and pretend everything is fine.
  - services ordered after $remote_fs are started, including open-iscsi
       open-iscsi calls mount -a -O _netdev itself, which will try to
       mount the remaining filesystems again, then succeeding

So nothing is really 'backgrounded', you are just relying on the fact
that mountnfs doesn't really check any exit codes (and that sysvinit
doesn't care if init scripts that your init scripts depends on were
successful), you just tape over that fact by running mount again.

This in turn means that with sysvinit you have kind of exempted
$remote_fs from being the true synchronization point. This doesn't
really matter that much for sysvinit, because there's a different
syncronization point directly after that (end of rcS execution, start
of
rc[2-5] execution), but for systemd that's a different story (see
below). (But note that this COULD break for an early boot service
ordered after $remote_fs that needs the filesystems, it's just that
Jessie by default doesn't ship one.)


Now let's take systemd. systemd has so-called 'targets' which are also
used as synchronization points at boot. The two sysvinit sync points
are
mapped as follows:

  $local_fs    -> local-fs.target
  $remote_fs   -> remote-fs.target

Additionally, systemd knows a couple of more sync points, namely

  local-fs-pre.target
  remote-fs-pre.target

However, systemd doesn't really have a sync point for early-boot vs.
runlevel services.

The boot sequence with systemd is then as follows (only depicting a
part
of it):

        early boot services (e.g. udev)
        ordered before local-fs-pre.target
                   |
                   v
          local-fs-pre.target
                   |
                   v
         mount local file systems
                   |
                   v
             local-fs.target
                   |
                   v
        early boot services ordered after local-fs.target
        but before remote-fs-pre.target
                   |
                   v
           remote-fs-pre.target
                   |
                   v
        mount remote file systems
                   |
                   v
            remote-fs.target
                   |
                   v
               the rest

Within each block, everything is of course parallel (barring other
ordering constraints, of course) - even the filesystems are mounted in
parallel.

And obviously, if something doesn't order against any targets shown
here, they will be started immediately (before or in parallel to
local-fs.target) and the targets in the middle won't wait for their
completion.

On shutdown, the whole thing is done in reverse, with one important
caveat: systemd tracks the state of the system, so it looks at the
dependencies of stuff that's running, so if you start a service
manually
without having it enabled at boot, its dependencies will still work
properly. (sysvinit/LSB tries to do that partially by always creating
stop links, even if the services is not enabled.)




Now you have two problems in this setup:

   - same thing as with sysvinit: open-iscsi is ordered after
     remote-fs.target, so it won't get started until remote-fs.target is
     reached

   - however, the crucial difference here is that systemd cares whether
     stuff has actually worked or not. It doesn't just call
     mount -a -O _netdev and hopes for the best, it tries to wait for
     the required devices to appear (because they might not appear
     synchronously)

        -> unfortunately, since open-iscsi won't start before
           remote_fs.target, those devices will never appear while
           systemd is waiting for them

        -> systemd has a default timeout of 90s for devices showing up
           so it will wait for 90s for these devices to show up and then
           fail

        -> only then will systemd consider remote-fs.target reached
           (btw. local-fs.target has a setting
           OnFailure=emergency.target, so that when it can't mount a
           local file system, the boot doesn't even continue, see
           Debian bug #743265 for a discussion on this; fortunately
           remote-fs.target doesn't have this setting, so boot does
           continue in this case)

        -> only then will systemd start open-iscsi

        -> that will then mount the filesystems again
           (which is actually unnecessary with systemd, because as soon
           as the devices appear, it will mount the stuff anyway)

        -> hence the 90s delay for waiting on devices that will only
show
           up later

     You can actually try this easily (if you have an iSCSI target lying
     around ;-)): setup a Jessie box, install open-iscsi, configure it
     to automatically log in to your target, put an iSCSI filesystem as
     _netdev into /etc/fstab and reboot - voilà: 90s delay. It's very
     simple to reproduce, and it ALWAYS happens in that constellation.
     With rootfs on iSCSI it should also happen if you log in to
     additional targets. (Otherwise, rootfs on iSCSI is not affected.)

   - on shutdown, things are also messy, since systemd tries to shut
down
     stuff much more in parallel than sysvinit does

        - open-iscsi is a early-boot ("runlevel S") service, i.e. with
          sysvinit those always get stopped after all services of the
          current runlevel (e.g. 2) are stopped

        - with systemd, it just cares about explicit dependencies, so
          it will try to stop open-iscsi as early as possible (since
          by default nothing is ordered after it)

        -> this has the consequence that stuff that's using remote
           filesystems might still be running while open-iscsi is
           terminating and it can't unmount them

        -> the open-iscsi service will then (try to) logout of the
           sessions even though stuff is still active.

                -> very, very bad

As I said in the original report, on the test system I've used so far
for Jessie I haven't actually seen this race condition (i.e. shutdown
always worked anyway), since nothing was really using the remote
filesystems on my test box, and it might be the case that it doesn't
always occur, but it will at least some times.

I hope this reply can make it a bit clearer as to where the problem
lies
and why my diagnosis is correct.

Note that I have spent probably 10-12 hours on this problem, first
trying to figure out what the problem was and then trying to come up
with a solution that changes as little as possible (because of the
freeze) and testing that against a lot of different scenarios:

  - I only noticed that I needed to move #DEBHELPER# around because of
    testing partial upgrades

  - I don't use rootfs in iSCSI myself, so I set up a test system to
    check that nothing broke (which the first version I wanted to send
    did, so I fixed that before reporting this)

  - I rebooted test boxes quite a lot to see if there was any trouble.

systemd's logic of handling it won't take care of it, because it's
already kind-of broken on sysvinit, but a lot of specific details in
sysvinit that systemd doesn't emulate quite that way mitigate that.

The changes required to make systemd support this in the same way as
sysvinit would be far more invasive to the current systemd code base as
fixing a couple of dependencies here.



I'm going to explain how systemd currently handles unit files, because
then it becomes clear why the unit file I have provided is not really
experimental at all.


systemd does not support init scripts directly from PID1 anymore (this
was different in very old versions). systemd's PID1 only understands
systemd unit files. Instead, systemd now has a concept called
'generators', which are small programs (sometimes even scripts) that
are run

  - at boot
  - every time systemd re-reads its configuration

The job of a generator is to read some aspect of the system
configuration (init scripts, /etc/fstab, /etc/crypttab, ...) and
generate native systemd units from that.

If you boot a systemd Jessie system and look in /run/systemd/generator
and /run/systemd/generator.late, you will see the units that were
generated by these generators. Each line in /etc/fstab becomes a .mount
unit, each sysvinit script becomes a .service file.

Of course, the generator responsible for init scripts doesn't magically
convert a sysvinit file completely into a service file (that's not
really possible to do automatically in the general case), but the
service file it generates just contains the necessary metadata.
Additionally, it sets ExecStart=/etc/init.d/$SCRIPT start and
ExecStop=/etc/init.d/$SCRIPT stop in the service file, so that the
original service file is actually called.

For example, if I take /etc/init.d/kbd, the systemd-sysv-generator will
produce the following serviced file in
/run/systemd/generator.late/kbd.service:
-----------------------------------------------------------
# Automatically generated by systemd-sysv-generator

[Unit]
SourcePath=/etc/init.d/kbd
Description=LSB: Prepare console
DefaultDependencies=no
Before=sysinit.target
After=remote-fs.target

[Service]
Type=forking
Restart=no
TimeoutSec=0
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
SysVStartPriority=18
ExecStart=/etc/init.d/kbd start
ExecStop=/etc/init.d/kbd stop
-----------------------------------------------------------

So what did I do in order to produce the service file I've attached in
my original report?

  - I took the generate service file for the open-iscsi init script
  - I removed the comment about automatic generation
  - I removed SourcePath (that's mainly for documentation purposes if
you
    run systemctl status)
  - I adjusted the After= and Before= dependencies
  - I added a [Install] section to make it possible to enable this unit

Here's a diff for comparison (old is generated, new is my modified
version):

-----------------------------------------------------------
diff -u open-iscsi.service /lib/systemd/system/open-iscsi.service
--- open-iscsi.service  2015-01-18 21:12:16.325286854 +0100
+++ /lib/systemd/system/open-iscsi.service      2015-01-19
19:14:53.000000000 +0100
@@ -1,11 +1,8 @@
-# Automatically generated by systemd-sysv-generator
-
  [Unit]
-SourcePath=/etc/init.d/open-iscsi
-Description=LSB: Starts and stops the iSCSI initiator services and 
logs in to default targets
+Description=iSCSI initiator
  DefaultDependencies=no
-Before=sysinit.target shutdown.target
-After=network-online.target remote-fs.target
+Before=sysinit.target shutdown.target remote-fs-pre.target
+After=network-online.target
  Wants=network-online.target
  Conflicts=shutdown.target

@@ -20,3 +17,6 @@
  SysVStartPriority=20
  ExecStart=/etc/init.d/open-iscsi start
  ExecStop=/etc/init.d/open-iscsi stop
+
+[Install]
+WantedBy=multi-user.target
-----------------------------------------------------------

So it's not like this is really that untested, it's basically the way
systemd handles sysv scripts but just with modified dependencies, to
make sure the unit is started before remote-fs-pre.target and not after
remote-fs.target.

I'll file a separate bug report for this. I don't think it's very
critical, especially it doesn't do anything wrong if everything is in
/etc/fstab (or you manually mounted with -o _netdev).
making it as little invasive as possible. And while open-iscsi is not
completely unusable with systemd, there is enough problems with the way
the current package interacts with systemd due to subtle differences in
the handling of dependencies and failures that I think this should
really be fixed in Jessie.

As I said in the original report:

Regards,
Christian

#776214#20

Date:: 2015-01-21 08:50:36 UTC

From:

To:

Hi again,

Btw, in case it wasn't clear from my first reply here:

systemd actively complains at boot that it's still waiting for some
devices to appear during the 90s, and the devices shown are the devices
specified in /etc/fstab that are on iSCSI.

Regards,
Christian

#776214#25

Date:: 2015-01-21 09:09:33 UTC

From:

To:

Thanks Christian. I'm building a setup to verify the same.

s3nt fr0m a $martph0ne, excuse typ0s

#776214#30

Date:: 2015-01-23 08:35:24 UTC

From:

To:

video just to be sure we are both referring to the same problem.

http://youtu.be/cwcnk00Hwk0


Next, I'll verify your fix. Hopefully by this weekend I'll get it ready.
And then we can ask for an exception from the Release Team.

I am also CCing the systemd maintainers to be sure we are on the right path.
Dear systemd maintainers: Will appreciate your review on this bug
report. As it stands now, it affects Jessie (Not RC).

#776214#35

Date:: 2015-01-23 11:55:54 UTC

From:

To:

Hi Ritesh,

Am 2015-01-23 09:35, schrieb Ritesh Raj Sarraf:

Yes, that's the same problem, that it waits 90s for the devices to
appear, which they can't, because iscsi login hasn't happened yet.

Thanks!

Christian

#776214#40

Date:: 2015-01-25 09:32:21 UTC

From:

To:

Christian,

The patch does not seem to resolve the problem. Can you please verify
the same ?
http://youtu.be/q4pOQn3C4q0

Ritesh

#776214#45

Date:: 2015-01-25 13:24:12 UTC

From:

To:

Hi,

Sorry for top posting but I'm writing this from a phone.

I can see what you mean, but that doesn't happen to me. The first part of the delay seems fine, as your system appears to take a while to log in to iSCSI (both bare metal against a hardware RAID and VMs against another VM w/ LIO I have here are much faster btw., at most ~2s here), but after the 'reached remote fs (pre)' it should find the devices and not time out waiting for them

Is this on LVM (because of /dev/mapper in the output)? If so, did you configure the VGs in /etc/default/open-iscsi? What does journalctl -xn say after booting?

BTW I can give you root access to a couple of VMs (together with access to libvirt to watch them boot if you have virt-manager installed) that demonstrate the problem and my solution. Just send me an email (privately) with your SSH pubkey signed with your GPG key in Debian's keyring.

Thanks a lot for taking the time to investigate this so thoroughly!

Christian

Am 25. Januar 2015 10:32:21 MEZ, schrieb Ritesh Raj Sarraf <rrs@researchut.com>:

#776214#50

Date:: 2015-01-25 14:43:45 UTC

From:

To:

My setup too is that of 2 VMs, and the iSCSI target used is LIO.

No. This is not using LVM. That was to be done is the next phase. My
usual tests include iSCSI + Multipath + LVM. But instead, in this case,
I'm using Device Mapper Multipath only, which is a more commonly used
target of the Device Mapper framework.

Unfortunately, I don't have enough time on weekdays. We can think of
doing that next weekend, if time permits then. Meanwhile, if you can
root cause it, I'd be willing to squeeze out some time to test.

Keep in mind that this is just 2 LUNs mapped. And with just 2 paths
each. root
The problem will be more severe for users with higher number of LUNs mapped.

I've still kept the systemd folks in the loop, hopefully they may be
able to shed some light.

#776214#55

Date:: 2015-01-25 15:37:30 UTC

From:

To:

Control: clone -1 -2
Control: retitle -2 multipath not automounting iscsi devices listed in fstab

I cut out the multipath stack just to see if there is some fix we can
push. So yes, your patch works perfect in a non-multipath setup. I'll
ask release team for an exception.

For multipath, I need to figure out some time to root cause it. But that
is beyond the scope of this bug report. Hence, the clone of the bug.

#776214#64

Date:: 2015-01-25 16:13:01 UTC

From:

To:

Dear multipath-tools Maintainers,

this is bug, which is a clone of #775778, is essentially the same
problem as was discussed in #775778: the ordering of the
multipath-tools init script after after $remote_fs (see Should-Start
line in LSB headers) causes an ordering problem, because systemd will
wait for all _netdev filesystems in /etc/fstab before the equivalent
of $remote_fs is reached.

This means that systemd will wait for the /dev/mapper/... devices
related to multipath to appear during boot, but at a point where
remote-fs.target (systemd's mapping of $remote_fs) is not yet reached.
This will lead to a 90s timeout (systemd's default timeout when
waiting for devices), only then will systemd continue booting.

Please see the original bug report for details w.r.t. open-iscsi; the
same reasoning applies to multipath-tools.

The same fix that was implemented for open-iscsi in principle also
applies for multipath-tools, i.e. make sure that for systemd systems
the unit is ordered before remote-fs-pre.target. I don't use
multipath-tools myself, but I'll be able to prepare a patch that fixes
this on a minimal level tomorrow, you'll just have to test it
yourself.

Christian

#776214#75

Date:: 2015-01-26 07:47:26 UTC

From:

To:

Thanks Christian.

I'll wait for your patch.

#776214#80

Date:: 2015-01-26 14:35:47 UTC

From:

To:

Am 26.01.2015 um 08:47 schrieb Ritesh Raj Sarraf:
backend), and I came upon the following issues AFTER I fixed this in the
same way as the open-iscsi package. These issues don't seem to be
related to systemd, but a general problem of the multipath package
(although I didn't test it with sysvinit, so I don't know for sure):

1. open-iscsi init script (which is still called even by the new
   systemd service file) does udevadm settle to make sure all device
   nodes from logging in to iSCSI have been created, because immediately
   after that, it wants to activate LVMs configured on iSCSI.

     * On its own, that's not a problem, so if you have bare iSCSI with
       or without LVM on top, that works fine.

     * But, if you have multipath started and configured, there's
       /lib/udev/rules.d/60-multipath-rules with the following entry:
           # Coalesce multipath devices before multipathd is running
           # (initramfs, early boot)
           ACTION=="add|change", SUBSYSTEM=="block",
                RUN+="/sbin/multipath -v0 /dev/$name"

       The problem here is that multipath -v0 /dev/$name doesn't
       complete because multipathd is not started. The problem is that
       this rule is not only triggered for the devices first available
       at boot, but also for the devices that appear due to iSCSI,
       which in this case are even configured. Unfortunately, since
       multipathd is not running, this is a new deadlock here.

       udev now has a default timeout of 30s, so boot hangs for that
       time and after that I get a bunch of log messages about
       timeouts.[1]

       After that, the system boots fine, udevadm settle completes,
       open-iscsi init script continues, and then multipathd is
       started, which properly activates the devices, which can then
       be mounted.

       I don't see anything systemd specific in here, and while I
       haven't tried it, I would suspect that the same thing occurs
       also with sysvinit.

2. Also, really curious, at shutdown I have the following situation:
   multipath-tools does not seem to dismantle (or however that is
   called properly) multipath volumes. So now, I have the following
   situation:

     - due to proper ordering with my fix for the 90s systemd issue,
       remote filesystems get unmounted by systemd first, so nothing
       is mounted anymore that's on multipath

     - /etc/init.d/multipath-tools stop is called
           - multipathd exits

     - but apparently, /dev/mapper/mp{1,2} (that's how I called my
       test devices) still exist

     - /etc/init.d/open-iscsi stop is called, that logs out of the
       iSCSI session

     - later at shutdown, something (I don't know exactly what, since
       shutdown is parallel)  causes the kernel to try to access all
       block devices in the system, making it notice that it can't
       really access the multipath devices anymore (which still exit!),
       so it complains about it. See [2] for log messages related to
       this.



So basically you have two issues:

 - 30s delay on boot because udevadm settle (in open-iscsi) waits for
   multipath -v0 but that won't complete until multipathd is started,
   which won't happen until the open-iscsi script is done (which waits
   for udevadm settle) -> timeout

       - note that if I comment out the udev rule in question, the
         system boots immediately (total boot time only a couple of
         seconds, including iSCSI + multipath setup), but obviously
         that can't be a complete solution, because you DO want to
         pick up multipath devices that were started in early boot

       - this appears to be related to or the same as
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=580972

 - on shutdown, multipath device mapper devices are not removed and then
   something tries to access them in late shutdown phase, when iSCSI is
   already gone, which produces weird log messages, which in the default
   configuration of Jessie are shown on the screen for a short time
   before rebooting (might irritate some people)

       - since file systems umount cleanly and open-iscsi does a 'sync'
         before logging out of all sessions, I think this is *probably*
         only cosmetic






Therefore, my question would be: do you see the same to issues on
sysvinit? If so, I would then attach my patch to fix the
boot/shutdown ordering stuff of multipath-tools just on systemd and then
this bug may be closed, whereas the other stuff is something that I
probably can't really comment on too much because I don't use multipath.



Christian

[1] First:

iscsid[890]: Connection1:0 to [target:
iqn.2003-01.org.linux-iscsi.tkmlx74.x8664:sn.9284f4d8cb0e, portal:
192.168.15.100,3260] through [iface: default] is operational now

exactly 30s later:

systemd-udevd[145]: worker [157]
/devices/platform/host2/session1/target2:0:0/2:0:0:3/block/sdd timeout;
kill it
systemd-udevd[145]: seq 1357
'/devices/platform/host2/session1/target2:0:0/2:0:0:3/block/sdd' killed
systemd-udevd[145]: worker [159]
/devices/platform/host2/session1/target2:0:0/2:0:0:4/block/sde timeout;
kill it
systemd-udevd[145]: seq 1358
'/devices/platform/host2/session1/target2:0:0/2:0:0:4/block/sde' killed
systemd-udevd[145]: worker [157] terminated by signal 9 (Killed)
systemd-udevd[145]: worker [159] terminated by signal 9 (Killed)

(sdd and sde are configured in multipath via their wwid)

[2] First you have the stopping of multipath stuff:

[   40.593235] systemd[1]: About to execute: /etc/init.d/multipath-tools
stop
[ ... some stuff ...]
[   40.610542] systemd[1]: Child 3043 (multipath-tools) died
(code=exited, status=0/SUCCESS)
[ ... some stuff ...]
[   40.647841] systemd[1]: Received SIGCHLD from PID 1081 (multipathd).
[   40.647867] systemd[1]: Child 1081 (multipathd) died (code=exited,
status=0/SUCCESS)

(so basically, it tells me that the init script was successful)

And later on you've got:

[   41.628858] device-mapper: multipath: Failing path 8:48.
[   41.628865] end_request: I/O error, dev dm-5, sector 204672
[   41.629431] end_request: I/O error, dev dm-5, sector 204784
[   41.629897] end_request: I/O error, dev dm-5, sector 0
[   41.630057] end_request: I/O error, dev dm-5, sector 8
[   41.630466] end_request: I/O error, dev dm-5, sector 0
[   41.631080] device-mapper: multipath: Failing path 8:64.
[   41.631084] end_request: I/O error, dev dm-6, sector 204672
[   41.631525] end_request: I/O error, dev dm-6, sector 204784
[   41.631843] end_request: I/O error, dev dm-6, sector 0
[   41.632029] end_request: I/O error, dev dm-6, sector 8
[   41.632760] end_request: I/O error, dev dm-6, sector 0
[   41.667776] device-mapper: multipath: Failing path 8:64.
[   41.667790] Buffer I/O error on device dm-6, logical block 25584
[   41.668517] device-mapper: multipath: Failing path 8:48.
[   41.668522] Buffer I/O error on device dm-5, logical block 25584
[   41.668707] Buffer I/O error on device dm-6, logical block 25584
[   41.669120] Buffer I/O error on device dm-5, logical block 25584
[   41.670072] Buffer I/O error on device dm-6, logical block 0
[   41.670202] Buffer I/O error on device dm-6, logical block 1
[   41.670322] Buffer I/O error on device dm-6, logical block 2
[   41.670441] Buffer I/O error on device dm-6, logical block 3
[   41.671130] Buffer I/O error on device dm-5, logical block 0
[   41.671258] Buffer I/O error on device dm-5, logical block 1
[   41.703846] device-mapper: multipath: Failing path 8:64.
[   41.704643] device-mapper: multipath: Failing path 8:48.
[   41.740753] device-mapper: multipath: Failing path 8:64.
[   41.741484] device-mapper: multipath: Failing path 8:48.
[   41.757937] systemd-udevd[3010]: '/sbin/kpartx -u -p -part /dev/dm-5'
[3427] terminated by signal 15 (Terminated)

The last one ist kind of weird, since that comes from a udev rule in
60-kpartx.rules that AFAICT should be run only if a device appears.

#776214 multipath not automounting iscsi devices listed in fstab #776214

Just Reply to ...

Reply to submitter ...

Send control command (Silently)

Set Architecture Tags (Silently)