#771441 please add a restart hook for hotplugged interfaces

Package:
tftpd-hpa
Source:
tftp-hpa
Description:
HPA's tftp server
Submitter:
Mike Crowe
Date:
2017-10-29 08:51:06 UTC
Severity:
wishlist
#771441#5
Date:
2014-11-29 16:09:09 UTC
From:
To:
Dear Maintainer,

When a Wheezy or Jessie machine is fitted with an SSD the machine often
boots so quickly that tftpd-hpa is started before the network is fully
configured. The problem is reproducible with sysvinit (on Wheezy) and
systemd (on Jessie) although it may be easier to reproduce with systemd.

The same problem can be observed by attempting to start tftpd-hpa by hand
when no network connections are available.

When tftpd-hpa fails to start daemon.log contains:

 in.tftpd[881]: cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known

The error appears to be due to getaddrinfo(3) failing when attempting to
look up "0.0.0.0:69".

The daemon starts successfully when the network is unavailable if the
default /etc/default/tftpd-hpa is changed:

#771441#10
Date:
2014-11-29 18:55:10 UTC
From:
To:
Hi Mike,

What are you using to set up your network?

If it's ifupdown, then for sysvinit I'm going to guess you're hitting
a known bug with allow-hotplug, that bites more than just this service.
Most people fix that by using 'auto' instead of allow-hotplug.

For systemd it's a bit more complicated, but I believe that should also
work for Jessie with it too, at least once the udev settle fix is
restored to it (if it hasn't been already).

If you're using something else, it would probably be good to get that
information on the record, since there may still be more than one bug
here which people should give some attention to.

Thanks for that.  I'm going to need to give the consequences of doing
this a bit more thought than I have for it right this moment, but I'll
try to do that sometime soon.

  Cheers,
  Ron

#771441#15
Date:
2014-11-29 18:55:10 UTC
From:
To:
Hi Mike,

What are you using to set up your network?

If it's ifupdown, then for sysvinit I'm going to guess you're hitting
a known bug with allow-hotplug, that bites more than just this service.
Most people fix that by using 'auto' instead of allow-hotplug.

For systemd it's a bit more complicated, but I believe that should also
work for Jessie with it too, at least once the udev settle fix is
restored to it (if it hasn't been already).

If you're using something else, it would probably be good to get that
information on the record, since there may still be more than one bug
here which people should give some attention to.

Thanks for that.  I'm going to need to give the consequences of doing
this a bit more thought than I have for it right this moment, but I'll
try to do that sometime soon.

  Cheers,
  Ron

#771441#20
Date:
2014-11-29 19:14:16 UTC
From:
To:
Hi Ron,

Thanks for quick response.

It looks like it's currently NetworkManager on all the machines I've seen
this on. I thought I'd seen the problem with ifupdown too but no longer
have any evidence to support that.

Of course on a laptop it is perfectly normal to not have any working
network interfaces at boot time so it seems rather unfair of tftpd-hpa not
to start when it is not configured to be bound to a specific interface.

Thanks.

Mike.

#771441#25
Date:
2014-11-29 20:10:26 UTC
From:
To:
With ifupdown and allow-hotplug it's definitely possibly, since unlike
auto, that doesn't wait for the defined network(s) to come up before
the rest of the init scripts continue (and we've seen lots of services
fail due to that).  For NM, I'm less sure of what the mechanics might
be, but I know who to talk to about that.

Are you really running this on a laptop, or is that just an example?

It's also not quite clear to me yet exactly what the desirable default
behaviour should be in such a case.  Would you really want it to bind
to any random wifi hotspot that NM on your laptop might find?

That doesn't quite seem ideal either ...

  Ron

#771441#30
Date:
2014-11-29 20:32:05 UTC
From:
To:
If NM is connecting to WiFi or perhaps with stuff like 802.1x (not that I'm
using such things) then the network might not even come up until after
login.

I am. We use TFTP for booting embedded Linux devices during development.

Indeed. But I'm not really making anything super-secret available via TFTP
and don't allow writing. Having said that, I don't think that
NetworkManager will randomly connect to networks it doesn't know about but
perhaps it could be fooled into doing so if the ESSID matches even if the
BSSID doesn't. :(

But I don't think that the default of :69 is any worse than 0.0.0.0:69
would be though - unless you have a deep distrust of anyone on IPv6. :)

Thanks.

Mike.

#771441#35
Date:
2014-11-29 21:35:46 UTC
From:
To:
Right, I'm not saying it's necessarily wrong for someone to configure it
like this explicitly if they are sure it's ok for their use case, and the
INADDR_ANY is almost surely just because this predates support for IPv6
and hadn't been looked at again since.  And also surely, at least in part,
because it also predates people using this on laptops in potentially
hostile environments where network interfaces it might bind to can come
and go with the wind ...

Which just has me wondering more generally if either of these is still an
appropriate *default*, and if not what might be more appropriate.

I'm not sure we need to go full tin-foil here, but there seems to be at
least a few things here that probably could do with a bit more thought.

Another I don't have the answer to right now off the top of my head is,
will this even listen on interfaces that come up after the daemon is
started without us explicitly using IP_FREEBIND?  Having it not fail at
startup isn't a lot of help if it still won't actually communicate on
those interfaces.  That's easy enough to test, but I'm not remembering
the answer to this being a definite "yes it will" ...

  Ron

#771441#40
Date:
2014-11-30 15:48:13 UTC
From:
To:
I think you're probably right. It would make sense to require an explicit
action during configuration to force a decision on which interfaces which
should be bound to (as exim4-config does.)

I'm reasonably sure that binding to INADDR_ANY will accept connections on
any interface that appears in the future too.

Thanks.

Mike.

#771441#45
Date:
2015-05-21 08:25:38 UTC
From:
To:
Hello,

since this behaviour is still on Jessie and is rather inconvenient.
Whats the recommended workaround nowadays?
I want to serve on 2 networks, so setting my ip address wont cut it.
Is changing to TFTP_ADDRESS=":69" cause any issues, or a if-up.d
script still the best option?

Kind Regards,
Norbert

#771441#50
Date:
2015-05-21 09:27:33 UTC
From:
To:
I went ahead and patched the init script so you can omit all Variables
in the configuration. In my opinion its the cleanest to just dont set
the TFTP_ADDRESS variable and the script then doesnt pass the
--address option to in.tftpd.
--- tftpd-hpa.save 2015-05-21 11:23:28.841023590 +0200 +++ tftpd-hpa 2015-05-21 11:13:41.716011919 +0200 @@ -25,6 +25,7 @@ set -e [ -r "$DEFAULTS" ] && . "$DEFAULTS" +TFTP_DIRECTORY="${TFTP_DIRECTORY:-/srv/tftp}" . /lib/lsb/init-functions @@ -53,7 +54,7 @@ done start-stop-daemon --start --quiet --oknodo --exec $DAEMON -- \ - --listen --user $TFTP_USERNAME $TFTP_ADDRESS \ + --listen ${TFTP_USERNAME:+--user $TFTP_USERNAME} ${TFTP_ADDRESS:+--address $TFTP_ADDRESS} \ $TFTP_OPTIONS $TFTP_DIRECTORY }
#771441#55
Date:
2015-05-21 14:59:43 UTC
From:
To:
Hi Norbert,

Can you elaborate a little more on exactly what configuration you have
with Jessie that you see this happening on?
(ie. what init system, what brings up you network(s) etc.)

If there's still a real bug on that side of things, I'd rather we know
about it and address it at the source than just sweep it under the rug
here, since the latter would just push the bug off onto some other use
case for people to run into again.

I have this serving on 4 networks with an address set, so that alone
isn't a problem.  What's the situation in your case that it won't?

As we discussed earlier in this bug, if that's what you *want* it
is fine, but if there is some other bug that is still causing this
problem, that won't help people who do want or need to restrict
this to just a subset of the available network addresses - so we
probably need to identify the real bug that you've hit so that we
can fix it for them too.

(which is an independent question from your patch to allow running
this with just the tftpd default options for people who choose that
too, which seems like it's probably a reasonable idea as well).

  Cheers,
  Ron

#771441#60
Date:
2015-05-21 21:23:28 UTC
From:
To:
2015-05-21 16:59 GMT+02:00 Ron <ron@debian.org>:

Bog standard Jessie x86_64 Gnome3-Desktop.
Systemd and Networkmanager I guess, the point is that installing
tftpd-hpa doesnt work out-of-the-box

Sure, and I couldnt tell whats the current state from looking at this thread.
one is a local 192.168.x.x network at my desk and the other a
bigger 10.x.x.x network.

Id bet some real money that the primary use for tftp is that its run
on some servers, not having a laptop and identifying your
home/internal network by the ip address you got assigned.
Thats the usecase you are talking about?

Just wondering about the reasoning here, I would `ve never thought
someone would "secure"
his important data on a "trivial" ftp server by expecting you get a
different IP than at home in
wireless networks (nevertheless someone malevolently giving your
laptop the special IP via DHCP)
To each his own and its nice its working for you, but I really cant
follow the arguments (),
but then I dont know everything about the involved network stacks.


I will just state the issue more clearly and what I expect.

Problem: I want to host a tftp for fetching firmwares via a bootloader.
I have two network interface where I want the files accessible.
Securing the tftp doesnt matter for me (if, then Id prefer locking to
interfaces),
I want a painless and easy setup, ideally for providing the company with simple
steps to reproduce (apt-get install tftpd-hpa; echo done)

State: We are using Wheezy and I added an if-up hook (which I consider
a mean hack).
Sometime we will change to Jessie, means I tested an untouched
installation and wrote down the steps necessary.

Issue: tftpd service will fail when booting (as in the service is
stopped), supposedly because
the network interface(s) arent up.

Workarounds:
1) manually restarting the service later will fix this issue.
2) changing TFTP_ADDRESS=":69" will fix the issue.
3) applying my patch and omitting the --address option works too

I strongly believe it doesnt matter when you un/replug the wire after
tftpd was successfully started
but I would have to test this

So In short, hope you dont take any offense but to me its clear that
TFTP_ADDRESS="0.0.0.0:69" should mean the same as TFTP_ADDRESS=":69"
should mean use any IP. Conversely if we ignore common socket rules,
and TFTP_ADDRESS="0.0.0.0:69" would mean bind to one fixed IP "0.0.0.0"
then the defaults would need to be adjusted.

Under this perspective I don`t understand many of the arguments above

Kind Regards,
Norbert

#771441#65
Date:
2015-05-22 00:08:36 UTC
From:
To:
Well it's been working out of the box for me, and apparently for almost
everybody else too - and the problem with systemd was supposed to have
been solved by its maintainers before the release - so the details of
exactly why you got hit by it were kind of pertinent here ...

From a sample space of two, Networkmanager appears to be emerging as
the common culprit ...

Yes, I mean 4 networks as in separate network interfaces with 4 separate
IPs, with the one tftpd instance serving all of them.  That part isn't
the problem here.

I'm running this on a "Real Server", yes - which also means no Gnome
desktop and no NM - but the only person who had or reported the same
problem as you have was running this on a laptop ...

Which surprised me a little too, but appears to be a real and valid
use case as it was described.

I'm not following what you're trying to say there either, or where
you got that line of thinking from, so I can only sympathise if you're
confused by it :)

Network booting is what I understand most people want this for, yes.
It's all I use it for.  It's not really the right tool for much else.

And that's exactly what the default configuration gives most people,
with the debconf prompt letting you set the address(es) it will bind
to explicitly if you want something different to that.

Which is exactly what the default of INADDR_ANY was originally intended
to do.  It's a little dated, showing its age from the era before tftpd
supported IPv6, but IPv6 support isn't your problem here either.

Which appears to be an issue with Networkmanager ...

The ifupdown support for both sysvinit and systemd should not have
this problem, and lots of things other than this will indeed fail
to bind to their listening addresses if those interfaces are not
present and configured on the system.

Which are all basically just hacks around the problem of having
booted your system with a non-functional network.  Which appears to
be a problem that people using Networkmanager have.

I don't know what its proposed solution to that is, but if it
doesn't have one, then it's not really suitable for use on servers.

"the wire" doesn't have anything to do with this.

Networkmanager not configuring your network before services that want
to use it start is the problem you appear to be are having.

I don't see why you worry that might offend me, but if that's "clear"
to you, then you probably need to take a closer look at what 0.0.0.0
aka INADDR_ANY actually means as a listening socket address :)

In the world where IPv6 exists, those two are very much not the same.

I don't know what you mean by "common socket rules", but 0.0.0.0 has
a very specific and well defined meaning.

And I still don't know exactly which "arguments" you're referring to
here.  AFAICS we have an open question about what the best default
tftpd configuration should be, partly because it now supports IPv6,
and partly because the world is a different place to what it was
when this was first picked and perhaps we ought to be a little more
defensive by default - but there's already a debconf prompt that lets
you pick that for yourself anyway, so this is 'important' but not
urgent.

And then there is a problem, which is by no means specific to tftpd,
which appears to essentially be a Networkmanager design and/or
configuration flaw.

The only thing that is certain about the first question, is whatever
we choose it shouldn't be on the basis of hiding Networkmanager bugs.
How you hack around those locally if you insist on using it is up to
you, but we can't "fix" those in this software, short of rewriting
parts of it to listen to netlink events for dynamic interface creation.

... which probably isn't really a very high priority for Server Use.
That's much more of a laptop problem.


I'm a bit confused about what you're aiming for here, because you're
"betting" this software is for server use, but seem confused about why
it doesn't work so well in its current default configuration if you use
laptop tools to configure your network.  Yes, you can configure a hack
around that in some of the software it causes problem with - but that's
a different question to what is actually the best default for *this*
package to ship with in a hands-off install.


I'm pretty sure we don't ever want to omit the --user option and have
this run as 'nobody', but not supplying --address might have some use
or merit.  It's less clear cut if that should be the default though,
or if it's functionally different in any significant way to what is
already possible ...

  Cheers,
  Ron

#771441#70
Date:
2015-05-22 09:15:00 UTC
From:
To:
Hi Ron

2015-05-22 2:08 GMT+02:00 Ron <ron@debian.org>:
tftp-hpa is affected.
I also use fixed ip/dns via the gnome gui (what exactly the gui
affects and how NM is hooked into it
is unclear to me).
At home I have two systems where I killed NM and used the interfaces
config file,
exactly because this thing caused me alot trouble already

And I am far from the only one with the problem, I found this report
from another sites.
And Ubuntu has this issue too:
https://bugs.launchpad.net/ubuntu/+source/tftp-hpa/+bug/972845

I can did out more references from my browser cache when I`m back at
work next week

I remove NM in a heartbeat if this wouldnt mean messing up the
dependencies and losing
gui for applying settings.

I still dont know the difference in meaning between TFTP_ADDRESS=":69"
and TFTP_ADDRESS="0.0.0.0:69".
And too me the best default would be to just use the default tftp port
- thats archived
in a most consistent manner by omitting the --address argument (IMHO).

Whatever it does, it makes NM and tftp-hpa coexist peacefully, I dont
know how long it will
take till NM has no bugs.

This sounds like the network doesnt "exist" before NM, which I believe
to be wrong.
It should be possibly to use any ip and unix sockets before services
are started?

Thats not clear to me, from what I get is that listen on a unbound socket
will randomly pick an port, binding to INADDR_ANY:port before should
do the same on a fixed one. Might be different with ip6, please enlighten me
http://man7.org/linux/man-pages/man7/ip.7.html

The arguments I red from this is that the TFTP_ADDRESS="0.0.0.0:69"
is something desired and it shouldnt surprise anyone if it doesnt work.
(I never got the debconf prompt btw)

unless you have a laptop and connect to wireless networks, I dont see NM
very appealing anyway. But using alternatives is a hassle.

I agree, but its also a matter of time, isnt it? Until this is fixed
correctly I still need
to setup some systems for use. Better defaults that happen to work even
with the bugs unfixed would be great, a simple configuration or workaround is
ok too.

From my experience with wheezy and open-vm-tools, fixed versions
in the main repos can take till the next release.

Its a server/workstation problem if the same software packages are used.

I dont WANT to use latptop tools, NetworkManager is the default in
pretty much anything
with a GUI. I dont want to use it, but using alternatives comes at the cost
of maintaining configurations and (dist-)upgrades not working smoothly.
Actually I dont even know which gui-packages Id have to replace so
I could set my ip and dns in the interfaces file.
I meant with out-of-the-box just using the standard GNOME desktop,
which should the best maintained and usable for most people that
dont know what NM and tftp-hpa is. ie something that safes me work
running around and editing files.

I was taking care of the case of all variables or even the defaults
file missing (which is handled in the script).
You could simply set tftp as username if the variable is missing.

Kind Regards, Norbert

#771441#75
Date:
2015-05-22 11:35:14 UTC
From:
To:
It's not only tftp-hpa that would get burned by this.  Any network
using service that needs to resolve or use an address would fail in
exactly the same way if started before your network is up.  And lots
of them do.  This isn't a new problem, you're just lucky that it's
the only thing that you are seeing it on.

Which is probably a big part of the problem here.  The getaddrinfo(3)
call is failing because your network is not yet configured, and anything
which calls that (which means just about everything with IPv6 support
that is even half sane) would fail in the same way that this does if
started at the same point in your boot sequence.  As would anything
that tries to bind to a specific address before the interface it is
assigned to is up.

You could do a lot worse than do the same on this system too then :)

"Upstart in broken boot dependency shocker.  film@11"

The fix suggested there is more correct though, since it tried to
address the broken boot dependency, not hack around it by exploiting
an implementation detail of the tftpd configuration which will only
"work" for some users.

"me too" references aren't really helpful here.  The problem isn't a
mystery and the solution to it isn't a popularity contest.  Unless
the broken boot dependency is fixed in NM, this problem will still
exist.  Changing the default config so that it doesn't effect *you*
as a side-effect won't make it go away for other people who need a
different config.


I've left this bug open here because the question of whether the old
default (which has been what it is for much longer than I've been
maintaining this package) should be changed is an open one worth some
further thought -- but what we change it to if we do doesn't really
depend on "will this be broken with NM" because until NM is fixed it
will *always* be broken with some/most valid tftpd configurations
(and it won't be the only thing that is).  The answer to that needs
to depend purely on "what is sane/safe for a default tftpd install".

We can note there are workarounds with varying degrees of ugly that
may work for some people in some circumstances, but we really can't
"fix" NM from here - that needs to happen in NM itself.

  Cheers,
  Ron

#771441#80
Date:
2015-05-22 18:10:04 UTC
From:
To:
I would separate the tftp use cases then:
*) Binding to an ip address.
This would raise the question, when should tftp daemon be started and how.
The necessary hook would be: Event "TFT_IP is available" -> start tftp daemon.
This is as far as I know not part of init systems and the Tftp package
is missing the hooks (if-up.d and/or whatever equivalent in NM).
Otherwise how would the init system or network manager know when the
specific ip that tftp needs is available - what if this cable is not
plugged, how long should it wait before all interfaces are available
(and possibly configured by DHCP)?.

*) Listening on any network. this should`nt be affected by any service
like NM and whatever interfaces are available or configured. This
should be the default, and so far the default configuration tries to
archive that and fails (for whatever reason). This to me is a bug in
tftp or the configuration, both related to this package. Id also
appreciate if you explain the difference between 0.0.0.0:69 and :69 as
address option, and what 0.0.0.0 means for IPv6 - a link to the
documentation would suffice.

Can you setup a VM with a fresh Jessie GNOME 3 installation, I dont
know about the setup you use but as I said I want some easy steps that
work on plain systems most people are familiar with.

Kind Regards, Norbert

2015-05-22 13:35 GMT+02:00 Ron <ron@debian.org>:

#771441#85
Date:
2016-10-05 16:45:58 UTC
From:
To:
[It seems that I forgot to subscribe to this bug so I didn't see that there
had been activity since the original exchange.]

In Message #75 2015-05-22 13:35 GMT+02:00 Ron <ron@debian.org> wrote:

I don't believe that it's sensible for tftpd to assume that all network
interfaces are up before it is started. We no longer live in a world of
fixed network configurations.

I believe that leaves daemons such as tftpd with three choices:

1. Bind to INADDR_ANY and accept connections on any network interfaces as
they appear. Rely on firewalling to avoid unwanted connections.

2. Bind to network devices rather than addresses using SO_BINDTODEVICE.
This isn't ideal since network devices may have multiple addresses.

3. Monitor network interfaces via netlink and bind to them as they appear.

I believe that all but the first of these requires non-trivial development
work.

The current tftpd-hpa package defaults to being available on all interfaces
via an IPv4 address. In Message #25, Ron rightly questioned whether this is
still a sensible default. But, as I said in Message #30, I don't believe
that changing the default to TFTP_ADDRESS=":69" makes the situation any
worse, and it does mean that tftpd does work correctly when no network
interface is available at startup. Maybe if the default was changed then it
could be turned into a debconf question?

Mike.

#771441#90
Date:
2016-10-06 02:31:07 UTC
From:
To:
Sure, but the majority of devices in that world are now phones, and
devices which won't typically host network services.  And they'd stop
being very useful if the servers they relied on started roaming around
the network at random like an end-user device does.

If you have servers doing that, you have bigger problems to fix than
this.  I don't think we can apply "50 million blowflies can't be wrong"
logic to this, we need to look at what is most appropriate for a server
if we're talking about what the default configuration should be.

You can't 'fix' that by just changing tftpd, there are lots of server
applications which assume or require this as part of configuring them
securely.  Breaking that assumption has larger consequences than just
needing to implement netlink monitoring in them all.

Or leaves applications like NM with two choices to support people
running services:

  1. Give them an option to block waiting on interfaces which are
     expected to be up to be up, before starting services that
     will be offered on them.

  2. Provide hooks to (re)start or stop the services that are bound
     to particular interfaces when those interfaces come or go.

And realistically, I think it must provide both those things to be
considered a functional application, regardless of what other
services do.

Don't get me wrong, I've been a huge fan of all things hotplug since
long before NM even existed, and support for that in the kernel
became widespread - but it is a Hard problem to solve well, and you
can't solve it by kicking the can down the road and saying "the rest
of you will all need figure out the problems this creates and then
rewrite your code".

See https://bugs.debian.org/816087#15
for another example of how "just let everything race" can go badly
wrong if you turn off the traffic lights at a major intersection
without paying enough attention to the consequences of that.

What if it already was a debconf question ;?

The default is only used for people who don't answer that themselves
with something different.

What the default should be is mostly a balancing act between what is
sane for a relatively naive user who doesn't know what they should
answer there, and what would be right out of the box for most people
without 'special needs'.  Whether it should be changed now, also adds
the question of line of least surprise for existing users, so there
is some inertia and risk there which we shouldn't ignore if changing
it now is not the clearly compelling thing to do.


I'm inclined to think that running this on a laptop is a special case.
And that changing it "because otherwise NM breaks" would be hiding a
bug in NM rather than fixing it at the real cause.

But I'm not ruling out that there might be other compelling reasons
to still change the default for this at some point.  Whether that
should be to :69, or to something else, is still an open question.

  Ron

#771441#95
Date:
2016-10-09 20:57:04 UTC
From:
To:
Oh, it is. :)

I'd never noticed since it is low priority so I'd never been asked it. :(

The full text of the question is:


If the default were changed to :69 then the observable change to users who
accept that default would be that IPv6 would start working when previously
it hadn't. I suspect that Debian is full of services that started working
on IPv6 upon package upgrade (mostly in the now distant past.)

I can't help wondering whether there might be more users now installing
tftpd-hpa on their desktop or laptop in order to backup their router
configuration or boot some embedded device, than on a server to boot a room
full of diskless workstations. Maybe I'm guilty of being skewed by my own
experience.

You won't be surprised to hear that I think that :69 makes a more sensible
default. I believe that this is only slightly influenced by that default
also ensuring that tftpd-hpa starts even when the network interface isn't
yet up.

Thanks.

Mike.

#771441#100
Date:
2016-11-12 16:32:31 UTC
From:
To:
Unfortunately, I appear to have generated the patch backwards which is
somewhat confusing. I hope that this one is correct.

Mike.

#771441#105
Date:
2017-01-27 08:48:22 UTC
From:
To:
Hello,

FTR: I'm annoyed by the behaviour of tftp here, too.

I acknowledge Norbert's expectations for the ipv4 world. So I did the
following with python to test my expectations:

	>>> import socket
	>>> fd4 = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0)
	>>> fd4.bind(("0.0.0.0", 6969))

This works fine with and without connections in Networkmanager. The
thing that tftpd stumbles over however is (tftpd/tftpd.c, line 640):

	>>> import socket
	>>> socket.getaddrinfo("0.0.0.0", None, socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP, socket.AI_CANONNAME | socket.AI_ADDRCONFIG)
	Traceback (most recent call last):
	  File "<stdin>", line 1, in <module>
	  File "/usr/lib/python3.5/socket.py", line 733, in getaddrinfo
	    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
	socket.gaierror: [Errno -2] Name or service not known

(in the case where no connection exists) vs

	>>> import socket
	>>> socket.getaddrinfo("0.0.0.0", None, socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP, socket.AI_CANONNAME | socket.AI_ADDRCONFIG)
	[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '0.0.0.0', ('0.0.0.0', 0))]

(when there is a connection).

The problem here is (quoting getaddrinfo(3)):

       If  hints.ai_flags includes the AI_ADDRCONFIG flag, then IPv4 addresses
       are returned in the list pointed to by res only if the local system has
       at  least  one IPv4 address configured, and IPv6 addresses are returned
       only if the local system has at least one IPv6 address configured.  The
       loopback  address is not considered for this case as valid as a config‐
       ured address.  This flag is useful on, for example, IPv4-only  systems,
       to ensure that getaddrinfo() does not return IPv6 socket addresses that
       would always fail in connect(2) or bind(2).

When dropping socket.AI_ADDRCONFIG the result is

	[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '0.0.0.0', ('0.0.0.0', 0))]

independent of a connection being available or not.

I think dropping AI_ADDRCONFIG (from common/tftpsubs.c:311) should work
for tftpd. However I'm not 100% sure this is the right thing in all
corner cases.

Having said that the right thing would be to not try to determine if an
argument to -a is an ipv4 or ipv6 address but use:

	getaddrinfo(address, port, socket.AF_UNSPEC, socket.SOCK_DGRAM, socket.IPPROTO_UDP, socket.AI_PASSIVE)

with port defaulting to "tftp" and then bind to all addresses returned
by that. Extra points for supporting more than one (address, port)-pair
on the command line.

Independent of this changing the default TFTP_ADDRESS to ":69" to get
ipv6 connectivity would be nice. Or maybe still better to ":tftp".

Best regards
Uwe

#771441#110
Date:
2017-01-27 13:51:34 UTC
From:
To:
Hello,

After some discussion in #debian-devel and #nm we found out that there
is a related network-manager issue. That is,
NetworkManager-wait-online.service switches to active state too early. I
reported this upstream at
https://bugzilla.gnome.org/show_bug.cgi?id=777831 .

So actually there are two problems here:

 - tftpd is started before the machine is online
   (-> NetworkManager problem); and
 - tftpd doesn't handle it nicely when being told to bind to 0.0.0.0
   before any interface has an ipv4 address
   (tftpd problem).

.

One could argue that the second isn't an issue on a well-configured
server, but on the other hand there is no reason to not try to handle
the dynamic case a tad better.

Best regards
Uwe

#771441#115
Date:
2017-01-29 11:09:43 UTC
From:
To:
AI_CANONNAME is only relevant when the resulting official name is used,
which is not the case in tftpd for the address to bind to. Also
AI_ADDRCONFIG isn't helpful. This flag is good for sockets used to
connect(2) somewhere. But for listening sockets it makes tftpd fail to
start when -a 0.0.0.0:69 is passed and no network device is up yet.

This addresses Debian bug https://bugs.debian.org/771441
---
 common/tftpsubs.c | 4 ++--
 common/tftpsubs.h | 2 +-
 tftp/main.c       | 9 ++++++---
 tftpd/tftpd.c     | 6 ++++--
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/common/tftpsubs.c b/common/tftpsubs.c
index 8c999f66eed8..344c74b3d78c 100644
--- a/common/tftpsubs.c
+++ b/common/tftpsubs.c
@@ -300,7 +300,7 @@ int pick_port_bind(int sockfd, union sock_addr *myaddr,
 }

 int
-set_sock_addr(char *host,union sock_addr  *s, char **name)
+set_sock_addr(char *host, union sock_addr *s, char **name, int ai_flags)
 {
     struct addrinfo *addrResult;
     struct addrinfo hints;
@@ -308,7 +308,7 @@ set_sock_addr(char *host,union sock_addr  *s, char **name)

     memset(&hints, 0, sizeof(hints));
     hints.ai_family = s->sa.sa_family;
-    hints.ai_flags = AI_CANONNAME | AI_ADDRCONFIG;
+    hints.ai_flags = ai_flags;
     hints.ai_socktype = SOCK_DGRAM;
     hints.ai_protocol = IPPROTO_UDP;
     err = getaddrinfo(strip_address(host), NULL, &hints, &addrResult);
diff --git a/common/tftpsubs.h b/common/tftpsubs.h
index b3a3bf3c95e1..0edda03a514c 100644
--- a/common/tftpsubs.h
+++ b/common/tftpsubs.h
@@ -98,7 +98,7 @@ static inline int sa_set_port(union sock_addr *s, u_short port)
        return 0;
 }

#771441#120
Date:
2017-02-02 05:38:49 UTC
From:
To:
Up to this point, this patch doesn't actually change the existing
operation in any way.  But in what follows ...

The use of AI_PASSIVE here is a placebo.  That flag has no effect unless
address was NULL, and if that was true, neither of the hunks here would
actually be executed in the first place.

Using AI_CANONNAME here should be harmless at worst.  So the only actual
change is to drop AI_ADDRCONFIG - the flag which limits getaddrinfo to
returning only the address families that are actually supported by the
configured interfaces on the system.  And ordinarily that would seem to
be a fairly uncontroversially Good Thing to do, for both connecting and
listening sockets.

So unless upstream sees this differently, I still think we'd need to see
some stronger rationale for why that isn't a Good Thing in this particular
case than just "Dropping that flag hides a real bug in NetworkManager".
Because it could hide or introduce real problems in other cases too, and
if the bug in NM is fixed, then the only reason I'm so far aware of for
you proposing this patch (based on the discussion on #d-d) also goes away
too ...

Assuming that at some point the NM bug will be fixed, why would we still
want to make this change in this code?

  Cheers,
  Ron

#771441#125
Date:
2017-02-02 13:22:47 UTC
From:
To:
Hello Ron,

Right. This coult be accomplished with a less intrusive patch that
assumes AI_CANONNAME | AI_ADDRCONFIG if name (i.e. the 3rd argument) is
non-NULL. YMMV.

Right. Today it only has an effect if the first argument to getaddrinfo
is NULL. The intension (IIUC) is that it should be used when you plan to
feed the result to bind (opposed to connect).

The downside of using AI_ADDRCONFIG is that it makes binding to 0.0.0.0
(or ::) fail when no interface is up yet.

If we can agree in principle I can rework the patch to make one change
per patch:

 - drop AI_ADDRCONFIG for tftpd use
 - (maybe) introduce AI_PASSIVE for tftpd
 - (maybe) drop AI_CANONNAME for tftpd

This is not the (only) reason for me. This is mostly only how it showed
up for some people, but still there are more IMHO good reasons to fix
it:

 - inconsistent behaviour when no interface is up: -a 0.0.0.0 fails,
   -a :: fails, not passing -a doesn't fail and makes tftpd bind to all
   interfaces.
 - The "no interface is up" also happens with ifupdown with no auto
   interface is used (only hotplug)
 - The "no interface is up" also happens if your laptop has no network
   connection during boot
 - It's more robust to try what the admin requested. It is possible even
   if no interface is up to bind to 0.0.0.0. So I suggest to do that and
   not try to know it better than the admin.
 - The error message

	cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known

   is misleading.

See above. Which problems do you see introduced by my patch?

IMHO "we don't do it right because it might paper over other problems"
is a poor reason for not patching. ("I don't need seatbelt or a helmet
because if my head gets hurt there is a different problem.")
(though not fixing) patch was suggested by them.

Best regards
Uwe

#771441#130
Date:
2017-02-02 14:25:25 UTC
From:
To:
Indeed. As I wrote in message #95, the debconf question for TFTP_ADDRESS
even implies that the current default value will support IPv6, when it does
not.

If Ron will accept it, then I can update the patch in Message #100 to say
":tftp" rather than ":69". Is there any chance we can get this into Stretch?

Mike.

#771441#135
Date:
2017-02-03 05:10:32 UTC
From:
To:
What do you mean by "Today"?  Both SuSv4 and the Linux man page are
unequivocal about the _only_ use of that flag being to special case
a NULL address (meaning 'this machine') to return either the wildcard
address or the LOOPBACK address.

That you'd use the wildcard address to bind a service to all addresses
of 'this machine', or the loopback address to connect to a service on
'this machine' is illustrative.  There's no deeper distinction or
fundamental difference related to what functions you might later pass
the address(es) you obtain to.

That seems to be where we disagree.  I don't see it as a 'downside' that
if you explicitly say "I want to bind to IPv4 addresses" (or IPv6), and
you don't actually have any, that this should fail early and loudly to
warn you about either misconfiguration, or some other more serious
failure, occurring.

If you passed a name instead of a numeric address, you'd only get the
address families the machine actually supported.  If you pass a numeric
address in a particular family, you get a sanity check that it's valid.

If you (personally) don't care about that, just don't pass an explicit
address.

The downsides of not using AI_ADDRCONFIG can't be remedied so easily.

I'm far less concerned about the format of the patch, than the details
of what it's actually hoping to achieve.

If all of the reasons for doing this are just different ways of saying
"if we do less sanity checking, then misconfiguration and broken tools
won't annoy me as often" - that's not very compelling.

Doubly not so when there already is a way you can configure this for
your own use which does already bypass them.  Simply disabling that
for everyone and every configuration isn't a good answer.

I don't see how that is 'inconsistent'?  If you ask for an explicit
address (family) you get a sanity check that (support for it) is
available.  If you say you don't care, then tftpd doesn't either.

Not defaulting to auto (and systemd not respecting it for a while) were
both bugs that broke lots of services on lots of people's machine.

And I already explained to you in #d-d that the established 'solution'
for that, for services which don't monitor netlink events, is to add a
hook which restarts them when interfaces appear or disappear, if you
really want them to bind to interfaces which might genuinely be expected
to be hotplugged.

Because if you're not actually using wildcard addresses, then this will
_still_ fail, even with your patch, if that interface isn't already up.

This doesn't need a link up state to work, it just needs a local
interface to be brought up with the needed address (family) assigned
to it.  If your only address is assigned by DHCP or something similar,
and you aren't blocking waiting on that - then as above lots of services
are going to fail to start if your boot sequence blindly tries to start
them anyway.

That isn't the fault of, or a bug in, those services.  Your system is
just misconfigured for what you actually want it to do.

See above.  Your patch is *taking away* the ability of the admin to
have the choice to specify exactly what behaviour they want ...

The behaviour that you want is already supported.  You just need to
explicitly "request" that, rather than redefine the historical
default to now mean that as well, taking away any other option that
some other admin might want to request.

Well, you can file that bug against glibc :)  We're just reporting
what getaddrinfo and gai_strerror returned ...
to any address, without any checking of what address (families) are
available, even if the user explicitly specifies them - and that silently
ignoring any error with being able to do that is a Good Thing.

If you're running a toy service on your laptop, that might be ok.
If you're really using your laptop as a 'portable' server for some
special use case, you probably don't want it binding to random wifi
hotspots wherever you may go with it.
If you're running a Real Server, then silently ignoring real problems
is pretty much the opposite of what you'd want happening in most cases.

The lessons of:
https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00
aren't entirely inapplicable here.

I don't think defining "do it right" as "I'd rather you disable checking
in the software for everyone than configure it locally to do what I want"
is very helpful here.  Especially not if the only reason you want that
is because NM has a bug, or you don't want to configure it to properly
support using this service with hotplugged interfaces.

By your analogy, what you're saying with this patch is "I want you to
remove the seatbelt completely, even though I already have the option
to choose not to wear it myself".

Cool, though it would be nice if there was a (possibly RC?) debian bug
also tracking that.

I think if you want to change the behaviour of this program, you're also
going to have to get your proposed patch past upstream before I apply it.
When I first adopted this package, the first thing I did was get all of
the patches we were carrying upstreamed, and I'm not keen to diverge
from them again over something like this.

Since they hadn't chimed in on it yet I've given you some feedback on
why _I_ don't think this is the right solution - but ultimately it is
them you'll need to convince otherwise, not me.

I am grateful for you digging into this, confirming that the core of
what is really making people most unhappy here is an NM bug, and
reporting that bug to them.  And as I said in #d-d, I will take a
patch to include an NM hook if you really do want support for NM
managed interfaces that really are expected to be hotplugged some
time long after boot.  If people really want support for late hotplug
in ifupdown we can add that too, but so far nobody has ever reported
that as being a problem for them ("use auto or expect pain is fairly
well known these days).


I do still consider the question of what the default config should
be an open one - it's something I inherited from the previous maintainer
too - but I'll follow up on that separately.  It's bad enough that we've
already got multiple issues conflated together in this one bug, so it
would be nice to at least still keep them to separate email (sub)threads.

  Cheers,
  Ron

#771441#140
Date:
2017-02-03 06:18:47 UTC
From:
To:
That's most probably just an oversight from between when that prompt
was first written and when IPv6 support was actually added.  But that
predates my involvement here, so I can't say for sure.

That said, it also doesn't seem entirely unreasonable for anyone
configuring a service like this to know that 0.0.0.0 is an IPv4
address ...  which might be related to how it got overlooked ...

It's ok, I don't need a patch to change the default.  The real question
for this bug (as I think I've said a few times now), is *what* it should
be changed to if we change it.

You've been unambiguous about your preference being that the default
should match your preferred use case - but given that we've now got
people saying they are running this on laptops, I think there's also
a strong case to be made that the default should actually be *more*
restrictive than it currently is.

Historically, TFTP was only ever used on trusted LAN ports, to provide
boot and configuration files for bare and dumb devices.  So binding
to all interfaces and assuming they are trusted wasn't an unreasonable
default.

But given that these days, those files can increasingly contain
sensitive data, like plaintext admin passwords for dumb embedded
devices - and that there is no other access control aside from what
ports you bind this to and how that machine is firewalled - it does
seem irresponsible to open that by default, for naive users who
might carry their laptop around and use it on random untrusted
networks.

Real admins with real servers are going to know how to preseed this
to use their own preference, or are going to be using other tools
to maintain their system configuration anyway.  So maybe we should
err on the side of 'forcing' naive users to explicitly make it more
permissive if that's what they really want, rather than just opening
it to everyone before they've even had a chance to read the man page.

Given that it's increasingly clear that there isn't actually a 'bug'
in this software, just the minor question of whether the default
configuration is still appropriate for expected use(r)s in 2017, it
doesn't seem all that likely that the release team would want to
accept such a change now even if I was convinced we certainly knew
the definitively right answer and pushed it.

If you want to fix the symptom for Stretch, you'd be better off
filing an RC bug against NM for the issue affecting it.

If you really want :69 as your local config for other reasons, you
can already do that today.


Right now, I'm basically seeing 3 options for how to 'close' this
issue here now:

 - Make the default more restrictive, raise the priority of the
   debconf question so more people actually see it, and include
   some explanation of why it's restrictive, and what you might
   want to change it to for particular use cases.

 - Leave the default as is, but tweak the prompt text to be a
   bit clearer (and maybe still raise the priority).

 - Make the default completely permissive as you're suggesting
   and just let anyone who gets burned by that learn their
   mistake The Hard Way.

And if I had to rank them by the amount of (potentially justified)
vitriol that the hate mail I'll get from people who don't like the
new default because it somehow inconvenienced them will contain ...

... then the first one starts looking like a pretty attractive
option ...  and I'm not really sure what arguments to the contrary
might change that.

I'm willing to listen to any that we haven't already heard (I haven't
forgotten them, there's no need to repeat them), and I'm far from
being completely convinced that's a Great Answer.  But it might
really be the Least Worst one for today, all things considered.


  Cheers,
  Ron

#771441#145
Date:
2017-02-03 09:19:06 UTC
From:
To:
I had in mind that at some point in the future (say with ipv8 or
802.11t-2042) the flag might mean more. I'd say the intension is to use
AI_PASSIVE if you plan to listen on this address, so it seemed right to
use it.  But I'm willing to restrict the discussion to the removing of
AI_ADDRCONFIG.

If I want to bind to 0.0.0.0 and no interface (but lo which might be
good enough for me) is up this might be intensionally. This way I might
be able to speed up system boot because I don't have to wait until all
interfaces are up before I start the daemon. Of course this doesn't help
if I configure tftp to listen on an explicit address, but that's a
different problem that's out of scope of my patch.

IMHO AI_ADDRCONFIG is at best an optimisation for programs that use a
socket to connect(2). It's not that sensible for sockets to listen.

Consider I do:

	tftpd -a tftpd.mycompany.com:tftpd

and tftpd.mycompany.com resolves to both an ipv4 and an ipv6 address. If
the server has an ipv4 problem it just starts to listen on the ipv6
address with AI_ADDRCONFIG. Does this sound wrong only for me?

I tried hard to show good reasons that this is not the motivation for
this change. I don't say "drop all sanity checking", I'm only saying
"don't refuse to work as good as you can".

If I don't pass -a that's not "I don't care" but "bind to 0.0.0.0 and
::". To make it more explicit:

 - If the admin requests 0.0.0.0 this is denied because there is no ipv4
   address on any interface.
 - If the admin requests :: this is denied because there is no ipv6
   address on any interface.
 - If the admin requests :: and 0.0.0.0 he gets what he wants even if
   all interfaces have neither an ipv4 nor an ipv6 address.

Still there are valid use cases of using hotplug. I don't see why tftpd
shouldn't try to cooperate in these cases without restricting users that
have different setups.
I want to discuss: "tftpd fails to bind to 0.0.0.0 in some situations
even though it could do as requested."

Also listening on 0.0.0.0 includes listening on lo. For listening
sockets it's ridiculous to special case lo. IMHO it's even wrong that in
the case where there is no address on any interface but lo

	>>> import socket
	>>> socket.getaddrinfo("127.0.0.1", "tftp", socket.AF_INET, flags=socket.AI_ADDRCONFIG)

fails. So IMHO AI_ADDRCONFIG is just band aid that might be used for
clients(!) that fail to use all return values from getaddrinfo.

Right, this is entirely ok and everything else would be wrong. (I didn't
test, but I think also in this case the error message is improved from
"Cannot resolve 8.8.8.8" to "Cannot bind to 8.8.8.8" which IMHO makes
more sense.)

This is not usual for the common laptop. If it's booted in a train, then
suspend and resumed in the office tftpd isn't running.

On my machine tftp is the only service having this problem. Pointing to
others that don't behave cooperative isn't an excuse to not cooperate.

For me it's the other way round. If I request an application to bind to
0.0.0.0 it should try to do this and not be smart with me. Even if the
application thinks I did something wrong, the application should only
complain if I request something impossible.

I fail to follow. Can you please remember me, what the failure is I
introduce?

I'm not talking about the default configuration. I just want to make
tftpd able to bind to 0.0.0.0 when requested to do so.

getaddrinfo and gai_strerror are fine. ftpd uses them wrongly and so the
error doesn't fit to what tftpd should actually do.

No, you're talking about that other problem ("What should be the default
binding address of tftpd?").
I'm talking about "tftpd fails to bind to 0.0.0.0 in some situations even
though it could do as requested."

Great, so we can agree on a use case where my patch makes sense. Great.
For me this is good enough to apply the patch given there are no
disadvantages to other use cases.

Right. In this case I have to make something different. I can do so even
if tftpd behaves fine when requested to bind to 0.0.0.0. Binding to
0.0.0.0 might not be part of the solution for the portable server
scenario, but being able to do so doesn't restrict me here. Great!

What is the problem here? I have a Real Server and tftpd is supposed to
be started on 0.0.0.0. Currently it fails to do so if the networking
setup is broken. So after repairing the network configuration I have to
restart tftpd. This is cheap and ok. With my suggested patch I have to
repair the network only and after that tftpd starts serving requests as
its configured to do. That's even better, isn't it? Even if not, it
doesn't make the situation worse here.

So lets agree that there are some situations where the patch is good,
and in all other situations it doesn't hurt. That's a good enough
justification to apply the patch if you ask me.

I admit I didn't read that completely. The abstract says that the
statement "Be liberal in what you accept, and conservative in what you
send" might have negative consequences to long term maintenance. This
might be true if you expand your application to handle all sort of
broken requests. I don't see this fit my patch, as it is not about
better guessing what the admin requested if he articulates something
incomprehensial. It's just about doing what the admin clearly requested.

I think you're talking about the default configuration thing again. If
not I cannot follow.

That's wrong. You currently don't have the option to not wear the
seatbelt because -a 0.0.0.0 fails.

Agreed. That's why I posted the patch to the upstream mailing list.

Assuming we're still not in agreement about this patch, it would be
great to get a third opinion.

Best regards
Uwe

#771441#156
Date:
2017-02-04 05:16:46 UTC
From:
To:
 ...

That would seem to be a pretty good summation of how we're failing to
converge here ...

Brainstorming imaginary problems to fit your proposed solution, especially
when you don't clearly say exactly what *your real* use case was here,
doesn't make that solution more compelling or less ill-advised.

If your real problem (aside from the NM bug which is now being tracked
here: https://bugs.debian.org/854078) is that you don't want to bind to
a specific address, just anything that appears at any time, then there
is no bug effecting you, you can already configure tftpd that way.

If it's instead that you do want to configure it to only use a subset of
the available addresses, and some of those addresses might genuinely be
hotplugged, long after boot, independent of the NM bug - then there's a
well known, long established, solution to that too.  Which I've mentioned
several times now.  Whatever brings those interfaces up needs a hook to
restart the services that you want bound to them, for each of the
services that doesn't do that itself by monitoring netlink events.

It's the "one simple trick" that solves all the problems you've expounded
here, also solves the problem you admit that your patch doesn't address,
and doesn't have the unfortunate side effects your patch does.

If:

Then why would you insist this crazy patch, which just crudely kludges
over a limited subset of the issues with hotplugged interfaces, leaving
others still broken - is better than one well-trodden solution which
fits all cases?


I don't use NM, so if someone who does wants us to add such a hook for
it to this package, they'll need to send a tested patch to do that.
I'll happily include it (and then close this clone of the bug) if they
do.  Bonus points if you also send one for ifupdown, but so far nobody
has reported wanting to use this with it and genuinely hotplugged
interfaces.

Please, let's focus on good solutions to the real problems rather than
straining to find good problems to fit a partial kludge.  This bug log
is already a maze of twisty little misconceptions - and the aim is to
dig our way *out* of that, not to find new ratholes to get lost in.


What doesn't work if the bug in NM is fixed, and it has a hook to
notify this service of real dynamic interface changes when they occur?

  Ron

#771441#161
Date:
2017-02-04 22:28:08 UTC
From:
To:
Hello Ron,

I mixed too many things that IMHO improve the code but actually only
care about one of those. So I suggest we restart the discussion with
focusing on that one thing only. Let me try that:

Currently tftpd when requested to bind to an address X does in pseudo
code and simplified:

	if X looks like an ipv6 address:
		family = AF_INET6
	elif X looks like an ipv4 address:
		family = AF_INET
	else:
		family = AF_UNSPEC
	addrinfo = getaddrinfo(X, NULL, { .ai_family = family, .ai_flags = AI_CANONNAME | AI_ADDRCONFIG })
	bind(fd, addrinfo)

(where bind() works on both AF_INET and AF_INET6 if getaddrinfo returns
both).

This does the right thing most of the time. There are cases however
where the behaviour is wrong or at least undesirable:

 a) if X = 0.0.0.0 and no interface (but lo) has an ipv4 address,
    getaddrinfo returns an error and tftpd fails to start with

    	cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known

 b) if X is an hostname that resolves to an ipv4 and an ipv6 address and
    the machine currently has no interface (but lo) with an ipv6
    address, tftpd only binds to the ipv4 address.

In case a) my expectation is that tftpd binds to 0.0.0.0 anyhow. I don't
think it is necessary to show a scenario where this is sensible because
I expect a command to do what was requested unless that's impossible.
Nevertheless there are situations where this might make sense:

 a1) On a mobile machine without network access during boot. tftpd might
     later be used when an interface is up (e.g. it is plugged into a
     network later or a virtual machine is booted once it is needed.)
 a2) On a machine where you want to boot quickly and so drop unneeded
     prerequisites. So you can start tftpd in parallel to bringing the
     network up without the need to serialize these two.

Additionally to the refusal to start binding on 0.0.0.0 the error
message is not understandable to me.

In case b) my expectation is that tftpd fails with something like:

	cannot bind to IPv6 address $ipv6_address

a) can be fixed by just dropping AI_ADDRCONFIG from the call to
getaddrinfo. Also for b) this improves the situation, from

	cannot resolve local IPv6 bind address: $X (...); using IPv4 only

to

	cannot bind to local IPv6 socket,IPv6 disabled: ...

So in this case the error message at least matches the actual problem.

This convinces me it's the right thing to drop AI_ADDRCONFIG for tftpd
as AFAICT there is no down side.

Best regards
Uwe

#771441#166
Date:
2017-02-05 06:35:16 UTC
From:
To:
Just repeating the same things, while ignoring the options I've shown
you that do properly fix the problem(s) you're claiming to care about,
isn't actually advancing this toward a workable solution in any way.
My previous replies to you were already focussed on the part of your
patch that removed AI_ADDRCONFIG, and why it was not needed at best,
and harmful at worst.

I can read the actual code, and understand how gai works, and I'm pretty
sure Mike understood all of that too when he first reported this bug.
I'd already long ago checked that there wasn't a real bug being triggered
somewhere here, and that the code itself really was working as expected,
and you haven't indicated anything to the contrary here.

In the subset of cases where gai is used to resolve a string into a
(set of) network address(es), it is not wrong to tell it that it's
useless to return any address (family) which can't possibly work with
the current machine configuration/state.

That doesn't become less wrong if your expectation or interpretation of
how it should work is different to reality and the specification of how
it is defined to work.  If you explicitly say "bind to 0.0.0.0", you're
saying you want to service global IPv4 requests.  If you have no global
IPv4 interfaces, that should fail and warn the admin of a problem, not
silently ignore that what they explicitly requested isn't going to work.


If what you meant to request was "bind to whatever *is* there, I don't
care what", then just don't pass an explicit address.

If you're allergic to that possibly also binding to IPv6 addresses,
then pass the -4 flag too.

We don't need to disable the sanity check for users who do configure
an explicit address for you to get the behaviour you say you want
from the current code, without any change to it at all.


If what you care about is "faster boot", then the answer to that isn't
"speculatively start things that will fail (or just be useless) if they
lose a race", it's to actually not waste time and resources doing that
until their known prerequisites have been satisfied.


Where it is expected that the available interfaces and/or configured
addresses might change dynamically long after boot, then whatever you
have doing that needs to notify, or reload, or (re)start, (or stop),
anything you are or want to be (not) running, based on that new state
of the system.

If you want that, propose a patch to add a hook.  Otherwise, what you
say you want is already possible without needing the kludge you did
send.


The only bit I'm still struggling to understand, is why you are still
pushing this patch hard instead of using what is already available that
has exactly the behaviour you say you desire, or looking at a more
complete and working solution for dynamic interfaces in general.

The options here and the actions of the code don't look very complicated
to me, so you don't need to "simplify" it on my behalf.  I'm just not
seeing you show any new problem that isn't contrived and that doesn't
already have a good and/or already working solution which doesn't depend
on needing this patch.

What problem isn't satisfied by the options I've shown above and earlier?

  Ron

#771441#171
Date:
2017-02-05 20:56:23 UTC
From:
To:
Hello Ron,

Note I repeated less than before. I hope this will simplify the
discussion and stop both of us arguing about stuff that doesn't matter
much.

Yes, you told me in the situations you care about the modification
doesn't help you. I seem to care about different situations where the
patch is beneficial. So if the patch doesn't make things worse for you
(and all others out there) and it improves the situation for me, IMHO we
should apply the patch.

I didn't understand yet, in which situations it is harmful as you claim
above. The best claim matching this is: It papers over other problems.
Is it that why you want to keep AI_ADDRCONFIG? I don't understand though
what this buys for you. Consider you have a network problem on the
machine tftpd is supposed to run at. The result is that eth0 doesn't
have an ipv4 address. You notice that because a tftp-client tries to
contact the tftpd server and doesn't get an answer. So what to do next?
I assume your next step is logging into the tftpd-machine and check if
tftpd is running. (If instead you try ping $tftpdmachine, or check the
network config of the tftpd-machine, it doesn't actually help you that
tftpd isn't running as it is already obvious then that the network
configuration is at fault and not tftpd.) You see it doesn't run, check
the log and see

	cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known

. Is it obvious now what the problem is? Maybe yes if the network is
still broken. But if not, it's harder to understand.

If instead tftpd would have started successfully, you see this after
login and the IMHO obvious next step is to check the network config.
If you ask me, that's not any harder to debug.

I'm not sure about Mike, but that doesn't matter here.

I did. I showed two examples where the use of AI_ADDRCONFIG breaks more
than necessary or expected.

Using AI_ADDRCONFIG suppresses options that actually work. After all I
can bind to 0.0.0.0 even if no interface but lo has an ipv4 address.
After that I can connect to 127.0.0.1:69 and talk to tftpd. Or I can
hotplug a network interface, configure that and talk to tftpd from a
remote machine.

I understood that none of these is a scenario you depend on. But with
your refusal to see this patch as useful you assume that the above is
not a use case for anybody.

The semantic of binding to 0.0.0.0 is not only "serve on global ipv4
addresses that currently exist". The semantic also includes: "serve on
lo" and "serve on interfaces that come up after the socket is bound".

The admin has a problem if a server that is supposed to serve files via
tftp to the world doesn't have a ipv4 address. That alone is a problem
big enough to notice. It doesn't help the admin that tftpd isn't running
or there is a cryptic error message in the log. If the network interface
is ready only later in the boot process everything works as expected as
soon as the interface is up. You say it's bad in this situation that the
admin doesn't notice there is a problem. I wonder if it's not a valid
approach to claim that there isn't a real problem. After all the world
can access the files now.

Do you consider it obvious that

	tftpd -a 0.0.0.0

is semantically different to

	tftpd -4

? I don't. Sure, I can change my setup accordingly now that I know. But
this feels much more like a kluge than making the two commands above
behave identically. And sure, we can even document this behaviour and
claim it to be a feature. But that's not intuitive and I bet people will
stumble about this in the future and wonder.

I failed to understand this. If someone uses:

	tftpd -a 1.2.3.4

my patch doesn't make tftpd magically work. The difference is that tftpd
in the "no ipv4 address available" case with my patch says:

	failed to bind to 1.2.3.4

instead of

	cannot resolve local IPv4 bind address: 1.2.3.4

without the patch. I say this is an improvement.
Maybe I didn't understand you correctly here.

tftpd when started with -a 0.0.0.0 with eth0 coming up later isn't
failing or useless. It does exactly what it is supposed to do when
AI_ADDRCONFIG is dropped. That is, tftpd binds to 0.0.0.0.

If you enlarge the problem I want to solve, my patch isn't suitable any
more. That is applicable to all improvements and so it's not sensible to
use it against an improvement. So it also wasn't helpful to retitle the
Debian bug to 'please add a restart hook for hotplugged interfaces' from
'tftpd-hpa fails to start properly if network is unavailable'. If
handling hotplugging is your wish, I suggest opening a separate bug for
that without highjacking this one.

Making a software behave better in some situations without making it
worse in any other is a good enough reason to consider the change. That
is still true even if after patch application there are still some
situations that could be improved.

I do this because tftpd doesn't behave as I expect. I struggled because
I didn't understood what the error message want to tell me. And I assume
that others struggle in the same way in this situation. And then there
is that idea of open source, where you can actually improve the stuff
you work with ... So my suggested patch might eventually save others
spending time understanding tftpd-hpa as it doesn't fail for them with
strange error messages.

It makes the behaviour of tftpd match what I expect. And I believe
others expected (or will expect) the same.

I think it doesn't make sense to continue discussing between the two of
us. If this mail isn't enough for you to see my point I suggest we try
to find a few other people (ideally hpa among them) to give their
opinion. I added the people that commented the Debian bug so far
explicitly to Cc:. Maybe you can speak up. (If you're missing context,
reading https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771441#161
should be enough.)

Best regards
Uwe

#771441#176
Date:
2017-02-14 19:17:06 UTC
From:
To:
Okay, let me chime in here.

AI_ADDRCONFIG seems to be the Wrong Thing[TM].
AI_PASSIVE seems to be the Right Thing[TM].

Part of the problem is that the fallback code for the case of
getaddrinfo() not being there is braindead, and of course the original
code used to use gethostbyname() directly.  I already have a much better
fallback version of getaddrinfo() written which would let us make much
better use of the getaddrinfo() interface,

Now, what I want to know is why you are specifying the accept-all
address explicitly as 0.0.0.0 instead of an empty string.

	-hpa

#771441#181
Date:
2017-02-14 20:32:43 UTC
From:
To:
Hello hpa,

That's great, thanks.

And you even seem to agree with me, that's still greater :-)

Do you still care about platforms without getaddrinfo? This is even in
POSIX.1-2001.

The really right thing to do would be not use a single socket for ipv4
and another for ipv6, but just iterate over the result of getaddrinfo
and open a socket for each addrinfo. But let's not do more than one
thing at a time.

That's because that's the default of the Debian tftpd-hpa package. If
you repeat your question about the Debian default, there (I think) the
answer is: it's a relict that predates ipv6 support. OK, probably
already back then '' would have worked. I can only guess about the
reasons, maybe it conflicted with the maintainer scripts that ask for
the default bind address during installation.

Best regards
Uwe

#771441#186
Date:
2017-10-29 08:38:56 UTC
From:
To:
Ich beabsichtige, Ihnen einen Teil meines Vermögens als freiwillige
finanzielle Spende an Sie zu geben. Reagieren Sie, um teilzunehmen.
Wang Jianlin
Wanda Gruppe