Dear Maintainer, When a Wheezy or Jessie machine is fitted with an SSD the machine often boots so quickly that tftpd-hpa is started before the network is fully configured. The problem is reproducible with sysvinit (on Wheezy) and systemd (on Jessie) although it may be easier to reproduce with systemd. The same problem can be observed by attempting to start tftpd-hpa by hand when no network connections are available. When tftpd-hpa fails to start daemon.log contains: in.tftpd[881]: cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known The error appears to be due to getaddrinfo(3) failing when attempting to look up "0.0.0.0:69". The daemon starts successfully when the network is unavailable if the default /etc/default/tftpd-hpa is changed:
Hi Mike, What are you using to set up your network? If it's ifupdown, then for sysvinit I'm going to guess you're hitting a known bug with allow-hotplug, that bites more than just this service. Most people fix that by using 'auto' instead of allow-hotplug. For systemd it's a bit more complicated, but I believe that should also work for Jessie with it too, at least once the udev settle fix is restored to it (if it hasn't been already). If you're using something else, it would probably be good to get that information on the record, since there may still be more than one bug here which people should give some attention to. Thanks for that. I'm going to need to give the consequences of doing this a bit more thought than I have for it right this moment, but I'll try to do that sometime soon. Cheers, Ron
Hi Mike, What are you using to set up your network? If it's ifupdown, then for sysvinit I'm going to guess you're hitting a known bug with allow-hotplug, that bites more than just this service. Most people fix that by using 'auto' instead of allow-hotplug. For systemd it's a bit more complicated, but I believe that should also work for Jessie with it too, at least once the udev settle fix is restored to it (if it hasn't been already). If you're using something else, it would probably be good to get that information on the record, since there may still be more than one bug here which people should give some attention to. Thanks for that. I'm going to need to give the consequences of doing this a bit more thought than I have for it right this moment, but I'll try to do that sometime soon. Cheers, Ron
Hi Ron, Thanks for quick response. It looks like it's currently NetworkManager on all the machines I've seen this on. I thought I'd seen the problem with ifupdown too but no longer have any evidence to support that. Of course on a laptop it is perfectly normal to not have any working network interfaces at boot time so it seems rather unfair of tftpd-hpa not to start when it is not configured to be bound to a specific interface. Thanks. Mike.
With ifupdown and allow-hotplug it's definitely possibly, since unlike auto, that doesn't wait for the defined network(s) to come up before the rest of the init scripts continue (and we've seen lots of services fail due to that). For NM, I'm less sure of what the mechanics might be, but I know who to talk to about that. Are you really running this on a laptop, or is that just an example? It's also not quite clear to me yet exactly what the desirable default behaviour should be in such a case. Would you really want it to bind to any random wifi hotspot that NM on your laptop might find? That doesn't quite seem ideal either ... Ron
If NM is connecting to WiFi or perhaps with stuff like 802.1x (not that I'm using such things) then the network might not even come up until after login. I am. We use TFTP for booting embedded Linux devices during development. Indeed. But I'm not really making anything super-secret available via TFTP and don't allow writing. Having said that, I don't think that NetworkManager will randomly connect to networks it doesn't know about but perhaps it could be fooled into doing so if the ESSID matches even if the BSSID doesn't. :( But I don't think that the default of :69 is any worse than 0.0.0.0:69 would be though - unless you have a deep distrust of anyone on IPv6. :) Thanks. Mike.
Right, I'm not saying it's necessarily wrong for someone to configure it like this explicitly if they are sure it's ok for their use case, and the INADDR_ANY is almost surely just because this predates support for IPv6 and hadn't been looked at again since. And also surely, at least in part, because it also predates people using this on laptops in potentially hostile environments where network interfaces it might bind to can come and go with the wind ... Which just has me wondering more generally if either of these is still an appropriate *default*, and if not what might be more appropriate. I'm not sure we need to go full tin-foil here, but there seems to be at least a few things here that probably could do with a bit more thought. Another I don't have the answer to right now off the top of my head is, will this even listen on interfaces that come up after the daemon is started without us explicitly using IP_FREEBIND? Having it not fail at startup isn't a lot of help if it still won't actually communicate on those interfaces. That's easy enough to test, but I'm not remembering the answer to this being a definite "yes it will" ... Ron
I think you're probably right. It would make sense to require an explicit action during configuration to force a decision on which interfaces which should be bound to (as exim4-config does.) I'm reasonably sure that binding to INADDR_ANY will accept connections on any interface that appears in the future too. Thanks. Mike.
Hello, since this behaviour is still on Jessie and is rather inconvenient. Whats the recommended workaround nowadays? I want to serve on 2 networks, so setting my ip address wont cut it. Is changing to TFTP_ADDRESS=":69" cause any issues, or a if-up.d script still the best option? Kind Regards, Norbert
I went ahead and patched the init script so you can omit all Variables in the configuration. In my opinion its the cleanest to just dont set the TFTP_ADDRESS variable and the script then doesnt pass the --address option to in.tftpd.--- tftpd-hpa.save 2015-05-21 11:23:28.841023590 +0200 +++ tftpd-hpa 2015-05-21 11:13:41.716011919 +0200 @@ -25,6 +25,7 @@ set -e [ -r "$DEFAULTS" ] && . "$DEFAULTS" +TFTP_DIRECTORY="${TFTP_DIRECTORY:-/srv/tftp}" . /lib/lsb/init-functions @@ -53,7 +54,7 @@ done start-stop-daemon --start --quiet --oknodo --exec $DAEMON -- \ - --listen --user $TFTP_USERNAME $TFTP_ADDRESS \ + --listen ${TFTP_USERNAME:+--user $TFTP_USERNAME} ${TFTP_ADDRESS:+--address $TFTP_ADDRESS} \ $TFTP_OPTIONS $TFTP_DIRECTORY }
Hi Norbert, Can you elaborate a little more on exactly what configuration you have with Jessie that you see this happening on? (ie. what init system, what brings up you network(s) etc.) If there's still a real bug on that side of things, I'd rather we know about it and address it at the source than just sweep it under the rug here, since the latter would just push the bug off onto some other use case for people to run into again. I have this serving on 4 networks with an address set, so that alone isn't a problem. What's the situation in your case that it won't? As we discussed earlier in this bug, if that's what you *want* it is fine, but if there is some other bug that is still causing this problem, that won't help people who do want or need to restrict this to just a subset of the available network addresses - so we probably need to identify the real bug that you've hit so that we can fix it for them too. (which is an independent question from your patch to allow running this with just the tftpd default options for people who choose that too, which seems like it's probably a reasonable idea as well). Cheers, Ron
2015-05-21 16:59 GMT+02:00 Ron <ron@debian.org>: Bog standard Jessie x86_64 Gnome3-Desktop. Systemd and Networkmanager I guess, the point is that installing tftpd-hpa doesnt work out-of-the-box Sure, and I couldnt tell whats the current state from looking at this thread. one is a local 192.168.x.x network at my desk and the other a bigger 10.x.x.x network. Id bet some real money that the primary use for tftp is that its run on some servers, not having a laptop and identifying your home/internal network by the ip address you got assigned. Thats the usecase you are talking about? Just wondering about the reasoning here, I would `ve never thought someone would "secure" his important data on a "trivial" ftp server by expecting you get a different IP than at home in wireless networks (nevertheless someone malevolently giving your laptop the special IP via DHCP) To each his own and its nice its working for you, but I really cant follow the arguments (), but then I dont know everything about the involved network stacks. I will just state the issue more clearly and what I expect. Problem: I want to host a tftp for fetching firmwares via a bootloader. I have two network interface where I want the files accessible. Securing the tftp doesnt matter for me (if, then Id prefer locking to interfaces), I want a painless and easy setup, ideally for providing the company with simple steps to reproduce (apt-get install tftpd-hpa; echo done) State: We are using Wheezy and I added an if-up hook (which I consider a mean hack). Sometime we will change to Jessie, means I tested an untouched installation and wrote down the steps necessary. Issue: tftpd service will fail when booting (as in the service is stopped), supposedly because the network interface(s) arent up. Workarounds: 1) manually restarting the service later will fix this issue. 2) changing TFTP_ADDRESS=":69" will fix the issue. 3) applying my patch and omitting the --address option works too I strongly believe it doesnt matter when you un/replug the wire after tftpd was successfully started but I would have to test this So In short, hope you dont take any offense but to me its clear that TFTP_ADDRESS="0.0.0.0:69" should mean the same as TFTP_ADDRESS=":69" should mean use any IP. Conversely if we ignore common socket rules, and TFTP_ADDRESS="0.0.0.0:69" would mean bind to one fixed IP "0.0.0.0" then the defaults would need to be adjusted. Under this perspective I don`t understand many of the arguments above Kind Regards, Norbert
Well it's been working out of the box for me, and apparently for almost everybody else too - and the problem with systemd was supposed to have been solved by its maintainers before the release - so the details of exactly why you got hit by it were kind of pertinent here ... From a sample space of two, Networkmanager appears to be emerging as the common culprit ... Yes, I mean 4 networks as in separate network interfaces with 4 separate IPs, with the one tftpd instance serving all of them. That part isn't the problem here. I'm running this on a "Real Server", yes - which also means no Gnome desktop and no NM - but the only person who had or reported the same problem as you have was running this on a laptop ... Which surprised me a little too, but appears to be a real and valid use case as it was described. I'm not following what you're trying to say there either, or where you got that line of thinking from, so I can only sympathise if you're confused by it :) Network booting is what I understand most people want this for, yes. It's all I use it for. It's not really the right tool for much else. And that's exactly what the default configuration gives most people, with the debconf prompt letting you set the address(es) it will bind to explicitly if you want something different to that. Which is exactly what the default of INADDR_ANY was originally intended to do. It's a little dated, showing its age from the era before tftpd supported IPv6, but IPv6 support isn't your problem here either. Which appears to be an issue with Networkmanager ... The ifupdown support for both sysvinit and systemd should not have this problem, and lots of things other than this will indeed fail to bind to their listening addresses if those interfaces are not present and configured on the system. Which are all basically just hacks around the problem of having booted your system with a non-functional network. Which appears to be a problem that people using Networkmanager have. I don't know what its proposed solution to that is, but if it doesn't have one, then it's not really suitable for use on servers. "the wire" doesn't have anything to do with this. Networkmanager not configuring your network before services that want to use it start is the problem you appear to be are having. I don't see why you worry that might offend me, but if that's "clear" to you, then you probably need to take a closer look at what 0.0.0.0 aka INADDR_ANY actually means as a listening socket address :) In the world where IPv6 exists, those two are very much not the same. I don't know what you mean by "common socket rules", but 0.0.0.0 has a very specific and well defined meaning. And I still don't know exactly which "arguments" you're referring to here. AFAICS we have an open question about what the best default tftpd configuration should be, partly because it now supports IPv6, and partly because the world is a different place to what it was when this was first picked and perhaps we ought to be a little more defensive by default - but there's already a debconf prompt that lets you pick that for yourself anyway, so this is 'important' but not urgent. And then there is a problem, which is by no means specific to tftpd, which appears to essentially be a Networkmanager design and/or configuration flaw. The only thing that is certain about the first question, is whatever we choose it shouldn't be on the basis of hiding Networkmanager bugs. How you hack around those locally if you insist on using it is up to you, but we can't "fix" those in this software, short of rewriting parts of it to listen to netlink events for dynamic interface creation. ... which probably isn't really a very high priority for Server Use. That's much more of a laptop problem. I'm a bit confused about what you're aiming for here, because you're "betting" this software is for server use, but seem confused about why it doesn't work so well in its current default configuration if you use laptop tools to configure your network. Yes, you can configure a hack around that in some of the software it causes problem with - but that's a different question to what is actually the best default for *this* package to ship with in a hands-off install. I'm pretty sure we don't ever want to omit the --user option and have this run as 'nobody', but not supplying --address might have some use or merit. It's less clear cut if that should be the default though, or if it's functionally different in any significant way to what is already possible ... Cheers, Ron
Hi Ron 2015-05-22 2:08 GMT+02:00 Ron <ron@debian.org>: tftp-hpa is affected. I also use fixed ip/dns via the gnome gui (what exactly the gui affects and how NM is hooked into it is unclear to me). At home I have two systems where I killed NM and used the interfaces config file, exactly because this thing caused me alot trouble already And I am far from the only one with the problem, I found this report from another sites. And Ubuntu has this issue too: https://bugs.launchpad.net/ubuntu/+source/tftp-hpa/+bug/972845 I can did out more references from my browser cache when I`m back at work next week I remove NM in a heartbeat if this wouldnt mean messing up the dependencies and losing gui for applying settings. I still dont know the difference in meaning between TFTP_ADDRESS=":69" and TFTP_ADDRESS="0.0.0.0:69". And too me the best default would be to just use the default tftp port - thats archived in a most consistent manner by omitting the --address argument (IMHO). Whatever it does, it makes NM and tftp-hpa coexist peacefully, I dont know how long it will take till NM has no bugs. This sounds like the network doesnt "exist" before NM, which I believe to be wrong. It should be possibly to use any ip and unix sockets before services are started? Thats not clear to me, from what I get is that listen on a unbound socket will randomly pick an port, binding to INADDR_ANY:port before should do the same on a fixed one. Might be different with ip6, please enlighten me http://man7.org/linux/man-pages/man7/ip.7.html The arguments I red from this is that the TFTP_ADDRESS="0.0.0.0:69" is something desired and it shouldnt surprise anyone if it doesnt work. (I never got the debconf prompt btw) unless you have a laptop and connect to wireless networks, I dont see NM very appealing anyway. But using alternatives is a hassle. I agree, but its also a matter of time, isnt it? Until this is fixed correctly I still need to setup some systems for use. Better defaults that happen to work even with the bugs unfixed would be great, a simple configuration or workaround is ok too. From my experience with wheezy and open-vm-tools, fixed versions in the main repos can take till the next release. Its a server/workstation problem if the same software packages are used. I dont WANT to use latptop tools, NetworkManager is the default in pretty much anything with a GUI. I dont want to use it, but using alternatives comes at the cost of maintaining configurations and (dist-)upgrades not working smoothly. Actually I dont even know which gui-packages Id have to replace so I could set my ip and dns in the interfaces file. I meant with out-of-the-box just using the standard GNOME desktop, which should the best maintained and usable for most people that dont know what NM and tftp-hpa is. ie something that safes me work running around and editing files. I was taking care of the case of all variables or even the defaults file missing (which is handled in the script). You could simply set tftp as username if the variable is missing. Kind Regards, Norbert
It's not only tftp-hpa that would get burned by this. Any network using service that needs to resolve or use an address would fail in exactly the same way if started before your network is up. And lots of them do. This isn't a new problem, you're just lucky that it's the only thing that you are seeing it on. Which is probably a big part of the problem here. The getaddrinfo(3) call is failing because your network is not yet configured, and anything which calls that (which means just about everything with IPv6 support that is even half sane) would fail in the same way that this does if started at the same point in your boot sequence. As would anything that tries to bind to a specific address before the interface it is assigned to is up. You could do a lot worse than do the same on this system too then :) "Upstart in broken boot dependency shocker. film@11" The fix suggested there is more correct though, since it tried to address the broken boot dependency, not hack around it by exploiting an implementation detail of the tftpd configuration which will only "work" for some users. "me too" references aren't really helpful here. The problem isn't a mystery and the solution to it isn't a popularity contest. Unless the broken boot dependency is fixed in NM, this problem will still exist. Changing the default config so that it doesn't effect *you* as a side-effect won't make it go away for other people who need a different config. I've left this bug open here because the question of whether the old default (which has been what it is for much longer than I've been maintaining this package) should be changed is an open one worth some further thought -- but what we change it to if we do doesn't really depend on "will this be broken with NM" because until NM is fixed it will *always* be broken with some/most valid tftpd configurations (and it won't be the only thing that is). The answer to that needs to depend purely on "what is sane/safe for a default tftpd install". We can note there are workarounds with varying degrees of ugly that may work for some people in some circumstances, but we really can't "fix" NM from here - that needs to happen in NM itself. Cheers, Ron
I would separate the tftp use cases then: *) Binding to an ip address. This would raise the question, when should tftp daemon be started and how. The necessary hook would be: Event "TFT_IP is available" -> start tftp daemon. This is as far as I know not part of init systems and the Tftp package is missing the hooks (if-up.d and/or whatever equivalent in NM). Otherwise how would the init system or network manager know when the specific ip that tftp needs is available - what if this cable is not plugged, how long should it wait before all interfaces are available (and possibly configured by DHCP)?. *) Listening on any network. this should`nt be affected by any service like NM and whatever interfaces are available or configured. This should be the default, and so far the default configuration tries to archive that and fails (for whatever reason). This to me is a bug in tftp or the configuration, both related to this package. Id also appreciate if you explain the difference between 0.0.0.0:69 and :69 as address option, and what 0.0.0.0 means for IPv6 - a link to the documentation would suffice. Can you setup a VM with a fresh Jessie GNOME 3 installation, I dont know about the setup you use but as I said I want some easy steps that work on plain systems most people are familiar with. Kind Regards, Norbert 2015-05-22 13:35 GMT+02:00 Ron <ron@debian.org>:
[It seems that I forgot to subscribe to this bug so I didn't see that there had been activity since the original exchange.] In Message #75 2015-05-22 13:35 GMT+02:00 Ron <ron@debian.org> wrote: I don't believe that it's sensible for tftpd to assume that all network interfaces are up before it is started. We no longer live in a world of fixed network configurations. I believe that leaves daemons such as tftpd with three choices: 1. Bind to INADDR_ANY and accept connections on any network interfaces as they appear. Rely on firewalling to avoid unwanted connections. 2. Bind to network devices rather than addresses using SO_BINDTODEVICE. This isn't ideal since network devices may have multiple addresses. 3. Monitor network interfaces via netlink and bind to them as they appear. I believe that all but the first of these requires non-trivial development work. The current tftpd-hpa package defaults to being available on all interfaces via an IPv4 address. In Message #25, Ron rightly questioned whether this is still a sensible default. But, as I said in Message #30, I don't believe that changing the default to TFTP_ADDRESS=":69" makes the situation any worse, and it does mean that tftpd does work correctly when no network interface is available at startup. Maybe if the default was changed then it could be turned into a debconf question? Mike.
Sure, but the majority of devices in that world are now phones, and
devices which won't typically host network services. And they'd stop
being very useful if the servers they relied on started roaming around
the network at random like an end-user device does.
If you have servers doing that, you have bigger problems to fix than
this. I don't think we can apply "50 million blowflies can't be wrong"
logic to this, we need to look at what is most appropriate for a server
if we're talking about what the default configuration should be.
You can't 'fix' that by just changing tftpd, there are lots of server
applications which assume or require this as part of configuring them
securely. Breaking that assumption has larger consequences than just
needing to implement netlink monitoring in them all.
Or leaves applications like NM with two choices to support people
running services:
1. Give them an option to block waiting on interfaces which are
expected to be up to be up, before starting services that
will be offered on them.
2. Provide hooks to (re)start or stop the services that are bound
to particular interfaces when those interfaces come or go.
And realistically, I think it must provide both those things to be
considered a functional application, regardless of what other
services do.
Don't get me wrong, I've been a huge fan of all things hotplug since
long before NM even existed, and support for that in the kernel
became widespread - but it is a Hard problem to solve well, and you
can't solve it by kicking the can down the road and saying "the rest
of you will all need figure out the problems this creates and then
rewrite your code".
See https://bugs.debian.org/816087#15
for another example of how "just let everything race" can go badly
wrong if you turn off the traffic lights at a major intersection
without paying enough attention to the consequences of that.
What if it already was a debconf question ;?
The default is only used for people who don't answer that themselves
with something different.
What the default should be is mostly a balancing act between what is
sane for a relatively naive user who doesn't know what they should
answer there, and what would be right out of the box for most people
without 'special needs'. Whether it should be changed now, also adds
the question of line of least surprise for existing users, so there
is some inertia and risk there which we shouldn't ignore if changing
it now is not the clearly compelling thing to do.
I'm inclined to think that running this on a laptop is a special case.
And that changing it "because otherwise NM breaks" would be hiding a
bug in NM rather than fixing it at the real cause.
But I'm not ruling out that there might be other compelling reasons
to still change the default for this at some point. Whether that
should be to :69, or to something else, is still an open question.
Ron
Oh, it is. :) I'd never noticed since it is low priority so I'd never been asked it. :( The full text of the question is: If the default were changed to :69 then the observable change to users who accept that default would be that IPv6 would start working when previously it hadn't. I suspect that Debian is full of services that started working on IPv6 upon package upgrade (mostly in the now distant past.) I can't help wondering whether there might be more users now installing tftpd-hpa on their desktop or laptop in order to backup their router configuration or boot some embedded device, than on a server to boot a room full of diskless workstations. Maybe I'm guilty of being skewed by my own experience. You won't be surprised to hear that I think that :69 makes a more sensible default. I believe that this is only slightly influenced by that default also ensuring that tftpd-hpa starts even when the network interface isn't yet up. Thanks. Mike.
Unfortunately, I appear to have generated the patch backwards which is somewhat confusing. I hope that this one is correct. Mike.
Hello,
FTR: I'm annoyed by the behaviour of tftp here, too.
I acknowledge Norbert's expectations for the ipv4 world. So I did the
following with python to test my expectations:
>>> import socket
>>> fd4 = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0)
>>> fd4.bind(("0.0.0.0", 6969))
This works fine with and without connections in Networkmanager. The
thing that tftpd stumbles over however is (tftpd/tftpd.c, line 640):
>>> import socket
>>> socket.getaddrinfo("0.0.0.0", None, socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP, socket.AI_CANONNAME | socket.AI_ADDRCONFIG)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/socket.py", line 733, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
(in the case where no connection exists) vs
>>> import socket
>>> socket.getaddrinfo("0.0.0.0", None, socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP, socket.AI_CANONNAME | socket.AI_ADDRCONFIG)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '0.0.0.0', ('0.0.0.0', 0))]
(when there is a connection).
The problem here is (quoting getaddrinfo(3)):
If hints.ai_flags includes the AI_ADDRCONFIG flag, then IPv4 addresses
are returned in the list pointed to by res only if the local system has
at least one IPv4 address configured, and IPv6 addresses are returned
only if the local system has at least one IPv6 address configured. The
loopback address is not considered for this case as valid as a config‐
ured address. This flag is useful on, for example, IPv4-only systems,
to ensure that getaddrinfo() does not return IPv6 socket addresses that
would always fail in connect(2) or bind(2).
When dropping socket.AI_ADDRCONFIG the result is
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '0.0.0.0', ('0.0.0.0', 0))]
independent of a connection being available or not.
I think dropping AI_ADDRCONFIG (from common/tftpsubs.c:311) should work
for tftpd. However I'm not 100% sure this is the right thing in all
corner cases.
Having said that the right thing would be to not try to determine if an
argument to -a is an ipv4 or ipv6 address but use:
getaddrinfo(address, port, socket.AF_UNSPEC, socket.SOCK_DGRAM, socket.IPPROTO_UDP, socket.AI_PASSIVE)
with port defaulting to "tftp" and then bind to all addresses returned
by that. Extra points for supporting more than one (address, port)-pair
on the command line.
Independent of this changing the default TFTP_ADDRESS to ":69" to get
ipv6 connectivity would be nice. Or maybe still better to ":tftp".
Best regards
Uwe
Hello, After some discussion in #debian-devel and #nm we found out that there is a related network-manager issue. That is, NetworkManager-wait-online.service switches to active state too early. I reported this upstream at https://bugzilla.gnome.org/show_bug.cgi?id=777831 . So actually there are two problems here: - tftpd is started before the machine is online (-> NetworkManager problem); and - tftpd doesn't handle it nicely when being told to bind to 0.0.0.0 before any interface has an ipv4 address (tftpd problem). . One could argue that the second isn't an issue on a well-configured server, but on the other hand there is no reason to not try to handle the dynamic case a tad better. Best regards Uwe
AI_CANONNAME is only relevant when the resulting official name is used,
which is not the case in tftpd for the address to bind to. Also
AI_ADDRCONFIG isn't helpful. This flag is good for sockets used to
connect(2) somewhere. But for listening sockets it makes tftpd fail to
start when -a 0.0.0.0:69 is passed and no network device is up yet.
This addresses Debian bug https://bugs.debian.org/771441
---
common/tftpsubs.c | 4 ++--
common/tftpsubs.h | 2 +-
tftp/main.c | 9 ++++++---
tftpd/tftpd.c | 6 ++++--
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/common/tftpsubs.c b/common/tftpsubs.c
index 8c999f66eed8..344c74b3d78c 100644
--- a/common/tftpsubs.c
+++ b/common/tftpsubs.c
@@ -300,7 +300,7 @@ int pick_port_bind(int sockfd, union sock_addr *myaddr,
}
int
-set_sock_addr(char *host,union sock_addr *s, char **name)
+set_sock_addr(char *host, union sock_addr *s, char **name, int ai_flags)
{
struct addrinfo *addrResult;
struct addrinfo hints;
@@ -308,7 +308,7 @@ set_sock_addr(char *host,union sock_addr *s, char **name)
memset(&hints, 0, sizeof(hints));
hints.ai_family = s->sa.sa_family;
- hints.ai_flags = AI_CANONNAME | AI_ADDRCONFIG;
+ hints.ai_flags = ai_flags;
hints.ai_socktype = SOCK_DGRAM;
hints.ai_protocol = IPPROTO_UDP;
err = getaddrinfo(strip_address(host), NULL, &hints, &addrResult);
diff --git a/common/tftpsubs.h b/common/tftpsubs.h
index b3a3bf3c95e1..0edda03a514c 100644
--- a/common/tftpsubs.h
+++ b/common/tftpsubs.h
@@ -98,7 +98,7 @@ static inline int sa_set_port(union sock_addr *s, u_short port)
return 0;
}
Up to this point, this patch doesn't actually change the existing operation in any way. But in what follows ... The use of AI_PASSIVE here is a placebo. That flag has no effect unless address was NULL, and if that was true, neither of the hunks here would actually be executed in the first place. Using AI_CANONNAME here should be harmless at worst. So the only actual change is to drop AI_ADDRCONFIG - the flag which limits getaddrinfo to returning only the address families that are actually supported by the configured interfaces on the system. And ordinarily that would seem to be a fairly uncontroversially Good Thing to do, for both connecting and listening sockets. So unless upstream sees this differently, I still think we'd need to see some stronger rationale for why that isn't a Good Thing in this particular case than just "Dropping that flag hides a real bug in NetworkManager". Because it could hide or introduce real problems in other cases too, and if the bug in NM is fixed, then the only reason I'm so far aware of for you proposing this patch (based on the discussion on #d-d) also goes away too ... Assuming that at some point the NM bug will be fixed, why would we still want to make this change in this code? Cheers, Ron
Hello Ron,
Right. This coult be accomplished with a less intrusive patch that
assumes AI_CANONNAME | AI_ADDRCONFIG if name (i.e. the 3rd argument) is
non-NULL. YMMV.
Right. Today it only has an effect if the first argument to getaddrinfo
is NULL. The intension (IIUC) is that it should be used when you plan to
feed the result to bind (opposed to connect).
The downside of using AI_ADDRCONFIG is that it makes binding to 0.0.0.0
(or ::) fail when no interface is up yet.
If we can agree in principle I can rework the patch to make one change
per patch:
- drop AI_ADDRCONFIG for tftpd use
- (maybe) introduce AI_PASSIVE for tftpd
- (maybe) drop AI_CANONNAME for tftpd
This is not the (only) reason for me. This is mostly only how it showed
up for some people, but still there are more IMHO good reasons to fix
it:
- inconsistent behaviour when no interface is up: -a 0.0.0.0 fails,
-a :: fails, not passing -a doesn't fail and makes tftpd bind to all
interfaces.
- The "no interface is up" also happens with ifupdown with no auto
interface is used (only hotplug)
- The "no interface is up" also happens if your laptop has no network
connection during boot
- It's more robust to try what the admin requested. It is possible even
if no interface is up to bind to 0.0.0.0. So I suggest to do that and
not try to know it better than the admin.
- The error message
cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known
is misleading.
See above. Which problems do you see introduced by my patch?
IMHO "we don't do it right because it might paper over other problems"
is a poor reason for not patching. ("I don't need seatbelt or a helmet
because if my head gets hurt there is a different problem.")
(though not fixing) patch was suggested by them.
Best regards
Uwe
Indeed. As I wrote in message #95, the debconf question for TFTP_ADDRESS even implies that the current default value will support IPv6, when it does not. If Ron will accept it, then I can update the patch in Message #100 to say ":tftp" rather than ":69". Is there any chance we can get this into Stretch? Mike.
What do you mean by "Today"? Both SuSv4 and the Linux man page are unequivocal about the _only_ use of that flag being to special case a NULL address (meaning 'this machine') to return either the wildcard address or the LOOPBACK address. That you'd use the wildcard address to bind a service to all addresses of 'this machine', or the loopback address to connect to a service on 'this machine' is illustrative. There's no deeper distinction or fundamental difference related to what functions you might later pass the address(es) you obtain to. That seems to be where we disagree. I don't see it as a 'downside' that if you explicitly say "I want to bind to IPv4 addresses" (or IPv6), and you don't actually have any, that this should fail early and loudly to warn you about either misconfiguration, or some other more serious failure, occurring. If you passed a name instead of a numeric address, you'd only get the address families the machine actually supported. If you pass a numeric address in a particular family, you get a sanity check that it's valid. If you (personally) don't care about that, just don't pass an explicit address. The downsides of not using AI_ADDRCONFIG can't be remedied so easily. I'm far less concerned about the format of the patch, than the details of what it's actually hoping to achieve. If all of the reasons for doing this are just different ways of saying "if we do less sanity checking, then misconfiguration and broken tools won't annoy me as often" - that's not very compelling. Doubly not so when there already is a way you can configure this for your own use which does already bypass them. Simply disabling that for everyone and every configuration isn't a good answer. I don't see how that is 'inconsistent'? If you ask for an explicit address (family) you get a sanity check that (support for it) is available. If you say you don't care, then tftpd doesn't either. Not defaulting to auto (and systemd not respecting it for a while) were both bugs that broke lots of services on lots of people's machine. And I already explained to you in #d-d that the established 'solution' for that, for services which don't monitor netlink events, is to add a hook which restarts them when interfaces appear or disappear, if you really want them to bind to interfaces which might genuinely be expected to be hotplugged. Because if you're not actually using wildcard addresses, then this will _still_ fail, even with your patch, if that interface isn't already up. This doesn't need a link up state to work, it just needs a local interface to be brought up with the needed address (family) assigned to it. If your only address is assigned by DHCP or something similar, and you aren't blocking waiting on that - then as above lots of services are going to fail to start if your boot sequence blindly tries to start them anyway. That isn't the fault of, or a bug in, those services. Your system is just misconfigured for what you actually want it to do. See above. Your patch is *taking away* the ability of the admin to have the choice to specify exactly what behaviour they want ... The behaviour that you want is already supported. You just need to explicitly "request" that, rather than redefine the historical default to now mean that as well, taking away any other option that some other admin might want to request. Well, you can file that bug against glibc :) We're just reporting what getaddrinfo and gai_strerror returned ... to any address, without any checking of what address (families) are available, even if the user explicitly specifies them - and that silently ignoring any error with being able to do that is a Good Thing. If you're running a toy service on your laptop, that might be ok. If you're really using your laptop as a 'portable' server for some special use case, you probably don't want it binding to random wifi hotspots wherever you may go with it. If you're running a Real Server, then silently ignoring real problems is pretty much the opposite of what you'd want happening in most cases. The lessons of: https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00 aren't entirely inapplicable here. I don't think defining "do it right" as "I'd rather you disable checking in the software for everyone than configure it locally to do what I want" is very helpful here. Especially not if the only reason you want that is because NM has a bug, or you don't want to configure it to properly support using this service with hotplugged interfaces. By your analogy, what you're saying with this patch is "I want you to remove the seatbelt completely, even though I already have the option to choose not to wear it myself". Cool, though it would be nice if there was a (possibly RC?) debian bug also tracking that. I think if you want to change the behaviour of this program, you're also going to have to get your proposed patch past upstream before I apply it. When I first adopted this package, the first thing I did was get all of the patches we were carrying upstreamed, and I'm not keen to diverge from them again over something like this. Since they hadn't chimed in on it yet I've given you some feedback on why _I_ don't think this is the right solution - but ultimately it is them you'll need to convince otherwise, not me. I am grateful for you digging into this, confirming that the core of what is really making people most unhappy here is an NM bug, and reporting that bug to them. And as I said in #d-d, I will take a patch to include an NM hook if you really do want support for NM managed interfaces that really are expected to be hotplugged some time long after boot. If people really want support for late hotplug in ifupdown we can add that too, but so far nobody has ever reported that as being a problem for them ("use auto or expect pain is fairly well known these days). I do still consider the question of what the default config should be an open one - it's something I inherited from the previous maintainer too - but I'll follow up on that separately. It's bad enough that we've already got multiple issues conflated together in this one bug, so it would be nice to at least still keep them to separate email (sub)threads. Cheers, Ron
That's most probably just an oversight from between when that prompt was first written and when IPv6 support was actually added. But that predates my involvement here, so I can't say for sure. That said, it also doesn't seem entirely unreasonable for anyone configuring a service like this to know that 0.0.0.0 is an IPv4 address ... which might be related to how it got overlooked ... It's ok, I don't need a patch to change the default. The real question for this bug (as I think I've said a few times now), is *what* it should be changed to if we change it. You've been unambiguous about your preference being that the default should match your preferred use case - but given that we've now got people saying they are running this on laptops, I think there's also a strong case to be made that the default should actually be *more* restrictive than it currently is. Historically, TFTP was only ever used on trusted LAN ports, to provide boot and configuration files for bare and dumb devices. So binding to all interfaces and assuming they are trusted wasn't an unreasonable default. But given that these days, those files can increasingly contain sensitive data, like plaintext admin passwords for dumb embedded devices - and that there is no other access control aside from what ports you bind this to and how that machine is firewalled - it does seem irresponsible to open that by default, for naive users who might carry their laptop around and use it on random untrusted networks. Real admins with real servers are going to know how to preseed this to use their own preference, or are going to be using other tools to maintain their system configuration anyway. So maybe we should err on the side of 'forcing' naive users to explicitly make it more permissive if that's what they really want, rather than just opening it to everyone before they've even had a chance to read the man page. Given that it's increasingly clear that there isn't actually a 'bug' in this software, just the minor question of whether the default configuration is still appropriate for expected use(r)s in 2017, it doesn't seem all that likely that the release team would want to accept such a change now even if I was convinced we certainly knew the definitively right answer and pushed it. If you want to fix the symptom for Stretch, you'd be better off filing an RC bug against NM for the issue affecting it. If you really want :69 as your local config for other reasons, you can already do that today. Right now, I'm basically seeing 3 options for how to 'close' this issue here now: - Make the default more restrictive, raise the priority of the debconf question so more people actually see it, and include some explanation of why it's restrictive, and what you might want to change it to for particular use cases. - Leave the default as is, but tweak the prompt text to be a bit clearer (and maybe still raise the priority). - Make the default completely permissive as you're suggesting and just let anyone who gets burned by that learn their mistake The Hard Way. And if I had to rank them by the amount of (potentially justified) vitriol that the hate mail I'll get from people who don't like the new default because it somehow inconvenienced them will contain ... ... then the first one starts looking like a pretty attractive option ... and I'm not really sure what arguments to the contrary might change that. I'm willing to listen to any that we haven't already heard (I haven't forgotten them, there's no need to repeat them), and I'm far from being completely convinced that's a Great Answer. But it might really be the Least Worst one for today, all things considered. Cheers, Ron
I had in mind that at some point in the future (say with ipv8 or
802.11t-2042) the flag might mean more. I'd say the intension is to use
AI_PASSIVE if you plan to listen on this address, so it seemed right to
use it. But I'm willing to restrict the discussion to the removing of
AI_ADDRCONFIG.
If I want to bind to 0.0.0.0 and no interface (but lo which might be
good enough for me) is up this might be intensionally. This way I might
be able to speed up system boot because I don't have to wait until all
interfaces are up before I start the daemon. Of course this doesn't help
if I configure tftp to listen on an explicit address, but that's a
different problem that's out of scope of my patch.
IMHO AI_ADDRCONFIG is at best an optimisation for programs that use a
socket to connect(2). It's not that sensible for sockets to listen.
Consider I do:
tftpd -a tftpd.mycompany.com:tftpd
and tftpd.mycompany.com resolves to both an ipv4 and an ipv6 address. If
the server has an ipv4 problem it just starts to listen on the ipv6
address with AI_ADDRCONFIG. Does this sound wrong only for me?
I tried hard to show good reasons that this is not the motivation for
this change. I don't say "drop all sanity checking", I'm only saying
"don't refuse to work as good as you can".
If I don't pass -a that's not "I don't care" but "bind to 0.0.0.0 and
::". To make it more explicit:
- If the admin requests 0.0.0.0 this is denied because there is no ipv4
address on any interface.
- If the admin requests :: this is denied because there is no ipv6
address on any interface.
- If the admin requests :: and 0.0.0.0 he gets what he wants even if
all interfaces have neither an ipv4 nor an ipv6 address.
Still there are valid use cases of using hotplug. I don't see why tftpd
shouldn't try to cooperate in these cases without restricting users that
have different setups.
I want to discuss: "tftpd fails to bind to 0.0.0.0 in some situations
even though it could do as requested."
Also listening on 0.0.0.0 includes listening on lo. For listening
sockets it's ridiculous to special case lo. IMHO it's even wrong that in
the case where there is no address on any interface but lo
>>> import socket
>>> socket.getaddrinfo("127.0.0.1", "tftp", socket.AF_INET, flags=socket.AI_ADDRCONFIG)
fails. So IMHO AI_ADDRCONFIG is just band aid that might be used for
clients(!) that fail to use all return values from getaddrinfo.
Right, this is entirely ok and everything else would be wrong. (I didn't
test, but I think also in this case the error message is improved from
"Cannot resolve 8.8.8.8" to "Cannot bind to 8.8.8.8" which IMHO makes
more sense.)
This is not usual for the common laptop. If it's booted in a train, then
suspend and resumed in the office tftpd isn't running.
On my machine tftp is the only service having this problem. Pointing to
others that don't behave cooperative isn't an excuse to not cooperate.
For me it's the other way round. If I request an application to bind to
0.0.0.0 it should try to do this and not be smart with me. Even if the
application thinks I did something wrong, the application should only
complain if I request something impossible.
I fail to follow. Can you please remember me, what the failure is I
introduce?
I'm not talking about the default configuration. I just want to make
tftpd able to bind to 0.0.0.0 when requested to do so.
getaddrinfo and gai_strerror are fine. ftpd uses them wrongly and so the
error doesn't fit to what tftpd should actually do.
No, you're talking about that other problem ("What should be the default
binding address of tftpd?").
I'm talking about "tftpd fails to bind to 0.0.0.0 in some situations even
though it could do as requested."
Great, so we can agree on a use case where my patch makes sense. Great.
For me this is good enough to apply the patch given there are no
disadvantages to other use cases.
Right. In this case I have to make something different. I can do so even
if tftpd behaves fine when requested to bind to 0.0.0.0. Binding to
0.0.0.0 might not be part of the solution for the portable server
scenario, but being able to do so doesn't restrict me here. Great!
What is the problem here? I have a Real Server and tftpd is supposed to
be started on 0.0.0.0. Currently it fails to do so if the networking
setup is broken. So after repairing the network configuration I have to
restart tftpd. This is cheap and ok. With my suggested patch I have to
repair the network only and after that tftpd starts serving requests as
its configured to do. That's even better, isn't it? Even if not, it
doesn't make the situation worse here.
So lets agree that there are some situations where the patch is good,
and in all other situations it doesn't hurt. That's a good enough
justification to apply the patch if you ask me.
I admit I didn't read that completely. The abstract says that the
statement "Be liberal in what you accept, and conservative in what you
send" might have negative consequences to long term maintenance. This
might be true if you expand your application to handle all sort of
broken requests. I don't see this fit my patch, as it is not about
better guessing what the admin requested if he articulates something
incomprehensial. It's just about doing what the admin clearly requested.
I think you're talking about the default configuration thing again. If
not I cannot follow.
That's wrong. You currently don't have the option to not wear the
seatbelt because -a 0.0.0.0 fails.
Agreed. That's why I posted the patch to the upstream mailing list.
Assuming we're still not in agreement about this patch, it would be
great to get a third opinion.
Best regards
Uwe
... That would seem to be a pretty good summation of how we're failing to converge here ... Brainstorming imaginary problems to fit your proposed solution, especially when you don't clearly say exactly what *your real* use case was here, doesn't make that solution more compelling or less ill-advised. If your real problem (aside from the NM bug which is now being tracked here: https://bugs.debian.org/854078) is that you don't want to bind to a specific address, just anything that appears at any time, then there is no bug effecting you, you can already configure tftpd that way. If it's instead that you do want to configure it to only use a subset of the available addresses, and some of those addresses might genuinely be hotplugged, long after boot, independent of the NM bug - then there's a well known, long established, solution to that too. Which I've mentioned several times now. Whatever brings those interfaces up needs a hook to restart the services that you want bound to them, for each of the services that doesn't do that itself by monitoring netlink events. It's the "one simple trick" that solves all the problems you've expounded here, also solves the problem you admit that your patch doesn't address, and doesn't have the unfortunate side effects your patch does. If: Then why would you insist this crazy patch, which just crudely kludges over a limited subset of the issues with hotplugged interfaces, leaving others still broken - is better than one well-trodden solution which fits all cases? I don't use NM, so if someone who does wants us to add such a hook for it to this package, they'll need to send a tested patch to do that. I'll happily include it (and then close this clone of the bug) if they do. Bonus points if you also send one for ifupdown, but so far nobody has reported wanting to use this with it and genuinely hotplugged interfaces. Please, let's focus on good solutions to the real problems rather than straining to find good problems to fit a partial kludge. This bug log is already a maze of twisty little misconceptions - and the aim is to dig our way *out* of that, not to find new ratholes to get lost in. What doesn't work if the bug in NM is fixed, and it has a hook to notify this service of real dynamic interface changes when they occur? Ron
Hello Ron,
I mixed too many things that IMHO improve the code but actually only
care about one of those. So I suggest we restart the discussion with
focusing on that one thing only. Let me try that:
Currently tftpd when requested to bind to an address X does in pseudo
code and simplified:
if X looks like an ipv6 address:
family = AF_INET6
elif X looks like an ipv4 address:
family = AF_INET
else:
family = AF_UNSPEC
addrinfo = getaddrinfo(X, NULL, { .ai_family = family, .ai_flags = AI_CANONNAME | AI_ADDRCONFIG })
bind(fd, addrinfo)
(where bind() works on both AF_INET and AF_INET6 if getaddrinfo returns
both).
This does the right thing most of the time. There are cases however
where the behaviour is wrong or at least undesirable:
a) if X = 0.0.0.0 and no interface (but lo) has an ipv4 address,
getaddrinfo returns an error and tftpd fails to start with
cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known
b) if X is an hostname that resolves to an ipv4 and an ipv6 address and
the machine currently has no interface (but lo) with an ipv6
address, tftpd only binds to the ipv4 address.
In case a) my expectation is that tftpd binds to 0.0.0.0 anyhow. I don't
think it is necessary to show a scenario where this is sensible because
I expect a command to do what was requested unless that's impossible.
Nevertheless there are situations where this might make sense:
a1) On a mobile machine without network access during boot. tftpd might
later be used when an interface is up (e.g. it is plugged into a
network later or a virtual machine is booted once it is needed.)
a2) On a machine where you want to boot quickly and so drop unneeded
prerequisites. So you can start tftpd in parallel to bringing the
network up without the need to serialize these two.
Additionally to the refusal to start binding on 0.0.0.0 the error
message is not understandable to me.
In case b) my expectation is that tftpd fails with something like:
cannot bind to IPv6 address $ipv6_address
a) can be fixed by just dropping AI_ADDRCONFIG from the call to
getaddrinfo. Also for b) this improves the situation, from
cannot resolve local IPv6 bind address: $X (...); using IPv4 only
to
cannot bind to local IPv6 socket,IPv6 disabled: ...
So in this case the error message at least matches the actual problem.
This convinces me it's the right thing to drop AI_ADDRCONFIG for tftpd
as AFAICT there is no down side.
Best regards
Uwe
Just repeating the same things, while ignoring the options I've shown you that do properly fix the problem(s) you're claiming to care about, isn't actually advancing this toward a workable solution in any way. My previous replies to you were already focussed on the part of your patch that removed AI_ADDRCONFIG, and why it was not needed at best, and harmful at worst. I can read the actual code, and understand how gai works, and I'm pretty sure Mike understood all of that too when he first reported this bug. I'd already long ago checked that there wasn't a real bug being triggered somewhere here, and that the code itself really was working as expected, and you haven't indicated anything to the contrary here. In the subset of cases where gai is used to resolve a string into a (set of) network address(es), it is not wrong to tell it that it's useless to return any address (family) which can't possibly work with the current machine configuration/state. That doesn't become less wrong if your expectation or interpretation of how it should work is different to reality and the specification of how it is defined to work. If you explicitly say "bind to 0.0.0.0", you're saying you want to service global IPv4 requests. If you have no global IPv4 interfaces, that should fail and warn the admin of a problem, not silently ignore that what they explicitly requested isn't going to work. If what you meant to request was "bind to whatever *is* there, I don't care what", then just don't pass an explicit address. If you're allergic to that possibly also binding to IPv6 addresses, then pass the -4 flag too. We don't need to disable the sanity check for users who do configure an explicit address for you to get the behaviour you say you want from the current code, without any change to it at all. If what you care about is "faster boot", then the answer to that isn't "speculatively start things that will fail (or just be useless) if they lose a race", it's to actually not waste time and resources doing that until their known prerequisites have been satisfied. Where it is expected that the available interfaces and/or configured addresses might change dynamically long after boot, then whatever you have doing that needs to notify, or reload, or (re)start, (or stop), anything you are or want to be (not) running, based on that new state of the system. If you want that, propose a patch to add a hook. Otherwise, what you say you want is already possible without needing the kludge you did send. The only bit I'm still struggling to understand, is why you are still pushing this patch hard instead of using what is already available that has exactly the behaviour you say you desire, or looking at a more complete and working solution for dynamic interfaces in general. The options here and the actions of the code don't look very complicated to me, so you don't need to "simplify" it on my behalf. I'm just not seeing you show any new problem that isn't contrived and that doesn't already have a good and/or already working solution which doesn't depend on needing this patch. What problem isn't satisfied by the options I've shown above and earlier? Ron
Hello Ron, Note I repeated less than before. I hope this will simplify the discussion and stop both of us arguing about stuff that doesn't matter much. Yes, you told me in the situations you care about the modification doesn't help you. I seem to care about different situations where the patch is beneficial. So if the patch doesn't make things worse for you (and all others out there) and it improves the situation for me, IMHO we should apply the patch. I didn't understand yet, in which situations it is harmful as you claim above. The best claim matching this is: It papers over other problems. Is it that why you want to keep AI_ADDRCONFIG? I don't understand though what this buys for you. Consider you have a network problem on the machine tftpd is supposed to run at. The result is that eth0 doesn't have an ipv4 address. You notice that because a tftp-client tries to contact the tftpd server and doesn't get an answer. So what to do next? I assume your next step is logging into the tftpd-machine and check if tftpd is running. (If instead you try ping $tftpdmachine, or check the network config of the tftpd-machine, it doesn't actually help you that tftpd isn't running as it is already obvious then that the network configuration is at fault and not tftpd.) You see it doesn't run, check the log and see cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known . Is it obvious now what the problem is? Maybe yes if the network is still broken. But if not, it's harder to understand. If instead tftpd would have started successfully, you see this after login and the IMHO obvious next step is to check the network config. If you ask me, that's not any harder to debug. I'm not sure about Mike, but that doesn't matter here. I did. I showed two examples where the use of AI_ADDRCONFIG breaks more than necessary or expected. Using AI_ADDRCONFIG suppresses options that actually work. After all I can bind to 0.0.0.0 even if no interface but lo has an ipv4 address. After that I can connect to 127.0.0.1:69 and talk to tftpd. Or I can hotplug a network interface, configure that and talk to tftpd from a remote machine. I understood that none of these is a scenario you depend on. But with your refusal to see this patch as useful you assume that the above is not a use case for anybody. The semantic of binding to 0.0.0.0 is not only "serve on global ipv4 addresses that currently exist". The semantic also includes: "serve on lo" and "serve on interfaces that come up after the socket is bound". The admin has a problem if a server that is supposed to serve files via tftp to the world doesn't have a ipv4 address. That alone is a problem big enough to notice. It doesn't help the admin that tftpd isn't running or there is a cryptic error message in the log. If the network interface is ready only later in the boot process everything works as expected as soon as the interface is up. You say it's bad in this situation that the admin doesn't notice there is a problem. I wonder if it's not a valid approach to claim that there isn't a real problem. After all the world can access the files now. Do you consider it obvious that tftpd -a 0.0.0.0 is semantically different to tftpd -4 ? I don't. Sure, I can change my setup accordingly now that I know. But this feels much more like a kluge than making the two commands above behave identically. And sure, we can even document this behaviour and claim it to be a feature. But that's not intuitive and I bet people will stumble about this in the future and wonder. I failed to understand this. If someone uses: tftpd -a 1.2.3.4 my patch doesn't make tftpd magically work. The difference is that tftpd in the "no ipv4 address available" case with my patch says: failed to bind to 1.2.3.4 instead of cannot resolve local IPv4 bind address: 1.2.3.4 without the patch. I say this is an improvement. Maybe I didn't understand you correctly here. tftpd when started with -a 0.0.0.0 with eth0 coming up later isn't failing or useless. It does exactly what it is supposed to do when AI_ADDRCONFIG is dropped. That is, tftpd binds to 0.0.0.0. If you enlarge the problem I want to solve, my patch isn't suitable any more. That is applicable to all improvements and so it's not sensible to use it against an improvement. So it also wasn't helpful to retitle the Debian bug to 'please add a restart hook for hotplugged interfaces' from 'tftpd-hpa fails to start properly if network is unavailable'. If handling hotplugging is your wish, I suggest opening a separate bug for that without highjacking this one. Making a software behave better in some situations without making it worse in any other is a good enough reason to consider the change. That is still true even if after patch application there are still some situations that could be improved. I do this because tftpd doesn't behave as I expect. I struggled because I didn't understood what the error message want to tell me. And I assume that others struggle in the same way in this situation. And then there is that idea of open source, where you can actually improve the stuff you work with ... So my suggested patch might eventually save others spending time understanding tftpd-hpa as it doesn't fail for them with strange error messages. It makes the behaviour of tftpd match what I expect. And I believe others expected (or will expect) the same. I think it doesn't make sense to continue discussing between the two of us. If this mail isn't enough for you to see my point I suggest we try to find a few other people (ideally hpa among them) to give their opinion. I added the people that commented the Debian bug so far explicitly to Cc:. Maybe you can speak up. (If you're missing context, reading https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771441#161 should be enough.) Best regards Uwe
Okay, let me chime in here. AI_ADDRCONFIG seems to be the Wrong Thing[TM]. AI_PASSIVE seems to be the Right Thing[TM]. Part of the problem is that the fallback code for the case of getaddrinfo() not being there is braindead, and of course the original code used to use gethostbyname() directly. I already have a much better fallback version of getaddrinfo() written which would let us make much better use of the getaddrinfo() interface, Now, what I want to know is why you are specifying the accept-all address explicitly as 0.0.0.0 instead of an empty string. -hpa
Hello hpa, That's great, thanks. And you even seem to agree with me, that's still greater :-) Do you still care about platforms without getaddrinfo? This is even in POSIX.1-2001. The really right thing to do would be not use a single socket for ipv4 and another for ipv6, but just iterate over the result of getaddrinfo and open a socket for each addrinfo. But let's not do more than one thing at a time. That's because that's the default of the Debian tftpd-hpa package. If you repeat your question about the Debian default, there (I think) the answer is: it's a relict that predates ipv6 support. OK, probably already back then '' would have worked. I can only guess about the reasons, maybe it conflicted with the maintainer scripts that ask for the default bind address during installation. Best regards Uwe
Ich beabsichtige, Ihnen einen Teil meines Vermögens als freiwillige finanzielle Spende an Sie zu geben. Reagieren Sie, um teilzunehmen. Wang Jianlin Wanda Gruppe