#772523 preseeding get_domain using DHCP is broken

#772523#5
Date:
2014-12-08 05:01:19 UTC
From:
To:
Hi,

I'm trying to preseed d-i to run a fully automated installation, using
jessie beta2. I'm trying to use a DHCP config, with the hostname being
set from reverse DNS and the domain set by the DHCP server using the
domain-name DHCP option (isc-dhcp's "option domain-name").

What currently happens is that I'm getting a priority=high input prompt
for the domain name, prefilled with the DHCP domain name. FWIW, I've
verified that the udhcpc lease file contains the domain name and that a
manual invocation of udhcpc results in /tmp/domain_name being prefilled
with my domain.

The log output is:
netcfg[342]: DEBUG: State is now 1
netcfg[342]: DEBUG: State is now 2
netcfg[342]: DEBUG: State is now 5
debconf: --> GET netcfg/hostname
netcfg[342]: DEBUG: Using DNS to try and obtain default hostname
netcfg[342]: DEBUG: Getting default hostname from rDNS lookup of
autoconfigured address 10.64.16.201
netcfg[342]: DEBUG: Hostname found: d-i-test.eqiad.wmnet
netcfg[342]: DEBUG: d-i-test.eqiad.wmnet is a valid FQDN
netcfg[342]: DEBUG: We have a real FQDN
debconf: --> SET netcfg/get_hostname d-i-eqiad
netcfg[342]: DEBUG: Preseeding domain from global: eqiad.wmnet
debconf: --> SET netcfg/get_domain eqiad.wmnet
debconf: --> INPUT high netcfg/get_hostname
debconf: --> GET netcfg/get_hostname
netcfg[342]: DEBUG: State is now 5
debconf: --> INPUT high netcfg/get_domain

What I think happens is:

0) netcfg_activate_dhcp gets called; state is AUTOCONFIG.
   netcfg_autoconfig() gets called, and in turn netcfg_dhcp(). udhcpc is
   started, gets a lease and writes the domain to DOMAINFILE. netcfg_dhcp()
   reads the domain name from /tmp/domain_name, sets domain to
   "eqiad.wmnet" and have_domain to 0.

1) Multiple states transition. DHCP State is now HOSTNAME (5).
   preseed_hostname_from_fqdn() is getting called, with globals
   have_domain=1, domain="eqiad.wmnet". This sets netcfg/get_domain to
   "eqiad.wmnet" and netcfg/get_hostname to the non-FQDN part of the
   hostname ("d-i-test").

2) netcfg_get_hostname() gets called, with accept_domain=1, which sets
  the global have_domain=0 at the top, but never sets it back to 1, as
  it only does so if netcfg/get_hostname is an FQDN, which it isn't
  anymore.

3) DHCP State is now DOMAIN (6). netcfg_get_domain() gets called with
   have_domain=0 and prompts.

Looking at the code, it looks like *removing* the domain name from my
DHCP config entirely would work, as d-i would then fallback to splitting
the FQDN into a domain part as well.

The whole behavior seems like very buggy to me; specifically, (2) seems
flawed, as it unconditionally sets have_domain to 0 and only sets it
back under certain specific conditions (= FQDN), entirely ignoring all
the work that preseed_hostname_from_fqdn() does.

Thanks,
Faidon

#772523#12
Date:
2023-08-23 09:52:26 UTC
From:
To:
Hello.

I just tried with bookworm net installer and things changed since
bullseye.

On a bullseye, netcfg use the domain returned by DHCP which is not the
case on bookworm.

I think that the rDNS check before using the DHCP domain should be
removed.

It looks like on bullseye, domain was preseeded from DHCP because
netcfg/get_hostname was considered a valid FQDN which is not the case on
bookworm.

I got logs from both installations (attached to this email) and here are
the relevant logs:

#+begin_src diff
--- netcfg-bullseye.log	2023-08-23 11:18:55.000000000 +0200
+++ netcfg-bookworm.log	2023-08-23 11:18:55.000000000 +0200
@@ -1,4 +1,4 @@
-frontend: --> SET netcfg/get_hostname bullseye
+frontend: --> SET netcfg/get_hostname bookworm
 frontend: --> METAGET netcfg/get_hostname type
 frontend: --> FSET netcfg/get_hostname seen true
 frontend: --> SET netcfg/choose_interface auto
@@ -11,7 +11,7 @@
 debconf: --> METAGET debian-installer/netcfg/title Description
 main-menu: INFO: Menu item 'netcfg' selected
 debconf: --> SETTITLE debian-installer/netcfg/title
-netcfg: INFO: Starting netcfg v.1.176
+netcfg: INFO: Starting netcfg v.1.187
 debconf: --> GET netcfg/enable
 debconf: --> GET netcfg/disable_autoconfig
 debconf: --> SET netcfg/use_autoconfig true
@@ -144,11 +150,11 @@
 netcfg: DEBUG: State is now 2
 netcfg: DEBUG: State is now 5
 debconf: --> GET netcfg/hostname
-netcfg: INFO: DHCP hostname: "bullseye"
-netcfg: DEBUG: bullseye is a valid FQDN
-debconf: --> SET netcfg/get_hostname bullseye
-netcfg: DEBUG: Preseeding domain from global: eole.lan
-debconf: --> SET netcfg/get_domain eole.lan
+netcfg: DEBUG: Using DNS to try and obtain default hostname
+netcfg: DEBUG: Getting default hostname from rDNS lookup of autoconfigured address 192.168.230.136
+netcfg: DEBUG: getnameinfo() returned -2 (Name or service not known)
+netcfg: DEBUG: Getting default hostname from rDNS lookup of autoconfigured address fe80::c0ff:fea8:e6ea
+netcfg: DEBUG: getnameinfo() returned -2 (Name or service not known)
 debconf: --> INPUT high netcfg/get_hostname
 debconf: --> GET netcfg/get_hostname
 netcfg: DEBUG: State is now 6
#+end_src

#772523#17
Date:
2024-01-27 18:19:05 UTC
From:
To:
Dear developers,

I confirm that something broke between Bullseye and Bookworm (IIRC it
worked even with Bookworm RC2) regarding netcfg's behavior when the
host's IP address can't be found in the DNS.

I also think it's a different bug from the one originally reported
(which were for Jessie). Maybe a new one should be created for this big
regression in Bookworm.

The behavior also changes according to when the preseeding happens:
before netcfg (when installation is performed from an installation media
and the preseed file is specified with kernel parameter file=...) or
after netcfg (when installation is performed from a PXE server and the
preseed file is specified with kernel parameter url=...).

Note that when IP address (assigned by DHCP) is found in the DNS,
everything works as before, whether netcfg is performed before or after
the preseeding.

I performed a number of tests with a VM and the IP address missing from
the DNS, and the preseed file containing "d-i netcfg/get_hostname string
unassigned-hostname" and "d-i netcfg/get_domain string
unassigned-domain" (as suggested in [1]).

[1] https://www.debian.org/releases/bookworm/example-preseed.txt

With Bullseye, if the preseeding happens after netcfg (url=...), without
hostname or domain kernel parameters, d-i chooses "debian" as hostname
and sets no domain name. With a hostname set in kernel parameters, d-i
keeps it and sets the domain name from DHCP. With both hostname and
domain set in kernel parameters, d-i keeps the hostname, but still sets
the domain name from DHCP, ignoring the one provided in kernel parameters.

If the preseeding happens before netcfg (file=...), without hostname or
domain kernel parameters, d-i uses the ones from the preseed file
(unassigned-hostname and unassigned-domain). When hostname or hostname
and domain are specified in kernel parameters, it behaves as above
(hostname from kernel parameters and domain from DHCP).

This behavior is consistent with the aforementioned example preseed
file, and also with documentation in [2] (and particularly B.2.3, "Auto
mode").

[2] https://www.debian.org/releases/stable/i386/apbs02.en.html

With Bookworm, everything

#772523#22
Date:
2024-01-27 18:54:04 UTC
From:
To:
(sorry, accidental Ctrl+Enter...)

With Bookworm, it totally broke.

If the preseeding happens after netcfg (url=...), when setting the
hostname from the kernel parameters, d-i keeps it, but does not get the
domain from DHCP as before; only setting both a hostname and a domain
name makes things work (but now keeps the domain from the kernel
parameters, which may be more appropriate).

If the preseeding happens before netcfg (file=...), whatever hostname
and/or domain is set in the kernel paramaters, and whatever domain the
DHCP sends, d-i uses the values from the preseed file
(unassigned-hostname and unassigned-domain).

So, currently, the only way to make things work with an installation
medium, is to unset both netcfg/get_hostname and netcfg/get_domain in
the preseed file, and set hostname and domain in the kernel parameters.

(for convenience, I attached a text file in markdown format with the
results of these tests).

This is a big change from the behavior of previous netcfg versions (I
use preseeding since Wheezy, both with PXE or installation media), and
this new behavior makes things more complicated: before, we just
configured bogus values in the preseed file, and if the machine was not
registered in the DNS but the DHCP provided a valid domain name,
specifying a hostname in the kernel parameters was sufficient. Now, we
have to specify both the hostname and the domain name in the kernel
command line (think of non-QWERTY keyboards...), and this makes me
consider this a severe regression.

It also partly contradicts the various documentations (like the ones
mentioned above).

I sincerely hope this will be fixed in a forthcoming point release.

Feel free to ask me if you need to test stuff, I have a suitable
environment to test preseeding for both PXE or installation media.

Best regards,