#958934 bind9: named fails to start after upgrade to 9.16.2

Package:
bind9
Source:
bind9
Description:
Internet Domain Name Server
Submitter:
"R. Scott Bailey"
Date:
2022-07-06 05:48:03 UTC
Severity:
normal
#958934#5
Date:
2020-04-26 21:42:47 UTC
From:
To:
Dear Maintainer,

Life was good on my DNS server until my recent update to 9.16.2-3.
After upgrading, the exact configuration that was happy now fails to
start. Example:

# named -g -u bind
26-Apr-2020 17:25:50.861 starting BIND 9.16.2-Debian (Stable Release) <id:b310dc7>
26-Apr-2020 17:25:50.861 running on Linux x86_64 5.5.0-2-amd64 #1 SMP Debian 5.5.17-1 (2020-04-15)
26-Apr-2020 17:25:50.861 built with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--enable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=/usr' '--with-libidn2' '--with-libjson-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' '--enable-dnstap' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-Co3jFO/bind9-9.16.2=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
26-Apr-2020 17:25:50.861 running as: named -g -u bind
26-Apr-2020 17:25:50.861 compiled by GCC 9.3.0
26-Apr-2020 17:25:50.861 compiled with OpenSSL version: OpenSSL 1.1.1g  21 Apr 2020
26-Apr-2020 17:25:50.861 linked to OpenSSL version: OpenSSL 1.1.1g  21 Apr 2020
26-Apr-2020 17:25:50.861 compiled with libxml2 version: 2.9.10
26-Apr-2020 17:25:50.861 linked to libxml2 version: 20910
26-Apr-2020 17:25:50.861 compiled with json-c version: 0.13.1
26-Apr-2020 17:25:50.861 linked to json-c version: 0.13.1
26-Apr-2020 17:25:50.861 compiled with zlib version: 1.2.11
26-Apr-2020 17:25:50.861 linked to zlib version: 1.2.11
26-Apr-2020 17:25:50.861 ----------------------------------------------------
26-Apr-2020 17:25:50.861 BIND 9 is maintained by Internet Systems Consortium,
26-Apr-2020 17:25:50.861 Inc. (ISC), a non-profit 501(c)(3) public-benefit
26-Apr-2020 17:25:50.861 corporation.  Support and training for BIND 9 are
26-Apr-2020 17:25:50.861 available at https://www.isc.org/support
26-Apr-2020 17:25:50.861 ----------------------------------------------------
26-Apr-2020 17:25:50.861 found 8 CPUs, using 8 worker threads
26-Apr-2020 17:25:50.861 using 8 UDP listeners per interface
26-Apr-2020 17:25:50.869 using up to 21000 sockets
26-Apr-2020 17:25:50.877 loading configuration from '/etc/bind/named.conf'
26-Apr-2020 17:25:50.881 reading built-in trust anchors from file '/etc/bind/bind.keys'
26-Apr-2020 17:25:50.901 looking for GeoIP2 databases in '/usr/share/GeoIP'
26-Apr-2020 17:25:50.901 using default UDP/IPv4 port range: [1024, 65535]
26-Apr-2020 17:25:50.905 using default UDP/IPv6 port range: [1024, 65535]
26-Apr-2020 17:25:50.905 listening on IPv4 interface lo, 127.0.0.1#5353
26-Apr-2020 17:25:50.913 listening on IPv4 interface br0, 16.1.1.3#5353
26-Apr-2020 17:25:50.917 listening on IPv6 interface lo, ::1#5353
26-Apr-2020 17:25:50.921 unable to set effective uid to 0: Operation not permitted
26-Apr-2020 17:25:50.921 Could not open '//run/named/named.pid'.
26-Apr-2020 17:25:50.921 Please check file and directory permissions or reconfigure the filename.
26-Apr-2020 17:25:50.921 could not open file '//run/named/named.pid': Permission denied
26-Apr-2020 17:25:50.921 generating session key for dynamic DNS
26-Apr-2020 17:25:50.929 unable to set effective uid to 0: Operation not permitted
26-Apr-2020 17:25:50.929 Could not open '//run/named/session.key'.
26-Apr-2020 17:25:50.929 Please check file and directory permissions or reconfigure the filename.
26-Apr-2020 17:25:50.929 could not open file '//run/named/session.key': Permission denied
26-Apr-2020 17:25:50.929 could not create //run/named/session.key
26-Apr-2020 17:25:50.929 failed to generate session key for dynamic DNS: permission denied
26-Apr-2020 17:25:50.929 sizing zone task pool based on 29 zones
26-Apr-2020 17:25:50.933 could not configure root hints from '/usr/share/dns/root.hints': permission denied
26-Apr-2020 17:25:50.957 loading configuration: permission denied
26-Apr-2020 17:25:50.957 exiting (due to fatal error)

Usability of entire system (and in fact, all systems on home network) is impaired as primary
DNS server is unresponsive and failover to secondary server (after query to primary times out)
induces noticeable delay in DNS lookups (and consequently nearly all network operations).

I am reverting to previous bind9 package as soon as I file this report. :-(

Thanks,
-Scott

#958934#10
Date:
2020-04-27 11:54:08 UTC
From:
To:
Am 26.04.20 um 23:42 schrieb R. Scott Bailey:

Dear Scott,

/run/named should be created by tmpfiles, see
/usr/lib/tmpfiles.d/named.conf .

What does "ls -la /run/named" show?

That one is supposed to be world-readable...

Do you have any denials from AppArmor in your kernel messages?

I have upgraded several machines from 9.11 to 9.16 and they generally
worked fine.

Bernhard

#958934#15
Date:
2020-04-27 11:54:08 UTC
From:
To:
Am 26.04.20 um 23:42 schrieb R. Scott Bailey:

Dear Scott,

/run/named should be created by tmpfiles, see
/usr/lib/tmpfiles.d/named.conf .

What does "ls -la /run/named" show?

That one is supposed to be world-readable...

Do you have any denials from AppArmor in your kernel messages?

I have upgraded several machines from 9.11 to 9.16 and they generally
worked fine.

Bernhard

#958934#20
Date:
2020-04-27 12:13:21 UTC
From:
To:
I concur with Bernhard.

And since there's not enough information, we can go only by the log
messages.

It seems to me that either BIND uses custom selinux/apparmor profile
or something else is in place that disallows named from accessing those
files.

The BIND 9 provides enough information for you to fix the problem before
you start
filling serious bugs. This is not a serious bug.

Ondrej

#958934#27
Date:
2020-04-27 12:13:21 UTC
From:
To:
I concur with Bernhard.

And since there's not enough information, we can go only by the log
messages.

It seems to me that either BIND uses custom selinux/apparmor profile
or something else is in place that disallows named from accessing those
files.

The BIND 9 provides enough information for you to fix the problem before
you start
filling serious bugs. This is not a serious bug.

Ondrej

#958934#32
Date:
2020-04-27 13:34:21 UTC
From:
To:
I can't find anything wrong with /run/named. On the system now:

buzz:~# ls -la /run/named
total 8
drwxrwxr-x  2 root bind   80 Apr 26 17:46 .
drwxr-xr-x 58 root root 1940 Apr 26 16:23 ..
-rw-r--r--  1 bind bind    7 Apr 26 17:46 named.pid
-rw-------  1 bind bind  102 Apr 26 17:46 session.key
buzz:~# dpkg -l bind9
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version              Architecture Description
+++-==============-====================-============-==========================>
ii  bind9          1:9.11.5.P4+dfsg-5.1 amd64        Internet Domain Name Server
buzz:~#

To recap the situation:

1. System running "n-1" testing version of bind9, everything operates normally
2. System updates to bind9 1:9.16.2-3, bind9 refuses to start
3. I look at directories, which exist, and compare settings/permissions with my secondary server (rpi3 running Debian 10.3), and can't find anything that looks different. Even removing "-u bind" from startup doesn't change anything! (although I confirmed at that time that "touch /run/named/xxxx" as root worked.)
4. Do (paraphrased for brevity) "install bind9/stable" to downgrade to known-good package version
5. bind9 starts and runs normally

To clarify, step 2 was an "apply all outstanding updates" activity, so it is possible that another update performed in that bundle is responsible for the breakage. However, step 4 modified only bind9 and its supporting libraries, and was the ONLY thing I did to move from broken to stable operation. Therefore I conclude that although it may not be at fault with respect to root cause, the latest testing version clearly is less resistant to something in my configuration than the stable version.

I have modified my zone files a lot (well, this IS a name server... ;) ) but do not believe I have meddled with any of the other security-related settings. I just went and looked specifically for signs of life from AppArmor, and all I could locate was this in /var/log/daemon.log:

buzz:/var/log# grep apparmor daemon.log.1
Apr 25 19:53:19 buzz apparmor.systemd[1211]: Restarting AppArmor
Apr 25 19:53:19 buzz apparmor.systemd[1211]: Reloading AppArmor profiles
Apr 25 19:53:19 buzz apparmor.systemd[1248]: Warning: found usr.sbin.chronyd in /etc/apparmor.d/force-complain, forcing complain mode
Apr 25 19:53:19 buzz apparmor.systemd[1248]: Warning from /etc/apparmor.d (/etc/apparmor.d/usr.sbin.chronyd line 60): Warning failed to create cache: usr.sbin.chronyd
buzz:/var/log#

And: (from a system boot while troubleshooting this issue - at the same time as above)

buzz:/var/log# grep -Fi apparmor messages*
messages.1:Apr 25 19:53:19 buzz kernel: [    0.184487] AppArmor: AppArmor initialized
messages.1:Apr 25 19:53:19 buzz kernel: [    0.510404] AppArmor: AppArmor Filesystem Enabled
messages.1:Apr 25 19:53:19 buzz kernel: [    1.407974] AppArmor: AppArmor sha1 policy hashing enabled
buzz:/var/log#

Thanks to noisy kvm messages, I cannot positively confirm there is nothing of interest in dmesg... but I would have expected messages to have been recorded somewhere in /var/log if they were being generated, and there's nothing beyond what I pasted above.

I am totally happy to try upgrading and playtesting again, but I would argue that "the error messages provide enough information for you to fix the problem" is inaccurate when the message is "permission denied", the user of interest clearly has the required permissions on the target directory (which exists), and EVEN THE ROOT ACCOUNT gets "permission denied" -- and yet no other error messages are generated anywhere on the system.

Does anybody have any additional suggestions on what to look at the next time I start a troubleshooting session?

Thanks,

Scott

#958934#37
Date:
2020-04-27 13:42:10 UTC
From:
To:
Hi Scott,

I would suggest running `named -g -u bind` under strace:

strace -o /tmp/bind9.trace -f /usr/sbin/named -g -u bind

And then look what’s happening when `named` tries to open the files.

Also there’s upstream bug related to this:

26-Apr-2020 17:25:50.921 unable to set effective uid to 0: Operation not permitted

see:
https://gitlab.isc.org/isc-projects/bind9/-/issues/1042
and
https://gitlab.isc.org/isc-projects/bind9/-/issues/1090

But we (as upstream) haven’t got to fix this yet (although I can reproduce it with
9.17.1, so it’s only a matter of having enough time to properly look at the issue).

Ondrej
--
Ondřej Surý
ondrej@sury.org

#958934#42
Date:
2020-04-27 14:09:40 UTC
From:
To:
Apparmor denials show up in kernel logs, so dmesg/journalctl -k are good
places to look.

  sudo journalctl -k -b0 | grep -F 'apparmor="DENIED"'

HTH,
Simon

#958934#47
Date:
2020-04-27 15:33:09 UTC
From:
To:
Thanks, Simon!

buzz:~# journalctl -k -b0 | grep -F apparmor
buzz:~#

So whatever's going on, it doesn't look like AppArmor has anything to do with it.

Cheers,

Scott
-----Original Message----- Apparmor denials show up in kernel logs, so dmesg/journalctl -k are good places to look. sudo journalctl -k -b0 | grep -F 'apparmor="DENIED"' HTH, Simon
#958934#52
Date:
2020-04-27 15:43:21 UTC
From:
To:
To completely rule out Apparmor, please share the following:

aa-enabled
sudo aa-status | grep -E '^[0-9]+|named'

Thanks,
Simon

#958934#57
Date:
2020-04-27 16:09:11 UTC
From:
To:
Here you go -- I have no idea what I am looking at...

buzz:~# aa-enabled
Yes
buzz:~# aa-status | grep -E '^[0-9]+|named'
43 profiles are loaded.
40 profiles are in enforce mode.
   /usr/sbin/named
   named
3 profiles are in complain mode.
79 processes have profiles defined.
77 processes are in enforce mode.
   /usr/sbin/named (209351)
2 processes are in complain mode.
0 processes are unconfined but have a profile defined.
buzz:~#

You didn't ask, but I can sense the inevitable follow-on questions:

buzz:~# dpkg -S /etc/apparmor.d/usr.sbin.named
bind9: /etc/apparmor.d/usr.sbin.named
buzz:~# dpkg -V bind9
??5?????? c /etc/bind/named.conf.local
??5?????? c /etc/bind/named.conf.options
buzz:~#

Of course, this is the stable package, but I'll add this to my list of things to look at when I retry 9.16.2 (perhaps tonight)...

Thanks,

Scott
-----Original Message----- To completely rule out Apparmor, please share the following: aa-enabled sudo aa-status | grep -E '^[0-9]+|named' Thanks, Simon
#958934#62
Date:
2020-04-27 16:23:04 UTC
From:
To:
Also check if you don't have systemd overrides in place, that could be
preventing named from writing to certain directories:

systemctl cat bind9.service

#958934#67
Date:
2020-05-19 12:43:22 UTC
From:
To:
Could you share your network configuration?
Are you using ifupdown?

I wonder if this is a duplicate of
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960594

where I suspect the if-up.d hook breaking the dependency resolution
causing bind9 to be started way too early.

#958934#72
Date:
2020-05-21 14:27:30 UTC
From:
To:
I think I'm using ifupdown (it's installed, anyway -- version 0.8.35+b1) but I also believe this is not the root cause of the problem.

It took me awhile to think it through, but if you revisit the problem report, I'm reproducing the problem by running named interactively (on steady-state system) -- so the problem can't be boot-related, as everything else (particularly all of the filesystems, and also all of the network interfaces) is up and running well before the point I attempted to start the daemon.

You'd think with all this COVID-19 stay-home stuff going on I would have had time to retest with tracing, but I'm still trying to get a window where I have free time and the rest of my household is willing to take the DNS hit.

Thanks,
Scott

Could you share your network configuration?
Are you using ifupdown?

I wonder if this is a duplicate of
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960594

where I suspect the if-up.d hook breaking the dependency resolution
causing bind9 to be started way too early.

#958934#77
Date:
2020-08-23 03:20:17 UTC
From:
To:
I have another data point to contribute, courtesy of DTE - our local power company - and a power outage that lasted longer than my UPS battery...

First, the problem still exists with bind9 9.16.4-1+b1.

Second (and most interestingly), this version WILL start successfully if you are upgrading from an earlier version (such as /stable) and do not reboot.

The sequence of events:

1. Feeling bored, I try upgrading bind9 and friends to current /testing version
2. I check carefully and see the upgrade was unexceptional and the new named daemon started without problems
3. Subsequently, we take a power hit that results in a normal system shutdown (when UPS reached critical level) and normal system boot (after power was restored)
4. named failed to start at boot time, as described previously in this problem
5. named failed to start manually too
6. I downgraded bind9 to /stable, and (of course) named started normally
7. I upgraded bind9 to /testing
8. something got confused at this point, and the old named was still running (I could tell by the start time)
9. I manually killed the named process
10. "systemctl start named" successfully started the new named

I verified in /var/log/daemon.log that the running daemon is identifying itself as:

Aug 22 23:14:35 buzz named[151367]: starting BIND 9.16.4-Debian (Stable Release) <id:0849b42>

I still have no idea exactly what initialization is missing, but it's clear that the /stable named is doing it but the /testing named is not. Why nobody else seems to be seeing this problem is yet another interesting question. There have been no configuration file changes since my initial problem report.

Puzzled,

Scott

#958934#82
Date:
2021-02-20 07:39:49 UTC
From:
To:
Quack,

I had a similar problem just now when migrating from
1:9.11.5.P4+dfsg-5.1+deb10u2 to 1:9.16.11-2~bpo10+1.

Aside from the similar permission errors I also had this output:
# aa-status | grep -E '^[0-9]+|named'
34 profiles are loaded.
18 profiles are in enforce mode.
    /usr/sbin/named
    named

This means there's two conflicting profiles loaded.

Btw `journalctl -k -b0 | grep -F apparmor` does not catch apparmor
messages in my case, but they are available in /var/log/audit/audit.log
(I have auditd installed but no specific rules).

So I looked at the rules:
# rgrep /usr/sbin/named /etc/apparmor*
/etc/apparmor.d/apparmor-profile:/usr/sbin/named {
/etc/apparmor.d/apparmor-profile:  /usr/sbin/named mr,
/etc/apparmor.d/usr.sbin.named:profile named /usr/sbin/named
flags=(attach_disconnected) {
/etc/apparmor.d/usr.sbin.named:  /usr/sbin/named mr,

I have no idea where apparmor-profile comes from but it's not of my
doing:
# dlocate /etc/apparmor.d/apparmor-profile
bind9: /etc/apparmor.d/apparmor-profile

I did not find in which version it came from though, maybe some previous
bpo; it's dated 2007 and a very basic profile (I can paste it if one
wants to look at it).

A reload of apparmor was not sufficient but this worked:
apparmor_parser -R /etc/apparmor.d/apparmor-profile
rm /etc/apparmor.d/apparmor-profile
apparmor_parser -r /etc/apparmor.d/usr.sbin.named
systemctl restart named

Hope this helps.
\_o<

#958934#87
Date:
2021-08-20 00:30:25 UTC
From:
To:
I had my head buried in the sand on this until this week when bullseye got released and my bind9 package I had pinned to "stable" upgraded and predictably broke again.

This time, I was able to review Marc's message from 20 February, which was quite interesting.

I also found two conflicting profiles ("usr/sbin/named" and "named"), of which it appears that "named" is produced by /etc/apparmor.d/usr.sbin.named and "/usr/sbin/named" is coming from .... I have no idea where it is coming from. ;-)

I found the following short-term workaround let me get it started:

# aa-remove-unknown -n
Warning: found usr.sbin.chronyd in /etc/apparmor.d/force-complain, forcing complain mode
Would remove 'docker-default'
Would remove '/usr/sbin/named'
# aa-remove-unknown
Warning: found usr.sbin.chronyd in /etc/apparmor.d/force-complain, forcing complain mode
Removing 'docker-default'
Removing '/usr/sbin/named'
# systemctl start bind9
# systemctl status -l bind9
● named.service - BIND Domain Name Server
     Loaded: loaded (/lib/systemd/system/named.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2021-08-19 20:03:31 EDT; 2s ago
       Docs: man:named(8)
   Main PID: 86681 (named)
      Tasks: 26 (limit: 38349)
     Memory: 158.9M
        CPU: 93ms
     CGroup: /system.slice/named.service
             └─86681 /usr/sbin/named -f -u bind

Aug 19 20:03:31 buzz named[86681]: /etc/bind/db.ldap:1: no TTL specified; using SOA MINTTL instead
Aug 19 20:03:31 buzz named[86681]: zone ldap.troy.cartasoft.com/IN: loaded serial 2021061500
Aug 19 20:03:31 buzz named[86681]: zone troy.cartasoft.com/IN: loaded serial 2021070200
Aug 19 20:03:31 buzz named[86681]: all zones loaded
Aug 19 20:03:31 buzz named[86681]: running
Aug 19 20:03:31 buzz named[86681]: zone 1.1.16.in-addr.arpa/IN: sending notifies (serial 2021070200)
Aug 19 20:03:31 buzz named[86681]: zone ldap.troy.cartasoft.com/IN: sending notifies (serial 2021061500)
Aug 19 20:03:31 buzz named[86681]: zone troy.cartasoft.com/IN: sending notifies (serial 2021070200)
Aug 19 20:03:31 buzz named[86681]: managed-keys-zone: Key 20326 for zone . is now trusted (acceptance timer complete)
Aug 19 20:03:32 buzz named[86681]: resolver priming query complete
#

I don't have any clue whatsoever where the "extra" profile is coming from... But I presume at this point this is looking like an apparmor problem rather than a bind9 problem.

FYI,
-Scott

#958934#92
Date:
2021-08-30 14:12:48 UTC
From:
To:
Today I had an unexpected reboot and was shocked (but pleased) to see the current stable (bullseye) named started without problems!

I checked "aa-remove-unknown -n" output (as in my previous update) but this time there was no mysterious "/usr/sbin/named" profile to be removed, and with it not in the picture the expected profile was in place and everything is great.

I still have no idea where that weird profile came from or why it disappeared. I was under the impression that "aa-remove-unknown" is only unloading profiles from memory, and not making persistent changes, but I can't think of anything else that's happened recently.

In any case, I no longer observe this problem and I would be happy to see this bug report closed.

Thanks for putting up with me for 16 months,

Scott

#958934#97
Date:
2022-07-06 05:44:44 UTC
From:
To:
Seems like this has been resolved, so I am closing the issue.