#987013 Release goal proposal: Remove Berkeley DB

#987013#5
Date:
2021-04-15 16:12:17 UTC
From:
To:
Hi

I would like to propose a release goal:

Remove Berkeley DB (finally)

Berkeley DB was relicensed to AGPLv3 almost eight years ago.  Since
then, Debian stayed with the last version before the license change.
The license change means, we can't take upstream patches, so security
support is only provided by other distributions with in the same state.

After this time we really should try to get rid of this package, which
even is NMU maintained since three years.

Affected source packages to remove:
- db-defaults
- db5.3

Bastian

#987013#10
Date:
2021-04-16 07:42:42 UTC
From:
To:
Hi,

I second this proposal.

Best regards,
Martin

#987013#15
Date:
2021-04-16 08:30:08 UTC
From:
To:
Bastian Blank wrote:

Sorry but I don't understand, why is that a problem?
I believe the AGPL (you mean the GNU Affero General Public License,
right?) is a free license. Is it not?

Gerardo

#987013#20
Date:
2021-04-16 15:04:03 UTC
From:
To:
I am not persuaded. I maintain libberkeleydb-perl and it works fine, it
is mature software.

But even if we agree that all the libdb5.3 reverse dependencies must
migrate to a different database then probably we will need to keep
around db5.3-util (and its dependency libdb5.3) to allow dumping and
restoring the databases.
Not all software uses libdb as a cache which can just be regenerated
and/or supports multiple databases and has internal dump/restore tools.

And then all the packages currently depending on libdb5.3 will need to
implement, or at least document, a transition strategy.
Let me just mention postfix (easy), inn2 (possible but very resources
intensive) and slapd (I am not sure, but it is critical and scary).

#987013#25
Date:
2021-04-16 18:09:13 UTC
From:
To:
Hi Gerardo

Yes, the AGPLv3 is a free license.

However the freeness is not the problem here.  The problem is the AGPL,
it's extended source provisions, the incompatibility with the license of
existing software and also a bit "Oracle".

The AGPL was created for network services.  It requires to provide the
source to anyone accessing it via network.  So this is tailored for the
services themselves, not arbitrary libraries deep within the dependency
chain.  There where a lot of discussions about this problems at the
time.[1]

So even if we would switch to a current version of Berkeley DB, we would
need to do the same work to make sure every software that uses it is in
compliance with the AGPL.  AFAIK every distribution either stayed with
BDB 5.3 and often just removed it's use as much as possible or just
killed it alltogether.

Regards,
Bastian

[1]: https://lwn.net/Articles/557820/

#987013#30
Date:
2021-04-16 18:29:52 UTC
From:
To:
Hi Marco

Mature and unmaintained are not opposites.  It works, but this does not
mean it's advised to be used, especially if it works on untrusted data.
(I the time I installed by current notebook, Linux 4.0 was the current
version.  It would still work, but would you really still use it on the
internet?)

My first goal would be to drop it from base packages, so not every
system out there needs to have it installed.

postfix is easy.  Would inn2 be license compliant with a AGPL licensed
BDB, aka able to provide the source to it's users, or what is the plan
anyway?  slapd defaults to LMDB since several years and you need to
explicitely specify the bdb or hdb backend.

Regards,
Bastian

#987013#35
Date:
2021-04-16 20:36:57 UTC
From:
To:
The plan is to continue using 5.3, not upgrading.
Sure, but the point was how to convert existing systems.

#987013#40
Date:
2021-04-19 07:38:47 UTC
From:
To:
Thanks for your explanation.

Gerardo

Il giorno ven 16 apr 2021 alle ore 20:09 Bastian Blank
<waldi@debian.org> ha scritto:

#987013#45
Date:
2021-05-05 18:51:27 UTC
From:
To:
Hi, please note that there's a number of indirect users of libdb via
interpreter packages - at least Perl, presumably Python too.

Given perl gets installed on almost all systems, this seems to to be
on the path to the first goal.

For the perl core, the libdb5.3 bindings are exposed with the DB_File
module. I think this is the only place but have not cross-checked that.
(The libberkeleydb-perl package is an entirely separate matter AIUI.)

I see 110 source packages in Debian matching DB_File. The list will
need to be inspected manually to weed out false positives. The remaining
packages need to be changed before perl can drop its libdb5.3 dependency.
I suppose this will also need a long list of Breaks declarations on the
perl side.

Then there's user code too. I also think we'll need at least a dumper
utility so that users can migrate their data manually when they discover
their program no longer works after upgrading.

#987013#50
Date:
2021-05-06 05:33:03 UTC
From:
To:
For Python, the dbm/ndbm.py module, based on the _dbm extension is also
affected.  You can build the _dbm extension using libgdbm-compat-dev, however
that changes the on-disk format, and the license used (likely the new one should
be moved into the python3-gdbm package).

Matthias

#987013#55
Date:
2021-05-09 18:04:23 UTC
From:
To:
As far as I can see, the realistic best case would be to drop
Berkeley DB *after* bookworm.

For usages that are not just build-time tests or temporary caches,
we need at least one release for migrating the data of our users.

apt-listchanges is using Berkeley DB through Python (#988090).
This is one global database, and the user-friendly way of migration
would be either in the maintainer scripts during the upgrade to bookworm
or at runtime when the version in bookworm discovers a legacy Berkeley
DB database.

If Python in bookworm would not be able to read legacy Berkeley DB
databases, we would be screwing our users by not being able to offer
them automatic migrations in packages like apt-listchanges.

I maintain bogofilter (a spam filter). It would be feasible to implement
a transparent migration from Berkeley DB to a different format in
bookworm, but this requires a bogofilter tool compiled against libdb5.3
in bookworm.

Which would not be possible without libdb5.3 in bookworm.

cu
Adrian

#987013#60
Date:
2021-08-23 14:12:46 UTC
From:
To:
Matthias Klose wrote:

Hi, I'm a nosy bystander.

Last year I was annoyed by scrapy using bdb to cache entire HTTP responses (including large HTML bodies).
As an experiment, I wrote some proof-of-concept code for other backends.
IIRC if the database doesn't exist yet, they can drop-in replace "import dbm".
Here they are attached, do with them what you will.
I don't intend to touch them again myself.

(They are expat licensed, but I can relicense if needed.)

(FWIW, I eventually ended up patching scrapy to use sqlite3 directly, and then gave up on scrapy entirely.)

#987013#65
Date:
2021-11-16 13:57:37 UTC
From:
To:
I think the backwards incompatible change to the dbm extension[1] has to
be reverted until after a release where someone has migrated the data of
all packages using Berkeley DB.

Most broken packages are likely not even among the ones that need
rebuilding, they would just be just broken and Python should not
make it harder to fix them.

If Berkeley DB removal should be done in trixie, someone will have to
analyze and implement solutions for data migrations as part of #987013
in bookworm. Ecosystem maintainers unilaterally dropping support would
only make it a lot harder to implement solutions.

How should a package like apt-listchanges migrate its database if the
existing reader code is no longer functional due to this change in Python?
That's not impossible, but might make things a lot harder.

AFAIK even Python 2.7 will still be shipped in bookworm, so let's not
make life harder for other people by hurrying too much with Berkeley DB.

cu
Adrian

[1] https://tracker.debian.org/news/1240462/accepted-python310-3100b1-2-source-into-experimental/

#987013#68
Date:
2021-11-16 13:57:37 UTC
From:
To:
I think the backwards incompatible change to the dbm extension[1] has to
be reverted until after a release where someone has migrated the data of
all packages using Berkeley DB.

Most broken packages are likely not even among the ones that need
rebuilding, they would just be just broken and Python should not
make it harder to fix them.

If Berkeley DB removal should be done in trixie, someone will have to
analyze and implement solutions for data migrations as part of #987013
in bookworm. Ecosystem maintainers unilaterally dropping support would
only make it a lot harder to implement solutions.

How should a package like apt-listchanges migrate its database if the
existing reader code is no longer functional due to this change in Python?
That's not impossible, but might make things a lot harder.

AFAIK even Python 2.7 will still be shipped in bookworm, so let's not
make life harder for other people by hurrying too much with Berkeley DB.

cu
Adrian

[1] https://tracker.debian.org/news/1240462/accepted-python310-3100b1-2-source-into-experimental/

#987013#73
Date:
2022-05-14 10:06:27 UTC
From:
To:
Hi,

cyrus-sasl2 package maintainer here. I am interested in the state of this since BerkeleyDB will be removed from sasl
upstream with the next release [0]. There are several other implementations to choose from: gdbm|lmdb|ndbm. ndbm does
not exist in Debian and gdbm's GPL license would probably be problematic when combined with CMU's BSD-4-clause code.

Can someone shine a light on what a transition to lmdb would mean for cyrus-sasl2?
Can the existing databases just be used as-is or do they need a migration?

Cheers,
Bastian

[0]: https://github.com/cyrusimap/cyrus-sasl/issues/718

#987013#78
Date:
2023-02-03 20:02:39 UTC
From:
To:
Apache httpd allows to use DBM file for various purposes. The default
format is Berkeley DB. This is highly configuration dependent, automatic
migration by maintainer scripts seems unfeasible. This means that the
users need time and a tool to migrate their configurations. I have
opened [1] for this.

Subversion's first repository format uses Berkeley DB. But according to
[2], new repositories have not used that format as default since 1.2 in
2005 and the format has been deprecated in 1.8 in 2013. So maybe a note
in the release notes would be sufficient, here?

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030376
[2] https://en.wikipedia.org/wiki/Apache_Subversion

#987013#83
Date:
2023-02-04 07:50:29 UTC
From:
To:
Hi,

As a Release Team member, I'm leave a small note here.

It has been a while since we did release goals in the formal way. I
recommend instead to discuss this in a bigger audience and get traction
amongst the Debian community.

Sure. But I agree with several readers of this bug that there should be
a plan. We shouldn't kill it until the users are able to sanely move
away from it. I doubt that will happen automatically, so somebody needs
to organize it.

I don't see the preparation happening in time for bookworm, so if the
preparations are done for trixie, Berkeley DB can be removed in forky.

Paul

#987013#88
Date:
2023-02-07 16:42:32 UTC
From:
To:
At least inn2 uses it, and a "transition" (i.e. rebuilding the overview
database with a different indexing method) for a non-trivial server may
require hours of downtime.

#987013#93
Date:
2024-03-22 15:19:13 UTC
From:
To:
Is there a reason against switching to this fork under the old license?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1010965

There are still applications using bdb by default and that means there
should be tools to read and edit them for the next years to support a
migration.

#987013#98
Date:
2025-10-02 07:02:22 UTC
From:
To:
Hi,


This discussion was started before, I think the comments are still
valid: see bug 987013 [1]. We also have a transition tracker [2].


Most importantly, we'd need an upgrade path, probably in each piece of
software that depends on it, for users that have their data in BDB now.
Somebody needs to do the work.

Paul

[1] https://bugs.debian.org/987013
[2] https://release.debian.org/transitions/html/db5.3-rm.html