#976907 golang-github-boltdb-bolt: FTBFS on ppc64el (arch:all-only src pkg): dh_auto_test: error: cd obj-powerpc64le-linux-gnu && go test -vet=off -v -p 160 -short github.com/boltdb/bolt github.com/boltdb/bolt/cmd/bolt returned exit code 1

#976907#5
Date:
2020-12-09 09:03:31 UTC
From:
To:
Hi,

During a rebuild of all packages in sid, your package failed to build
on ppc64el. At the same time, it did not fail on amd64.

I'm marking this bug as severity:serious since your package has only
Architecture:all binary packages, and should thus, in theory, build
everywhere. Failure to build on ppc64el might indicate a serious issue
in this package or in another package.

But feel free to downgrade or close if you believe that this is only a
build-time issue. (I would personnally prefer a severity:minor bug just to
track that the package can only be built on specific architectures.)

Relevant part (hopefully):
http://qa-logs.debian.net/2020/12/09/golang-github-boltdb-bolt_1.3.1-7_unstable.log

A list of current common problems and possible solutions is available at
http://wiki.debian.org/qa.debian.org/FTBFS . You're welcome to contribute!

If you reassign this bug to another package, please marking it as 'affects'-ing
this package. See https://www.debian.org/Bugs/server-control#affects

If you fail to reproduce this, please provide a build log and diff it with me
so that we can identify if something relevant changed in the meantime.

About the archive rebuild: The rebuild was done on a Power8 cluster part of the
Grid'5000 testbed. Hardware specs: https://www.grid5000.fr/w/Grenoble:Hardware#drac

#976907#10
Date:
2020-12-12 16:59:37 UTC
From:
To:
Hello all,

1 down, 1 to go.... info below.
[...]

^--- I've not looked into this one yet.

[...]
[...]

^-- this one is solved by adding `tx.Rollback()` last in
TestTx_Commit_ErrTxNotWritable function (in tx_test.go:65).
In other words, this is a test-suite bug (not a bug in the actual
product code).

The reasoning goes that tx.Commit() is expected to
return error bolt.ErrTxNotWritable, which it does -- but this
means it's holding a reader lock on db.mmaplock.
After the test function finishes the deferred function
db.MustClose() runs and calls into things that tries
to take a read-write lock of db.mmaplock which times out.
The added tx.Rollback() on a read-only tx basically only
removes the transaction and releases the db.mmaplock.

I have no idea why this would not also trigger on any other arch.

Regards,
Andreas Henriksson

#976907#15
Date:
2020-12-12 17:34:56 UTC
From:
To:
Hello again,
[...]

Now also quickly looked into this one. It seems the test-suite
makes assumptions related to calculations that involve
os.Getpagesize() (which gives 4096 on amd64 and 65536 on ppc64el,
which is 16 times larger).
Changing the 500 number to 8000 (16 times larger) in
TestBucket_Stats(...) (in bucket_test.go:1143) gives the expected
BranchPageN == 1.... (however after that it then says
"unexpected LeafPageN: 6" with this modification).

Anyway, this makes me loose interest in pursuing this further.
In my opinion it's pretty clear that these are test-suite only
issues and not issues in the actual product.

Unless someone else wants to pursue fixing up the test-suite for ppc64le
needs, my offer to "fix" this will be to simply disable it on !amd64
architectures (unless we agree on simply downgrading this issue to non-RC).

Regards,
Andreas Henriksson

#976907#20
Date:
2020-12-12 17:48:52 UTC
From:
To:
Hello again,

So after wasting my time here I finally realized that apparently
boltdb is archived upstream. It will not receive any fixes.

Apparently golang-github-coreos-bbolt is a maintained feature-extended
fork. We should likely encurage moving to that and get boltdb removed
from debian.

The timeout waiting for db.mmaplock that occurred in boltdb is
apparently already fixed in bbolt, see:
https://github.com/etcd-io/bbolt/commit/e06ec0a754bc30c2e17ad871962e71635bf94d45

The pagesize issue seems to plague them both still though.

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=976926
for the bug report against bbolt.

Regards,
Andreas Henriksson

#976907#25
Date:
2020-12-13 15:33:13 UTC
From:
To:
Hi Andreas,

Thanks a lot for investigating! The problem with removing this 'right now' is that there are a few (important) reverse-dependencies and reverse-build-deps that this package has.

$ reverse-depends golang-github-boltdb-bolt-dev  
Reverse-Depends
* golang-github-blevesearch-bleve-dev
* golang-github-hashicorp-nomad-dev
* golang-github-hashicorp-raft-boltdb-dev
* golang-github-influxdb-influxdb-dev


$ reverse-depends golang-github-boltdb-bolt-dev -b
Reverse-Testsuite-Triggers
* snapd

Reverse-Build-Depends
* docker-libkv
* etcd
* go-dep
* golang-github-blevesearch-bleve
* golang-github-hashicorp-raft-boltdb
* influxdb
* nomad
* snapd
* vuls

Can simply replacing the dependency in all of them with bbolt work?
This may also need upstream patching in the future.
Please let me know

Kind Regards,
Nilesh

#976907#32
Date:
2021-01-12 04:39:32 UTC
From:
To:
example, see:

https://github.com/hashicorp/raft-boltdb/pull/19#issuecomment-703732437

In short: hashicorp-raft-boltdb wants to make sure there's no issue before
making the change. This change would impact reverse build deps of
hashicorp-raft-boltdb, like nomad or consul.

I think it's better to downgrade the severity here (as was done in
coreos-bbolt, see https://bugs.debian.org/976926).