#908862 neutron: FTBFS randomly (failing tests)

Package:
src:neutron
Source:
neutron
Submitter:
Santiago Vila
Date:
2024-05-10 15:45:05 UTC
Severity:
important
Tags:
#908862#5
Date:
2018-09-15 08:40:27 UTC
From:
To:
Dear maintainer:

I tried to build this package but it failed:
--------------------------------------------------------------------------------
[...]
 debian/rules build-indep
pyversions: missing X(S)-Python-Version in control file, fall back to debian/pyversions
pyversions: missing debian/pyversions file, fall back to supported versions
py3versions: no X-Python3-Version in control file, using supported versions
dh build-indep --buildsystem=python_distutils --with python3,systemd
   dh_update_autotools_config -i -O--buildsystem=python_distutils
   dh_autoreconf -i -O--buildsystem=python_distutils
   dh_auto_configure -i -O--buildsystem=python_distutils
dh_auto_configure: Please use the third-party "pybuild" build system instead of python-distutils
dh_auto_configure: This feature will be removed in compat 12.
   debian/rules override_dh_auto_build
make[1]: Entering directory '/<<PKGBUILDDIR>>'
pyversions: missing X(S)-Python-Version in control file, fall back to debian/pyversions
pyversions: missing debian/pyversions file, fall back to supported versions

[... snipped ...]

  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in wrapped
    return f(*dup_args, **dup_kwargs)
  File "/<<PKGBUILDDIR>>/neutron/plugins/ml2/drivers/type_tunnel.py", line 154, in sync_allocations
    allocs = ctx.session.query(self.model).all()
  File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2783, in all
    return list(self)
  File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2935, in __iter__
    return self._execute_and_instances(context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2958, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/lib/python3/dist-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
    util.raise_from_cause(newraise, exc_info)
  File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 508, in do_execute
    cursor.execute(statement, parameters)
oslo_db.exception.DBNonExistentTable: (sqlite3.OperationalError) no such table: ml2_geneve_allocations [SQL: 'SELECT ml2_geneve_allocations.geneve_vni AS ml2_geneve_allocations_geneve_vni, ml2_geneve_allocations.allocated AS ml2_geneve_allocations_allocated \nFROM ml2_geneve_allocations'] (Background on this error at: http://sqlalche.me/e/e3q8)
---------------------------------------------------------------------- Ran 14783 tests in 3712.856s FAILED (failures=292, skipped=1221) make[1]: *** [debian/rules:46: override_dh_install] Error 1 make[1]: Leaving directory '/<<PKGBUILDDIR>>' make: *** [debian/rules:6: binary-indep] Error 2 dpkg-buildpackage: error: fakeroot debian/rules binary-indep subprocess returned exit status 2 -------------------------------------------------------------------------------- The build was made in my autobuilder with "dpkg-buildpackage -A" on buster but it also fails here in buster and sid: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html where you can get a full build log if you need it. If this is really a bug in one of the build-depends, please use reassign and affects, so that this is still visible in the BTS web page for this package. Please re-re-reconsider uploading packages in source-only form (dpkg-buildpackage -S). I can't compare my build log with the official one because there is simply no official build log: https://buildd.debian.org/status/package.php?p=neutron Thanks.
#908862#14
Date:
2018-09-22 11:05:30 UTC
From:
To:
I'm sorry, this is probably my fault, as I forgot to try without eatmydata.
(Version 9.1.1-3 used to build ok with eatmydata, but this was a long
time ago).

Trying now with a plain buster chroot. If it builds ok I'll retitle appropriately.

(Maybe eatmydata is the reason it fails in reproducible builds too).

Thanks.

#908862#19
Date:
2018-09-22 12:23:05 UTC
From:
To:
As I wrote to you privately, it is unlikely that the issue is using
eatmydata, because I use it too.

Cheers,

Thomas Goirand (zigo)

#908862#24
Date:
2019-01-08 08:59:50 UTC
From:
To:
Closing this bug as it builds fine for me.

Cheers,

Thomas Goirand (zigo)

#908862#29
Date:
2019-01-19 00:43:06 UTC
From:
To:
reopen 908862
found 908862 2:13.0.2-1
found 908862 2:13.0.2-3
found 908862 2:13.0.2-5
thanks

Hello Thomas.

This is still happening, both in reproducible-builds and also in my
own autobuilders. The reproducible-builds logs are available here:

https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html

and I've just put my build logs here:

https://people.debian.org/~sanvila/build-logs/neutron/

(I'm still using eatmydata, since you said you use it yourself and
it's supposed to work).

Please take a look at this. Just saying "it works for me" does not help.
While we are at it, I have yet to see how this is not serious, being a
FTBFS bug in a supported architecture, but I will be happy to skip the
discussion about the severity as far as you are really willing to
solve the problem.

If you still need a machine to reproduce this, please contact me privately.

Thanks.

#908862#42
Date:
2019-01-21 09:34:52 UTC
From:
To:
Santiago Vila <sanvila@unex.es>:
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html

What's weird, is that it's not the same amount of failures.

Correct.

Here, you're recognizing that there's a pattern, and that we've seen
already a few times that, building works for me, but fails in your
environment. So, quite the opposite, the "it works for me" is important,
and needs to be addressed. We need to figure out what is different in
your environment and mine.

To investigate this, I've just tried building in a porter box, which
normally should be very close (if not identical) to what we have in the
buildd network. What I did was really only download the build
dependencies in the schroot, and run dpkg-buildpackage in the schroot:
so nothing special, really. The build was on barriere.debian.org.

The result is that it works perfectly. I do expect to see the same
result if uploading source only.

One thing I've observed, is that most of the times (or every time?), the
test errors we've seen are in SQLite, with a "table doesn't exist"
error. I have no idea (yet) why it's like that.

An FTBFS which we would observe everywhere, including in porter boxes
(which are normally the same as the buildds), in all configurations
including the most basic one (like mine), would indeed qualify as an RC
bug. Though if there's an FTBFS only in some specific environments,
which we aren't even able to explain, doesn't look like a good candidate
for an RC bug.

Unfortunately, last time I tried, I could indeed see the FTBFS, but I
wasn't able to understand why it was happening in your build env and not
the standard one (ie: porter box or my own sbuild setup). As you're
actually paying for the VM to be hosted, I don't dare to ask you to
leave it up, and I'm unsure what to do... :/

I'm open to ideas and I am willing to spend the time to understand, if
you have some clues,
Cheers,

Thomas Goirand (zigo)

#908862#51
Date:
2019-02-12 00:12:29 UTC
From:
To:
This is only something to worry about when I give access to a machine
and the maintainer takes not days but weeks to ssh into it and try to
debug the problem.

I trust that this will not happen here. I've prepared a machine for you.
Details in private email.

If you don't know what's wrong and a failure in the tests does not
indicate that the package is misbuilt, then disabling the test is the
only reasonable thing to do. Making the build fail just for fun does
not make any sense. However, please take the time to investigate the
problem properly, as that's the purpose of offering a test machine.

[ BTW: Once again I have to remind you that just sending a message to
  the nnnnnn@bugs.debian.org address does *not* reach the submitter.
  I found this message from you after I saw that a new neutron package
  propagated to testing and still built unrealiably ].

Thanks.

#908862#56
Date:
2019-10-31 10:30:09 UTC
From:
To:
Hi,

It looks like 2:15.0.0-2 built well in Bullseye, so I guess this bug can
be closed now...

Thomas

#908862#63
Date:
2019-10-31 18:51:44 UTC
From:
To:
reopen 908862
thanks

Sorry for the reopening but the problem is still there and it's not
just my problem, as it fails in both reproducible-builds:

https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html

and my own autobuilders, I've added the last build log here:

https://people.debian.org/~sanvila/build-logs/neutron/

I offered a virtual machine in the past, please contact me privately
if you still need it to reproduce the failure.

Thanks.

#908862#68
Date:
2019-10-31 22:10:06 UTC
From:
To:
Santiago,

This is a sad situation...

The point of running unit tests at package build time, is to avoid
regressions, and to make sure the package is healthy and working. This
is the case for this package, where the unit tests are demonstrating the
package state, and the fact that it integrates well with the current
version of its build-dependencies.

The package built well in my laptop with a normal sbuild setup, in the
buildd (also with sbuild), builds well in reproducible build in Buster
in 2 different environment (see the bullseye builds). I have just tried
another time on my laptop, with an up-to-date Sid env, and it worked
once more. All of this shows exactly what I intended to do when running
unit tests at build time. That's more than enough for me.

Now, it fails on "some env", but you can't provide an explanation on
what going on, and in what way your build environment is different from
the normal sbuild which both I and the buildd network are running, then
I don't know what I can do, and it feels like all of this is just a loss
of time, especially when knowing that Neutron is otherwise fully working
as expected (I use it in production, both the version currently in
Buster, and version 15.0.0 currently in Sid/Bullseye).

If you understand what's going on, good, I'll accept a patch. Otherwise,
if there's nothing I can do, why leaving this bug open? Why should I
spend more time on this, rather than adding more features and doing more
useful Q/A, or let's say, work more on the Python 2 removal in Bullseye?
I spent already a lot of time on this...

Cheers,

Thomas Goirand (zigo)

#908862#75
Date:
2019-10-31 22:46:22 UTC
From:
To:
Not "some env". It fails in BOTH reproducible builds and my
autobuilders. Those are INDEPENDENT. There is OBJECTIVELY a problem
behind this.

It is not a job of a bug reporter to provide an explanation.
I'm providing a virtual machine for you to test, and you still
refuse the offer. I could understand your unwillingness to work
on something which you don't see as RC, but that's not a good
reason to *close* a report.

You could still accept the offer to access one of my autobuilders.
You can also forward the bug upstream and tag it as "help".

Because it's a bug. This is a matter of principles. We don't close
bugs just because we are not motivated enough to fix them, we close them
when they are fixed.

I'm not actually asking you to do anything, since you already
downgraded the bug and I'm not disputing the severity.
believe it is not a bug, or just because you are unwilling to work on
it?

In the first case, we should reassign to the tech-ctte, since I
clearly disagree. In the second case it is absolutely not proper to
close a bug if we agree it's a bug.

Thanks.

#908862#80
Date:
2019-11-01 13:26:40 UTC
From:
To:
Well, it built well in Bullseye in reproducible build. I do agree
there's objectively a problem, but I don't see how to fix it.

I'm not convince about this.
Not really. I tried, but didn't understood what's going on.

This is only task management, and not something you should take
personally, really.

Technically speaking, there's no issue in Debian itself, just in some
environment which aren't setup in a "normal way", apparently. So yes,
this is a bug, but it doesn't affect Debian directly. IMO, going through
the technical committee for this is a waste of their time: we both agree
there's an issue that's worth fixing. You just don't agree on my way
triaging the bugs of the team, and probably feel disappointed that I
don't wish to spend more time on this. I do understand the frustration,
but at the end, this is still up to me to decide how I wish to organize
my package maintenance.
otherwise: thanks for the bug report, I do recognize there's a bug)

I'm closing it because, at this point in time, I'm puzzled understanding
how I could help and fix it, and it looks like nobody is going to work
on this in the near future (the bug has been opened for months already,
with no evolution...). Added to the fact Debian itself isn't affected
(ie: it builds on buildd and in any "normal" environment), then I prefer
to triage this bug as closed, so I can focus on other things which I
evaluate as more important. I know others in Debian keep open bugs
forever, even if there's only a very small hope for fixing. That's not
what I do, I like to keep things open just as a TODO for future work,
and hope to address eventually absolutely all open bugs. I feel like
keeping open bugs for years is lying to our users, and doesn't help.

Anyway, to move forward, I did what you suggested, and I have open a bug
upstream:
https://bugs.launchpad.net/neutron/+bug/1850928

and I also asked to Mike Bayer on IRC. He is the upstream for SQLAlchemy
and Alembic, and also commits on oslo.db (which 3 packages probably are
involved in the issue here). Let's hope we get an answer and fix this.

Cheers,

Thomas Goirand (zigo)

#908862#85
Date:
2019-11-01 15:12:49 UTC
From:
To:
We don't really know. You describe your laptop as "normal" and both
my autobuilders and reproducible-builds (where there is a build
failure in unstable, btw) as "not normal", but without a real
understanding of the root cause, this is pure speculation.

It could be, for example, that the test suite assumes a certain
feature to be present on the build machine which is not mandated by
policy (for example a CPU speed greater than "X").
packages which FTBFS randomly:

https://bugs.debian.org/cgi-bin/pkgreport.cgi?dist=unstable;include=subject%3AFTBFS+randomly;submitter=sanvila%40debian.org

because I believe that packages should either build all the time or
show a clear error message explaining why the build fails.

By closing this bug, you are implicitly telling me that I should track
all those bugs outside the Debian BTS, in which case I should better
stop doing QA on Debian altogether.

Not really. I'm disappointed that you want to remove this bug from the
list of bugs we would like to see fixed.

I'm not really sure you understand the frustration. Please take a look
at my list of "FTBFS randomly" bugs above.

Then please allow it to be in the BTS with whatever severity you feel
comfortable with (including wishlist).

Nothing will prevent you from focusing on those more important things
if we keep the bug open with a lower severity. That's what severities
are for.

I agree. However, if we close a bug without actually fixing it, that
would be also a way of lying.

Thanks a lot! But I still expect the BTS to be a reflect of the known
problems, regardless of our willingness or motivation to fix them.

Severities exist to prioritize the bugs. If you think this bug is low
priority, set it to "wishlist".

"Closed" is not a severity.

Thanks.

#908862#92
Date:
2019-11-01 15:54:02 UTC
From:
To:
Ok, got you. I've reopened the bug.

Thomas Goirand (zigo)

#908862#99
Date:
2019-11-01 16:08:49 UTC
From:
To:
Hi. Some technical notes:

This night I tried to build "neutron" in bullseye a lot of times in a
lot of different virtual machines. These are the failure rates
according to the machine type:

Heztner CX11 with 1 CPU, 2GB of RAM and 2GB of swap
  Failure rate: 100%

Self-hosted KVM machine with 1 CPU, 7GB of RAM and 4GB swap
  Failure rate: 100%

Scaleway DEV1-S machine with 2 CPUs, 2GB of RAM and 2GB of swap
  Failure rate: 100%

Hetzner CX21 with 2 CPU, 4GB of RAM and 4GB of swap
  Failure rate: 6%


My wild guess:

I think it's not lack of RAM, because it builds ok on CX21 most of the
type and it fails on my self-hosted KVM machines which have more RAM.

I guess it should be some sort of race condition which is just more
likely to happen on single-CPU systems (but not impossible to happen
on multi-core, just less likely).

Before you ask: Building on single-CPU is still useful and convenient:

https://people.debian.org/~sanvila/single-cpu/

Thanks.

#908862#104
Date:
2019-11-02 17:54:05 UTC
From:
To:
retitle 908862 neutron: FTBFS randomly (failing tests)
thanks

Dear Thomas: Please don't put my name in the bug title like you did.
That's a gross mischaracterization of what's really happening.
I'm recovering the original title, but adding the word "randomly",
meaning that it fails in some systems and it works in some others.

I believe this is a race condition of some sort. You can probably
reproduce the failure in your laptop by doing this:

taskset -c 0 dpkg-buildpackage -uc -us -A

I hope this should finally make moreinfo and unreproducible not to
apply anymore.

Disabling the tests in debian/rules if a single CPU is detected would
probably help, but, as I've shown, the problem is not limited to
single-CPU systems. I would bet that this also fails in your system
if you try enough times, but we might better stop here and let upstream
do their job at finding a proper fix.

For the Debian package, I would just disable the failing tests, as
they do not seem to be trusted.

Thanks.

#908862#111
Date:
2019-11-02 19:04:24 UTC
From:
To:
I initially wrote "Santiago Vila + reproducible build" though it got cut
at the plus ... :/ Sorry for this.

I need to repair openvswitch first, as since it's missing, neutron
build-dep can't be installed (I managed to break it, somehow...).

Disabling tests is really not what I want to do. These tests are
catching real world issues, and I do need to detect the problems.

Thomas Goirand (zigo)

#908862#116
Date:
2024-05-10 14:22:06 UTC
From:
To:
After 5 years, I still have no clue on why Neutron couldn't build in
your env. What I know for sure: I wouldn't build Neutron with only 8GB
of RAM in a VM.

At this point, I don't see why this bug should stay open. Nobody is
investing time on it, neither you, me or upstream. And nobody is able to
re-trigger it but you, with not enough RAM, it seems.

Closing this bug then...

Feel free to reopen if you still think it should, but them please
explain how it should be reproduced, and try with enough RAM. Note that
stestr will spawn as many process as there's CPU, so the more VCPU you
have, the more RAM you should provide too.

Cheers,

Thomas Goirand (zigo)

#908862#121
Date:
2024-05-10 15:34:49 UTC
From:
To:
El 10/5/24 a las 16:22, Thomas Goirand escribió:

I don't keep failed logs for neutron anymore, and I believe the reason is
that I changed my build setup recently.

Before, I was using directory-based chroots, with the "default" profile
and partition bindings defined in /etc/schroot/default/fstab.

Now I'm using file-based chroots, with the "sbuild" profile and partition
bindings defined in /etc/schroot/sbuild/fstab.

Now this bug is closed and I agree that it's better that it's closed.

However, please do not spread misinformation about the cause of the bug.

No, it was not lack of RAM. I knew it was not when I reported it, and this is
still true, as I have successful build logs with machines with 2 CPUs and 8 GB of RAM.

Moreover, neutron still fails in bullseye and bookworm here:

https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html

Maybe you could ask reproducible-builds people to tell you how to reproduce it.

At least I gave you a machine to reproduce it, and you verified that the failure
happened indeed. If any of us had thought about changing the build setup
(in the same machine) maybe we could have discovered the reason at the time.

On the positive side, I think we were very close. If a similar bug happens and
you can reproduce it in a machine which I provide, I think the next logical step
should be to extend the VM offer to upstream.

Also on the positive side: If you can't reproduce any currently open bug reported
by Lucas Nussbaum, please contact me privately, I can provide a machine with the
exact specs he used for the build.

Thanks.

#908862#126
Date:
2024-05-10 15:39:17 UTC
From:
To:
fixed 908862 2:24.0.0-2
tags 908862 + bullseye bookworm
thanks

As I did with ceilometer, I'm fixing the metadata with this message.

Thanks.