- Package:
- src:neutron
- Source:
- neutron
- Submitter:
- Santiago Vila
- Date:
- 2024-05-10 15:45:05 UTC
- Severity:
- important
- Tags:
Dear maintainer:
I tried to build this package but it failed:
--------------------------------------------------------------------------------
[...]
debian/rules build-indep
pyversions: missing X(S)-Python-Version in control file, fall back to debian/pyversions
pyversions: missing debian/pyversions file, fall back to supported versions
py3versions: no X-Python3-Version in control file, using supported versions
dh build-indep --buildsystem=python_distutils --with python3,systemd
dh_update_autotools_config -i -O--buildsystem=python_distutils
dh_autoreconf -i -O--buildsystem=python_distutils
dh_auto_configure -i -O--buildsystem=python_distutils
dh_auto_configure: Please use the third-party "pybuild" build system instead of python-distutils
dh_auto_configure: This feature will be removed in compat 12.
debian/rules override_dh_auto_build
make[1]: Entering directory '/<<PKGBUILDDIR>>'
pyversions: missing X(S)-Python-Version in control file, fall back to debian/pyversions
pyversions: missing debian/pyversions file, fall back to supported versions
[... snipped ...]
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in wrapped
return f(*dup_args, **dup_kwargs)
File "/<<PKGBUILDDIR>>/neutron/plugins/ml2/drivers/type_tunnel.py", line 154, in sync_allocations
allocs = ctx.session.query(self.model).all()
File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2783, in all
return list(self)
File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2935, in __iter__
return self._execute_and_instances(context)
File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2958, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 948, in execute
return meth(self, multiparams, params)
File "/usr/lib/python3/dist-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
context)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
util.raise_from_cause(newraise, exc_info)
File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise
raise value.with_traceback(tb)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
context)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 508, in do_execute
cursor.execute(statement, parameters)
oslo_db.exception.DBNonExistentTable: (sqlite3.OperationalError) no such table: ml2_geneve_allocations [SQL: 'SELECT ml2_geneve_allocations.geneve_vni AS ml2_geneve_allocations_geneve_vni, ml2_geneve_allocations.allocated AS ml2_geneve_allocations_allocated \nFROM ml2_geneve_allocations'] (Background on this error at: http://sqlalche.me/e/e3q8)
----------------------------------------------------------------------
Ran 14783 tests in 3712.856s
FAILED (failures=292, skipped=1221)
make[1]: *** [debian/rules:46: override_dh_install] Error 1
make[1]: Leaving directory '/<<PKGBUILDDIR>>'
make: *** [debian/rules:6: binary-indep] Error 2
dpkg-buildpackage: error: fakeroot debian/rules binary-indep subprocess returned exit status 2
--------------------------------------------------------------------------------
The build was made in my autobuilder with "dpkg-buildpackage -A"
on buster but it also fails here in buster and sid:
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html
where you can get a full build log if you need it.
If this is really a bug in one of the build-depends, please use reassign and affects,
so that this is still visible in the BTS web page for this package.
Please re-re-reconsider uploading packages in source-only form (dpkg-buildpackage -S).
I can't compare my build log with the official one because there is
simply no official build log:
https://buildd.debian.org/status/package.php?p=neutron
Thanks.
I'm sorry, this is probably my fault, as I forgot to try without eatmydata. (Version 9.1.1-3 used to build ok with eatmydata, but this was a long time ago). Trying now with a plain buster chroot. If it builds ok I'll retitle appropriately. (Maybe eatmydata is the reason it fails in reproducible builds too). Thanks.
As I wrote to you privately, it is unlikely that the issue is using eatmydata, because I use it too. Cheers, Thomas Goirand (zigo)
Closing this bug as it builds fine for me. Cheers, Thomas Goirand (zigo)
reopen 908862 found 908862 2:13.0.2-1 found 908862 2:13.0.2-3 found 908862 2:13.0.2-5 thanks Hello Thomas. This is still happening, both in reproducible-builds and also in my own autobuilders. The reproducible-builds logs are available here: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html and I've just put my build logs here: https://people.debian.org/~sanvila/build-logs/neutron/ (I'm still using eatmydata, since you said you use it yourself and it's supposed to work). Please take a look at this. Just saying "it works for me" does not help. While we are at it, I have yet to see how this is not serious, being a FTBFS bug in a supported architecture, but I will be happy to skip the discussion about the severity as far as you are really willing to solve the problem. If you still need a machine to reproduce this, please contact me privately. Thanks.
Santiago Vila <sanvila@unex.es>: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html What's weird, is that it's not the same amount of failures. Correct. Here, you're recognizing that there's a pattern, and that we've seen already a few times that, building works for me, but fails in your environment. So, quite the opposite, the "it works for me" is important, and needs to be addressed. We need to figure out what is different in your environment and mine. To investigate this, I've just tried building in a porter box, which normally should be very close (if not identical) to what we have in the buildd network. What I did was really only download the build dependencies in the schroot, and run dpkg-buildpackage in the schroot: so nothing special, really. The build was on barriere.debian.org. The result is that it works perfectly. I do expect to see the same result if uploading source only. One thing I've observed, is that most of the times (or every time?), the test errors we've seen are in SQLite, with a "table doesn't exist" error. I have no idea (yet) why it's like that. An FTBFS which we would observe everywhere, including in porter boxes (which are normally the same as the buildds), in all configurations including the most basic one (like mine), would indeed qualify as an RC bug. Though if there's an FTBFS only in some specific environments, which we aren't even able to explain, doesn't look like a good candidate for an RC bug. Unfortunately, last time I tried, I could indeed see the FTBFS, but I wasn't able to understand why it was happening in your build env and not the standard one (ie: porter box or my own sbuild setup). As you're actually paying for the VM to be hosted, I don't dare to ask you to leave it up, and I'm unsure what to do... :/ I'm open to ideas and I am willing to spend the time to understand, if you have some clues, Cheers, Thomas Goirand (zigo)
This is only something to worry about when I give access to a machine and the maintainer takes not days but weeks to ssh into it and try to debug the problem. I trust that this will not happen here. I've prepared a machine for you. Details in private email. If you don't know what's wrong and a failure in the tests does not indicate that the package is misbuilt, then disabling the test is the only reasonable thing to do. Making the build fail just for fun does not make any sense. However, please take the time to investigate the problem properly, as that's the purpose of offering a test machine. [ BTW: Once again I have to remind you that just sending a message to the nnnnnn@bugs.debian.org address does *not* reach the submitter. I found this message from you after I saw that a new neutron package propagated to testing and still built unrealiably ]. Thanks.
Hi, It looks like 2:15.0.0-2 built well in Bullseye, so I guess this bug can be closed now... Thomas
reopen 908862 thanks Sorry for the reopening but the problem is still there and it's not just my problem, as it fails in both reproducible-builds: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html and my own autobuilders, I've added the last build log here: https://people.debian.org/~sanvila/build-logs/neutron/ I offered a virtual machine in the past, please contact me privately if you still need it to reproduce the failure. Thanks.
Santiago, This is a sad situation... The point of running unit tests at package build time, is to avoid regressions, and to make sure the package is healthy and working. This is the case for this package, where the unit tests are demonstrating the package state, and the fact that it integrates well with the current version of its build-dependencies. The package built well in my laptop with a normal sbuild setup, in the buildd (also with sbuild), builds well in reproducible build in Buster in 2 different environment (see the bullseye builds). I have just tried another time on my laptop, with an up-to-date Sid env, and it worked once more. All of this shows exactly what I intended to do when running unit tests at build time. That's more than enough for me. Now, it fails on "some env", but you can't provide an explanation on what going on, and in what way your build environment is different from the normal sbuild which both I and the buildd network are running, then I don't know what I can do, and it feels like all of this is just a loss of time, especially when knowing that Neutron is otherwise fully working as expected (I use it in production, both the version currently in Buster, and version 15.0.0 currently in Sid/Bullseye). If you understand what's going on, good, I'll accept a patch. Otherwise, if there's nothing I can do, why leaving this bug open? Why should I spend more time on this, rather than adding more features and doing more useful Q/A, or let's say, work more on the Python 2 removal in Bullseye? I spent already a lot of time on this... Cheers, Thomas Goirand (zigo)
Not "some env". It fails in BOTH reproducible builds and my autobuilders. Those are INDEPENDENT. There is OBJECTIVELY a problem behind this. It is not a job of a bug reporter to provide an explanation. I'm providing a virtual machine for you to test, and you still refuse the offer. I could understand your unwillingness to work on something which you don't see as RC, but that's not a good reason to *close* a report. You could still accept the offer to access one of my autobuilders. You can also forward the bug upstream and tag it as "help". Because it's a bug. This is a matter of principles. We don't close bugs just because we are not motivated enough to fix them, we close them when they are fixed. I'm not actually asking you to do anything, since you already downgraded the bug and I'm not disputing the severity. believe it is not a bug, or just because you are unwilling to work on it? In the first case, we should reassign to the tech-ctte, since I clearly disagree. In the second case it is absolutely not proper to close a bug if we agree it's a bug. Thanks.
Well, it built well in Bullseye in reproducible build. I do agree there's objectively a problem, but I don't see how to fix it. I'm not convince about this. Not really. I tried, but didn't understood what's going on. This is only task management, and not something you should take personally, really. Technically speaking, there's no issue in Debian itself, just in some environment which aren't setup in a "normal way", apparently. So yes, this is a bug, but it doesn't affect Debian directly. IMO, going through the technical committee for this is a waste of their time: we both agree there's an issue that's worth fixing. You just don't agree on my way triaging the bugs of the team, and probably feel disappointed that I don't wish to spend more time on this. I do understand the frustration, but at the end, this is still up to me to decide how I wish to organize my package maintenance. otherwise: thanks for the bug report, I do recognize there's a bug) I'm closing it because, at this point in time, I'm puzzled understanding how I could help and fix it, and it looks like nobody is going to work on this in the near future (the bug has been opened for months already, with no evolution...). Added to the fact Debian itself isn't affected (ie: it builds on buildd and in any "normal" environment), then I prefer to triage this bug as closed, so I can focus on other things which I evaluate as more important. I know others in Debian keep open bugs forever, even if there's only a very small hope for fixing. That's not what I do, I like to keep things open just as a TODO for future work, and hope to address eventually absolutely all open bugs. I feel like keeping open bugs for years is lying to our users, and doesn't help. Anyway, to move forward, I did what you suggested, and I have open a bug upstream: https://bugs.launchpad.net/neutron/+bug/1850928 and I also asked to Mike Bayer on IRC. He is the upstream for SQLAlchemy and Alembic, and also commits on oslo.db (which 3 packages probably are involved in the issue here). Let's hope we get an answer and fix this. Cheers, Thomas Goirand (zigo)
We don't really know. You describe your laptop as "normal" and both my autobuilders and reproducible-builds (where there is a build failure in unstable, btw) as "not normal", but without a real understanding of the root cause, this is pure speculation. It could be, for example, that the test suite assumes a certain feature to be present on the build machine which is not mandated by policy (for example a CPU speed greater than "X"). packages which FTBFS randomly: https://bugs.debian.org/cgi-bin/pkgreport.cgi?dist=unstable;include=subject%3AFTBFS+randomly;submitter=sanvila%40debian.org because I believe that packages should either build all the time or show a clear error message explaining why the build fails. By closing this bug, you are implicitly telling me that I should track all those bugs outside the Debian BTS, in which case I should better stop doing QA on Debian altogether. Not really. I'm disappointed that you want to remove this bug from the list of bugs we would like to see fixed. I'm not really sure you understand the frustration. Please take a look at my list of "FTBFS randomly" bugs above. Then please allow it to be in the BTS with whatever severity you feel comfortable with (including wishlist). Nothing will prevent you from focusing on those more important things if we keep the bug open with a lower severity. That's what severities are for. I agree. However, if we close a bug without actually fixing it, that would be also a way of lying. Thanks a lot! But I still expect the BTS to be a reflect of the known problems, regardless of our willingness or motivation to fix them. Severities exist to prioritize the bugs. If you think this bug is low priority, set it to "wishlist". "Closed" is not a severity. Thanks.
Ok, got you. I've reopened the bug. Thomas Goirand (zigo)
Hi. Some technical notes: This night I tried to build "neutron" in bullseye a lot of times in a lot of different virtual machines. These are the failure rates according to the machine type: Heztner CX11 with 1 CPU, 2GB of RAM and 2GB of swap Failure rate: 100% Self-hosted KVM machine with 1 CPU, 7GB of RAM and 4GB swap Failure rate: 100% Scaleway DEV1-S machine with 2 CPUs, 2GB of RAM and 2GB of swap Failure rate: 100% Hetzner CX21 with 2 CPU, 4GB of RAM and 4GB of swap Failure rate: 6% My wild guess: I think it's not lack of RAM, because it builds ok on CX21 most of the type and it fails on my self-hosted KVM machines which have more RAM. I guess it should be some sort of race condition which is just more likely to happen on single-CPU systems (but not impossible to happen on multi-core, just less likely). Before you ask: Building on single-CPU is still useful and convenient: https://people.debian.org/~sanvila/single-cpu/ Thanks.
retitle 908862 neutron: FTBFS randomly (failing tests) thanks Dear Thomas: Please don't put my name in the bug title like you did. That's a gross mischaracterization of what's really happening. I'm recovering the original title, but adding the word "randomly", meaning that it fails in some systems and it works in some others. I believe this is a race condition of some sort. You can probably reproduce the failure in your laptop by doing this: taskset -c 0 dpkg-buildpackage -uc -us -A I hope this should finally make moreinfo and unreproducible not to apply anymore. Disabling the tests in debian/rules if a single CPU is detected would probably help, but, as I've shown, the problem is not limited to single-CPU systems. I would bet that this also fails in your system if you try enough times, but we might better stop here and let upstream do their job at finding a proper fix. For the Debian package, I would just disable the failing tests, as they do not seem to be trusted. Thanks.
I initially wrote "Santiago Vila + reproducible build" though it got cut at the plus ... :/ Sorry for this. I need to repair openvswitch first, as since it's missing, neutron build-dep can't be installed (I managed to break it, somehow...). Disabling tests is really not what I want to do. These tests are catching real world issues, and I do need to detect the problems. Thomas Goirand (zigo)
After 5 years, I still have no clue on why Neutron couldn't build in your env. What I know for sure: I wouldn't build Neutron with only 8GB of RAM in a VM. At this point, I don't see why this bug should stay open. Nobody is investing time on it, neither you, me or upstream. And nobody is able to re-trigger it but you, with not enough RAM, it seems. Closing this bug then... Feel free to reopen if you still think it should, but them please explain how it should be reproduced, and try with enough RAM. Note that stestr will spawn as many process as there's CPU, so the more VCPU you have, the more RAM you should provide too. Cheers, Thomas Goirand (zigo)
El 10/5/24 a las 16:22, Thomas Goirand escribió: I don't keep failed logs for neutron anymore, and I believe the reason is that I changed my build setup recently. Before, I was using directory-based chroots, with the "default" profile and partition bindings defined in /etc/schroot/default/fstab. Now I'm using file-based chroots, with the "sbuild" profile and partition bindings defined in /etc/schroot/sbuild/fstab. Now this bug is closed and I agree that it's better that it's closed. However, please do not spread misinformation about the cause of the bug. No, it was not lack of RAM. I knew it was not when I reported it, and this is still true, as I have successful build logs with machines with 2 CPUs and 8 GB of RAM. Moreover, neutron still fails in bullseye and bookworm here: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/neutron.html Maybe you could ask reproducible-builds people to tell you how to reproduce it. At least I gave you a machine to reproduce it, and you verified that the failure happened indeed. If any of us had thought about changing the build setup (in the same machine) maybe we could have discovered the reason at the time. On the positive side, I think we were very close. If a similar bug happens and you can reproduce it in a machine which I provide, I think the next logical step should be to extend the VM offer to upstream. Also on the positive side: If you can't reproduce any currently open bug reported by Lucas Nussbaum, please contact me privately, I can provide a machine with the exact specs he used for the build. Thanks.
fixed 908862 2:24.0.0-2 tags 908862 + bullseye bookworm thanks As I did with ceilometer, I'm fixing the metadata with this message. Thanks.