#1016750 sbcl breaks cl-ironclad autopkgtest on armhf: Heap exhausted, game over.

#1016750#5
Date:
2022-08-06 14:19:55 UTC
From:
To:
Dear maintainer(s),

With a recent upload of sbcl the autopkgtest of cl-ironclad fails in
testing when that autopkgtest is run with the binary packages of sbcl
from unstable. It passes when run with only packages from testing. In
tabular form:

                        pass            fail
sbcl                   from testing    2:2.2.6-2
cl-ironclad            from testing    0.57-1
all others             from testing    from testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of sbcl to testing
[1]. Due to the nature of this issue, I filed this bug report against
both packages. Can you please investigate the situation and reassign the
bug to the right package?

More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[1] https://qa.debian.org/excuses.php?package=sbcl

https://ci.debian.net/data/autopkgtest/testing/armhf/c/cl-ironclad/24381695/log.gz

; wrote
/tmp/autopkgtest-lxc.d2c27h4h/downtmp/autopkgtest_tmp/usr/share/common-lisp/source/ironclad/src/ciphers/tea-tmp93OFNPHA.fasl
; compilation finished in 0:00:00.160
; compiling file
"/usr/share/common-lisp/source/ironclad/src/ciphers/threefish.lisp"
(written 18 FEB 2022 01:39:10 PM):
Heap exhausted during garbage collection: 16 bytes available, 48 requested.
         Immobile Object Counts
  Gen   GF type  fdefn symbol   code  Boxed   Cons    Raw   Code  SmMix
  Mixed  LgRaw LgCode  LgMix Waste%       Alloc        Trig   Dirty GCs
Mem-age
   1     0    0      0      0      0   7121    939   9668      5    273
    827      0      0     17   19.9    61867480    32499253   18850   1
  1.1503
   2     0    0      0      0      0  19865   2766  12586     84    836
   1681     10     34    127   11.7   137430448     5368709   23089   0
  0.9496
   3     0    0      0      0      0  45116   7190   7447    604   3552
   1673    239    467    331    1.0   270091288     2000000    7806   0
  0.4559
   4     0    0      0      0      0      0      0      0      0      0
      0      0      0      0    0.0           0     2000000       0   0
  0.0000
   5     0    0      0      0      0      0      0      0      0      0
      0      0      0      0    0.0           0     2000000       0   0
  0.0000
   6     0    0      0      0      0   1370    471   1242   3507    314
     40    255    134    281    1.9    30584464     2000000     251   0
  0.0000
Tot     0    0      0      0      0  73472  11366  30943   4200   4975
  4221    504    635    756    6.9   499973680 [93.1% of 536870912 max]
GC control variables:
    *GC-INHIBIT* = true
    *GC-PENDING* = true
fatal error encountered in SBCL pid 1357:
Heap exhausted, game over.

    0:     0xf746478c pc=0x98 {0x4f0380b8+b0fc7fe0} {code_serialno=3901}
    1:     0xf7464730 pc=0x4f005760 {0x4f005000+0760} (FLET
"WITHOUT-GCING-BODY-5" :IN SB-KERNEL::SUB-GC)
    2:     0xf74646d4 pc=0x4f0053f8 {0x4f005000+03f8} (FLET
"WITHOUT-INTERRUPTS-BODY-1" :IN SB-KERNEL::SUB-GC)
    3:     0xf7464678 pc=0x4f005108 {0x4f005000+0108} SB-KERNEL::SUB-GC
    4:     0xf746466c pc=(nil) LRA=0x3e2e7    5: [I]0xf7464654
pc=0x3e4d8 {0x4fb3f240+b04ff298} {code_serialno=f5f}
    6:     0xf7464634 pc=0x5028c858 {0x5028c4b0+03a8} SB-C::MAKE-WIRED-TN
    7:     0xf7464604 pc=0x4fd9a348 {0x4fd9a030+0318}
SB-C::IR2-CONVERT-FULL-CALL-ARGS
    8:     0xf74645c4 pc=0x4f8506f8 {0x4f8505c8+0130}
SB-C::IR2-CONVERT-FIXED-FULL-CALL
    9:     0xf74645b4 pc=0x4f3d8b18 {0x4f3d8990+0188}
SB-C::IR2-CONVERT-FULL-CALL
   10:     0xf7464598 pc=0x4f6fe720 {0x4f6fda90+0c90}
SB-C::IR2-CONVERT-BLOCK
   11:     0xf7464584 pc=0x4f2c8f58 {0x4f2c8dc8+0190} SB-C::IR2-CONVERT
   12:     0xf7464550 pc=0x504637a8 {0x50463198+0610}
SB-C::%COMPILE-COMPONENT
   13:     0xf7464538 pc=0x5041c820 {0x5041c258+05c8}
SB-C::COMPILE-COMPONENT
   14:     0xf7464514 pc=0x5037ca10 {0x5037c7b8+0258} SB-C::COMPILE-TOPLEVEL
   15:     0xf7464500 pc=0x503f0eb8 {0x503f0c28+0290}
SB-C::CONVERT-AND-MAYBE-COMPILE
   16:     0xf74644b4 pc=0x50393270 {0x50392a10+0860}
SB-C::PROCESS-TOPLEVEL-FORM
   17:     0xf746449c pc=0x50457490 {0x504573a0+00f0}
SB-C::PROCESS-TOPLEVEL-PROGN
   18:     0xf7464450 pc=0x50393188 {0x50392a10+0778}
SB-C::PROCESS-TOPLEVEL-FORM
   19:     0xf7464404 pc=0x50393408 {0x50392a10+09f8}
SB-C::PROCESS-TOPLEVEL-FORM
   20:     0xf74643b8 pc=0x50393408 {0x50392a10+09f8}
SB-C::PROCESS-TOPLEVEL-FORM
   21:     0xf7464334 pc=0x502f60e0 {0x502f3b20+25c0} (LAMBDA
(SB-KERNEL::FORM &KEY :CURRENT-INDEX &ALLOW-OTHER-KEYS) :IN
SB-C::SUB-COMPILE-FILE)
   22:     0xf74642c0 pc=0x4f0c5038 {0x4f0c49a8+0690}
SB-C::%DO-FORMS-FROM-INFO
   23:     0xf7464244 pc=0x502f3ea0 {0x502f3b20+0380} (FLET "LAMBDA0"
:IN "SYS:SRC;COMPILER;MAIN.LISP")
   24:     0xf74641e8 pc=0x50283490 {0x50282968+0b28} (FLET
SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT)
   25:     0xf7464124 pc=0x502f5258 {0x502f3b20+1738} SB-C::SUB-COMPILE-FILE
   26:     0xf7464078 pc=0x4fb52548 {0x4fb51ac8+0a80} COMPILE-FILE
   27:     0xf746403c pc=0x510f2840 {0x510f2600+0240}
UIOP/UTILITY::CALL-WITH-MUFFLED-CONDITIONS
   28:     0xf7464020 pc=0x510f2548 {0x510f2398+01b0}
UIOP/PATHNAME::CALL-WITH-ENOUGH-PATHNAME
   29:     0xf7463fb8 pc=0x510f15b8 {0x510f10a8+0510}
UIOP/LISP-BUILD::COMPILE-FILE*
   30:     0xf7463f98 pc=0x510f0ac8 {0x510f0700+03c8}
ASDF/LISP-ACTION::PERFORM-LISP-COMPILATION
   31:     0xf7463f7c pc=0x510f0540 {0x510f02a8+0298} (SB-PCL::EMF
ASDF/ACTION::PERFORM)
   32:     0xf7463f38 pc=0x510efe28 {0x510ef968+04c0} (LAMBDA () :IN
ASDF/ACTION::CALL-WHILE-VISITING-ACTION)
   33:     0xf7463e80 pc=0x510ef478 {0x510ee078+1400}
(SB-PCL::FAST-METHOD ASDF/ACTION::PERFORM-WITH-RESTARTS :AROUND (T T))
   34:     0xf7463e68 pc=0x510edfd0 {0x510edd48+0288}
(SB-PCL::FAST-METHOD ASDF/PLAN::PERFORM-PLAN (T))
   35:     0xf7463df0 pc=0x502831b0 {0x50282968+0848} (FLET
SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT)
   36:     0xf7463dc0 pc=0x510ecde0 {0x510ec7e8+05f8}
(SB-PCL::FAST-METHOD ASDF/PLAN::PERFORM-PLAN :AROUND (T))
   37:     0xf7463da0 pc=0x510ec770 {0x510ec468+0308}
(SB-PCL::FAST-METHOD ASDF/OPERATE::OPERATE (ASDF/OPERATION::OPERATION
ASDF/COMPONENT::COMPONENT))
   38:     0xf7463d74 pc=0x510ec3d8 {0x510eb578+0e60} (SB-PCL::EMF
ASDF/OPERATE::OPERATE)
   39:     0xf7463d20 pc=0x510e94a0 {0x510e83c8+10d8} (LAMBDA () :IN
ASDF/OPERATE::OPERATE)
   40:     0xf7463ccc pc=0x510e8880 {0x510e83c8+04b8}
(SB-PCL::FAST-METHOD ASDF/OPERATE::OPERATE :AROUND (T T))
   41:     0xf7463ca0 pc=0x510ec3d8 {0x510eb578+0e60} (SB-PCL::EMF
ASDF/OPERATE::OPERATE)
   42:     0xf7463c4c pc=0x510e94a0 {0x510e83c8+10d8} (LAMBDA () :IN
ASDF/OPERATE::OPERATE)
   43:     0xf7463bf8 pc=0x510e8880 {0x510e83c8+04b8}
(SB-PCL::FAST-METHOD ASDF/OPERATE::OPERATE :AROUND (T T))
   44:     0xf7463b04 pc=0x510eaff8 {0x510e9c90+1368}
ASDF/SESSION::CALL-WITH-ASDF-SESSION
   45:     0xf7463ab0 pc=0x510e9708 {0x510e83c8+1340} (LAMBDA () :IN
ASDF/OPERATE::OPERATE)
   46:     0xf74639c0 pc=0x510eaff8 {0x510e9c90+1368}
ASDF/SESSION::CALL-WITH-ASDF-SESSION
   47:     0xf746396c pc=0x510e8880 {0x510e83c8+04b8}
(SB-PCL::FAST-METHOD ASDF/OPERATE::OPERATE :AROUND (T T))
   48:     0xf7463958 pc=0x510e8378 {0x510e81e8+0190}
ASDF/OPERATE::LOAD-SYSTEM
   49:     0xf7463950 pc=0x510e8100 {0x510e8000+0100} (LAMBDA () :IN
"/tmp/autopkgtest-lxc.d2c27h4h/downtmp/build.Sop/src/debian/tests/runtests.lisp")
   50:     0xf74638ec pc=0x4f521ff0 {0x4f520ad8+1518}
SB-INT::SIMPLE-EVAL-IN-LEXENV
   51:     0xf74638d8 pc=0x4f0c6148 {0x4f0c6000+0148} SB-EXT::EVAL-TLF
   52:     0xf74637b8 pc=0x4f062438 {0x4f0601a8+2290} (LABELS
SB-FASL::EVAL-FORM :IN SB-INT::LOAD-AS-SOURCE)
   53:     0xf746370c pc=0x4f061870 {0x4f0601a8+16c8} (LAMBDA
(SB-KERNEL::FORM &KEY :CURRENT-INDEX &ALLOW-OTHER-KEYS) :IN
SB-INT::LOAD-AS-SOURCE)
   54:     0xf7463698 pc=0x4f0c5038 {0x4f0c49a8+0690}
SB-C::%DO-FORMS-FROM-INFO
   55:     0xf74635b0 pc=0x4f060ca0 {0x4f0601a8+0af8} SB-INT::LOAD-AS-SOURCE
   56:     0xf746350c pc=0x4f0c1838 {0x4f0c05a8+1290} (LABELS
SB-FASL::LOAD-STREAM-1 :IN LOAD)
   57:     0xf74634a8 pc=0x4f064840 {0x4f064218+0628}
SB-FASL::CALL-WITH-LOAD-BINDINGS
   58:     0xf7463400 pc=0x4f0c1278 {0x4f0c05a8+0cd0} LOAD
   59:     0xf746337c pc=0x4f05e0c0 {0x4f05cd30+1390} (FLET
SB-IMPL::LOAD-SCRIPT :IN SB-IMPL::PROCESS-SCRIPT)
   60:     0xf7463318 pc=0x4f05d668 {0x4f05cd30+0938} (FLET
SB-UNIX::BODY :IN SB-IMPL::PROCESS-SCRIPT)
   61:     0xf74632a4 pc=0x4f05d1c0 {0x4f05cd30+0490} (FLET
"WITHOUT-INTERRUPTS-BODY-11" :IN SB-IMPL::PROCESS-SCRIPT)
   62:     0xf7463240 pc=0x4f05cec8 {0x4f05cd30+0198}
SB-IMPL::PROCESS-SCRIPT
   63:     0xf7463130 pc=0x4f027f28 {0x4f0270a0+0e88} SB-IMPL::TOPLEVEL-INIT
   64:     0xf74630d0 pc=0x4f026e20 {0x4f026468+09b8} (FLET
SB-UNIX::BODY :IN SB-IMPL::START-LISP)
   65:     0xf7463060 pc=0x4f026a18 {0x4f026468+05b0} (FLET
"WITHOUT-INTERRUPTS-BODY-3" :IN SB-IMPL::START-LISP)
   66:     0xf7463000 pc=0x4f026630 {0x4f026468+01c8} SB-IMPL::%START-LISP
Note: [I] = interrupted
autopkgtest [18:28:32]: test command1

#1016750#14
Date:
2022-08-09 03:08:53 UTC
From:
To:
Hello Paul,

It looks like Lisp just ran out of memory.  Indeed, I can't see this
failure on debci.debian.org at present, which makes sense if it's a
random OOM problem on weaker architectures like armhf -- so, not the
fault of the new sbcl upload.  Is it possible to label the tests as
flaky on only a single architecture?

#1016750#19
Date:
2022-08-09 05:46:41 UTC
From:
To:
Hi Sean,

Yes, but it does so systematically.

Huh? Did you check
https://ci.debian.net/packages/c/cl-ironclad/testing/armhf/ or
https://ci.debian.net/packages/c/cl-ironclad/testing/armel/ You'll see
there that cl-ironclad was retried with sbcl about 11 + 10 times and
every time it failed (and never succeeded).

This isn't random. And, our armhf box has 255 GB of RAM (and armel VM
has 26 GB), so running out of memory isn't likely. What can happen is
that threads use too much resources for the address space, but that's
something under control of the packages in question if I'm correct (but
please correct me if I misunderstand). I'm not saying it's (easily)
fixable, just that it systematically runs out of reachable memory during
the particular test.

Yes [1], but I'm not seeing flaky behavior.

Paul

[1] add the skippable restriction to the test you want to be able to
ignore and exit with exit code 77 when the failure happens on the
architecture you want to ignore. If you want to skip the test on armhf
and armel altogether, I rather suggest you use an "Architecture: !armel
!armhf" line in the d/t/control section of the test.

#1016750#24
Date:
2022-08-09 16:17:13 UTC
From:
To:
Hello,

Right, it's not random.  I was looking at the page for unstable.

I first looked at <https://ci.debian.net/packages/c/cl-ironclad/> and
the failure doesn't show up there -- do you know what that would be?
Then I clicked on unstable but not testing, it would seem.

I'll write to upstream about this.

#1016750#31
Date:
2022-08-09 16:34:11 UTC
From:
To:
Hello,

When I updated SBCL in Debian from 2.2.3 to 2.2.6, our ci infrastructure
detected that the test suite for the cl-ironclad package consistently
runs out of memory on armhf and armel.  This didn't happen before.

https://ci.debian.net/packages/c/cl-ironclad/testing/armhf/
https://ci.debian.net/packages/c/cl-ironclad/testing/armel/

Based on your understanding of the changes between SBCL 2.2.3 and 2.2.6,
does it seem likely that this is an SBCL bug, or rather that changes in
SBCL have uncovered a bug or limitation in the ironclad test suite,
e.g. that it tries to use more memory than a 32-bit address space can
accommodate?

Also asked about here: <https://github.com/sharplispers/ironclad/issues/53>.

Would be grateful for any input.  Thanks.

#1016750#36
Date:
2022-08-09 18:53:41 UTC
From:
To:
https://ci.debian.net/data/autopkgtest/testing/armhf/c/cl-ironclad/24441370/log.gz
shows a GC invariant failure, not a heap exhaustion.
How do I identify the exact git revision of SBCL you're using? I can only
guess at what line# failed an assertion in my current tree.

#1016750#41
Date:
2022-08-10 03:35:05 UTC
From:
To:
Hello,

Hmm okay, sounds like it's a bug in sbcl rather than cl-ironclad, then.

We currently use the release tarballs from URIs matching:

<https://sf.net/sbcl/sbcl-*-source.tar.bz2>

plus these patches:

<https://salsa.debian.org/common-lisp-team/sbcl/-/tree/master/debian/patches>

a couple of which are backported from git.

(If I continue to be the only person working on sbcl packaging in
Debian, I'll probably switch to packaging git revisions not tarballs,
and patches in git not in a directory like this, which'll mean that you
can work with our git repo directly.)

Thanks.

#1016750#46
Date:
2022-08-10 05:19:36 UTC
From:
To:
Hi Sean,

Yes, it's the difficulty of making good UI. In the past we reported the
last overall result, but for testing that includes all tests with
${random} package from unstable so that could be changing all the time,
so we changed that to only report "pure" runs. It's on our wishlist to
change it again and report both flavors on the same page, but we need a
reasonable design to do that without causing too much confusion.

Paul

#1016750#51
Date:
2022-08-21 22:47:40 UTC
From:
To:
Hello,

I've just updated SBCL in Debian to 2.2.7 and dropped two patches
included in that release, which simplifies things a bit.  ci.debian.net
will re-run the cl-ironclad test suite soon.  Let me know if it would be
useful for me to file a Launchpad bug gathering the various links
together.

#1016750#56
Date:
2022-08-27 19:15:18 UTC
From:
To:
I suspect that the heap exhaustion is corrected for this release.
Also I think this particular GC invariant failure is not of the utmost
importance. It seems directly related to the heap exhaustion which leaves
things in an inconsistent state such that 'gc_gen_report_to_file' does not
work. It's unfortunate that such is the case, but it is what it is.  I
would have far been more worried about a GC failure on its own.  Let's see
how things go after you pull in newer source.

#1016750#61
Date:
2022-08-28 19:26:49 UTC
From:
To:
Hello,

You have in mind an upcoming 2.2.8 release, right?  2.2.7 is still
showing the test failure.

Thank you for this analysis.  It would seem that disabling the test is
an acceptable workaround.

#1016750#66
Date:
2022-09-19 02:03:12 UTC
From:
To:
Hello,

Just to report back, it still fails with 2.2.8.  I'll proceed to disable
the test.

https://ci.debian.net/data/autopkgtest/testing/armhf/c/cl-ironclad/26219482/log.gz

#1016750#71
Date:
2022-10-12 00:05:00 UTC
From:
To:
control: reassign -1 cl-ironclad
control: retitle -1 cl-ironclad: fails to compile against recent SBCL on 32-bit ARM architectures

I intend to resolve this bug by disabling the failing compilation.
SBCL upstream do not think there is evidence of a serious SBCL bug.

#1016750#82
Date:
2022-10-12 00:39:20 UTC
From:
To:
control: severity -1 important

Hello,

This reduces the severity of this bug, rather than resolving it.
SBCL is not the only CL implementation in Debian.