#901636 mandoc: "mandoc -mdoc -T man" causes memory dump

Package:
mandoc
Source:
mdocml
Description:
BSD manpage compiler toolset
Submitter:
Bjarni Ingi Gislason
Date:
2021-07-04 16:36:03 UTC
Severity:
important
Tags:
#901636#5
Date:
2018-06-15 21:12:17 UTC
From:
To:
  Output from "mandoc -mdoc -T man groff_mdoc.7":

mandoc: mdoc_man.c:678: print_node: Assertion `n->tok >= MDOC_Dd && n->tok < MDOC_MAX' failed.
Aborted (core dumped)

#901636#10
Date:
2018-07-27 09:22:03 UTC
From:
To:
Bjarni Ingi Gislason <bjarniig@rhi.hi.is> writes:

Where is "groff_mdoc.7" from?  I have no such file on my system, so
there's no way for me to try and reproduce this error.

Bdale

#901636#15
Date:
2018-07-27 09:22:03 UTC
From:
To:
Bjarni Ingi Gislason <bjarniig@rhi.hi.is> writes:

Where is "groff_mdoc.7" from?  I have no such file on my system, so
there's no way for me to try and reproduce this error.

Bdale

#901636#20
Date:
2019-03-01 21:35:58 UTC
From:
To:
  It is in the "groff" package.
#901636#25
Date:
2019-03-01 21:35:58 UTC
From:
To:
  It is in the "groff" package.
#901636#30
Date:
2019-03-01 22:20:34 UTC
From:
To:
Bjarni Ingi Gislason <bjarniig@rhi.hi.is> writes:

Thanks.  I can confirm that I see this error on this source file, too,
with mdocml version 1.14.4-1.

I do not plan to pursue this further, though, since I don't actually use
mdocml any more.  Michael Stapelberg has been taking care of the package
since Feb 2017... I just realized we never actually updated the
Maintainer assertion in the package metadata to reflect that, though.
I've just made that update in the packaging repository for the next upload.

Regards,

Bdale

#901636#35
Date:
2019-03-01 22:20:34 UTC
From:
To:
Bjarni Ingi Gislason <bjarniig@rhi.hi.is> writes:

Thanks.  I can confirm that I see this error on this source file, too,
with mdocml version 1.14.4-1.

I do not plan to pursue this further, though, since I don't actually use
mdocml any more.  Michael Stapelberg has been taking care of the package
since Feb 2017... I just realized we never actually updated the
Maintainer assertion in the package metadata to reflect that, though.
I've just made that update in the packaging repository for the next upload.

Regards,

Bdale

#901636#40
Date:
2021-06-07 16:32:46 UTC
From:
To:
found 901636 1.14.5-1
thanks

Hi!

I think I hit this as well, but I've a minimal example to show for it:
-- >8 --
.Dd June  7, 2021
.Dt SLEEP 1
.Os
.EQ
Z = X + Y
.EN
-- >8 --

Running "mandoc -Tman" on this file produces some output until it
explodes on the .EQ directive (I think? an equivalent thing happens on
the unminimised version (attached)):
-- >8 --
.\" Automatically generated from an mdoc input file.  Do not edit.
.TH "SLEEP" "1" "June 7, 2021" "Debian" "General Commands Manual"
.nh
mandoc: mdoc_man.c:686: print_node: Assertion `n->tok >= MDOC_Dd && n->tok < MDOC_MAX' failed.
Aborted
-- >8 --

Happens on buster (1.14.4-1) and sid (1.14.5-1).

GDB has this to say about it:
-- >8 --
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0xf7e0c515 in __GI_abort () at abort.c:79
#2  0xf7e0c3fe in __assert_fail_base (fmt=<optimized out>, assertion=<optimized out>, file=<optimized out>, line=300,
    function=<optimized out>) at assert.c:92
#3  0xf7e1b1ff in __GI___assert_fail (assertion=assertion@entry=0x565a4c88 "tok >= MDOC_Dd && tok <= MDOC_MAX",
    file=file@entry=0x565a4bba "mdoc_man.c", line=line@entry=300,
    function=function@entry=0x565a4b88 <__PRETTY_FUNCTION__.2> "mdoc_man_act") at assert.c:101
#4  0x5657f29f in mdoc_man_act (tok=<optimized out>) at mdoc_man.c:300
#5  0x5657f4e8 in mdoc_man_act (tok=<optimized out>) at mdoc_man.c:689
#6  print_node (meta=meta@entry=0x565c6600, n=n@entry=0x565cc610) at mdoc_man.c:688
#7  0x56580a4a in man_mdoc (arg=<optimized out>, mdoc=mdoc@entry=0x565c6600) at mdoc_man.c:634
#8  0x5657d8cc in parse (curp=curp@entry=0xffffd450, fd=fd@entry=4, file=file@entry=0xffffd786 "../sleep.1-min")
    at main.c:859
#9  0x565613bf in main (argc=<optimized out>, argv=<optimized out>) at main.c:540

(gdb) up
#6  print_node (meta=meta@entry=0x565c6600, n=n@entry=0x565cc610) at mdoc_man.c:688
(gdb) p n
$2 = (struct roff_node *) 0x565cc610
(gdb) p *n
$3 = {parent = 0x565c6670, child = 0x0, last = 0x0, next = 0x0, prev = 0x565cc590, head = 0x0, body = 0x0, tail = 0x0,
  args = 0x0, norm = 0x0, string = 0x0, span = 0x0, eqn = 0x565cc670, line = 4, pos = 1, flags = 9, prev_font = 0,
  aux = 0, tok = TOKEN_NONE, type = ROFFT_EQN, sec = SEC_NONE, end = ENDBODY_NOT}
-- >8 --

n->tok is what's being passed to mdoc_man_act() which produces the
assertion, and TOKEN_NONE is just below MDOC_Md, according to roff.h:
-- >8 --
    316     ROFF_cblock,
    317     ROFF_RENAMED,
    318     ROFF_USERDEF,
    319     TOKEN_NONE,
    320     MDOC_Dd,
    321     MDOC_Dt,
    322     MDOC_Os,
-- >8 --

The entire document(?) seems to be
-- >8 --
(gdb) p mdoc->first
$14 = (struct roff_node *) 0x565c6670
(gdb) p *mdoc->first
$15 = {parent = 0x0, child = 0x565cc260, last = 0x565cc610, next = 0x0, prev = 0x0, head = 0x0, body = 0x0, tail = 0x0,
  args = 0x0, norm = 0x0, string = 0x0, span = 0x0, eqn = 0x0, line = 0, pos = 0, flags = 3, prev_font = 0, aux = 0,
  tok = TOKEN_NONE, type = ROFFT_ROOT, sec = SEC_NONE, end = ENDBODY_NOT}
(gdb) p *mdoc->first->child
$16 = {parent = 0x565c6670, child = 0x565cc2d0, last = 0x565cc3b0, next = 0x565cc430, prev = 0x0, head = 0x0,
  body = 0x0, tail = 0x0, args = 0x0, norm = 0x0, string = 0x0, span = 0x0, eqn = 0x0, line = 1, pos = 1, flags = 1035,
  prev_font = 0, aux = 0, tok = MDOC_Dd, type = ROFFT_ELEM, sec = SEC_NONE, end = ENDBODY_NOT}
(gdb) p *mdoc->first->child->next
$17 = {parent = 0x565c6670, child = 0x565cc4a0, last = 0x565cc510, next = 0x565cc590, prev = 0x565cc260, head = 0x0,
  body = 0x0, tail = 0x0, args = 0x0, norm = 0x0, string = 0x0, span = 0x0, eqn = 0x0, line = 2, pos = 1, flags = 1035,
  prev_font = 0, aux = 0, tok = MDOC_Dt, type = ROFFT_ELEM, sec = SEC_NONE, end = ENDBODY_NOT}
(gdb) p *mdoc->first->child->next->next
$18 = {parent = 0x565c6670, child = 0x0, last = 0x0, next = 0x565cc610, prev = 0x565cc430, head = 0x0, body = 0x0,
  tail = 0x0, args = 0x0, norm = 0x0, string = 0x0, span = 0x0, eqn = 0x0, line = 3, pos = 1, flags = 1035,
  prev_font = 0, aux = 0, tok = MDOC_Os, type = ROFFT_ELEM, sec = SEC_NONE, end = ENDBODY_NOT}
(gdb) p *mdoc->first->child->next->next->next
$19 = {parent = 0x565c6670, child = 0x0, last = 0x0, next = 0x0, prev = 0x565cc590, head = 0x0, body = 0x0, tail = 0x0,
  args = 0x0, norm = 0x0, string = 0x0, span = 0x0, eqn = 0x565cc670, line = 4, pos = 1, flags = 9, prev_font = 0,
  aux = 0, tok = TOKEN_NONE, type = ROFFT_EQN, sec = SEC_NONE, end = ENDBODY_NOT}
(gdb) p *mdoc->first->child->next->next->next->next
Cannot access memory at address 0x0
-- >8 --

Which is likely to be produced by roff.c#roff_EQ(), which seems to be
one of the very few spots where a TOKEN_NONE is actively produced,
and the only one where ROFFT_EQN is:
-- >8 --
   3297 static int
   3298 roff_EQ(ROFF_ARGS)
   3299 {
   3300     struct roff_node    *n;
   3301
   3302     if (r->man->meta.macroset == MACROSET_MAN)
   3303         man_breakscope(r->man, ROFF_EQ);
   3304     n = roff_node_alloc(r->man, ln, ppos, ROFFT_EQN, TOKEN_NONE);
   3305     if (ln > r->man->last->line)
   3306         n->flags |= NODE_LINE;
   3307     n->eqn = eqn_box_new();
   3308     roff_node_append(r->man, n);
   3309     r->man->next = ROFF_NEXT_SIBLING;
   3310
   3311     assert(r->eqn == NULL);
   3312     if (r->last_eqn == NULL)
   3313         r->last_eqn = eqn_alloc();
   3314     else
   3315         eqn_reset(r->last_eqn);
   3316     r->eqn = r->last_eqn;
   3317     r->eqn->node = n;
   3318
   3319     if (buf->buf[pos] != '\0')
   3320         mandoc_msg(MANDOCERR_ARG_SKIP, ln, pos,
   3321             ".EQ %s", buf->buf + pos);
   3322
   3323     return ROFF_IGN;
   3324 }
-- >8 --

I'm not sure what other token is better suited here (or, indeed, if at
all), but I think someone better-versed than me should stand good chance?

Best,
наб

#901636#47
Date:
2021-07-03 12:12:25 UTC
From:
To:
Hi, sorry, me again, this also affects .TS requests in mdoc mode,
and minimises to:
-- >8 --
.TS
l .
a
.TE
-- >8 --
which dies as such:
-- >8 --
$ { echo '.TS'; echo 'l .'; echo 'a'; echo '.TE'; } | mandoc -Tman -mdoc
.\" Automatically generated from an mdoc input file.  Do not edit.
.TH "UNTITLED" "" "July 3, 2021" "" "LOCAL"
.nh
mandoc: mdoc_man.c:300: mdoc_man_act: Assertion `tok >= MDOC_Dd && tok <= MDOC_MAX' failed.
Aborted
-- >8 --
(original and minimal pages attached)

Attaching GDB yields, rather similarly,
  tok = TOKEN_NONE, type = ROFFT_TBL,

Would it make sense to pass tbl(7) and eqn(7) through,
like it's done in man input mode?

Best,
наб

#901636#52
Date:
2021-07-03 15:41:04 UTC
From:
To:
tags 901636 + patch
thanks

This turned out easier than expected!

I've attached a patch, which regenerates tbl macros in a normalised
format, turning:
-- >8 --
.TS
decimalpoint(q) allbox ;
||lb| lb lb lb
l  l  lb30  lw20  lw(abcd)30 .
Program	Digest	Standard	Length
_
\fBsha1sum\fP	SHA-1	FIPS PUB 180	160 bits	avoid use in security applications
T{
\fBsha224sum\fP, \fBsha256sum\fP,	\^
\fBsha384sum\fP, \fBsha512sum\fP
T}	SHA-2	FIPS PUB 180-2	respective	\^
\fBb2sum\fP	BLAKE2b	RFC 7693	8-512 bits	\=
\fBmd5sum\fP	MD5	RFC 1321	128 bits	\fINOT\fP suitable for any application
.TE


.TS
lb lb lb lb
l  l  l  l  l .
Program	Digest	Standard	Length
_
\fBsha1sum\fP	SHA-1	FIPS PUB 180	160 bits	avoid use in security applications
\fBsha224sum\fP, \fBsha256sum\fP,
\fBsha384sum\fP, \fBsha512sum\fP	SHA-2	FIPS PUB 180-2	respective
\fBb2sum\fP	BLAKE2b	RFC 7693	8-512 bits
\fBmd5sum\fP	MD5	RFC 1321	128 bits	\fINOT\fP suitable for any application
.TE
-- >8 --
into
-- >8 --
.TH "UNTITLED" "" "July 3, 2021" "" "LOCAL"
.nh
.if n .ad l
.TS
allbox box decimalpoint(q) ;
||lb|lblblb
lllb30lw(20)lw(abcd)30.
Program	Digest	Standard	Length
_
\fBsha1sum\fP	SHA-1	FIPS PUB 180	160 bits	avoid use in security applications
T{
\fBsha224sum\fP, \fBsha256sum\fP,	\^
\fBsha384sum\fP, \fBsha512sum\fP
T}	SHA-2	FIPS PUB 180-2	respective	\^
\fBb2sum\fP	BLAKE2b	RFC 7693	8-512 bits	\=
\fBmd5sum\fP	MD5	RFC 1321	128 bits	\fINOT\fP suitable for any application
.TE

.sp
.sp
.TS
lblblblb
lllll.
Program	Digest	Standard	Length
_
\fBsha1sum\fP	SHA-1	FIPS PUB 180	160 bits	avoid use in security applications
\fBsha224sum\fP, \fBsha256sum\fP,
\fBsha384sum\fP, \fBsha512sum\fP	SHA-2	FIPS PUB 180-2	respective
\fBb2sum\fP	BLAKE2b	RFC 7693	8-512 bits
\fBmd5sum\fP	MD5	RFC 1321	128 bits	\fINOT\fP suitable for any application
.TE
-- >8 --

When invoked as mandoc -Tman -mdoc.

The top table, while extraordinarily pathological,
survives decently well; the bottom, normal, one remains unscathed.

This patch also fixes stitching T{ T} blocks.

This doesn't fix eqn, but I can't say I've ever successfully used eqn,
or seen it used, so.

The changelog mentions mandoc lists but the mandoc website doesn't,
and the archives don't clear up how they're divided, so I can't really
send it there. Michael ‒ I assume you know what the best course of
action is?

Best,
наб

#901636#59
Date:
2021-07-04 16:16:06 UTC
From:
To:
I just fixed the assertion failure upstream at mandoc.bsd.lv,
see https://inbox.vuxu.org/mandoc-source/c2aa13aac88a7dec@mandoc.bsd.lv/
and http://cvsweb.bsd.lv/mandoc/ .

Supporting tbl(7) and eqn(7) in -T man has been on the
http://cvsweb.bsd.lv/mandoc/TODO list for some time, but it causes
a non-trivial amount of work and is not particularly high priority
for the following reason: The main use case for -T man is that you
can maintain the documentation of your portable software project
in the mdoc(7) format and yet provide autogenerated man(7) versions
of your manual pages for the very few remaining operating systems
that still do not support mdoc(7).  If you care about that, embedding
tbl(7) or eqn(7) code in your manual pages is just a bad idea in
the first place.

Thanks to Nab for proposing patches, but these can't be committed
as-is because they consitute a layering violation.  The new code
is essentially a tbl-to-tbl output mode and would belong into a new
module tbl_tbl.c; it is misplaced in mdoc_man.c because it is neither
related to mdoc(7) input nor to man(7) output.  It appears setting
that up properly wouldn't be excessively difficult, but i don't
have the time to do so right now.  Besides, this would be the first
src_dst.c output module where src == dst, so some generic design
questions might need to be considered before commit.

Thanks to all of you for your input!
  Ingo