#1104789 libhtml-gumbo-perl: erratic behavior on the unsupported template HTML element - GUMBO_NODE_TEMPLATE node type

Package:
libhtml-gumbo-perl
Source:
libhtml-gumbo-perl
Description:
HTML5 parser based on gumbo C library
Submitter:
Vincent Lefevre
Date:
2026-06-07 05:31:03 UTC
Severity:
normal
Tags:
#1104789#5
Date:
2025-05-06 13:48:52 UTC
From:
To:
I get erratic behavior on the template HTML element, e.g. on
the HTML file "<template>". For instance:

$ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');"
<html><head>\217¥�¾U</head><body></body></html>
$ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');"
<html><head>)�>\220U</head><body></body></html>
$ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');"
<html><head>q'N$uU</head><body></body></html>

One can see random output, which may include control characters
(above, I have changed them to \217 and \220 as Emacs shows them,
to avoid such control characters in the mail message).

With valgrind:

$ valgrind perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');"
==64955== Memcheck, a memory error detector
==64955== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==64955== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==64955== Command: perl -C -MHTML::Gumbo -e print\ HTML::Gumbo-\>new-\>parse('\<template\>',\ format\ =\>\ 'string');
==64955==
==64955== Conditional jump or move depends on uninitialised value(s)
==64955==    at 0x484DC89: strlen (vg_replace_strmem.c:505)
==64955==    by 0x2AD7DF: ??? (in /usr/bin/perl)
==64955==    by 0x486D6CE: tree_to_string (Gumbo.xs:189)
==64955==    by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
==64955==    by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
==64955==    by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
==64955==    by 0x486E41B: parse_to_string_cb (Gumbo.xs:505)
==64955==    by 0x486ED4B: common_parse.isra.0 (Gumbo.xs:545)
==64955==    by 0x486F09C: XS_HTML__Gumbo_parse_to_string (Gumbo.xs:559)
==64955==    by 0x20B3E7: ??? (in /usr/bin/perl)
==64955==    by 0x290C95: Perl_runops_standard (in /usr/bin/perl)
==64955==    by 0x179E51: perl_run (in /usr/bin/perl)
==64955==
<html><head></head><body></body></html>
==64955==
==64955== HEAP SUMMARY:
==64955==     in use at exit: 592,160 bytes in 2,369 blocks
==64955==   total heap usage: 7,166 allocs, 4,797 frees, 1,159,576 bytes allocated
==64955==
==64955== LEAK SUMMARY:
==64955==    definitely lost: 18,102 bytes in 19 blocks
==64955==    indirectly lost: 50,698 bytes in 23 blocks
==64955==      possibly lost: 514,100 bytes in 2,318 blocks
==64955==    still reachable: 9,260 bytes in 9 blocks
==64955==                       of which reachable via heuristic:
==64955==                         newarray           : 1,056 bytes in 33 blocks
==64955==         suppressed: 0 bytes in 0 blocks
==64955== Rerun with --leak-check=full to see details of leaked memory
==64955==
==64955== Use --track-origins=yes to see where uninitialised values come from
==64955== For lists of detected and suppressed errors, rerun with: -s
==64955== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

So, uninitialized data are used for the output.

If I use "format => 'callback'" (will a callback) instead of
"format => 'string'", then I get the following error:

Unknown node type at /usr/lib/x86_64-linux-gnu/perl5/5.40/HTML/Gumbo.pm line 298, <> line 1.

(which is better from the security point of view, but prevents one
from parsing some modern HTML documents).

It apparently comes from Gumbo.xs, where there are two occurrences of

  croak("Unknown node type");

I suspect that this is the first one as the second one corresponds to
text node types.

The cause is probably the most recent node type GUMBO_NODE_TEMPLATE
from the Gumbo library (libgumbo):

typedef enum {
  /** Document node.  v will be a GumboDocument. */
  GUMBO_NODE_DOCUMENT,
  /** Element node.  v will be a GumboElement. */
  GUMBO_NODE_ELEMENT,
  /** Text node.  v will be a GumboText. */
  GUMBO_NODE_TEXT,
  /** CDATA node. v will be a GumboText. */
  GUMBO_NODE_CDATA,
  /** Comment node.  v will be a GumboText, excluding comment delimiters. */
  GUMBO_NODE_COMMENT,
  /** Text node, where all contents is whitespace.  v will be a GumboText. */
  GUMBO_NODE_WHITESPACE,
  /** Template node.  This is separate from GUMBO_NODE_ELEMENT because many
   * client libraries will want to ignore the contents of template nodes, as
   * the spec suggests.  Recursing on GUMBO_NODE_ELEMENT will do the right thing
   * here, while clients that want to include template contents should also
   * check for GUMBO_NODE_TEMPLATE.  v will be a GumboElement.  */
  GUMBO_NODE_TEMPLATE
} GumboNodeType;

This node type was added in 2015:

https://github.com/google/gumbo-parser/commit/4383a40605ee7872a8e2de58553383a13d919153

but most of the HTML::Gumbo code predates this change.

#1104789#10
Date:
2025-05-17 09:57:23 UTC
From:
To:
Control: tag -1 patch

The attached change does not make HTML::Gumbo support <template>
properly but seems to plug this specific hole, and hence the
known security aspects.

I've checked that this doesn't break the (not very extensive) test
suite, and that the only reverse dependency in trixie, request-tracker5,
still builds with this.

Tentatively tagging 'patch', but eyeballs would be good.

I think full support for <template> should be a separate wishlist bug.

#1104789#19
Date:
2025-05-17 12:47:19 UTC
From:
To:
Hello,

Bug #1104789 in libhtml-gumbo-perl reported by you has been fixed in the
Git repository and is awaiting an upload. You can see the commit
message below and you can check the diff of the fix at:

https://salsa.debian.org/perl-team/modules/packages/libhtml-gumbo-perl/-/commit/f9de66e265bce8c607d7f4a80819725b0b44d661
------------------------------------------------------------------------
Add patch to fix wrong code path with GUMBO_NODE_TEMPLATE.

Thanks: Vincent Lefevre for the bug report and Niko Tyni for the patch.
Closes: #1104789
------------------------------------------------------------------------

(this message was generated automatically)
-- 
Greetings

https://bugs.debian.org/1104789

#1104789#26
Date:
2025-05-17 13:04:19 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
libhtml-gumbo-perl, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 1104789@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
gregor herrmann <gregoa@debian.org> (supplier of updated libhtml-gumbo-perl package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Sat, 17 May 2025 14:44:37 +0200
Source: libhtml-gumbo-perl
Architecture: source
Version: 0.18-5
Distribution: unstable
Urgency: medium
Maintainer: Debian Perl Group <pkg-perl-maintainers@lists.alioth.debian.org>
Changed-By: gregor herrmann <gregoa@debian.org>
Closes: 1104789
Changes:
 libhtml-gumbo-perl (0.18-5) unstable; urgency=medium
 .
   * Add patch to fix wrong code path with GUMBO_NODE_TEMPLATE.
     Thanks to Vincent Lefevre for the bug report and Niko Tyni for the patch.
     (Closes: #1104789)
   * Declare compliance with Debian Policy 4.7.2.
Checksums-Sha1:
 68102b221a867b1aa089b8d31226e44cfb8b45c3 2461 libhtml-gumbo-perl_0.18-5.dsc
 a5259e6ee406119f1460561796a86b92f03ac917 4036 libhtml-gumbo-perl_0.18-5.debian.tar.xz
Checksums-Sha256:
 ff02cc0bc8b1b6f44d45cd1c815bfc0177c93c733d724e7035ebb38ab5f85b4d 2461 libhtml-gumbo-perl_0.18-5.dsc
 60e0ad8713c19f94f08ee1a0bbcf46664a65bdd6ab3ce726866d29e1044dd930 4036 libhtml-gumbo-perl_0.18-5.debian.tar.xz
Files:
 8c87d11826a0b0755177fe607cf6677b 2461 perl optional libhtml-gumbo-perl_0.18-5.dsc
 57ecfb5e55a04711ecf97899df3ffdb6 4036 perl optional libhtml-gumbo-perl_0.18-5.debian.tar.xz
-----BEGIN PGP SIGNATURE-----

iQKTBAEBCgB9FiEE0eExbpOnYKgQTYX6uzpoAYZJqgYFAmgohUpfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEQx
RTEzMTZFOTNBNzYwQTgxMDREODVGQUJCM0E2ODAxODY0OUFBMDYACgkQuzpoAYZJ
qgZGdw//XTTnXtYsEPQgxb2xEfyA8Kj86XPibSzc55iin3mStND69a21MQULT8gk
WPMWh7hbUKybVKyHOpixwlHP6pKmW3BOWdayi+AQnoYbA58Ln1PTuxspT4kGgLfG
tb64SYy8FyM9yaNKBLpuLsf0mYHdSATYg8STG8pV5Tlk7QZ36rznA8wBUF3jP+PP
rNSTs0Z9bajNuDAIATyMLPbn4gMqoKfQ179xZMyDvNtNonAE0STdpTLAoJ2jJCHo
Q8YEN8IYb806Vb/Th3t57QH2GqA6Vw7zSSk5wkqJXTdiy6jY/Al8Lnpvw8JqFvR+
zYWSJiMxXd2j8q4asYoycO4npqXr0u9x+IG05GB3+giGJtYppSeLX7o5Yw1XHdzD
U+AqBin5ESKHpnZTmN0VTXjmmXrvh94XMRinkEPq4MCmSGatLNvVE+X3htbre0ST
vw3QXxUYD6WtvJkU9hjg1+FNxvwiKi6LYlgggAcu7S49Qb3GBvQw0DacTRFQ2jPq
9Pt8Phv9Va8Ui93fTdwigzAJZ1YwILywn1YWfAj1TZwl8JiP3g9T/nERW2sgiJ9j
zhQzVPTqMbYuKNRK60OHK89vPl1WlzMZP77cdbi3GukI+qzpH8eCgSVU03qGabyW
jbagpedYnO/LzaBgOaMmF4/ilQ6kB/TgIynSB181q3PQrVdBb6w=
=WovM
-----END PGP SIGNATURE-----

#1104789#31
Date:
2025-05-17 17:37:16 UTC
From:
To:
I'll look into it, but anyway, it should currently be regarded just
like another HTML element (i.e. generate a "start"), otherwise this
would be an API breakage that could affect existing scripts. In the
HTML::Gumbo(3pm) man page:

           HTML::Gumbo->new->parse( $html, format => 'callback', callback => sub {
               my ($event) = shift;
               if ( $event eq 'document start' ) {
                   my ($doctype) = @_;
               }
               elsif ( $event eq 'document end' ) {
               }
               elsif ( $event eq 'start' ) {
                   my ($tag, $attrs) = @_;
               }
               elsif ( $event eq 'end' ) {
                   my ($tag) = @_;
               }
               elsif ( $event eq /^(text|space|cdata|comment)$/ ) {
                   my ($text) = @_;
               }
               else {
                   die "Unknown event";
               }
           } );

with no mention of a specific event for the template element.

That was how I initially found the bug.