#287371 xsltproc: DTD should be cached when included several times, or used memory should be limited

Package:
xsltproc
Source:
libxslt
Description:
XSLT 1.0 command line processor
Submitter:
Vincent Lefevre
Date:
2023-02-18 11:36:06 UTC
Severity:
grave
Tags:
#287371#5
Date:
2004-12-27 12:11:49 UTC
From:
To:
Here xsltproc takes up to 138 MB, making the whole system slow down
due to swapping. This problem occurs when generating my blog page,
where a document() is used for each blog item (this will change in
the future, but the current behavior shouldn't occur). The sources
are in a DocBook-based DTD that can be downloaded from

http://www.vinc17.org/DTD/website.dtd

I'm not including the XML sources since this is quite complicated
(lots of inclusions and dependencies). But if the bug is not known,
I could try to build a simpler example.

#287371#10
Date:
2004-12-30 05:05:06 UTC
From:
To:
Can you try with xsltproc from the experimental distribution? I know
several memleaks have been fixed there and in libxml2.

Thanks

Mike

#287371#15
Date:
2004-12-31 01:40:54 UTC
From:
To:
Unfortunately, there's no package for PowerPC yet.
#287371#20
Date:
2004-12-31 05:15:42 UTC
From:
To:
Can't you try to build it ?

Mike

#287371#25
Date:
2005-01-10 10:40:34 UTC
From:
To:
I could try on an x86 machine where I've installed the experimental
libxml2 package (version 2.6.16-1). The problem is still there.

#287371#30
Date:
2005-02-09 16:12:21 UTC
From:
To:
How big is the document you load with document() ? How many times it
gets loaded ? Could you provide me the files ?

Mike

#287371#35
Date:
2005-02-09 16:38:54 UTC
From:
To:
The documents are small, but the DTD is very big (this is a DTD based
on DocBook + MathML). Currently, about 50 documents are included.

I wanted to post a followup, but hadn't had the time yet. FYI, I had
a discussion with Daniel on the LibXSLT mailing-list 10 days ago. In
short, for some reasons, the DTD structures are not reused each time
a new document is parsed. IMHO, this could be solved by some form of
cache (corresponding to the DTD + internal subset if any).

Technically, this bug could be regarded as a wishlist. But using so
much memory should be regarded as a bug IMHO, unless the other XSLT
processors have the same problem.

The title of the bug should be changed to something like "DTD
structures should be shared/cached in case of multiple inclusions"
(when possible, of course).

#287371#40
Date:
2005-02-09 16:52:31 UTC
From:
To:
retitle 287371 DTD should be cached when included several times
severity 287371 wishlist
tag 287371 upstream
thanks

Thanks for the feedback.
Note that such "optimization" bugs are not really *that* important, so i
downgraded this bug to wishlist, even if a huge amount of memory is
used. Also note that 138MB is not *that* much considering the number of
documents and the DTD size.

Mike

#287371#51
Date:
2005-02-09 23:44:20 UTC
From:
To:
DTD (and internal subset) that should be cached (to be reused when
the DTD with internal subset is the same, thus not taking additional
memory when a second document is processed).

Well, it is important on machines that don't have enough memory.

By caching the DTD structures, one could gain something like a
factor 1000 on the asymptotic memory usage with small documents
(3 KB vs 3 MB for the DTD itself). This is quite significant.

#287371#56
Date:
2005-02-10 00:29:38 UTC
From:
To:
Machines that don't have enough memory can't run OpenOffice.Org. Will
you file an important bug there as well ?

Mike

#287371#61
Date:
2005-02-10 00:34:22 UTC
From:
To:
No, because OpenOffice.Org doesn't waste memory (it's quite memory
hungry, but this is expected, as it's a complex software). With
xsltproc, if one considers the sum of the sizes of all source data,
the required memory for the processing may be something like 1000
times larger, without any theoretical reason.

#287371#66
Date:
2022-02-10 12:08:33 UTC
From:
To:
Control: severity -1 grave
Control: retitle -1 xsltproc: DTD should be cached when included several times, or used memory should be limited
Control: tags -1 security

This is no different than CVE-2013-0338 and CVE-2013-0339[*]. The
point is that from a small document, one can exhaust the memory
of the machine. CVE-2013-0338 and CVE-2013-0339 are about entity
expansion, but there are the same consequences with just loading
data in memory.

[*] https://www.openwall.com/lists/oss-security/2013/02/22/3

#287371#75
Date:
2022-02-19 17:01:52 UTC
From:
To:
If you believe so, and you confirmed that it hasn't been fixed in the
past 15 years, could you please either (or both):
 * report it to mitre's CVE form
 * report it in https://gitlab.gnome.org/GNOME/libxml2/-/issues
?

#287371#80
Date:
2022-02-19 17:28:00 UTC
From:
To:
I'll test again (I've been using a fake DTD for the past 15 years).
#287371#85
Date:
2022-04-25 02:06:54 UTC
From:
To:
processes were killed by the OOM killer, including daemons!

[...]
Apr 25 02:44:53 zira systemd[6589]: dconf.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/ldac_hq
Apr 25 02:44:53 zira systemd[1]: user@1000.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/ldac_sq
Apr 25 02:44:53 zira systemd[6589]: pipewire.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/ldac_mq
Apr 25 02:44:53 zira systemd[6589]: pipewire-media-session.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSink/aptx_hd
Apr 25 02:44:53 zira systemd[6589]: pulseaudio.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/aptx_hd
Apr 25 02:44:53 zira systemd[6589]: pipewire.service: Main process exited, code=killed, status=9/KILL
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSink/aptx
Apr 25 02:44:53 zira systemd[6589]: pipewire.service: Failed with result 'oom-kill'.
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/aptx
Apr 25 02:44:55 zira systemd[1]: user@1000.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSink/sbc
Apr 25 02:44:55 zira systemd[1]: session-204.scope: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:53 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/sbc
Apr 25 02:44:55 zira systemd[6589]: Requested transaction contradicts existing jobs: Resource deadlock avoided
Apr 25 02:44:53 zira acpid[792]: input device has been disconnected, fd 23
Apr 25 02:44:55 zira systemd[6589]: gpg-agent.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:55 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSink/sbc_xq_453
Apr 25 02:44:55 zira systemd[6589]: dbus.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:55 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/sbc_xq_453
Apr 25 02:44:55 zira systemd[6589]: at-spi-dbus-bus.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:55 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSink/sbc_xq_512
Apr 25 02:44:55 zira systemd[6589]: gvfs-daemon.service: A process of this unit has been killed by the OOM killer.
Apr 25 02:44:55 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/sbc_xq_512
Apr 25 02:44:55 zira systemd[6589]: pipewire-media-session.service: Main process exited, code=killed, status=9/KILL
Apr 25 02:44:55 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSink/sbc_xq_552
Apr 25 02:44:55 zira systemd[6589]: pipewire-media-session.service: Failed with result 'oom-kill'.
Apr 25 02:44:55 zira bluetoothd[241267]: Endpoint unregistered: sender=:1.100 path=/MediaEndpoint/A2DPSource/sbc_xq_552
Apr 25 02:44:55 zira systemd[6589]: Stopped PipeWire Media Session Manager.
Apr 25 02:44:56 zira rtkit-daemon[826]: Successfully made thread 289323 of process 289323 owned by '1000' high priority at nice level -11.
Apr 25 02:44:55 zira systemd[6589]: pipewire-media-session.service: Consumed 3.038s CPU time.
Apr 25 02:44:56 zira rtkit-daemon[826]: Supervising 1 threads of 1 processes of 1 users.
Apr 25 02:44:55 zira systemd[6589]: dbus.service: Main process exited, code=killed, status=9/KILL
Apr 25 02:44:56 zira rtkit-daemon[826]: Supervising 1 threads of 1 processes of 1 users.
Apr 25 02:44:55 zira systemd[6589]: dbus.service: Failed with result 'oom-kill'.
Apr 25 02:44:56 zira rtkit-daemon[826]: Supervising 1 threads of 1 processes of 1 users.
Apr 25 02:44:55 zira systemd[6589]: gvfs-daemon.service: Main process exited, code=killed, status=9/KILL
Apr 25 02:44:56 zira rtkit-daemon[826]: Supervising 1 threads of 1 processes of 1 users.
Apr 25 02:44:55 zira systemd[6589]: gvfs-daemon.service: Failed with result 'oom-kill'.
Apr 25 02:44:56 zira rtkit-daemon[826]: Successfully made thread 289329 of process 289324 owned by '1000' RT at priority 20.
Apr 25 02:44:55 zira systemd[6589]: at-spi-dbus-bus.service: Main process exited, code=killed, status=9/KILL
Apr 25 02:44:56 zira rtkit-daemon[826]: Supervising 2 threads of 2 processes of 1 users.
Apr 25 02:44:55 zira systemd[6589]: at-spi-dbus-bus.service: Failed with result 'oom-kill'.
Apr 25 02:44:56 zira rtkit-daemon[826]: Successfully made thread 289327 of process 289327 owned by '1000' high priority at nice level -11.
Apr 25 02:44:55 zira systemd[6589]: pulseaudio.service: Main process exited, code=killed, status=9/KILL
Apr 25 02:44:56 zira rtkit-daemon[826]: Supervising 3 threads of 3 processes of 1 users.
Apr 25 02:44:55 zira systemd[6589]: pulseaudio.service: Failed with result 'oom-kill'.
Apr 25 02:44:56 zira rtkit-daemon[826]: Supervising 3 threads of 3 processes of 1 users.
Apr 25 02:44:55 zira systemd[6589]: pulseaudio.service: Consumed 9min 48.063s CPU time.
Apr 25 02:44:56 zira rtkit-daemon[826]: Successfully made thread 289331 of process 289323 owned by '1000' RT at priority 20.
Apr 25 02:44:55 zira systemd[6589]: gpg-agent.service: Main process exited, code=killed, status=9/KILL
Apr 25 02:44:56 zira rtkit-daemon[826]: Supervising 4 threads of 3 processes of 1 users.
Apr 25 02:44:55 zira systemd[6589]: gpg-agent.service: Failed with result 'oom-kill'.
Apr 25 02:44:55 zira systemd[6589]: dconf.service: Main process exited, code=killed, status=9/KILL
Apr 25 02:44:55 zira systemd[6589]: dconf.service: Failed with result 'oom-kill'.
[...]

(some daemons restarted automatically, but bluetooth was still in
a bad state).

#287371#90
Date:
2023-02-17 22:47:55 UTC
From:
To:
Control: tag -1 unreproducible
sample document ... which still hasn't happened.
It was reported again xslt version 1.1.8-5 with some mention of version
2.6.16-1 of libxml2 ... both are ANCIENT, but no newer 'found' version was
reported.

Just for fun, I downloaded your dtd (attached) and noticed a MathML entity
which refers to DTD: "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd
Trying to load that document gives a 404, but I found the following:
https://www.w3.org/TR/MathML2/appendixa.html#parsing.usingdtdt

My XML knowledge is a bit rusty, but that means your *custom made* DTD is
invalid?
no longer valid?

What I do recall from when I was full into XML and related technologies is
that it indeed uses a lot of memory.
As mentioned in 2005, 138MB is not considered *that* much.
And in 2023 that is even more true.

But you have a machine with *unspecified* but apparently very limited resources
and there you try to load a *custom made* DTD which is probably quite complex
(MathML can't be simple AFAIK).
"What could possibly go wrong (tm)"

That request was made a YEAR ago, but I'm not seeing a 'forwarded'? Or a link
to a CVE item?

And the final message in this bug is that you run Out Of Memory, so the OOM
killer does exactly what it needs to do: kill other programs.
But yet, you conclude that this is all xsltproc's fault?

There was no action on this bug for ~ 17 years, any requested information was
not provided, the issue was not made reproducible, it was not reported
upstream and I'm probably 'forgetting' a few items.

How in the world could this possibly be considered Release Critical?

I'll leave changing the severity to the maintainer, but 'wishlist' seems
appropriate to me.

#287371#97
Date:
2023-02-18 11:33:38 UTC
From:
To:
Control: tag -1 -security
Control: severity -1 wishlist

It may be annoying, but the system is working as it should be/do, so I'm
removing the 'security' tag.
If this bug turns out to be an actual security issue and there is thus a CVE,
the tag can be added back.
bug and it was previously already set to that, so there is no need to postpone
setting an appropriate severity level.

FTR: I'm not contesting that you may have found a valid bug.
In upstream's 1.1.36 there seems to be at least 1 fix for a potential memory
leak (IIUC).
But the lack of cooperation by OP combined with the severity and the age of
the bug, did (apparently) bug me.