#994998 python3-lxml: etree.tostring truncates output when encoding is set to "utf8"

Package:
python3-lxml
Source:
lxml
Description:
pythonic binding for the libxml2 and libxslt libraries
Submitter:
Micha Moskovic
Date:
2021-09-24 13:36:03 UTC
Severity:
important
#994998#5
Date:
2021-09-24 13:32:58 UTC
From:
To:
Dear Maintainer,

I ran into a bug that causes lxml to truncate output when using
"tostring" with encoding set to "utf8", while it works correctly when
encoding is set to "utf-8". See attached "bug.py" file with an example
to reproduce. The output under "Bad" has truncated text in the last
subfield.

I've previously reported this bug upstream in
https://bugs.launchpad.net/lxml/+bug/1944751 but further testing makes
me think that this is Debian specific: when running the attached
"bug.py" example in a new virtualenv in which I ran "pip install lxml",
and hence using the upstream binary wheel, the bug doesn't arise.

Best,
Micha