#607066 tidy should not generate entity references (except standard XML ones) for XHTML

Package:
tidy
Source:
tidy-html5
Description:
HTML/XML syntax checker and reformatter
Submitter:
Vincent Lefevre
Date:
2025-05-09 15:48:03 UTC
Severity:
minor
Tags:
#607066#5
Date:
2010-12-14 13:41:29 UTC
From:
To:
For XHTML output, it is an error to generate entity references (except
the standard XML ones) as they are not guaranteed to work. Indeed
http://www.w3.org/TR/html5/the-xhtml-syntax.html#writing-xhtml-documents
says:

  Note: According to the XML specification, XML processors are not
  guaranteed to process the external DTD subset referenced in the
  DOCTYPE. This means, for example, that using entity references for
  characters in XHTML documents is unsafe if they are defined in an
  external file (except for <, >, &, " and ').

Numeric character references should be generated instead, i.e. when
XHTML output is generated, numeric-entities should be ignored (or at
least, it should default to "yes").

An example of the problem:

  echo "é" | tidy -asxhtml

outputs:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />
<title></title>
</head>
<body>
é
</body>
</html>

Note: web browsers should have no problems with entity references,
but remember that one of the purposes of XHTML (over HTML) is to
manipulate the files with XML tools.

#607066#10
Date:
2025-04-27 10:42:01 UTC
From:
To:
I am closing all the bugs that had been filled before 5.8.0 upstream version
that are **upstream** issues.

If the bug that you've reported can still be found in 5.8.0, please retest this,
make sure that the upstream bugs are actually filled upstream and reopen
the bug in Debian BTS and correctly set the "forwarded" attribute on the
issue.

Ondrej
--
Ondřej Surý (He/Him)
ondrej@sury.org

#607066#15
Date:
2025-05-09 15:44:41 UTC
From:
To:
Control: reopen -1
Control: notfixed -1 2:5.8.0-1
Control: found -1 2:5.8.0-2
Control: tags -1 upstream
Control: forwarded -1 https://github.com/htacg/tidy-html5/issues/1148

Done right now.

Note that the example should now be:

  echo "é" | tidy -ascii -asxhtml

i.e. with a -ascii option.