#534095 manpages/docbook.xsl: please add encoding declarations

#534095#5
Date:
2008-05-27 17:55:56 UTC
From:
To:
$ <?xml version='1.0'?>
<!DOCTYPE refentry PUBLIC '-//OASIS//DTD DocBook XML V4.5//EN' 'http://www.docbook.org/xml/4.5/docbookx.dtd'>
<refentry>
<refentryinfo>
         <title>tmp</title>
         <productname>tmp</productname>
         <date>2008-05-27</date>
</refentryinfo>
<refmeta>
         <refentrytitle>tmp</refentrytitle>
         <manvolnum>1</manvolnum>
         <refmiscinfo class='version'>0</refmiscinfo>
</refmeta>
<refnamediv>
         <refname>tmp</refname>
</refnamediv>
<refsection>
         <title>≤</title>
         <para></para>
</refsection>
</refentry>

$ xsltproc /usr/share/xml/docbook/stylesheet/nwalsh/manpages/docbook.xsl tmp.xml
Note: Writing tmp.1

$ man -l tmp.1
TMP(1)                                tmp                               TMP(1)

NAME
        tmp

â¤
tmp 0                             2008-05-27                            TMP(1)

#534095#10
Date:
2008-05-27 21:15:42 UTC
From:
To:
Am Dienstag, den 27.05.2008, 19:55 +0200 schrieb Jakub Wilk:

[..]

[..]

It outputs the correct character for me. Note, that not all special
characters are replaced by their GROFF escape codes by default. See
file:///usr/share/doc/docbook-xsl/doc/manpages/man.charmap.use.subset.html, which you should use to not rely on a special character set.

But I cannot find a bug. Can you send me the manpage, it creates for you
(preferably gzipped)?

Regards, Daniel

#534095#13
Date:
2008-05-27 21:15:42 UTC
From:
To:
Am Dienstag, den 27.05.2008, 19:55 +0200 schrieb Jakub Wilk:

[..]

[..]

It outputs the correct character for me. Note, that not all special
characters are replaced by their GROFF escape codes by default. See
file:///usr/share/doc/docbook-xsl/doc/manpages/man.charmap.use.subset.html, which you should use to not rely on a special character set.

But I cannot find a bug. Can you send me the manpage, it creates for you
(preferably gzipped)?

Regards, Daniel

#534095#18
Date:
2008-05-27 22:15:09 UTC
From:
To:
* Daniel Leidert <daniel.leidert@wgdd.de>, 2008-05-27, 23:15:
That explains a lot, I was not aware of that parameter.
The file its actually UTF-8-encoded, but man interprets it as if it was
ISO-8859-1.

#534095#23
Date:
2008-06-15 18:28:55 UTC
From:
To:
reassign 483189 libc6
retitle 483189 iconv fails on UTF-8 input with -f UTF-8
thanks

Sorry, I forgot to answer you.

Am Mittwoch, den 28.05.2008, 00:15 +0200 schrieb Jakub Wilk:

[..]
about the UTF-8 character itself and because of this, man output does
not show the character.

This could be a duplicate of #342132.

PS: To avoid locale issues, you can (and I recommend it) use the
suggested parameter.

Regards, Daniel

#534095#30
Date:
2008-06-15 18:28:55 UTC
From:
To:
reassign 483189 libc6
retitle 483189 iconv fails on UTF-8 input with -f UTF-8
thanks

Sorry, I forgot to answer you.

Am Mittwoch, den 28.05.2008, 00:15 +0200 schrieb Jakub Wilk:

[..]
about the UTF-8 character itself and because of this, man output does
not show the character.

This could be a duplicate of #342132.

PS: To avoid locale issues, you can (and I recommend it) use the
suggested parameter.

Regards, Daniel

#534095#35
Date:
2009-06-21 16:01:26 UTC
From:
To:
reassign 483189 man-db 2.5.5-2
retitle 483189 man-db: does not really support UTF-8 manpages
severity 483189 important
clone 483189 -1
reassign -1 docbook-xsl 1.73.2.dfsg.1-5
retitle -1 manpages/docbook.xsl: please add encoding declarations
severity -1 wishlist
thanks

* Daniel Leidert <daniel.leidert@wgdd.de>, 2008-06-15, 20:28:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519209#51
I'm quite sure that the bug has nothing to do with iconv.

What is happening here is as follows:
manconv tries to convert the manpage from UTF-8 to ISO-8859-1, which
obviously fails; thus, it assumes the encoding is ISO-8859-1.

To solve the issue:
- man should support UTF-8-encoded manual pages with characters not
covered by legacy encodings.
- docbook stylesheet should be adding encoding declarations to the
generated manpages.

#534095#54
Date:
2009-06-21 21:59:45 UTC
From:
To:
reassign 483189 groff-base
forcemerge 322760 48318
thanks

man-db works just fine assuming that the underlying groff installation
has sufficient support. I plan to upgrade Debian's groff packages to
1.20 fairly soon (I have test packaging of it available via bzr at
http://bzr.debian.org/users/cjwatson/groff/experimental/), which has
proper UTF-8 support and will fix this bug. I've already subjected
man-db to quite extensive testing against groff 1.20 to confirm that
this class of bugs vanishes with it.

As I noted in the comment you linked to earlier
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519209#51), that's
only necessary if you're using something other than UTF-8 or the legacy
encoding for the relevant language.

There's no actual *problem* with adding a UTF-8 encoding declaration,
but it will not help with this bug, and once this bug is fixed it will
not be necessary for an encoding declaration to be present in order to
take advantage of the fix. manconv already has pretty reliable heuristic
detection of UTF-8; in fact, this is a large part of its purpose in
life.

Regards,

#534095#59
Date:
2009-06-22 09:34:47 UTC
From:
To:
* Colin Watson <cjwatson@debian.org>, 2009-06-21, 22:59:
docbook-xsl can produce manpages in other encodings, too, so my wish is
still relevant.
In fact, adding UTF-8 encoding declaration could (minimally) help:


$ echo "[©] [≤]" > tmp1
$ man -l tmp1
[©] [â¤]

$ (echo "'\\\" -*- coding: UTF-8 -*-"; cat tmp1) > tmp2
$ man -l tmp2
[©] []


In the latter case, at least *some* characters are displayed properly.

#534095#64
Date:
2009-06-22 10:19:43 UTC
From:
To:
This makes no difference for me (man-db 2.5.5-2, groff-base
1.18.1.1-22). Debugging output shows identical processing apart from the
insertion of tbl. I'm adamant that this *should* not cause a difference,
so I'm happy to figure out why this is different for you if you can show
me 'man -d' output.