Fabre

#279221 should transcode characters from utf-8 if the terminal is not utf-8 capable #279221

Package:: w3m

Source:: w3m

Description:: WWW browsable pager with excellent tables/frames support

Submitter:: Joey Hess

Date:: 2014-10-12 15:00:05 UTC

Severity:: minor

#279221#5

Date:: 2004-11-01 05:50:05 UTC

From:

To:

Here's the problem:

joey@dragon:~>locale | grep CtypE
LC_CTYPE="POSIX"
joey@dragon:~>echo '—' > foo.html
joey@dragon:~>w3m -dump foo.html
?

That comes out as a '?' because w3m apparently internally converts it to the
utf-8 character for mdash (which is not '-', but the other dash), and then
discovers it's not in the character set for this terminal and decides to render
it as a question mark. When reading a document with lots of —, “,
&helip; and other fancy entities, this gets very annoying.

Instead, w3m should be aware of the character set and just use available
characters that are close to the right ones, like "-". Other browsers, such
as lynx, do that.

#279221#10

Date:: 2005-06-07 12:56:16 UTC

From:

To:

Hi,

For this, iconv can be much helpful:

$ hexdump foo
0000000  e2 80 94 0a
$ iconv -f utf-8 -t latin1//translit < foo
--
$

The //translit suffixe tells iconv to translate everything.

So w3m should do something like:

#define TRANSLIT "//translit"
char *codeset = nl_langinfo(CODESET);
int len = strlen(codeset);
char *charset = malloc(len+strlen(TRANSLIT)+1);
memcpy(charset,codeset,len);
memcpy(charset+len,TRANSLIT,strlen(TRANSLIT)+1);
conv = iconv_open(charset, page_charset);
iconv(conv, ...);

Regards,
Samuel

#279221#21

Date:: 2007-06-06 12:16:57 UTC

From:

To:

Hi,

Any news?

#279221#30

Date:: 2014-10-12 12:31:24 UTC

From:

To:

Dear Maintainer,

I wonder it this bug report can be closed for w3m in Debian 7.

I got the correct output

$ echo '—' > foo.html
$ w3m -dump < foo.html
—

Regards
Markus

#279221#35

Date:: 2014-10-12 14:46:45 UTC

From:

To:

Still not improved.

    $ w3m -dump foo.html
    ?
    $ w3m -dump -T text/html < foo.html
    ?

Thanks,
--
Tatsuya Kinoshita

#279221 should transcode characters from utf-8 if the terminal is not utf-8 capable #279221

Just Reply to ...

Reply to submitter ...

Send control command (Silently)

Set Architecture Tags (Silently)