Here's the problem: joey@dragon:~>locale | grep CtypE LC_CTYPE="POSIX" joey@dragon:~>echo '—' > foo.html joey@dragon:~>w3m -dump foo.html ? That comes out as a '?' because w3m apparently internally converts it to the utf-8 character for mdash (which is not '-', but the other dash), and then discovers it's not in the character set for this terminal and decides to render it as a question mark. When reading a document with lots of —, “, &helip; and other fancy entities, this gets very annoying. Instead, w3m should be aware of the character set and just use available characters that are close to the right ones, like "-". Other browsers, such as lynx, do that.
Hi, For this, iconv can be much helpful: $ hexdump foo 0000000 e2 80 94 0a $ iconv -f utf-8 -t latin1//translit < foo -- $ The //translit suffixe tells iconv to translate everything. So w3m should do something like: #define TRANSLIT "//translit" char *codeset = nl_langinfo(CODESET); int len = strlen(codeset); char *charset = malloc(len+strlen(TRANSLIT)+1); memcpy(charset,codeset,len); memcpy(charset+len,TRANSLIT,strlen(TRANSLIT)+1); conv = iconv_open(charset, page_charset); iconv(conv, ...); Regards, Samuel
Hi, Any news?
Dear Maintainer, I wonder it this bug report can be closed for w3m in Debian 7. I got the correct output $ echo '—' > foo.html $ w3m -dump < foo.html — Regards Markus
Still not improved.
$ w3m -dump foo.html
?
$ w3m -dump -T text/html < foo.html
?
Thanks,
--
Tatsuya Kinoshita