Fabre

#751004 9base: unicode(1plan9) outputs wrong characters #751004

Package:: 9base

Source:: 9base

Description:: Plan 9 userland tools

Submitter:: Nils Dagsson Moskopp

Date:: 2015-03-06 02:24:05 UTC

Severity:: important

#751004#5

Date:: 2014-06-09 11:59:21 UTC

From:

To:

Dear Maintainer,

I tried printing cat emoticons (U+1F638 to U+1F640). I got something else:

; /usr/lib/plan9/bin/unicode 1F638-1F640
1f638 	1f639 	1f63a 	1f63b 	1f63c 	1f63d 	1f63e 	1f63f 
1f640 

The unicode(1plan9) tool seems to assume that codepoints have max. two bytes.

; /usr/lib/plan9/bin/unicode F638-F640
f638 	f639 	f63a 	f63b 	f63c 	f63d 	f63e 	f63f 
f640 

Further evidence:

; /usr/lib/plan9/bin/unicode 41-50
0041 A	0042 B	0043 C	0044 D	0045 E	0046 F	0047 G	0048 H
0049 I	004a J	004b K	004c L	004d M	004e N	004f O	0050 P

; /usr/lib/plan9/bin/unicode 10041-10050
10041 A	10042 B	10043 C	10044 D	10045 E	10046 F	10047 G	10048 H
10049 I	1004a J	1004b K	1004c L	1004d M	1004e N	1004f O	10050 P

; /usr/lib/plan9/bin/unicode 20041-20050
20041 A	20042 B	20043 C	20044 D	20045 E	20046 F	20047 G	20048 H
20049 I	2004a J	2004b K	2004c L	2004d M	2004e N	2004f O	20050 P

This is 𝐮𝐧𝐚𝐜𝐜𝐞𝐩𝐭𝐚𝐛𝐥𝐞.

#751004#10

Date:: 2015-03-06 02:21:25 UTC

From:

To:

I have written a replacement for unicode(1) in Bourne Shell.

It seems to do the right thing for astral plance characters:
--- snib ---
; unicode 1F638-1F640
1F638 😸	1F639 😹	1F63A 😺	1F63B 😻	1F63C 😼	1F63D 😽	1F63E 😾	1F63F 😿
1F640 🙀
--- snab ---
--- snob ---
; unicode 10041-10050
10041 𐁁	10042 𐁂	10043 𐁃	10044 𐁄	10045 𐁅	10046 𐁆	10047 𐁇	10048 𐁈
10049 𐁉	1004A 𐁊	1004B 𐁋	1004C 𐁌	1004D 𐁍	1004E 𐁎	1004F 𐁏	10050 𐁐
--- sneb ---

#751004 9base: unicode(1plan9) outputs wrong characters #751004

Just Reply to ...

Reply to submitter ...

Send control command (Silently)

Set Architecture Tags (Silently)