#751004 9base: unicode(1plan9) outputs wrong characters

Package:
9base
Source:
9base
Description:
Plan 9 userland tools
Submitter:
Nils Dagsson Moskopp
Date:
2015-03-06 02:24:05 UTC
Severity:
important
#751004#5
Date:
2014-06-09 11:59:21 UTC
From:
To:
Dear Maintainer,

I tried printing cat emoticons (U+1F638 to U+1F640). I got something else:

; /usr/lib/plan9/bin/unicode 1F638-1F640
1f638 	1f639 	1f63a 	1f63b 	1f63c 	1f63d 	1f63e 	1f63f 
1f640 

The unicode(1plan9) tool seems to assume that codepoints have max. two bytes.

; /usr/lib/plan9/bin/unicode F638-F640
f638 	f639 	f63a 	f63b 	f63c 	f63d 	f63e 	f63f 
f640 

Further evidence:

; /usr/lib/plan9/bin/unicode 41-50
0041 A	0042 B	0043 C	0044 D	0045 E	0046 F	0047 G	0048 H
0049 I	004a J	004b K	004c L	004d M	004e N	004f O	0050 P

; /usr/lib/plan9/bin/unicode 10041-10050
10041 A	10042 B	10043 C	10044 D	10045 E	10046 F	10047 G	10048 H
10049 I	1004a J	1004b K	1004c L	1004d M	1004e N	1004f O	10050 P

; /usr/lib/plan9/bin/unicode 20041-20050
20041 A	20042 B	20043 C	20044 D	20045 E	20046 F	20047 G	20048 H
20049 I	2004a J	2004b K	2004c L	2004d M	2004e N	2004f O	20050 P

This is 𝐮𝐧𝐚𝐜𝐜𝐞𝐩𝐭𝐚𝐛𝐥𝐞.

#751004#10
Date:
2015-03-06 02:21:25 UTC
From:
To:
I have written a replacement for unicode(1) in Bourne Shell.

It seems to do the right thing for astral plance characters:
--- snib ---
; unicode 1F638-1F640
1F638 😸	1F639 😹	1F63A 😺	1F63B 😻	1F63C 😼	1F63D 😽	1F63E 😾	1F63F 😿
1F640 🙀
--- snab ---
--- snob --- ; unicode 10041-10050 10041 𐁁 10042 𐁂 10043 𐁃 10044 𐁄 10045 𐁅 10046 𐁆 10047 𐁇 10048 𐁈 10049 𐁉 1004A 𐁊 1004B 𐁋 1004C 𐁌 1004D 𐁍 1004E 𐁎 1004F 𐁏 10050 𐁐 --- sneb ---