Hi, If I create a directory named héhé (e with combining acute accent), and then try to complete hé into it (here é is the precombined form), libreadline does not take care that é (e with combining acute accent) is equivalent to é (the precombined "e with acute accent" character). libreadline should use some unicode normal form on completion. Samuel
Samuel Thibault, le Thu 22 Jun 2006 03:20:44 +0200, a écrit : I meant "zsh" of course... Samuel
I don't think they should be considered equivalent as long as the file system does not regard them as equivalent: vin% touch "`echo e\\\u300`" vin% touch "`echo \\\ue8`" vin% ls è è But I wonder if this is a bug in the glibc. Perhaps some kind of local phishing could be based on that. Note also that zle is completely confused by the combining character. But so is bash.
Would the best thing here be to normalize to Unicode NFD in the completion code?
As the file system considers them to be different, the shell _must_ treat them as different, as well. Completion does what Zsh currently does with those 'modify the char before me' pseudo-chars in Unicode: Play it safe and display their <number>. richih@roadwarrior ~/killme/unicode % touch héhé richih@roadwarrior ~/killme/unicode % ls héhé richih@roadwarrior ~/killme/unicode % touch "`echo e\\\u300`" richih@roadwarrior ~/killme/unicode % ls e héhé richih@roadwarrior ~/killme/unicode % touch "`echo \\\ue8`" richih@roadwarrior ~/killme/unicode % ls <tab> e è héhé richih@roadwarrior ~/killme/unicode % ls <tab> richih@roadwarrior ~/killme/unicode % ls e<0300><tab> richih@roadwarrior ~/killme/unicode % ls è<tab> richih@roadwarrior ~/killme/unicode % ls héhé<tab>
Debian Bug Tracking System, le Thu 01 Jan 2009 17:12:03 +0000, a écrit : Sure. But a user would type the precombined form é, not e. And then depending on the user's configuration, completion may not provide any match and then the user gets upset. See $ ls héhé $ rm hé<tab> that fails to match it, and a usual french user would never think about trying without the accent, as 'é' is really a letter different from 'e' (same in a lot of languages). BTW, please remember to always Cc the submitter of the bug, he is not notified when just 374913@bugs.debian.org is mailed. Samuel
And when completing the filename, zsh should "fix" the accented characters at the same time. So, I completely disagree with the "must", as everything would work fine. The fact that the file system (which regards a filename just as a sequence of bytes) is poorly designed is not a valid reason for zsh to follow this poor design. In particular, what the user types is characters, not bytes. So, the shell has to do some interpretation to make things work as the user expects (in particular because completion is a user-oriented feature).
reopen 374913 notfixed 4.3.6-7 severity 374913 wishlist thanks After all, zsh handles case-insensitivity completion on case-sensitive filesystems. There's no reason why it shouldn't do something similar for Unicode normalization. IMHO, this is even more important as the user doesn't always have the way to choose how accented characters are written (and/or doesn't necessarily know which form he should choose).