#374913 zsh: Completion should handle combining accents equivalents

Package:
zsh
Source:
zsh
Description:
shell with lots of features
Submitter:
Samuel Thibault
Date:
2012-12-20 23:36:09 UTC
Severity:
wishlist
#374913#5
Date:
2006-06-22 01:20:44 UTC
From:
To:
Hi,

If I create a directory named héhé (e with combining acute accent),
and then try to complete hé into it (here é is the precombined form),
libreadline does not take care that é (e with combining acute accent) is
equivalent to é (the precombined "e with acute accent" character).

libreadline should use some unicode normal form on completion.

Samuel

#374913#10
Date:
2006-06-22 07:51:59 UTC
From:
To:
Samuel Thibault, le Thu 22 Jun 2006 03:20:44 +0200, a écrit :

I meant "zsh" of course...

Samuel

#374913#15
Date:
2006-11-05 01:26:10 UTC
From:
To:
I don't think they should be considered equivalent as long as the
file system does not regard them as equivalent:

vin% touch "`echo e\\\u300`"
vin% touch "`echo \\\ue8`"
vin% ls
è  è

But I wonder if this is a bug in the glibc. Perhaps some kind of local
phishing could be based on that.

Note also that zle is completely confused by the combining character.
But so is bash.

#374913#18
Date:
2006-11-05 05:47:10 UTC
From:
To:
Would the best thing here be to normalize to Unicode NFD in the
completion code?

#374913#23
Date:
2009-01-01 17:10:46 UTC
From:
To:
As the file system considers them to be different, the shell
_must_ treat them as different, as well.

Completion does what Zsh currently does with those
'modify the char before me' pseudo-chars in Unicode:
Play it safe and display their <number>.


richih@roadwarrior ~/killme/unicode % touch héhé
richih@roadwarrior ~/killme/unicode % ls
héhé
richih@roadwarrior ~/killme/unicode % touch "`echo e\\\u300`"
richih@roadwarrior ~/killme/unicode % ls
e  héhé
richih@roadwarrior ~/killme/unicode % touch "`echo \\\ue8`"
richih@roadwarrior ~/killme/unicode % ls <tab>
e  è  héhé
richih@roadwarrior ~/killme/unicode % ls <tab>
richih@roadwarrior ~/killme/unicode % ls e<0300><tab>
richih@roadwarrior ~/killme/unicode % ls è<tab>
richih@roadwarrior ~/killme/unicode % ls héhé<tab>

#374913#28
Date:
2009-01-01 17:30:56 UTC
From:
To:
Debian Bug Tracking System, le Thu 01 Jan 2009 17:12:03 +0000, a écrit :

Sure. But a user would type the precombined form é, not e. And then
depending on the user's configuration, completion may not provide any
match and then the user gets upset. See

$ ls
héhé
$ rm hé<tab>

that fails to match it, and a usual french user would never think about
trying without the accent, as 'é' is really a letter different from 'e'
(same in a lot of languages).

BTW, please remember to always Cc the submitter of the bug, he is not
notified when just 374913@bugs.debian.org is mailed.

Samuel

#374913#33
Date:
2009-01-01 18:06:39 UTC
From:
To:
And when completing the filename, zsh should "fix" the accented
characters at the same time. So, I completely disagree with the
"must", as everything would work fine.

The fact that the file system (which regards a filename just as a
sequence of bytes) is poorly designed is not a valid reason for zsh
to follow this poor design. In particular, what the user types is
characters, not bytes. So, the shell has to do some interpretation
to make things work as the user expects (in particular because
completion is a user-oriented feature).

#374913#38
Date:
2009-01-01 18:22:10 UTC
From:
To:
reopen 374913
notfixed 4.3.6-7
severity 374913 wishlist
thanks

After all, zsh handles case-insensitivity completion on case-sensitive
filesystems. There's no reason why it shouldn't do something similar
for Unicode normalization. IMHO, this is even more important as the
user doesn't always have the way to choose how accented characters are
written (and/or doesn't necessarily know which form he should choose).