- Package:
- manpages-dev
- Source:
- manpages
- Submitter:
- Filipus Klutiero
- Date:
- 2011-10-06 16:24:05 UTC
- Severity:
- wishlist
- Tags:
strcoll(3) explains that: Unfortunately, it seems the way collation happens depending on the LC_COLLATE locale is unspecified. I couldn't find any description, even upstream.
reassign 644270 manpages-dev 3.32-0.2 retitle 644270 strcoll(3): please make locale(5) easier to find severity 644270 wishlist Hi, Filipus Klutiero wrote: The strcoll(3) manpage is from manpages-dev, not eglibc. Anyway, the behavior is as described in POSIX[1]. Clarifying text welcome. Hope that helps, Jonathan [1] http://www.unix.org/2008edition/
Hi again, Filipus Klutiero wrote: [...] Cc-ing the manpages-dev maintainers (sorry I forgot to do so before). Where to go from here? I believe giving a summary of the collation order chosen for each locale would be a lot of work for very little gain (unless there are some important details common to most locales), so I would suggest pointing to the locale generation tools and locale sources to help the reader to find these things out for herself instead. Anyway, your best bet is is to work with upstream, as described at [1], and let us know a relevant Message-ID so we can track your work. Making the platform's behavior more intuitive is certainly a valuable goal. Thanks! Hope that helps, Jonathan [1] http://www.kernel.org/doc/man-pages/contributing.html [2] http://www.unix.org/2008edition/
That sounds a reasonable approach to me. Cheers, Michael
Thanks Jonathan, I had looked at package contents to figure out the sequence of fr_CA, and had found /usr/share/i18n/locales/. I then looked at /usr/share/i18n/locales/fr_CA, then at en_CA, then at iso14651_t1, and finally at iso14651_t1_common. This is where I decided to stop guesswork and looked for actual documentation. I agree that not knowing the file's syntax was the final thing that discouraged me, but even seeing what locale(5) contains now is of little help (for me, it doesn't change anything). I did mean this bug as being about the lack of *specification* of collation. Linking to a manual giving hints on how to interpret the code is better than nothing, but only a fraction of users will dare going that way. This is not about strcoll's manpage. I probably shouldn't have mentioned strcoll() specifically, this is about collation in general. I believe this should be documented in glibc-doc-reference, in section 7 "Locales and Internationalization" and easily reachable from 5.6 Collation Functions. I think an even more general issue is that the influence of choosing a specific locale doesn't seem to be explained. The documentation explains what different locales can change, but not what each locale does. Debian's best-known interface to locale choice is dpkg-reconfigure locales. I'm not sure my dad would find it obvious that he wants to pick "fr_CA.UTF-8 UTF-8" there. I don't think specifying the collation order of each locale would give that little gain. What made me hit this issue is I was trying to determine what locale a multilingual program should use (the best compromise assuming that a single locale will be used). Collation is important, and I think many people wonder how it works. I however do agree that this will require important work. Anyway, if we stick to the issue of collation, the Unicode collation algorithm is documented on http://www.unicode.org/reports/tr10/ The specification is non-free, but specifying the parameters of each locale and linking to it would be enough for me. As for non-Unicode locales, I don't know. POSIX 7.3.2 does contain a nice amount of useful information. It clearly describes collating sequence definitions. It also gives the collating sequence definition of C. That one is quite accessible. Thanks for that too Jonathan.
Filipus Klutiero wrote: Thanks --- this information would have been useful in the original report. Was there was some particular question about the fr_CA collating sequence that you were looking for an answer to, or were you just curious in general? I guess there is also an implied bug report here regarding the locale(5) page. But there's not much use in splitting the bug into the various relevant tasks unless someone is actually doing the work of writing text. Patches implementing even partial progress towards your goal (a "SEE ALSO" here, a clarifying sentence there) would be welcome. Thanks for your interest. To be clear, I will not personally be working on this, since when I have time to write manpages, there are many others I would rather spend time on.
On 2011/10/6 Filipus Klutiero wrote: [...] [...] Hello, Glibc locales implement ISO 14651, not Unicode collation. Early drafts are available, for instance at http://www.dkuug.dk/jtc1/sc22/open/n2933.pdf Denis
Le 2011-10-06 02:27, Jonathan Nieder a écrit : To be clear, I explained my story to show that I made some efforts to learn about glibc's collation sequences, implying that I needed it because it had some importance. I didn't have a very particular question in mind. An application I developed used to collate without using strcoll(). I changed it to collate with strcoll(), and erroneously chose C.UTF-8. I was looking for another global locale which would be closer to the previous behavior. In particular, the previous sort allowed overriding how a string would sort prefixing it with special characters such as "-" or "[". I was wondering if/how a different locale would allow doing such hacks. There are probably many more things actually relying on the collation. Just to clarify, I consider this as a bug which should be [mostly] addressed in the glibc reference manual (I did mean the upstream tag). Having Debian maintainers verify the report, provide insight on it and forward it upstream would make them entirely meet my expectations (this has largely already been done).
Le 2011-10-06 03:51, D. Barbier a écrit : Oh, I guess that's a good point... Thanks a lot for this, I suppose this shows there's something to document. The standard also contains very useful information. Apparently the current standard (ISO/IEC 14651:2007) is also available (still non-free), both in English and French: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html The English version is http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip