#641166 coreutils: 'man sort': '--random-sort' misleading

Package:
coreutils
Source:
coreutils
Description:
GNU core utilities
Submitter:
gwern
Date:
2015-10-19 16:30:03 UTC
Severity:
minor
#641166#5
Date:
2011-09-10 23:44:21 UTC
From:
To:
The existing documentation for the option:

       -R, --random-sort
              sort by random hash of keys

This is not wrong, strictly-speaking, but it is misleading: sorting by random hash *sounds* like a perfect shuffle,
which is what 99% of users want, and sorting by hash is equivalent if and if only there are no duplicate entries.
If there *are* duplicate entries, then the 'random' sort will put all duplicates in consecutive runs.

I suggest amending the line to read more like

              sort by random hash of keys; equivalent to perfect shuffle on unique keys

or maybe just say

              sort by random hash of keys; not the same as a perfect shuffle

Or at least warn in some fashion that 'random' is not quite what 'random' usually means on lists.

(I do random shuffle with mplayer using 'sort -R', and once, to 'bias' the selection to a particular set of songs, I put that directory in
3 or 4 times; I thought I was going crazy when the first such song came up 3 times, which I calculated at billions to one against.
I checked everything until I began to wonder what exactly 'random hash of keys' meant, and then saw how it treated duplicate entries.)

#641166#10
Date:
2011-09-12 14:01:07 UTC
From:
To:
gwern wrote:

Thanks for the suggestion, but we try to keep the man page pretty terse,
since it's automatically derived from sort --help output.

Did you see the "real" documentation?

  `-R'
  `--random-sort'
  `--sort=random'
       Sort by hashing the input keys and then sorting the hash values.
       Choose the hash function at random, ensuring that it is free of
       collisions so that differing keys have differing hash values.
       This is like a random permutation of the inputs (*note shuf
       invocation::), except that keys with the same value sort together.

       If multiple random sort fields are specified, the same random hash
       function is used for all fields.  To use different random hash
       functions for different fields, you can invoke `sort' more than
       once.

       The choice of hash function is affected by the `--random-source'
       option.

There should be a note like this the end of the man page:

 SEE ALSO
       The  full documentation for sort is maintained as a Texinfo manual.  If
       the info and sort programs are properly installed  at  your  site,  the
       command

              info sort

       should give you access to the complete manual.

#641166#15
Date:
2011-09-12 15:15:35 UTC
From:
To:
No; when I was younger, I sometimes looked at the info page for
commands, but invariably they seemed to be useless or copies of the
man page, and I wrote them off completely as a strange GNU waste of
time akin to Guile or other GNU quirks. If length is the problem, how
about adding '; not perfect shuffle'? 3 words in the places most
people will look for documentation.

#641166#20
Date:
2011-09-12 15:55:38 UTC
From:
To:
I can almost guarantee that would lead to a bug report asking for that
to be explained.

Mike Stone

#641166#25
Date:
2011-09-12 16:06:40 UTC
From:
To:
Gwern Branwen wrote:

Please try to reset your misconception, at least for the coreutils.
Though, in general the info documentation for GNU programs is far
superior to the man pages.

Sorry, but that would be inaccurate, because sometimes (no duplicates),
it does give you a perfect shuffle.

#641166#30
Date:
2015-10-07 08:43:16 UTC
From:
To:
Hi Micheal,

A conversation on #debconf-team this morning which mentioned the use of
sort -R revealed that people that have been bitten by the quirks of
sort -R have a folkloric understanding that there's something not quite
right about it compared with what they want, but that even people that
understand that are not necessarily aware of shuf.

Since shuf does the thing that people are most often wanting, how about
just adding a note to the -R option to say something like:

  you probably want shuf(1) instead

As a data point, I've been aware that GNU commands are documented in
info for over two decades, but having uselessly invoked info only to be
looking at a strangely formatted version of the man page again (because
the real info has been kicked into non-free due to GFDL problems) I
don't think it would occur to me to consult the info if looking at the
sort man page.  Even if I did, I might well miss the importance of the note
reference to shuf.

Given that there is not even a mention of info in the man page, the fact
that the documentation in info is better than in man seems to be a poor
reason not improve the documentation that most people consult.

Cheers, Phil.

#641166#35
Date:
2015-10-19 16:20:50 UTC
From:
To: