#1029703 bsdextrautils: column -l (--table-columns-limit) mangles data with two spaces

Package:
bsdextrautils
Source:
bsdextrautils
Description:
extra utilities from 4.4BSD-Lite
Submitter:
Date:
2023-08-09 13:39:10 UTC
Severity:
normal
Tags:
#1029703#5
Date:
2023-01-26 12:43:40 UTC
From:
To:
Dear Maintainer,

The -l (--table-column-limit) option to the "column" utility does not
work correctly for data that has more than one space in a row. It is
supposed to specify a maximum number of columns and the last column
will contain all remaining line data.

An example of how it is supposed to work can be seen when the input is
delimited by a single space. For example:

    $ printf '1 2 3 4 5 6\nOne Two Three Four Five Six\n' \
	    | column -t -l5

    1    2    3      4     5 6
    One  Two  Three  Four  Five Six

Note how column 5 is the maximum column, so the data for column six is
simply appended. That is the correct behavior.

However, the problem can be easily triggered by simply piping the
output from column back into itself. This should be a no-op, but
instead mangles the data:

    $ printf '1 2 3 4 5 6\nOne Two Three Four Five Six\n' \
	    | column -t -l5 \
	    | column -t -l5

    1    2    3      4       3
    One  Two  Three  Four  ur

As you can see, the fifth column has been overwritten by data from
previous columns. (Perhaps a pointer problem?)

Any data with multiple spaces will trigger the bug. For example, the
output from 'ls -l':

    $ ls -lh | column -t -l7
    total       500K
    drwxr-xr-x  2     ben  ben    4.0K  Jan  an
    -rwxr-xr-x  1     ben  ben    2.7K  Jul  ul
    drwxr-xr-x  5     ben  ben    4.0K  Dec  ec
    -rw-r--r--  1     ben  ben    116K  Nov  ov
    -rw-r--r--  1     ben  ben    31K   Nov  Nov
    drwxr-xr-x  2     ben  ben    4.0K  Mar  ar
    -rw-r--r--  1     ben  ben    225   Oct  Oct
    drwxr-xr-x  2     ben  ben    12K   Jan  Jan
    drwxr-xr-x  12    ben  ben    260K  Jan  n



	    *    *    *    *    *

This may be irrelevant, but I noticed in the source that there is some
code which seems suspicious at lines 459 and 470:

   457		if (ctl->maxncols && n + 1 == ctl->maxncols) {
   458			if (nchars + skip < len)
-> 459				wcdata = wcs0 + (nchars + skip);
   460			else
   461				wcdata = NULL;
   462		} else {
   463			wcdata = local_wcstok(ctl, wcs, &sv);
   464
   465			/* For the default separator ('greedy' mode) it uses
   466			 * strtok() and it skips leading white chars. In this
   467			 * case we need to remember size of the ignored white
   468			 * chars due to wcdata calculation in maxncols case */
   469			if (wcdata && ctl->greedy
-> 470			    && n == 0 && nchars == 0 && wcdata > wcs)
   471				skip = wcdata - wcs;
   472		}

In 459, pointer arithmetic is being done to index into the string for
the last column. However, it is a few bytes shy, perhaps because skip
is always zero. In my experiments, the test in 469-470 always failed,
thus `skip` is never changed.

The reference to wide characters made me wonder if that was the issue,
but neither export LANG=C nor recompiling with HAVE_WIDECHAR=0 helped.