#848578 /usr/bin/ts: charset issue with ts and localization

Package:
moreutils
Source:
moreutils
Description:
additional Unix utilities
Submitter:
Yves-Alexis Perez
Date:
2024-12-09 17:39:01 UTC
Severity:
normal
#848578#5
Date:
2016-12-18 15:01:39 UTC
From:
To:
Hi,

it seems that ts has some charset issues. My locales are:

LANG=fr_FR.utf8
LC_TIME="fr_FR.utf8"
LC_MESSAGES=en_US.UTF-8
LC_ALL=


When running ts, I have:

echo toto |ts
d�c. 18 15:57:51 toto

Notice the wrong encoding (should be déc). ts apparently outputs
character 0xe9 instead of 0xef.

#848578#10
Date:
2016-12-23 04:36:51 UTC
From:
To:
Dear Yves-Alexis,
able to get it.  Can you please try to help me?

 * I enabled 'fr_FR.UTF-8' and 'en_US.UTF-8' in /etc/locale.gen and ran
   'dpkg-reconfigure -p high locales'.
 * I reset my locale related environment variables in this order:

	export LC_ALL= LANG=

   then I start a new terminal with these settings, thus, 'locale' in
   the new terminal shows 'POSIX' to all LC_* variables. Then I issue

	export LC_ALL=
	export LANG=fr_FR.utf8
	export LC_TIME="fr_FR.utf8"
	export LC_MESSAGES=en_US.UTF-8

   and the 'locale' output is almost as expected:

	LANG=fr_FR.utf8
	LANGUAGE=
	LC_CTYPE="fr_FR.utf8"
	LC_NUMERIC="fr_FR.utf8"
	LC_TIME=fr_FR.utf8
	LC_COLLATE="fr_FR.utf8"
	LC_MONETARY="fr_FR.utf8"
	LC_MESSAGES=en_US.UTF-8
	LC_PAPER="fr_FR.utf8"
	LC_NAME="fr_FR.utf8"
	LC_ADDRESS="fr_FR.utf8"
	LC_TELEPHONE="fr_FR.utf8"
	LC_MEASUREMENT="fr_FR.utf8"
	LC_IDENTIFICATION="fr_FR.utf8"
	LC_ALL=

 * Running 'date', it seems to show the behaviour you described:

	vendredi 23 décembre 2016, 05:25:47 (UTC+0100)

   but with 'ts' it seems to be correct to me:

	$ echo bicycle repair man | ts
	déc. 23 05:26:07 bicycle repair man

An interesting point to me: when I am starting from my usual locale
settings (LANG=de_DE.UTF-8, LC_ALL=nb_NO.UTF.8) and then set the
specific values as written above, I get the same 'locale' output but

	$ date | ts
	déc. 23 05:32:11 vendredi 23 décembre 2016, 05:32:11 (UTC+0100)

seems to ok.

So, since 'ts' always shows me the correct output, I am not yet able to
reproduce your issue; but am confused about 'date' behaving as you
wrote about 'ts'...

Kind regards,
Nicolas

#848578#15
Date:
2016-12-23 04:36:51 UTC
From:
To:
Dear Yves-Alexis,
able to get it.  Can you please try to help me?

 * I enabled 'fr_FR.UTF-8' and 'en_US.UTF-8' in /etc/locale.gen and ran
   'dpkg-reconfigure -p high locales'.
 * I reset my locale related environment variables in this order:

	export LC_ALL= LANG=

   then I start a new terminal with these settings, thus, 'locale' in
   the new terminal shows 'POSIX' to all LC_* variables. Then I issue

	export LC_ALL=
	export LANG=fr_FR.utf8
	export LC_TIME="fr_FR.utf8"
	export LC_MESSAGES=en_US.UTF-8

   and the 'locale' output is almost as expected:

	LANG=fr_FR.utf8
	LANGUAGE=
	LC_CTYPE="fr_FR.utf8"
	LC_NUMERIC="fr_FR.utf8"
	LC_TIME=fr_FR.utf8
	LC_COLLATE="fr_FR.utf8"
	LC_MONETARY="fr_FR.utf8"
	LC_MESSAGES=en_US.UTF-8
	LC_PAPER="fr_FR.utf8"
	LC_NAME="fr_FR.utf8"
	LC_ADDRESS="fr_FR.utf8"
	LC_TELEPHONE="fr_FR.utf8"
	LC_MEASUREMENT="fr_FR.utf8"
	LC_IDENTIFICATION="fr_FR.utf8"
	LC_ALL=

 * Running 'date', it seems to show the behaviour you described:

	vendredi 23 décembre 2016, 05:25:47 (UTC+0100)

   but with 'ts' it seems to be correct to me:

	$ echo bicycle repair man | ts
	déc. 23 05:26:07 bicycle repair man

An interesting point to me: when I am starting from my usual locale
settings (LANG=de_DE.UTF-8, LC_ALL=nb_NO.UTF.8) and then set the
specific values as written above, I get the same 'locale' output but

	$ date | ts
	déc. 23 05:32:11 vendredi 23 décembre 2016, 05:32:11 (UTC+0100)

seems to ok.

So, since 'ts' always shows me the correct output, I am not yet able to
reproduce your issue; but am confused about 'date' behaving as you
wrote about 'ts'...

Kind regards,
Nicolas

#848578#20
Date:
2022-08-06 17:37:22 UTC
From:
To:
I can also reproduce this problem with `ts` while `date` works fine:

  $ date | ts
  ao� 06 10:33:36 sam 06 aoû 2022 10:33:36 PDT

  $ date
  sam 06 aoû 2022 10:35:39 PDT

  $ echo test | ts
  ao� 06 10:36:04 test

This is what my locale is set to:

  $ locale
  LANG=fr_CA.utf8
  LANGUAGE=
  LC_CTYPE="fr_CA.utf8"
  LC_NUMERIC="fr_CA.utf8"
  LC_TIME="fr_CA.utf8"
  LC_COLLATE="fr_CA.utf8"
  LC_MONETARY="fr_CA.utf8"
  LC_MESSAGES="fr_CA.utf8"
  LC_PAPER="fr_CA.utf8"
  LC_NAME="fr_CA.utf8"
  LC_ADDRESS="fr_CA.utf8"
  LC_TELEPHONE="fr_CA.utf8"
  LC_MEASUREMENT="fr_CA.utf8"
  LC_IDENTIFICATION="fr_CA.utf8"
  LC_ALL=

  $ cat /etc/locale.gen | grep -v '^#'
  en_CA.UTF-8 UTF-8
  en_NZ.UTF-8 UTF-8
  fr_CA.UTF-8 UTF-8

Let me know if there's anything else I can provide to help reproduce the
problem.

Francois

#848578#25
Date:
2022-09-07 19:42:31 UTC
From:
To:
Thanks for the details, finally I can reproduce this now.  When adding
three unicode related lines into 'ts' it _seems_ to me to be fixed, but
my perl experience is quite degraded and I'd like to get some feedback
from volunteer testers before forwarding the patch to upstream.

Francois, might you be able to patch your ts with the attached patch
and re-check?  As August has gone, I used

    echo test | faketime "2022-08-01" ./ts

for testing with your specified locale settings.

Kind regards,
Nicolas

#848578#30
Date:
2022-09-08 02:23:28 UTC
From:
To:
I've patched my /usr/bin/ts as you indicated and the above works well now:

  $ echo test | faketime "2022-01-04" ts
  jan 04 00:00:00 test
  $ echo test | faketime "2022-02-04" ts
  fév 04 00:00:00 test
  $ echo test | faketime "2022-07-04" ts
  jui 04 00:00:00 test
  $ echo test | faketime "2022-08-04" ts
  aoû 04 00:00:00 test

Also, faketime is really handy!

Thanks.

Francois

#848578#35
Date:
2022-10-07 18:14:29 UTC
From:
To:
Enable UTF-8 compatible processing of input and output to correctly output e.g.
timestamps containing non-latin letters (cp. [1]).

[1]: https://bugs.debian.org/848578

Signed-off-by: Nicolas Schier <nicolas@fjasle.eu>
---
 ts | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/ts b/ts
index af23cf7..fbd5b1a 100755
--- a/ts
+++ b/ts
@@ -54,6 +54,11 @@ use strict;
 use POSIX q{strftime};
 no warnings 'utf8';

+# Ensure that text read or printed are converted from/to UTF-8.
+binmode STDIN, ':utf8';
+binmode STDOUT, ':utf8';
+binmode STDERR, ':utf8';
+
 $|=1;

 my $rel=0;

#848578#40
Date:
2022-12-07 20:13:25 UTC
From:
To:
Dear Joey,

Are there chances that you still apply such patches?  After your call
for adoption: Is there some new maintainer for moreutils already
available?

Kind regards,
Nicolas

#848578#45
Date:
2023-01-02 16:57:32 UTC
From:
To:
Nicolas Schier wrote:

I'm still maintaining moreutils until I find someone else.

I am not considering new additions of tools to it any longer.

#848578#50
Date:
2023-01-02 16:59:52 UTC
From:
To:
Nicolas Schier wrote:

What if the input is not valid utf8? What if the user's locale is not
utf8 and the timestamp contains a character that is not utf8?

#848578#55
Date:
2023-03-02 09:39:45 UTC
From:
To:
I also sent this already directly to Joey, but later found this bug
report. My take is this:
--- /usr/bin/ts 2019-02-20 22:03:31.000000000 +0100 +++ ./ts 2023-03-01 21:06:41.177886024 +0100 @@ -53,6 +53,7 @@ use strict; use POSIX q{strftime}; no warnings 'utf8'; +use open q{:locale}; $|=1; This should ensure that the I/O is treated with the encoding indicated by the locale, be it UTF-8 or something else. Alrighty then, Thomas
#848578#60
Date:
2023-03-05 20:56:42 UTC
From:
To:
Thanks a lot!  I hope, Joey will pick this up, soon.

Kind regards,
Nicolas

#848578#65
Date:
2023-10-13 16:21:04 UTC
From:
To:
Enable perl locale support to ensure that I/O is treated with the
encoding indicated by the locale, be it UTF-8 or something else.

Link: https://perldoc.perl.org/perllocale#Unicode-and-UTF-8
Patch-by: Dr. Thomas Orgis <thomas.orgis@uni-hamburg.de>
Signed-off-by: Nicolas Schier <nicolas@fjasle.eu>
---
 ts | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ts b/ts
index af23cf7..71b0fbc 100755
--- a/ts
+++ b/ts
@@ -53,6 +53,7 @@ use warnings;
 use strict;
 use POSIX q{strftime};
 no warnings 'utf8';
+use open q{:locale};

 $|=1;

#848578#70
Date:
2024-12-09 17:31:20 UTC
From:
To:
What if ts is run on a file that is not encoded the same as the current
locale? I think this would probably make it crash, or possibly output a
corrupted version of the file?