Hi, it seems that ts has some charset issues. My locales are: LANG=fr_FR.utf8 LC_TIME="fr_FR.utf8" LC_MESSAGES=en_US.UTF-8 LC_ALL= When running ts, I have: echo toto |ts d�c. 18 15:57:51 toto Notice the wrong encoding (should be déc). ts apparently outputs character 0xe9 instead of 0xef.
Dear Yves-Alexis, able to get it. Can you please try to help me? * I enabled 'fr_FR.UTF-8' and 'en_US.UTF-8' in /etc/locale.gen and ran 'dpkg-reconfigure -p high locales'. * I reset my locale related environment variables in this order: export LC_ALL= LANG= then I start a new terminal with these settings, thus, 'locale' in the new terminal shows 'POSIX' to all LC_* variables. Then I issue export LC_ALL= export LANG=fr_FR.utf8 export LC_TIME="fr_FR.utf8" export LC_MESSAGES=en_US.UTF-8 and the 'locale' output is almost as expected: LANG=fr_FR.utf8 LANGUAGE= LC_CTYPE="fr_FR.utf8" LC_NUMERIC="fr_FR.utf8" LC_TIME=fr_FR.utf8 LC_COLLATE="fr_FR.utf8" LC_MONETARY="fr_FR.utf8" LC_MESSAGES=en_US.UTF-8 LC_PAPER="fr_FR.utf8" LC_NAME="fr_FR.utf8" LC_ADDRESS="fr_FR.utf8" LC_TELEPHONE="fr_FR.utf8" LC_MEASUREMENT="fr_FR.utf8" LC_IDENTIFICATION="fr_FR.utf8" LC_ALL= * Running 'date', it seems to show the behaviour you described: vendredi 23 décembre 2016, 05:25:47 (UTC+0100) but with 'ts' it seems to be correct to me: $ echo bicycle repair man | ts déc. 23 05:26:07 bicycle repair man An interesting point to me: when I am starting from my usual locale settings (LANG=de_DE.UTF-8, LC_ALL=nb_NO.UTF.8) and then set the specific values as written above, I get the same 'locale' output but $ date | ts déc. 23 05:32:11 vendredi 23 décembre 2016, 05:32:11 (UTC+0100) seems to ok. So, since 'ts' always shows me the correct output, I am not yet able to reproduce your issue; but am confused about 'date' behaving as you wrote about 'ts'... Kind regards, Nicolas
Dear Yves-Alexis, able to get it. Can you please try to help me? * I enabled 'fr_FR.UTF-8' and 'en_US.UTF-8' in /etc/locale.gen and ran 'dpkg-reconfigure -p high locales'. * I reset my locale related environment variables in this order: export LC_ALL= LANG= then I start a new terminal with these settings, thus, 'locale' in the new terminal shows 'POSIX' to all LC_* variables. Then I issue export LC_ALL= export LANG=fr_FR.utf8 export LC_TIME="fr_FR.utf8" export LC_MESSAGES=en_US.UTF-8 and the 'locale' output is almost as expected: LANG=fr_FR.utf8 LANGUAGE= LC_CTYPE="fr_FR.utf8" LC_NUMERIC="fr_FR.utf8" LC_TIME=fr_FR.utf8 LC_COLLATE="fr_FR.utf8" LC_MONETARY="fr_FR.utf8" LC_MESSAGES=en_US.UTF-8 LC_PAPER="fr_FR.utf8" LC_NAME="fr_FR.utf8" LC_ADDRESS="fr_FR.utf8" LC_TELEPHONE="fr_FR.utf8" LC_MEASUREMENT="fr_FR.utf8" LC_IDENTIFICATION="fr_FR.utf8" LC_ALL= * Running 'date', it seems to show the behaviour you described: vendredi 23 décembre 2016, 05:25:47 (UTC+0100) but with 'ts' it seems to be correct to me: $ echo bicycle repair man | ts déc. 23 05:26:07 bicycle repair man An interesting point to me: when I am starting from my usual locale settings (LANG=de_DE.UTF-8, LC_ALL=nb_NO.UTF.8) and then set the specific values as written above, I get the same 'locale' output but $ date | ts déc. 23 05:32:11 vendredi 23 décembre 2016, 05:32:11 (UTC+0100) seems to ok. So, since 'ts' always shows me the correct output, I am not yet able to reproduce your issue; but am confused about 'date' behaving as you wrote about 'ts'... Kind regards, Nicolas
I can also reproduce this problem with `ts` while `date` works fine: $ date | ts ao� 06 10:33:36 sam 06 aoû 2022 10:33:36 PDT $ date sam 06 aoû 2022 10:35:39 PDT $ echo test | ts ao� 06 10:36:04 test This is what my locale is set to: $ locale LANG=fr_CA.utf8 LANGUAGE= LC_CTYPE="fr_CA.utf8" LC_NUMERIC="fr_CA.utf8" LC_TIME="fr_CA.utf8" LC_COLLATE="fr_CA.utf8" LC_MONETARY="fr_CA.utf8" LC_MESSAGES="fr_CA.utf8" LC_PAPER="fr_CA.utf8" LC_NAME="fr_CA.utf8" LC_ADDRESS="fr_CA.utf8" LC_TELEPHONE="fr_CA.utf8" LC_MEASUREMENT="fr_CA.utf8" LC_IDENTIFICATION="fr_CA.utf8" LC_ALL= $ cat /etc/locale.gen | grep -v '^#' en_CA.UTF-8 UTF-8 en_NZ.UTF-8 UTF-8 fr_CA.UTF-8 UTF-8 Let me know if there's anything else I can provide to help reproduce the problem. Francois
Thanks for the details, finally I can reproduce this now. When adding
three unicode related lines into 'ts' it _seems_ to me to be fixed, but
my perl experience is quite degraded and I'd like to get some feedback
from volunteer testers before forwarding the patch to upstream.
Francois, might you be able to patch your ts with the attached patch
and re-check? As August has gone, I used
echo test | faketime "2022-08-01" ./ts
for testing with your specified locale settings.
Kind regards,
Nicolas
I've patched my /usr/bin/ts as you indicated and the above works well now: $ echo test | faketime "2022-01-04" ts jan 04 00:00:00 test $ echo test | faketime "2022-02-04" ts fév 04 00:00:00 test $ echo test | faketime "2022-07-04" ts jui 04 00:00:00 test $ echo test | faketime "2022-08-04" ts aoû 04 00:00:00 test Also, faketime is really handy! Thanks. Francois
Enable UTF-8 compatible processing of input and output to correctly output e.g.
timestamps containing non-latin letters (cp. [1]).
[1]: https://bugs.debian.org/848578
Signed-off-by: Nicolas Schier <nicolas@fjasle.eu>
---
ts | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/ts b/ts
index af23cf7..fbd5b1a 100755
--- a/ts
+++ b/ts
@@ -54,6 +54,11 @@ use strict;
use POSIX q{strftime};
no warnings 'utf8';
+# Ensure that text read or printed are converted from/to UTF-8.
+binmode STDIN, ':utf8';
+binmode STDOUT, ':utf8';
+binmode STDERR, ':utf8';
+
$|=1;
my $rel=0;
Dear Joey, Are there chances that you still apply such patches? After your call for adoption: Is there some new maintainer for moreutils already available? Kind regards, Nicolas
Nicolas Schier wrote: I'm still maintaining moreutils until I find someone else. I am not considering new additions of tools to it any longer.
Nicolas Schier wrote: What if the input is not valid utf8? What if the user's locale is not utf8 and the timestamp contains a character that is not utf8?
I also sent this already directly to Joey, but later found this bug report. My take is this:--- /usr/bin/ts 2019-02-20 22:03:31.000000000 +0100 +++ ./ts 2023-03-01 21:06:41.177886024 +0100 @@ -53,6 +53,7 @@ use strict; use POSIX q{strftime}; no warnings 'utf8'; +use open q{:locale}; $|=1; This should ensure that the I/O is treated with the encoding indicated by the locale, be it UTF-8 or something else. Alrighty then, Thomas
Thanks a lot! I hope, Joey will pick this up, soon. Kind regards, Nicolas
Enable perl locale support to ensure that I/O is treated with the
encoding indicated by the locale, be it UTF-8 or something else.
Link: https://perldoc.perl.org/perllocale#Unicode-and-UTF-8
Patch-by: Dr. Thomas Orgis <thomas.orgis@uni-hamburg.de>
Signed-off-by: Nicolas Schier <nicolas@fjasle.eu>
---
ts | 1 +
1 file changed, 1 insertion(+)
diff --git a/ts b/ts
index af23cf7..71b0fbc 100755
--- a/ts
+++ b/ts
@@ -53,6 +53,7 @@ use warnings;
use strict;
use POSIX q{strftime};
no warnings 'utf8';
+use open q{:locale};
$|=1;
What if ts is run on a file that is not encoded the same as the current locale? I think this would probably make it crash, or possibly output a corrupted version of the file?