#1012620 bash: UTF-8 file content is corrupted when read into variable and output to another file

Package:
bash
Source:
bash
Description:
GNU Bourne Again SHell
Submitter:
Vladimir Vinogradov
Date:
2022-06-11 06:45:03 UTC
Severity:
important
#1012620#5
Date:
2022-06-10 12:31:59 UTC
From:
To:
Dear Maintainer,

The error was detected by the script that generates Kubernetes Secret
objects. For safety, the script checked the text saved to the file with the
original content of the bash variable. For such a check the script read the
contents of the file into another variable and compared its value with the
original value.

The error appears consistently with LANG="en_US.UTF-8" in the Google Cloud
Shell container and in an AWS EC2 virtual machine created from an AMI
Debian 11 image
(AMI ID: ami-0fc5afe6a259e0ed4).
If LANG=C, then the error disappears.

The frequency of the error depends on the specific content of the bash
script (or its size) and some indirect circumstances of the script being
run.

It should be noted that the cause of the error is very difficult to
diagnose, so the number of users actually affected by it is unknown.

Attached to the letter is the script example with specific data, which
periodically generates the specified error. Run the script several times to
observe the error (~10):

   rm -f output1.txt output2.txt output3.txt
   MYUTF8=' ... ';
   echo -n "$MYUTF8" > "output1.txt"
   echo    "$( < output1.txt )" > output2.txt
   MYINPUT="$( < output1.txt )"
   echo    "$MYINPUT"           > output3.txt
   if ! diff output2.txt output3.txt ; then
      echo -e "\noutput2.txt and output3.txt files should be the same, but
they are different!\n"
   fi

Example of data corruption:

$ diff output2.txt output3.txt
11c11
<
 ввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввввв
---
 ввввввввввввввввввввввввввввввввввввввввввввввввввР�вввввввввввввввввввввввввввввввввввввв

Regards,
Vladimir Vinogradov

#1012620#10
Date:
2022-06-11 06:42:41 UTC
From:
To:
The bug stopped reproducing after installing bash 5.1-2+deb11u1 from
stable-proposed-updates (see
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003012).

But one question is still open: why did the same script behave differently
with the same input data?

I think bugreport can be completely closed after the bash upstream
maintainer answers this question.

Regards,
Vladimir Vinogradov