#425040 id3v2: unicode support is broken and results in id3 tag data loss

Package:
libid3-3.8.3c2a
Source:
id3lib3.8.3
Submitter:
Marius Mikucionis
Date:
2026-06-17 04:21:02 UTC
Severity:
important
Tags:
#425040#5
Date:
2007-05-18 16:44:32 UTC
From:
To:
I have a bunch of unicode letters in tags, I ran the following:

find . -iname "*mp3" -exec id3v2 -C {} \;

All unicode letters got converted into question marks,
all id3 (including id3v1) info is lost.

thank you very much.

#425040#10
Date:
2007-07-21 18:56:27 UTC
From:
To:
# Automatically generated email from bts, devscripts version 2.9.6
# I have no reason to assume 0.1.11-1 doesnt have this problem
found 425040 0.1.11-1
tags 425040 + sarge etch lenny sid

#425040#19
Date:
2008-03-17 00:56:15 UTC
From:
To:
reassign 425040 id3lib3.8.3
thanks

The conversion code in id3v2 is trivial:

      myTag.Link(argv[nIndex], ID3TT_ALL);
      luint nTags;
      nTags = myTag.Update(ID3TT_ID3V2);

All the real work is done by functions in id3lib.

Ben.

#425040#24
Date:
2008-03-17 01:27:43 UTC
From:
To:
The conversion process does *not* remove ID3v1 tags, so you may be able
to recover by deleting the ID3v2 tags (id3v2 -d).

What encoding was used in the ID3v1 tags?  ID3v1 does not have any flag
to indicate encoding and is normally assumed to use ISO 8859-1.  Text
with this encoding seems to be converted correctly.

Ben.

#425040#33
Date:
2008-03-17 15:47:52 UTC
From:
To:
2008/3/17, Ben Hutchings <ben@decadent.org.uk>:

No, somehow this does not recover the information. See a test below.
AmaroK, easytag and others detect and display it correctly without any
assistance.
If they use LANG environment to guess the encoding then it must be UTF-8,
perhaps with some kind of smart fallback to ISO8859-13 when reading
(if id3 v1 *really* lacks the encoding info).

I did the following test:
1) recorded blank mp3
2) added/edited the tag with amarok (amarok and easytag display it
correctly, id3v2 shows that only id3 v1 tag is present, and UTF-8
characters are broken and interpreted as ISO8859-1)
3) did the conversion with "id3v2 -C" (id3v2 shows that id3 v1 and v2
are present, all tools show broken UTF-8 characters)
4) stripped with "id3v2 -d" (id3v2 shows that only id3 v1 tag is
present, all tools show broken UTF-8)

So my conclusion is that the other tools somehow know what is the
correct encoding and correctly interpret it, but id3v2 overwrites this
information effectively killing the method used by other tools.
Interestingly, easytag suggests to save some(?) tag information on
broken-by-id3v2 files, although I did not change anything. My blind
guess is that it found that the encoding information is missing and
wants to write something generic there, although results do not
improve (for obvious reasons).

I've put the files from the test here (perhaps you can dig it with hex dumps):
http://www.cs.aau.dk/~marius/id3v2

#425040#40
Date:
2008-06-14 18:51:13 UTC
From:
To:
severity 425040 normal
thanks

Hi Marius!

You wrote:

If so, then that's an additional feature of amarok, not a bug in libid3.
iso8859-1, not as utf8 [1].

Which makes sense, as the original id3v1 tags were broken
correctly (and thus shows the correct broken characters).

I tested it like this: make a blank mp3, and add a correct iso8859-1
id3v1 tag like this:

| id3v2 -a $(perl -C0 -e 'print "f".chr(245)."ob".chr(225)."r\n";') test.mp3

The perl here is to make sure correct iso8859-1 code is output in my
utf8 locale.  I then checked that the tags are correct by using

| id3v2 -l test.mp3|iconv -f latin1

which shows that a correct iso8859-1 tag was present.  I also tested
that vlc (which does not use libid3) shows this tag correctly.

I then converted the tag to id3v2 and removed the id3v1:

| id3v2 -C test.mp3
| id3v2 -s test.mp3

And again checked that the tag was shown correctly in vlc.

Conclusion: libid3 seems to work perfectly, although it does not handle
non-iso8859-1 id3v1 tags (which are broken anyway[1]).

However, I think there is a bug in id3v2 because it does not translate
the data it reads from the id3 tags to the currect locale, nor does it
interpret the user's input correctly as encoded in the correct locale.
I do not think that this warrants an RC bug however.

Kind regards,
Bas.

[1] Wikipedia states: "ID3v1 also lacked support for
	internationalization. It is stated in the standard that all the
	strings must be encoded in ISO-8859-1."

#425040#47
Date:
2008-06-26 21:31:13 UTC
From:
To:
2008/6/14 Bas Zoetekouw <bas@debian.org>:

Yes, I also arrived at this conclusion eventually :-(

Regardless of what is going to be written, perhaps a user should be
at least warned when something else than ISO8859-1 is attempted
to be saved into id3v1?
i.e. the message should say something like
"id3v1 format cannot handle the metadata you are trying to save, other
applications may be confused about non-ASCII encoding [Abort]
[Ignore and proceed] [Leave id3v1 untouched] [Convert to smth meaningfull]"

Cheers,

#425040#80
Date:
2026-06-17 04:18:34 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
id3lib3.8.3, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 213239@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Martin A. Godisch <godisch@debian.org> (supplier of updated id3lib3.8.3 package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Wed, 17 Jun 2026 06:02:08 +0200
Source: id3lib3.8.3
Architecture: source
Version: 3.8.3-21
Distribution: unstable
Urgency: low
Maintainer: Martin A. Godisch <godisch@debian.org>
Changed-By: Martin A. Godisch <godisch@debian.org>
Closes: 213239
Changes:
 id3lib3.8.3 (3.8.3-21) unstable; urgency=low
 .
   * Add 67-fix-utf8-text-encoding.patch: handle ID3TE_UTF8 as a single-byte
     encoding in the text-field read/write dispatch, the single-byte accessors
     and the fixed-size Clear() path (using ID3TE_IS_SINGLE_BYTE_ENC as the
     header recommends). UTF-8 text frames were routed through the UTF-16
     writer, which prepended a spurious byte-order mark and byte-swapped the
     data, producing corrupt frames that could not be read back. UTF-8 tags
     are now written and read as raw UTF-8 bytes. Closes: #213239.
   * Add 68-fix-utf16be-text-encoding.patch: handle ID3TE_UTF16BE as a
     double-byte encoding (per the ID3TE_IS_DOUBLE_BYTE_ENC guidance in
     globals.h). UTF-16BE text frames were not stored at all; they are now
     stored and serialised correctly (big-endian, without a byte-order mark)
     and read back into id3lib's host-order representation. UTF-16 (with BOM)
     and the single-byte paths are unchanged.
Checksums-Sha1:
 922219d01998511f1da202b67dda042b41b9e7f4 2169 id3lib3.8.3_3.8.3-21.dsc
 1a4e9c4bd698224eb9cdf31b1ff429a540edaba2 17076 id3lib3.8.3_3.8.3-21.debian.tar.xz
 42d58b33f9e5e55363550a5602a869383a30273c 7987 id3lib3.8.3_3.8.3-21_amd64.buildinfo
Checksums-Sha256:
 110a96428dca1de9ff4daf8c781779cb9fd1f600310b7b0fad005dcfe9129627 2169 id3lib3.8.3_3.8.3-21.dsc
 808b079b045ad01440729054c59260760e69e4e8772b56aa46c21e4b708b5a04 17076 id3lib3.8.3_3.8.3-21.debian.tar.xz
 0b843cd75f34fd0a90deb29969fa441cd7b498bd05559e3ad01b02199b631df1 7987 id3lib3.8.3_3.8.3-21_amd64.buildinfo
Files:
 af1acc92f38f06f684e4633616899822 2169 libs optional id3lib3.8.3_3.8.3-21.dsc
 c53c2cb4a44a8181d973d465d5d8507d 17076 libs optional id3lib3.8.3_3.8.3-21.debian.tar.xz
 9f4afadf21428d4f9c1648823da881e6 7987 libs optional id3lib3.8.3_3.8.3-21_amd64.buildinfo
-----BEGIN PGP SIGNATURE-----

iQJHBAEBCgAxFiEEGEIyO0/Pm5CZX6F/o1C5kfBaSFcFAmoyHSYTHGdvZGlzY2hA
ZGViaWFuLm9yZwAKCRCjULmR8FpIV92OD/9V+KL5yPgWfwcYcwjo/Ycapj5LYk5C
1dv5PfkZjjZXBwfokNZ4GhD1J4C0MSTbqCokevdZ+j0BLS7oosmg2xNF/OWMlTRh
W7G2unPXxO5m2sWROmqlVLy5Nc0Ga/DtX8N80d2Tmp0m//uJAyGgEO9KlAYRCRjC
Hri/WOM9SgU5gprUUZujZujijSVcEd6vWVXd9PtoBd64obD9WuzHAkZeVeJn4gmq
sM1nSwdDi5qXJXTYw1dSGVy8jhiaCw10jCbvhgDbntOFSZ5piUh83cPhIjxYT7pk
+033qWnC1YgWLiUzqwM2q7n31y3dlPGX8UAGAfEOR2SRsgPPpuSXzdMLkdqDuBAA
Qrf7shyFnBgpfKUtvs+gf8jR0ypX2uTK1enmXQofdUY+I3zRboRpXmGFUXYSmiia
mOuivT6CrXKm90Xlg3+ULF4iBnPaAI/mm3iVW0T+8ilP1mBGBzP29j+983XFPxX+
FY/A7MxWj5Om45t7Nr8Pdw8ihtnPOwFeX3/OhzKxUZQZc5WxYTEJtAc4a0ORstrg
CFF+BYD5OmZF/9cvD99rem8RpvnvXXj5169gn3osquVTAyULeqvf0ytC6SUbs+Am
2g6VxFLpO6AMI+YohIes7N3XswFUUNeHD98CslJA2VcrBMcIkyTYsAMnJ3vi9hJc
zZRUOgxC+PfE6w==
=SY1M
-----END PGP SIGNATURE-----