#527157 id3v2: written data depends on system locale, which can lead to corruption

Package:
id3v2
Source:
id3v2
Description:
command line id3v2 tag editor
Submitter:
Tino Keitel
Date:
2026-06-19 21:11:01 UTC
Severity:
normal
#527157#5
Date:
2009-05-05 21:47:18 UTC
From:
To:
When I want to set a tag, the text provided in the commandline is written
unmodified. This leads to data corruption if the system locale is not
latin1 (or ISO 8859-1). My system is set to UTF8 locale, so that the
following happens:

$ id3v2 --TIT2 "test" 07\ -\ Parties\ in\ München.mp3

Now let's see what a reliable tool (eyeD3) shows:

$ eyeD3 07\ -\ Parties\ in\ München.mp3
[...]
ID3 v2.3:
title: test		artist: Superpunk

Now I set the real title of the track, which contains a german umlaut:

$ id3v2 --TIT2 "Parties in München" 07\ -\ Parties\ in\ München.mp3

And this is the eyeD3 output:

$ eyeD3 --debug 07\ -\ Parties\ in\ München.mp3
[snip]
eyeD3 trace> FrameSet: Reading Frame #6
eyeD3 trace> FrameHeader [start byte]: 91 (0x5B)
eyeD3 trace> FrameHeader [id]: TIT2 (0x54495432)
eyeD3 trace> FrameHeader [data size]: 20 (0x14)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0)
dl(0)
eyeD3 trace> FrameSet: Reading 20 (0x14) bytes of data from byte pos 101
(0x65)
eyeD3 trace> FrameSet: 20 bytes of data read
eyeD3 trace> TextFrame encoding: latin_1
eyeD3 trace> TextFrame text: Parties in München
[snip]
ID3 v2.3:
title: Parties in München		artist: Superpunk

The UTF8 data from the terminal was put into the ID3 tag, which is wrong in
2 points:

1. the charset is set to latin_1

2. ID3 v2.3 doesn't support UTF8, only UTF16

If the user wants to put correctly encoded data into the tags, he/she has to
make sure to convert it to the correct encoding (latin1). He/She seems to be
completely lost if the input data contains characters which are not present
in the latin_1 charset.

Regards,
Tino

#527157#10
Date:
2013-04-22 02:27:49 UTC
From:
To:
fixed 527157 0.1.12-2.1
thanks

I believe this is essentially the "write" version of the "read" problem
in bug #559998, and both should be fixed in the patch recently NMU'd in
0.1.12-2.1.

The charset issues are overall still pretty poor, though. Since both
id3v2 and id3lib upstreams have been dormant for years, I believe the
Debian maintainer (Stefan Ott) was planning to port id3v2 from id3lib to
taglib, which is maintained upstream and handles charsets in a sane way.
But I'm not sure what the status of the port is.

Best,
Mark

#527157#17
Date:
2026-06-19 21:04:06 UTC
From:
To:
Already fixed. The command-line text frames are written through id3lib's
charset-aware path (SetID3EncText), which interprets input in the system
locale and stores it as ISO-8859-1 when representable or UTF-16
otherwise, with the correct text-encoding marker. This is the write-side
counterpart of #559998 and was fixed by the charset-conversion patch
NMU'd in 0.1.12-2.1, so no raw locale bytes are stored mislabelled as
Latin-1 anymore. I'm closing this bug report.

Martin