As my whole system is now UTF-8 aware, I have also re-tagged all my mp3s and oggs. As oggs are currently using UTF-8 by default, I have a problem with mp3 - apparently, no mp3-tagger for linux has implemented setting the encoding flags of the text field - they are all marked as latin1. Of course, I didn't notice this until I have read the specification :( So this is a wish for upstream to implement this feature, possibly both setting and getting the encoding of a particular frame (without the recoding of the frame) to be able to write scripts for automagical tagging.
The attached patch adds two new options to id3v2, '-u' will set the string encoding to UTF16BE (id3v2 text encoding type 2), '-U' will set the string encoding to UTF8 (id3v2 text encoding type 3). Note that with only this patch by itself the two new options won't work as id3lib is quite broken in this regard. But at least it shouldn't break the existing behaviour and in combination with an id3lib patch I'm currently preparing for submission will allow real unicode tags to be written. A basically working preview version of the id3lib patch is at http://ranmachan.dyndns.org/~ranma/id3lib.unicode.20050708.patch With it id3lib treats the strings as being in locale encoding by default and will convert them to either iso-8859-1, utf-16 or utf-8 depending on which encoding is selected as desired by id3v2. The reading function is also patched and will return the output in locale encoding after the above patch.
returns it in the locale encoding. An explicit encoding selector such as the proposed -u (UTF-8, text-encoding type 3) is intentionally not added: UTF-8 and UTF-16BE are ID3v2.4 encodings, while id3v2/id3lib writes ID3v2.3.0. For explicit-encoding or ID3v2.4 needs, use mid3v2(1). I'm closing this bug report. Martin