Dear Lintian Maintainers, Lintian skips and does not check source packages if arguments contain non-latin letters in filenames. And Lintian hangs indefinitely if it prints no other tags (from the remaining arguments). The bug only appears with .dsc files, .deb packages seem unaffected. Note that file paths in Linux are treated as arbitrary byte sequences and should not be encoded to UTF-8 nor decoded. $ lintian -L '>=classification' тест/hello_2.12.3-1.dsc Skipping ÑеÑÑ/hello_2.12.3-1.dsc: ÃÂà The bug can be worked around if one changes the current working directory before launching Lintian. $ ( cd тест && lintian -L '>=classification' hello_2.12.3-1.dsc )
Control: tags -1 moreinfo
Control: tags -1 patch
This happens because `$parent` is not valid utf8 while the file is. When we
concat them and try to check for the existence of the directory, it does not
work.
If both are bytes, it does. If both are utf8 (which is always the case except
the one you mention above), it works as well.
I have a one-liner patch for this (pasted at the end of this mail).
Encoding the file as utf8 does make sense, as otherwise packages are not
allowed in the debian archive.
But I am not sure if we should be consuming bytes. There might be a reason for
that implementation which I am not aware of at least -- given that there is
utf8 stuff everywhere, this may have been intended.
I'm CC'ing other maintainers for more advice.
diff --git a/bin/lintian b/bin/lintian
index 488b3790b..13673ec5d 100755
--- a/bin/lintian
+++ b/bin/lintian
@@ -733,6 +733,7 @@ for my $subject (@subjects) {
next
if $basename =~ m{/};
+ utf8::downgrade($basename) if !utf8::is_utf8($parent);
die encode_utf8("$parent/$basename does not exist, exiting\n")
unless -e "$parent/$basename";
and should not be used in production code. The utf8 flag depends on the string history, how it was processed, and hints to Perl whether fast byte algorithms for indexing and in the length function are allowed or not. I suppose utf8::downgrade can be called unconditionally. Or even better, decode_utf8 already loaded from the Encode module because it never fails. The deb822(5) manual page states: "All control files must be encoded in UTF-8." And we have no choice but to decode $basename back to raw octets before doing file operations. A more accurate solution would be to decode characters before setting the files property of $processable. However, I have not dived into Lintian internals enough to understand where the value is being parsed.
s/decode_utf8/encode_utf8/ Ah! I meant encoding to convert a character string into a byte string. Of course, double encoding in error handling in that loop is wrong and leads to an infinite freeze I mentioned in the first message.