#1137319 lintian: skipping non-latin arguments

Package:
lintian
Source:
lintian
Submitter:
Nicholas Guriev
Date:
2026-05-25 17:33:02 UTC
Severity:
normal
Tags:
#1137319#5
Date:
2026-05-22 12:23:40 UTC
From:
To:
Dear Lintian Maintainers,

Lintian skips and does not check source packages if arguments contain non-latin
letters in filenames. And Lintian hangs indefinitely if it prints no other tags
(from the remaining arguments). The bug only appears with .dsc files, .deb
packages seem unaffected.

Note that file paths in Linux are treated as arbitrary byte sequences and
should not be encoded to UTF-8 nor decoded.


$ lintian -L '>=classification' тест/hello_2.12.3-1.dsc
Skipping ÑеÑÑ/hello_2.12.3-1.dsc: ÃÂÃ


The bug can be worked around if one changes the current working directory
before launching Lintian.

$ ( cd тест && lintian -L '>=classification' hello_2.12.3-1.dsc )

#1137319#10
Date:
2026-05-22 19:09:07 UTC
From:
To:
Control: tags -1 moreinfo
Control: tags -1 patch

This happens because `$parent` is not valid utf8 while the file is. When we
concat them and try to check for the existence of the directory, it does not
work.

If both are bytes, it does. If both are utf8 (which is always the case except
the one you mention above), it works as well.

I have a one-liner patch for this (pasted at the end of this mail).
Encoding the file as utf8 does make sense, as otherwise packages are not
allowed in the debian archive.

But I am not sure if we should be consuming bytes. There might be a reason for
that implementation which I am not aware of at least -- given that there is
utf8 stuff everywhere, this may have been intended.

I'm CC'ing other maintainers for more advice.

diff --git a/bin/lintian b/bin/lintian
index 488b3790b..13673ec5d 100755
--- a/bin/lintian
+++ b/bin/lintian
@@ -733,6 +733,7 @@ for my $subject (@subjects) {
             next
               if $basename =~ m{/};

+           utf8::downgrade($basename) if !utf8::is_utf8($parent);
             die encode_utf8("$parent/$basename does not exist, exiting\n")
               unless -e "$parent/$basename";

#1137319#19
Date:
2026-05-25 16:30:04 UTC
From:
To:
and should not be used in production code. The utf8 flag depends on the string
history, how it was processed, and hints to Perl whether fast byte algorithms
for indexing and in the length function are allowed or not. I suppose
utf8::downgrade can be called unconditionally. Or even better, decode_utf8
already loaded from the Encode module because it never fails.

The deb822(5) manual page states: "All control files must be encoded in
UTF-8." And we have no choice but to decode $basename back to raw octets
before doing file operations. A more accurate solution would be to decode
characters before setting the files property of $processable. However, I have
not dived into Lintian internals enough to understand where the value is being
parsed.

#1137319#24
Date:
2026-05-25 17:30:05 UTC
From:
To:
s/decode_utf8/encode_utf8/

Ah! I meant encoding to convert a character string into a byte string.

Of course, double encoding in error handling in that loop is wrong and leads
to an infinite freeze I mentioned in the first message.