#1008130 lintian: support/use multi-threads (currently single threaded and slow)

Package:
lintian
Source:
lintian
Submitter:
Samuel Henrique
Date:
2025-07-22 04:43:01 UTC
Severity:
wishlist
Tags:
#1008130#5
Date:
2022-03-22 22:10:32 UTC
From:
To:
I'd like to request lintian to make use of multiple threads when
performing its evaluations, I came to notice that running lintian
against the curl package takes a few seconds (on a powerful machine)
and it uses only a single thread.

I believe there could be noticeable performance gains from using all
the threads available. Although I don't know how feasible that is with
lintian+perl.

Note that I didn't go all the way to debugging lintian to confirm it's
single-threaded, I only noticed that I had a thread on 100% while
lintian was running and I'm considering this to be good evidence.
Worst case scenario the maintainer can clarify I'm wrong (I know
there's some chance lintian is actually multi-threaded but it was
waiting for something else that's single-threaded).

Thanks,

#1008130#10
Date:
2022-03-22 23:11:30 UTC
From:
To:
Hi Samuel,

I share your hope and have implemented two attempts to parallelize the
~300 or so checks.

My first attempt used IO::Async but failed. That module is probably
the best one currently available, but it replaces the SIGCHLD handler.
Lintian uses dozens of other modules that call external programs via
other means. Unfortunately, those do not interact well with IO::Async,
which causes the parallel execution to freeze or otherwise experience
strange bugs.

A particularly serious problem for Lintian was the interaction with
Path::Tiny. [1]

You may be able to find some details by searching the Git log for
"Heisenbug" (capital H, please).

My current implementation uses MCE [2] which works okay, but does not
yet yield the performance gains you and I are hoping for. That is why
the experimental branch has not been merged.

As far as I can tell, the degradation relates to the serializations
Perl performs between parent and child processes. It is possible to
"close" on the in-memory file indexes as part of the fork() but it's
not enough to explain the difference. (The indexes are large and also
being transitioned to disk for unrelated reasons.) Memory usage is
higher, as well.

I may have to implement better profiling before we make significant
progress. That is because at least half the time is spent generating
the file indexes, which require a different parallelization strategy
than the checks.

One long-term plan could be to have a data interchange format between
the parent and the child processes. It would also allow checks to be
written in other programming languages, such as Haskell, but I would
seek further community input before proceeding with anything like
that.

[1] https://github.com/dagolden/Path-Tiny/issues/224
[2] https://metacpan.org/pod/MCE

Perl performs surprisingly well for an interpreted language, but I am
not sure true "threading" works well. In Lintian, we use multiple
processes, if at all. That is how I interpreted your use of the word
"threads".

You are right. For the purposes of your analysis, Lintian uses a single process.

Thank you for your valuable suggestions!

Kind regards,
Felix Lechner

#1008130#17
Date:
2025-01-23 06:52:23 UTC
From:
To:
Is your work available anywhere?

Thanks,

#1008130#22
Date:
2025-01-23 18:27:06 UTC
From:
To:
I would be very happy if this were implemented.  To quote an email I sent debian-devel:

For point 2, it seems the easiest way to make a significant difference would be if lintain
could run multi-threaded.

My current development CPU has 8 physical cores hyper-threaded, which present
to the OS as 16 logical cores.  Most of the build process is multi-threaded
and uses all the cores to their maximum potential simultaneously.  But lintian
is single-threaded, so it only uses one core and the other 15 sit idle.  There might be some
lintian tests that depend on the output of other lintian tests, but I would imagine that
most of them could be run in parallel with the results combined at the end.

I don’t know enough Perl to know how easy it would be to run lintian in a multi-threaded
manner, but if this was not a difficult change it would speed up lintian runs dramatically.
In the case of qtwebengine-opensource-src on my hardware, assuming that all cores could
be efficiently utilized and there are no other bottlenecks in RAM or disk access, it would
drop lintian’s runtime from about 30 minutes to about 2 minutes.

https://lists.debian.org/debian-devel/2024/05/msg00169.html

#1008130#27
Date:
2025-07-22 04:40:41 UTC
From:
To:
The current blocker for multithreading is Lintian::Index, which is not
shareable/serializable at the moment so it can't be copied between all
the threads. Even if it was, it's a huge amount of data being copied
into each thread, and only a few checks need Lintian::Index.

Possible pathways:
- Make Lintian::Index shareable
- Implement some sort of communication between threads to allow a worker
  thread to RPC the Index in the main thread
- Refactor Lintian::Index and/or any checks that use it