Lintian issues these errors for putty 0.77-1:
E: putty source: source-is-missing [doc/html/AppendixA.html]
E: putty source: source-is-missing [doc/html/AppendixB.html]
E: putty source: source-is-missing [doc/html/AppendixE.html]
E: putty source: source-is-missing [doc/html/Chapter10.html]
E: putty source: source-is-missing [doc/html/Chapter2.html]
E: putty source: source-is-missing [doc/html/Chapter3.html]
E: putty source: source-is-missing [doc/html/Chapter4.html]
E: putty source: source-is-missing [doc/html/Chapter5.html]
E: putty source: source-is-missing [doc/html/Chapter7.html]
E: putty source: source-is-missing [doc/html/Chapter8.html]
E: putty source: source-is-missing [doc/html/Chapter9.html]
E: putty source: source-is-missing [doc/html/IndexPage.html]
This is pretty oversensitive. Firstly, it's HTML, which is still often
enough written by hand anyway. As it happens, these particular HTML
files are generated from halibut input that's also provided in the
source package, though I can't see how Lintian could possibly expect to
know that.
I tried to work out whether I should be overriding this or whether it's
a bug in Lintian, and I think it's the latter. The current relevant
code is this in lib/Lintian/Check/Files/SourceMissing.pm:
sub visit_patched_files {
my ($self, $item) = @_;
return
unless $item->is_file;
[...]
return
if !defined $longest || $line_length{$longest} <= $VERY_LONG_LINE_LENGTH;
[...]
if ($item->basename =~ /\.(?:x?html?\d?|xht)$/i) {
# html file
$self->pointed_hint('source-is-missing', $item->pointer)
unless $self->find_source($item, {'.fragment.js' => $DOLLAR});
}
return;
}
So it issues a diagnostic for every HTML file with a somewhat long line
(over 512 characters) unless it has an associated .fragment.js somewhere
(I think - the find_source sub is undocumented and a bit obscure to me)?
That doesn't sound right - surely that would catch far too many false
positives.
Next, I went looking through git history to try to figure out where this
was introduced. I found this commit:
https://salsa.debian.org/lintian/lintian/-/commit/4f24ab7fca
The commit message makes it sound as though it was probably just
refactoring, but it wasn't. The corresponding bit of code there was
previously in a warn_prebuilt_javascript sub called from a
warn_long_lines sub, which in turn was called in two places: once for
certain kinds of .js files, and once from this sub:
# check javascript in html file
sub check_html_cruft {
my ($self, $item, $lowercase) = @_;
my $blockscript = $lowercase;
my $indexscript;
while (($indexscript = index($blockscript, '<script')) > $ITEM_NOT_FOUND) {
$blockscript = substr($blockscript,$indexscript);
# sourced script ok
if ($blockscript =~ m{\A<script\s+[^>]*?src="[^"]+?"[^>]*?>}sm) {
$blockscript = substr($blockscript,$+[0]);
next;
}
# extract script
if ($blockscript =~ m{<script[^>]*?>(.*?)</script>}sm) {
$blockscript = substr($blockscript,$+[0]);
my $lcscript = $1;
$self->check_js_script($item, $lcscript);
return 0
if $self->warn_long_lines($item, $lcscript);
next;
}
# here we know that we have partial script. Do the check nevertheless
# first check if we have the full <script> tag and do the check
# if we get <script src=" "
# then skip
if ($blockscript =~ /\A<script[^>]*?>/sm) {
$blockscript = substr($blockscript,$+[0]);
$self->check_js_script($item, $blockscript);
}
return 0;
}
return 1;
}
This made much more sense! I could get on board with issuing a
diagnostic for <script> tags in HTML files that look like unminified
JavaScript, and that appears to be what this check was originally meant
to do. Unfortunately, it looks like that extra logic was dropped in
this "Further rationalize cruft check; separate concerns" commit, and
now we have a very much broader check on HTML files with no indication
that this change was intentional. Something like this <script> check is
still present in Lintian, but in a different context, and it's no longer
used for the source-is-missing check.
I suggest restoring something like this code to check for <script> tags
around the source-is-missing check for HTML files. I suspect that this
might also deal with reports such as #1017094 and #1017966, though I've
filed this separately as I'm not sure of that.
Thanks,
HTML is very often generated and there are many different ways to generate it. I think the right thing for lintian to do here is to know about more of the source formats and when there is generated HTML in the tarball but source is also present, then emit a new lower severity generated-files tag instead of the existing source-is-missing tag. I think the right thing for putty here is for upstream to remove the HTML from their VCS and tarballs, then add the generation process to their build system and continuous integration, so that they always know when there are problems with generating the HTML. If they refuse then you could exclude the HTML from Debian's copy of the upstream tarball. Until either lintian changes or the putty HTML gets removed, overriding the lintian warning in putty seems the correct thing to do. PS: I note that manual pages are similar to HTML in this regard and I think the same reasoning above applies to the putty manual pages and to lintian's treatment of manual pages in source packages. If that is done, I think lintian should add more heuristics to detect other generated HTML. The halibut generated HTML doesn't make that easy but there are some signals that can be added I think, like this: halibut-1.3/bk_html.c: html_raw(&ho, "<!-- version IDs:\n");
The HTML files have never been in PuTTY upstream's VCS. They are generated automatically as part of PuTTY's build system for release tarballs, as a convenience to people who want to build PuTTY without Halibut, since it's a somewhat niche documentation tool. Since I agree with upstream that this is a reasonable convenience, I'm not going to ask them to stop doing it. We're not talking about opaque object code here. This is perfectly readable plain HTML that just happens to be generated from another perfectly readable text format. It's not the preferred form of modification, sure (I wouldn't edit it directly since I have the Halibut input files available, but if nobody told me that those existed then I'd happily edit the HTML without even noticing), but this package isn't covered by the GPL so that's not very relevant. I'm not going to waste a second on editing Debian's copy of the upstream tarball for this complete non-issue. I already take care to ensure that the package rebuilds the documentation from source, and there's no DFSG issue with the pre-generated files being present so there's no reason to remove them from the tarball. The only reason that the presence of pre-generated files is even coming up is because Lintian's heuristics are misfiring in a way that seems clearly incorrect and probably unintentional. Done. Firstly, that's conditional and not present in the generated PuTTY documentation. (I've sent a patch upstream to add a suitable <meta name="generator"> tag, since that seems like a reasonable thing to include in any event.) Secondly, with respect, this is a distraction from the point of this bug. Feel free to file a separate bug for more detailed heuristics, but Lintian should start by making its current heuristics not entirely wrong (the presence of a .fragment.js file obviously has nothing to do with whether general HTML files are generated, only ones that have certain kinds of <script> tags). That's what I'm requesting here. Thanks,
Am Sun, Sep 18, 2022 at 12:26:20AM +0100 schrieb Colin Watson:
I do not think that the issue of included data:image/png is the same as
included compressed JS. I admit I don't mind much if those lots of
false positives in R packages could be avoided.
Its just a fact that all html documentation that comes in R packages is
accompanied by an Rmd file with the same name. The html file that is
rendered with tools that are packaged in Debian and thus could be
reproduced might contain data:image/png, compressed JS (that could also
be obtained from Debian packaged JS) or simply some lines that are
unusually long.
It would be great if this could be somehow expressed in lintian to
avoid false positives.
Kind regards
Andreas.
Dear Lintian maintainers, This test is causing hundreds of false positive and should be disabled as soon as possible. This is a huge waste of time for everybody. If you need help with that, please tell me, I have worked on lintian in the past. Cheers,
Dear Lintian maintainers, I cannot offer the same help as ballombe, but I also find it would help to disable these errors. At least, could they be "demoted" to warnings? Thanks in advance, Santiago
Le jeudi 8 février 2024, 18:31:28 UTC Santiago Ruano Rincón a écrit : Are you sure it is not embdeded base64 encoded png or minified javascript* ? If not we could try to know why it choke ? In this particular case, it is the source package that choke. If halibut include the name of the source in the html we could magically remove the source is missing warnings. Another alternative if we could determine the file was compiled by halibut, we could demote to pedantic warning and ask to repack in order to be sure to recompile from source. Thanks
There are far too many different HTML generators out there to handle. You would need to define a standard way to indicate the path to the source in the generated file. But some generator authors might consider this is an inacceptable data leak, so this would only be done if some environment variable is defined. In the short term, I suggest to disable it since there is no policy requirement for the source code to be in a particular path, so it is not an error. At the very least, it should not be generated more than once per package. Cheers,
Le jeudi 8 février 2024, 19:57:22 UTC Bill Allombert a écrit : We have done this for doxyen and sphinx, so maybe not for more for doxygen or sphinx we only detect some string in html file and whitelist.... Generared by something will work Moreover adding missing-source override like could be done be done by adding manualy a symlink debian/missing-sources/ fullname pointing to the righ location. We also magically search know source by using some heurtistic in SourceMissing.pm So the basic framework is here, we only need to add more rules Bastien
This is two out of how many ? For example, my packages use TtH, GAPDoc, hevea, pod2html. I do not think it is sustainable. Cheers,
Yes, I'm absolutely certain. I already gave a full explanation of this in my first message, which for some reason people are ignoring: """ So it issues a diagnostic for every HTML file with a somewhat long line (over 512 characters) unless it has an associated .fragment.js somewhere """ The HTML files it's issuing a diagnostic on here are perfectly innocuous and readable. Here's an example of one of the "offending" lines: In version 0.51 and before, local echo could not be separated from local line editing (where you type a line of text locally, and it is not sent to the server until you press Return, so you have the chance to edit it and correct mistakes <em>before</em> the server sees it). New in version 0.52, local echo and local line editing are separate options, and by default PuTTY will try to determine automatically whether to enable them or not, based on which protocol you have selected and also based on hints from the server. If you have a problem with PuTTY's default choice, you can force each option to be enabled or disabled as you choose. The controls are in the Terminal panel, in the section marked ‘Line discipline options’. I mean, come on. Sure, there are a couple of character entities (which have nothing to do with the diagnostic here anyway), but otherwise you can't tell me with a straight face that that's some kind of obscure compiled format; I would have written it exactly the same way by hand except for the word-wrapping. Or we could fix the ridiculously-oversensitive diagnostic. On the matter of repacking (which I will not do in this case), please see my comment in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019980#15.