#960681 licensecheck: fails to detect license at end (option --tail is broken)

#960681#5
Date:
2009-02-02 20:50:52 UTC
From:
To:
the title says it all ;)
#960681#10
Date:
2009-02-02 21:45:48 UTC
From:
To:
severity 513967 wishlist
thanks

Well, not quite. :-)

For example, it doesn't tell us where we can find a copy of the files
(neither apt-cache search nor rmadison yielded any results); a sample of
the text would have been ideal. After a little searching, I'm assuming
that the software in question is
http://ocropus.googlecode.com/files/ocropus-0.2.tar.gz, which I'm now
downloading.

Since licensecheck doesn't claim to be able to detect every possible
license, I'm downgrading this to wishlist.

Regards,

Adam

#960681#17
Date:
2009-02-02 22:24:50 UTC
From:
To:
[...]

After looking in to this some more, there turned out to be two separate
issues. Firstly, a slightly embarrassing error in a regular expression
meant that the default list of files to check didn't include .h files;
I've fixed that in SVN.

The second is that although ext/lua/lua.h does contain an MIT/X11 style
license statement, it begins at line 362 of the file. By default,
licensecheck only checks the first 60 lines of each file; this is
overrideable using --lines on the command line, or by setting
LICENSECHECK_PARSELINES in the configuration files.

One of my co-maintainers may disagree, but I'm not convinced that
licensecheck should, by default, be reading all the way through each
file in order to try and find a license statement.

Regards,

Adam

#960681#30
Date:
2018-04-12 07:54:11 UTC
From:
To:
How about licensecheck detects pattern "See Copyright Notice at the
end of this file" and goes LICENSECHECK_PARSELINES from the end to
look for a license.

#960681#35
Date:
2018-04-12 14:56:59 UTC
From:
To:
Excerpts from Vasyl Vavrychuk's message of april 12, 2018 9:54 am:

Heh - that's a clever idea.

I worry it might be too clever, though: It would complicate processing
by needing to backtrack and rescan based on output of scanning.  Not
very complicated but I would prefer to keep processing logic simple if
possible.

Also, is it really needed? codesearch.debian.org has only ~100 hits for
that pattern, and I believe licensecheck already by default checks at
the end too.

Could you perhaps point at some real live examples where you believe
this kind of mechanism would be useful?


Thanks,

 - Jonas

#960681#40
Date:
2018-04-12 21:01:54 UTC
From:
To:
licensecheck seems not checking at the end by default because for
https://raw.githubusercontent.com/lua/lua/master/lua.h I get

licensecheck lua.h
lua.h: UNKNOWN

Maybe described feature is needed for reporter of this bug. Also if we
want to have fully automated way to licensecheck of all Debian
packages than we need it.

#960681#45
Date:
2018-04-12 21:26:53 UTC
From:
To:
Excerpts from Vasyl Vavrychuk's message of april 12, 2018 11:01 pm:

Oh.  Looks like the --tail option is broken. :-/

I normally use --lines 0 (and testsuite is incomplete), so didn't notice
this myself.

It is (slower but) more reliable to use "--lines 0".


 - Jonas

#960681#58
Date:
2021-11-19 17:06:00 UTC
From:
To:
copyright assignment from a LICENSE file:

$ licensecheck --copyright LICENSE
LICENSE: *No copyright* Apache License 2.0

But it gets the copyright with --lines option:

$ licensecheck --copyright LICENSE -l 0
LICENSE: Apache License 2.0
  [Copyright: patent, trademark, and / license to reproduce, prepare
Derivative Works of, / License. Subject to the terms and conditions of / 2017
Sourced Technologies S.L.]

Would it be possible to use --lines 0 with files containing a lot of capital
letters (like LICENSE.* README*) and so on... These files tend to be relatively
short compared to other source files.

HTH

#960681#65
Date:
2021-11-20 10:15:59 UTC
From:
To:
Hi Dominique,

Quoting Dominique Dumont (2021-11-19 18:06:00)

Sounds like a bad idea to me to dynamically adapt scanning "depth" based
on filename naming style.

You write in bug#1000179 that --lines=0

I would appreciate some numbers about actual slowdown.

On a related note, it is bad style to fork general public license files
- I have reported this incident upstream:
https://github.com/go-git/go-billy/issues/20


 - Jonas

#960681#70
Date:
2021-11-21 17:32:31 UTC
From:
To:
Fair enough.

Here are some measurements where the cell content is the "real" time given by time command.

This table is to be viewed with a monospace font.

licensecheck command is:
┌────
│ licensecheck --lines 0 --encoding utf8 --copyright --machine --shortname-scheme=debian,spdx --recursive .
└────

This is also the command used internally by cme.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 package  plain      cme with   licensecheck  licensecheck
          cme        lines=0                  with lines=0
───────────────────────────────────────────────────────────
 pan      0m2.694s   0m6.553s   0m4.571s      0m9.303s
 moarvm   0m3.768s   0m41.772s  0m3.900s      0m40.274s
 nqp      0m3.057s   0m3.635s   0m3.682s      0m9.955s
 rakudo   0m3.448s   0m9.784s   0m11.358s     0m17.517s
 systemd  4m30.489s  4m59.546s  4m31.644s     5m2.661s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


The result is surprising as using --lines 0 can be lead to similar time or 10 times longer...

cme can take less time because more files are skipped.

All the best

#960681#75
Date:
2021-11-21 21:02:12 UTC
From:
To:
Quoting Dominique Dumont (2021-11-21 18:32:31)

Thanks.

Takes longer indeed.

Only systemd is relatively large, though, with only little slowdown.

Only moarvm has extreme slowdown, which is vastly reduced by skipping a
few large test files codifying random numbers, by adding this option:

#960681#80
Date:
2021-12-18 09:51:26 UTC
From:
To:
Quoting Dominique Dumont (2021-11-19 18:06:00)
bottom when *nothing* is detected at top.

What you present here, Dominique, is not a bug but a feature: By default
(and whenever option --lines is non-zero) licensecheck sloppily stops
scanning at first chunk of license or copyright found, hoping all
information is kept close together.

This was not clearly documented, but has always been how the code works
(and documentation never promised something else either).

Next release will have improved documentation, saying this:

If you don't want sloppy scanning, then you will need to accept that it
takes longer time, using "--lines=0".

If you think licensecheck should be slightly less sloppy by stopping
only when both *some* license and *some* copyright holder is detected,
then please file a separate wishlist bugreport about that.


 - Jonas

#960681#87
Date:
2023-05-12 14:36:53 UTC
From:
To:
Good morning,

 Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance  for invoice IN0099203.

 Please provide payment as soon as possible.




 Best Regards,
 Shawneen Chisholm
 Accounts Receivable Coordinator

 UNITED RENTALS, INC.
Branch L02 BONNYVILLE
4920 56TH AVE
BONNYVILLE AB T9N 2N8 CA
780-826-7610


 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited

#960681#92
Date:
2023-05-12 14:36:25 UTC
From:
To:
Good morning,

 Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance  for invoice IN0099203.

 Please provide payment as soon as possible.




 Best Regards,
 Shawneen Chisholm
 Accounts Receivable Coordinator

 UNITED RENTALS, INC.
Branch L02 BONNYVILLE
4920 56TH AVE
BONNYVILLE AB T9N 2N8 CA
780-826-7610


 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited