- Package:
- licensecheck
- Source:
- licensecheck
- Submitter:
- Jeffrey Ratcliffe
- Date:
- 2025-04-08 10:03:03 UTC
- Severity:
- important
the title says it all ;)
severity 513967 wishlist thanks Well, not quite. :-) For example, it doesn't tell us where we can find a copy of the files (neither apt-cache search nor rmadison yielded any results); a sample of the text would have been ideal. After a little searching, I'm assuming that the software in question is http://ocropus.googlecode.com/files/ocropus-0.2.tar.gz, which I'm now downloading. Since licensecheck doesn't claim to be able to detect every possible license, I'm downgrading this to wishlist. Regards, Adam
[...] After looking in to this some more, there turned out to be two separate issues. Firstly, a slightly embarrassing error in a regular expression meant that the default list of files to check didn't include .h files; I've fixed that in SVN. The second is that although ext/lua/lua.h does contain an MIT/X11 style license statement, it begins at line 362 of the file. By default, licensecheck only checks the first 60 lines of each file; this is overrideable using --lines on the command line, or by setting LICENSECHECK_PARSELINES in the configuration files. One of my co-maintainers may disagree, but I'm not convinced that licensecheck should, by default, be reading all the way through each file in order to try and find a license statement. Regards, Adam
How about licensecheck detects pattern "See Copyright Notice at the end of this file" and goes LICENSECHECK_PARSELINES from the end to look for a license.
Excerpts from Vasyl Vavrychuk's message of april 12, 2018 9:54 am: Heh - that's a clever idea. I worry it might be too clever, though: It would complicate processing by needing to backtrack and rescan based on output of scanning. Not very complicated but I would prefer to keep processing logic simple if possible. Also, is it really needed? codesearch.debian.org has only ~100 hits for that pattern, and I believe licensecheck already by default checks at the end too. Could you perhaps point at some real live examples where you believe this kind of mechanism would be useful? Thanks, - Jonas
licensecheck seems not checking at the end by default because for https://raw.githubusercontent.com/lua/lua/master/lua.h I get licensecheck lua.h lua.h: UNKNOWN Maybe described feature is needed for reporter of this bug. Also if we want to have fully automated way to licensecheck of all Debian packages than we need it.
Excerpts from Vasyl Vavrychuk's message of april 12, 2018 11:01 pm: Oh. Looks like the --tail option is broken. :-/ I normally use --lines 0 (and testsuite is incomplete), so didn't notice this myself. It is (slower but) more reliable to use "--lines 0". - Jonas
copyright assignment from a LICENSE file: $ licensecheck --copyright LICENSE LICENSE: *No copyright* Apache License 2.0 But it gets the copyright with --lines option: $ licensecheck --copyright LICENSE -l 0 LICENSE: Apache License 2.0 [Copyright: patent, trademark, and / license to reproduce, prepare Derivative Works of, / License. Subject to the terms and conditions of / 2017 Sourced Technologies S.L.] Would it be possible to use --lines 0 with files containing a lot of capital letters (like LICENSE.* README*) and so on... These files tend to be relatively short compared to other source files. HTH
Hi Dominique, Quoting Dominique Dumont (2021-11-19 18:06:00) Sounds like a bad idea to me to dynamically adapt scanning "depth" based on filename naming style. You write in bug#1000179 that --lines=0 I would appreciate some numbers about actual slowdown. On a related note, it is bad style to fork general public license files - I have reported this incident upstream: https://github.com/go-git/go-billy/issues/20 - Jonas
Fair enough.
Here are some measurements where the cell content is the "real" time given by time command.
This table is to be viewed with a monospace font.
licensecheck command is:
┌────
│ licensecheck --lines 0 --encoding utf8 --copyright --machine --shortname-scheme=debian,spdx --recursive .
└────
This is also the command used internally by cme.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
package plain cme with licensecheck licensecheck
cme lines=0 with lines=0
───────────────────────────────────────────────────────────
pan 0m2.694s 0m6.553s 0m4.571s 0m9.303s
moarvm 0m3.768s 0m41.772s 0m3.900s 0m40.274s
nqp 0m3.057s 0m3.635s 0m3.682s 0m9.955s
rakudo 0m3.448s 0m9.784s 0m11.358s 0m17.517s
systemd 4m30.489s 4m59.546s 4m31.644s 5m2.661s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The result is surprising as using --lines 0 can be lead to similar time or 10 times longer...
cme can take less time because more files are skipped.
All the best
Quoting Dominique Dumont (2021-11-21 18:32:31) Thanks. Takes longer indeed. Only systemd is relatively large, though, with only little slowdown. Only moarvm has extreme slowdown, which is vastly reduced by skipping a few large test files codifying random numbers, by adding this option:
Quoting Dominique Dumont (2021-11-19 18:06:00) bottom when *nothing* is detected at top. What you present here, Dominique, is not a bug but a feature: By default (and whenever option --lines is non-zero) licensecheck sloppily stops scanning at first chunk of license or copyright found, hoping all information is kept close together. This was not clearly documented, but has always been how the code works (and documentation never promised something else either). Next release will have improved documentation, saying this: If you don't want sloppy scanning, then you will need to accept that it takes longer time, using "--lines=0". If you think licensecheck should be slightly less sloppy by stopping only when both *some* license and *some* copyright holder is detected, then please file a separate wishlist bugreport about that. - Jonas
Good morning, Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance for invoice IN0099203. Please provide payment as soon as possible. Best Regards, Shawneen Chisholm Accounts Receivable Coordinator UNITED RENTALS, INC. Branch L02 BONNYVILLE 4920 56TH AVE BONNYVILLE AB T9N 2N8 CA 780-826-7610 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited
Good morning, Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance for invoice IN0099203. Please provide payment as soon as possible. Best Regards, Shawneen Chisholm Accounts Receivable Coordinator UNITED RENTALS, INC. Branch L02 BONNYVILLE 4920 56TH AVE BONNYVILLE AB T9N 2N8 CA 780-826-7610 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited