- Package:
- libstring-copyright-perl
- Source:
- libstring-copyright-perl
- Submitter:
- Ximin Luo
- Date:
- 2023-05-14 15:36:04 UTC
- Severity:
- normal
- Tags:
Dear Maintainer, For https://sources.debian.net/src/sagemath/7.6-2/sage/src/sage/misc/edit_module.py/ $ licensecheck --copyright src/sage/misc/edit_module.py src/sage/misc/edit_module.py: GPL [Copyright: 2007 Nils Bruin <nbruin@sfu.ca> and] This is wrong, but I can work around it with the following sed script: $ cat src/sage/misc/edit_module.py | tr '\n' '\t' | sed -e 's/\(,\|\band\)\s*\t#\?\s*/\1 /g' | tr '\t' '\n' > fixed.py $ licensecheck --copyright fixed.py fixed.py: GPL [Copyright: 2007 Nils Bruin <nbruin@sfu.ca> and William Stein <wstein@math.ucsd.edu>] It would be good if this logic were incorporated into licensecheck itself. I'd help, but my perl is really bad. (Also perhaps the # in the regex should be a (?:#|//|/*) or something like that) X
Looks like I can do this by editing /usr/share/perl5/String/Copyright.pm as follows: # stringify objects $copyright = "$copyright"; + $copyright =~ s/(,|\band)\s*\n(?:#|\/\/|\/\*)?\s*/$1 /g; Please test and apply if it's good! X Ximin Luo:
Ximin Luo: This breaks some of my test cases; attached is an updated patch. It gives good results for Sage: $ licensecheck -l200 --copyright src/sage/plot/arrow.py src/sage/combinat/words/paths.py src/sage/sets/finite_set_maps.py src/sage/modular/modform/all.py src/sage/plot/arrow.py: GPL [Copyright: 2006 Alex Clemesha <clemesha@gmail.com>, William Stein <wstein@gmail.com>, 2008 Mike Hansen <mhansen@gmail.com>, 2009 Emily Kirkman] src/sage/combinat/words/paths.py: GPL (v2 or later) [Copyright: 2009 Sebastien Labbe <slabqc@gmail.com>, / 2008 Arnaud bergeron <abergeron@gmail.coms>,] src/sage/sets/finite_set_maps.py: GPL [Copyright: 2010 Florent Hivert <Florent.Hivert@univ-rouen.fr>,] src/sage/modular/modform/all.py: GPL [Copyright: 2004-2006 William Stein <wstein@gmail.com>] It's a little complicated - it uses replacement expressions. If you can think of a better way of doing it, please let me know! X
Hi Ximin, Quoting Ximin Luo (2017-07-05 17:45:17) Unfortunately it is not as simple as throwing a regex at it: One of my reasons for taking over and working on licensecheck was a remark once on d-devel@ that it was far too slow to be usable for Chromium, and I wanted to (silently so as to not make too much of a fool of myself) take the challenge of optimizing it. Unlikely in its days living in devscripts, licensecheck routines to match copyright holders have been separated into new library String::Copyright (libstring-copyright-perl in Debian), and the code has been refactored to use a single large RE2-compatible regex to match each copyright statement, in the hope of some day switching to use the RE2 engine and become faster... My first brief look at this has revealed a few bugs: In next release of licensecheck the leading # is stripped _before_ handing over to String::Copyright code (as was intended for years). Have a look (if interested) at /usr/share/perl5/String/Copyright.pm and in particular the (huge when expanded) $signs_and_more_re at line 138. Replacing $blank_re with $blank_or_break_re in $owners_re (line 136) succeeds in detecting the second copyright holder, but then also bogusly includes the license statement as a copyright holder. That is the most elegant signature I have seen. Ever! It beats my primary school teacher who used "kh" to mean both her initials and an abbreviation of the danish equivalent of "kind regards". - Jonas
Quoting Ximin Luo (2017-07-05 20:07:00) Thanks! I thought you wrote you were not into perl ;-) I will take a closer look and get back to you on this. - Jonas
Quoting Ximin Luo (2017-07-05 20:07:00) The patch relaxes the $dash_re regex to match multiple dashes. Can you provide me an example of where that is useful? - Jonas
Jonas Smedegaard: Thanks for the tips! I'm not sure if you got my other follow-ups to the bug report - I did in fact find String::Copyright, but I didn't know about the history nor plans for it, so thanks for filling me in on that. At any rate, here is an updated version of my patch, along with some test cases for Sage's copyright notices. I did try to think of a way to achieve the same logic *inside* the massive $re regexes. However I don't think this is possible, at least with my current approach - which tries to be conservative in order to adapt to humans being annoyingly inconsistent. What it does is, it joins subsequent lines only when the indent is greater than the main line (with the "Copyright" part). This means I have to call length() in an expression-replacement, which I don't think is possible to do inside a normal regex... As for speed: # with the patch $ time debian/rules debian/licensecheck.copyright licensecheck -l250 -i ^sage/build/ -r --deb-machine --merge-licenses sage > "debian/licensecheck.copyright" real 0m35.318s user 0m35.204s sys 0m0.056s # without the patch $ time debian/rules debian/licensecheck.copyright licensecheck -l250 -i ^sage/build/ -r --deb-machine --merge-licenses sage > "debian/licensecheck.copyright" real 0m31.168s user 0m31.040s sys 0m0.076s X
Jonas Smedegaard: Yes, if you look at copyright-test.sh that I just sent in that other email, and run it in the sage/ directory of the sagemath package, you'll see that this $dash_re is useful for src/sage/modular/modform/all.py: # Copyright (C) 2004--2006 William Stein <wstein@gmail.com> X
Quoting Ximin Luo (2017-07-05 21:17:00) I did see your other emails, but only after I posted my initial reply (I am slow at writing emails). I have now published App::Licensecheck 3.0.30 to CPAN, and if it survives CPANtesters inspections then I will release that to Debian. That release does not fix the topic of this bugreport, but it does fix a bug in that String::Copyright expects plain text as input but was passed text with comment markers by App::Licensecheck. Which seems is what complicates your patch, so I will ask you to please try again with that newer App::Licensecheck to see how much you can reduce the patch. If you want to try with the 3.0.30 release before it gets packaged for Debian, you can do it like this: sudo apt install cpanminus cpanm App::Licensecheck export PATH="$HOME/perl5/bin:$PATH" export PERL5LIB="$HOME/perl5/lib/perl5" ...and when done exploring (assuming you want _any_ local CPAN gone): rm -rf ~/perl5 ~/.cpanm NB! It is easiest for me if you file a new bugreport for each separate issue - e.g. the one of not matching double-dashed year ranges. Fine if you work on a patch that addresses multiple issues, but still safer to report the issues separately, so that I don't accidentally miss fixing some of it, e.g. if I choose to resolve things differently than with your tested patch.. Thanks :-) - Jonas
Good morning, Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance for invoice IN0099203. Please provide payment as soon as possible. Best Regards, Shawneen Chisholm Accounts Receivable Coordinator UNITED RENTALS, INC. Branch L02 BONNYVILLE 4920 56TH AVE BONNYVILLE AB T9N 2N8 CA 780-826-7610 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited
Good morning, Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance for invoice IN0099203. Please provide payment as soon as possible. Best Regards, Shawneen Chisholm Accounts Receivable Coordinator UNITED RENTALS, INC. Branch L02 BONNYVILLE 4920 56TH AVE BONNYVILLE AB T9N 2N8 CA 780-826-7610 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited