#472199 licensecheck: DEP-5 output is imperfect

Package:
licensecheck
Source:
licensecheck
Submitter:
Jan Hauke Rahm
Date:
2025-04-08 11:39:02 UTC
Severity:
wishlist
Tags:
Blocked By:
Bug Title
828948

  3

licensecheck: file parsing: extract strings from binary files with binwalk or hachoir

wishlist stable testing unstable about 1 year ago

960694

  4

licensecheck: detect and skip binary files by default

wishlist stable testing unstable about 1 year ago

828941

  5

licensecheck: file parsing: extract strings from binary files with binwalk or hachoir

wishlist stable testing unstable about 1 year ago

960695

  4

licensecheck: add usage option, to covermore than current "only look for source code"

wishlist stable testing unstable about 1 year ago

837973

  4

licensecheck: file parsing: extract metadata from fonts

normal stable testing unstable about 1 year ago

526701

  5

licensecheck: options: recursive scan should check all files (--check=".*" by default)

normal stable testing unstable about 3 years ago

545193

  5

licensecheck: options: recursive scan should check all files (--recursive should imply --check=".*")

normal stable testing unstable about 3 years ago

#472199#5
Date:
2008-03-22 16:42:29 UTC
From:
To:
Dear devscripts devel team,

since there is a proposal to change debian/copyright files to a
machine-interpretable format (described at [1]) I'd like to see
licensecheck to autogenerate such a file.
This would of course not prevent the maintainer from checking the
license and copyright information by himself but but it could do the
groundwork for him.

If there is already something planned like that, I'm sorry for this bug.

Thanks for your work!

Hauke

[1] http://wiki.debian.org/Proposals/CopyrightFormat

#472199#10
Date:
2008-04-21 14:07:46 UTC
From:
To:

Dear devscripts developers,

I'm not really a perl hacker but I did a try on adding this feature to
licensecheck myself. It is *NOT* ready and does *NOT* do what it should.
It just adds a possible way to output license information as requested
by the copyright proposal.

Well, I tried to comment on every sensefull line I've added so that you
can clearly understand what I did. Furthermore I tried to not touch
anything you wrote so I don't think that I broke anything.
Although I added an option "createfile" my patch does not create a new
copyright file. It just prints whatever could be put into such a file. I
guess, doing the file handling is not the most difficult part of the
job. :)

I'd like to hear your comments on my patch. If you feel it is not worth
working on it just tell me. I can stop everytime.

Cheers,
Hauke
iE8DBQFIDJ+yGOp6XeD8cQ0RC09JAN9W/yU7VmN9KcSY66qJF+Qk4T0dYHASZxFy
7L31AN48ojn/OH0c2wgs7yncYLqReowZckrRiGjiqiYw
=aNPf
-----END PGP SIGNATURE-----

#472199#15
Date:
2009-04-01 14:30:37 UTC
From:
To:
Hi,
I've come up with a reworked patch [attached] which might help. I've
intentionally removed the grouping of similarly licensed/copyrighted files to
make it clear that it is a skeleton and must be checked.

Also, I've added only the license keywords currently present in DEP5
http://dep.debian.net/deps/dep5/

thanks,
filippo
--
Filippo Giunchedi - http://esaurito.net - 0x6B79D401

It's not that I'm afraid to die, I just don't want to be there
when it happens.
-- Woody Allen

#472199#22
Date:
2010-01-12 21:33:40 UTC
From:
To:
Hi,

there has quite some time gone without any news in this bug report. I'm
trying again with a new patch.
It uses parts of the previously submitted patches, but differs in some aspects:
 * I wrote it first, and then checked the BTS for patches - my bad ;-).
 * I re-added the file grouping according to license+copyright similarity,
   as IMVHO the removal of this feature defeats the whole purpose
   of automatizing this information retrieval.
 * It parses all licenses (known to licensecheck), irrespective of their
   mentioning in DEP5, as otherwise the previous patch might have missed
   some licenses completely. Non-standardized (as seen from DEP5) license
   abbreviations are still prefixed by "other", for clarity.

It works for me with a real world example, without breaking current
functionality. Of course, there is any arbitrary amount of further polishing
possible, but I think, it would be already worth the addition in the current
state, as it at least helps a bit in the creation of a machine-readable
debian/copyright file. Even with this, there is of course enough work left
to do for the maintainer to get her/his debian/copyright file into shape.

Thanks for consideration!

I am of course very open to any kind of comments!

Best Regards,
Jan

#472199#27
Date:
2010-04-05 07:48:53 UTC
From:
To:
While discussing the topic of using cdbs on pkg-multimedia, I found that
there are some interesting bits in cdbs that might have a better home in
the devscripts package. Among others is the licensecheck2dep5 scripts,
which can be found attached. It is driven by this rules in
1/rules/utils.mk, which can be trivially translated to shell or perl

,----[excerpt from 1/rules/utils.mk.in
| debian/stamp-copyright-check:
| 	@set -e; if [ ! -f debian/copyright_hints ]; then \
| 		echo; \
| 		echo '$(if $(DEB_COPYRIGHT_CHECK_STRICT),ERROR,WARNING): copyright-check disabled - touch debian/copyright_hints to enable.'; \
| 		echo; \
| 		$(if $(DEB_COPYRIGHT_CHECK_STRICT),exit 1,:); \
| 	elif [ licensecheck = $(DEB_COPYRIGHT_CHECK_SCRIPT) ] && ! which licensecheck > /dev/null; then \
| 		echo; \
| 		echo '$(if $(DEB_COPYRIGHT_CHECK_STRICT),ERROR,WARNING): copyright-check disabled - licensecheck (from devscripts package) is missing.'; \
| 		echo; \
| 		$(if $(DEB_COPYRIGHT_CHECK_STRICT),exit 1,:); \
| 	elif [ licensecheck = $(DEB_COPYRIGHT_CHECK_SCRIPT) ] && ! licensecheck --help | grep -qv -- --copyright; then \
| 		echo; \
| 		echo '$(if $(DEB_COPYRIGHT_CHECK_STRICT),ERROR,WARNING): copyright-check disabled - licensecheck (from devscripts package) seems older than needed 2.10.7.'; \
| 		echo; \
| 		$(if $(DEB_COPYRIGHT_CHECK_STRICT),exit 1,:); \
| 	else \
| 		echo; \
| 		echo 'Scanning upstream source for new/changed copyright notices...'; \
| 		echo; \
| 		echo "$(DEB_COPYRIGHT_CHECK_INVOKE) | $(_cdbs_scripts_path)/licensecheck2dep5 > debian/copyright_newhints"; \
| 		export LC_ALL=C; \
| 		$(DEB_COPYRIGHT_CHECK_INVOKE) | $(_cdbs_scripts_path)/licensecheck2dep5 > debian/copyright_newhints; \
| 		echo "`grep -c ^Files: debian/copyright_hints` combinations of copyright and licensing found."; \
| 		newstrings=`diff -a -u debian/copyright_hints debian/copyright_newhints | sed '1,2d' | egrep -a '^\+' - | sed 's/^\+//'`; \
| 		if [ -n "$$newstrings" ]; then \
| 			echo "$(if $(DEB_COPYRIGHT_CHECK_STRICT),ERROR,WARNING): The following (and possibly more) new or changed notices discovered:"; \
| 			echo; \
| 			echo "$$newstrings" \
| 				| perl -ne '/^.{0,60}$$/ or s/^(.{0,60})\b.*$$/$$1…/;s/[^[:print:][:space:]…]//g;$$_ ne $$prev and (($$prev) = $$_) and print' \
| 				| sort -m \
| 				| head -n 200; \
| 			echo; \
| 			echo "To fix the situation please do the following:"; \
| 			echo "  1) Fully compare debian/copyright_hints with debian/copyright_newhints"; \
| 			echo "  2) Update debian/copyright as needed"; \
| 			echo "  3) Replace debian/copyright_hints with debian/copyright_newhints"; \
| 			$(if $(DEB_COPYRIGHT_CHECK_STRICT),exit 1,:); \
| 		else \
| 			echo 'No new copyright notices found - assuming no news is good news...'; \
| 		fi; \
| 		rm -f debian/copyright_newhints; \
| 	fi
| 	touch $@
`----

Please consider merging this into the devscripts package.

I also not that an alternate implementation has been posted to this bug,
which I haven't examined yet. This shows that there is (some) common
interest in this functionality.

#472199#32
Date:
2010-04-06 21:43:31 UTC
From:
To:
Hi,

I developed the CDBS licensecheck2dep5 wrapper mentioned by Reinhard,
and would like to both integrate those, merge with the other patches in
this bugreport, and develop licensecheck further.

I have requested membership of the devscripts team, and am in the
process of subscribing the the mailinglist now.

Hope you are fine letting me work on this.  And thanks to Reinhard for
pushing me out of my cave :-P


   - Jonas

#472199#37
Date:
2010-04-07 08:04:05 UTC
From:
To:
Wonderful, thanks!

An extra helping hands with devscripts is always welcome, especially if
you plan to work not only on licensecheck but also to help out with some
other devscripts :-P I'm no admin of the project though, so I cannot
directly give you commit access ...

Cheers.

#472199#40
Date:
2010-08-14 12:12:09 UTC
From:
To:
Hi Jonas,

our DPL is thankful and happy about your work and would welcome your
helping hands with devscripts although he is no project admin and thus
could not grant you commit access ... but he forgot to CC you in his
reply to #472199 (possibly because he assumed you did already subscribe
to the list).


I'm just a devscripts user, but it looks like the standalone perl script
licensecheck2dep5 Jonas wrote for CDBS could be added to devscripts in
its current state if there would be a manpage for it.

There are two ways I would prefer to adding another script to /usr/bin:

 * Placing this script in /usr/share/devscripts and add an option to
   licensecheck that would filter its output through licensecheck2dep5.
 * Moving the script into a sub of licensecheck and adding an
   appropriate option to licensecheck.


Regards
Carsten

#472199#45
Date:
2010-08-14 15:00:54 UTC
From:
To:
Hi Carsten (and Zack),

Indeed I did not receive the email from Zack (I blame my own mailinglist
subscription fumbling for that, however - will try again after sending
off this message).

My intended plan for licensecheck2dep5 (currently part of cdbs) is not
to ship it as-is with devscripts, but rather do as you propose as
section option above: integrate with licensecheck itself, controlled by
a commandline option.


I notice that someone silently(?) added me to the devscripts team
already, so take that as an approval of my proposal to work on this -
even more official than a go from our DPL ;-)

Thanks for pushing me!  Please repeat if I seem too slow progressing :-)


 - Jonas

#472199#54
Date:
2012-01-16 08:12:58 UTC
From:
To:
I was manually generating dep5 copyright file for ruby-sigar and
thought it would be nice to be able to auto-generate it (almost all
files have different copyright terms, though most of them is apache
2.0). I was going to file a wishlist bug, but found this. I would like
to know if it is planned for a new devscripts version.

Thanks
Praveen

#472199#59
Date:
2012-08-28 20:02:53 UTC
From:
To:
Hi Jonas,
I've just discovered /usr/lib/cdbs/licensecheck2dep5, thanks to a reply
from Vincent Cheng [1].

[1] https://lists.debian.org/debian-mentors/2012/08/msg00367.html

As soon as I noticed that licensecheck2dep5 is part of the
cdbs package, I wondered why it was not part of devscripts (which looks
like the most obvious package where this tool should be shipped).
Even better, I think it should be integrated as a possible output
format for licensecheck itself...

Then I found out that I am not the only one with this opinion.
In bug #472199 (I am Cc:ing its e-mail address), a number of different
people asked for this feature to be implemented in licensecheck and you
stepped in, expressing the intention to integrate this capability into
licensecheck.
This sounds really great! Thanks a lot!

Unfortunately there seems to be no visible progress since August 2010...
Is there any well-hidden progress, by chance?!?

I am afraid that my Perl knowledge is just a smattering and it is
rusty, too... As a consequence, I will probably not be able to help
much with the implementation.
But I may help with some testing and feedback, if you are interested.

Please let me know.
Thanks for your time!

#472199#64
Date:
2012-08-29 08:24:19 UTC
From:
To:
Hi Francesco,

Unfortunately only progress is in my head. :-/

I have been granted write access to devscript and have followed its
mailinglist since back then, but not yet taken time to actually do this
work.

I am still interested in this, and expect to make time for it in the
upcoming months.  I have some ideas on how to support both very strict
and (as seems most other than myself is interested in) more relaxed
scanning.

You very nudging me like here is helpful for me to bump up the priority.
So thank you for that!

I shall make noise on the bugreport when reaching something concrete to
test, so please subscribe to that if you wanna help out.

I also kinda expect others that might wanna dive in to tackle this to
make noise at the bugreport.  That way hopefully we avoid wasting time
on multiple concurrent efforts (in case I continue to not come of with
something).


 - Jonas

#472199#69
Date:
2012-08-29 17:12:45 UTC
From:
To:
On Wed, 29 Aug 2012 10:24:19 +0200 Jonas Smedegaard wrote:

[...]
[...]

That's definitely well-hidden!   ;-)
I hope you have a good backup strategy for your head...   ;-D

Good, so you are ready to start...

That's great news, indeed!

If I understand you correctly, I think the most important mode is the
"strict" one: each file has to be scanned and classified as accurately
as possible.

You're welcome.
Or, at least, I hope I managed to do so: I am not used to subscribing
to bug reports, this is the first time I try...

Thanks for your time!

Looking forward to having something to test.
Bye.

#472199#74
Date:
2012-10-24 19:37:43 UTC
From:
To:
[...]

Hi again, Jonas!
Any news?

I've just discovered bug #597861, which seems to be an interesting read
for someone (like you) willing to integrate licensecheck2dep5 into
licensecheck. Especially the part [1] where a fork of licensecheck2dep5
is talked about...

[1] http://bugs.debian.org/597861#31

Please consider joining forces with other people willing to address
this same issue, if possible.

Looking forward to receiving some (good) news from you.
Bye.

#472199#79
Date:
2015-05-14 15:22:35 UTC
From:
To:
This functionality is now provided by 'cme update dpkg-copyright'
with cme and libconfig-model-dpkg-perl.

See https://ddumont.wordpress.com/2015/04/05/improving-creation-of-debian-copyright-file/

Note that licensecheck provides one entry per scanned file.

On the other 'cme' coaslesces similar entries in the same DEP-5 Files entry. This reduces a lot
the size of the generated file.

All the best

#472199#88
Date:
2015-11-19 20:31:39 UTC
From:
To:
Thoughts below

First of all I think backwards compatibility should be maintained so any
change should be an option.

I was only asking about the translation of the license tags?
But then if you translate the license tags it would cost nothing to go
to a crude bloated DEP-5 format. But since I think we agree that we do
not wish to attempt to make licensecheck to produce "good" DEP-5 it
might be best to avoid moving it in that direction.
yes this would be prompt people to raise wishlist bugs for
consolidation. slippery slope.

Yes. Still I can't make my mind up.

done!

I am slightly puzzled why
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519080 is a wishlist
bug and not a full blown bug.

#472199#97
Date:
2017-01-25 06:59:20 UTC
From:
To:
Dear Customer,



We can not deliver your parcel arrived at January 22.



Please review delivery label in attachment!



Thanks,

Mark Werner,

USPS Chief Station Manager.

#472199#102
Date:
2017-04-08 21:01:16 UTC
From:
To:
Dear Customer,

Please check your package delivery details attached!

FedEx
-----BEGIN PGP PUBLIC KEY BLOCK----- ablmSd7uG7FkSLfUDqE2h66/6cIqtf/4OLr6WTOm1rT4cnz3Tt9Lw1uSSVaDEqxXhOfoMvY7v4gO B0w7FMczATn14Px/+Q1Omb0B3k2aNgRESq1qhPLTHKJRF/sSud1XhZm8kQTXKv5dY8EFEhLc57dW qWnWLQpdVOHHrYYq1BDo2OlVzu6NkMEDkdzbtyek1Ss0ovGBgnisW3k7W15XUupVqj3KZxew8hYJ danp4DeQwzhlW2/FvkdUD8S0cShQh3DkTLz5lNjan1gqhPfLOFm5cqZ+eGQwh76Lyy9MWUikwJUM 5MN2LleE68OFnJVKShZpo1Rt13vMsYiifL9zy8XKvu5+0hjeCxUPYm66KjLsww7/SPFDMLy7oTVl DmUO5OX+vVKWwA0oBqcjM14wCYPc3328hlT5Vw0J4uF38kkWjL9WCD1C+sY1mKCg0ZiSz3/GOSx1 qNqbg3InU8GbJNrC9EDAPFHQvr1as6QmHaLUAwWban69aHuFEZLRXr0rYMGqtyE8rzpA1bKkZmMY FcZknd9n7hAFaUlrAnQ75XUMPZ3xegTI6wxPizzabzQoGb094+bLyW1X0tLn/SOAEdw1mBifSdFY QuJoHxDWCkXGirjxlxQSattTbDFC9n+1bcudZVsn4dsGIVxG2NCrqrkebsZoSUIAjwTF7L0sTGNr gwnIJ5YWkrOESsz363TmK8aay/MgXNll7kCvQcr5PRmqVRk3yPyANCcakrVy40fbS1oA85+sLWSy hYPepUqAADU9R8AHAAVulHSOTFkTo/zU2rpucc55uc/UmT9oX5vfwYwFgcDxUOfjD5mYCURCARSw pEeTCcjHqI99B1QuJ4VgQ8GBRf4+HgzZp8IwwXhayWIQ3k2P5+to8C7pKPBeu3fv2AECoTgwWEfy mU+IfItPNRo6k4euZEtdxIleUP/ByEL0jCD/UXJbQiLvISQ5Cey+fh+UeuUN/obAzAzfIhM8C+e/ B+lVnrV2LFjc8qyrY43PgBjDYneNF3rOjbrsKzeCIzzvLl+tgMwKWkJ9jHPho0Fq1hWZgsJnfUwR 2kiQtblZYnDSn2xy2q2JYgaRMQ4hC7oSx/OJ2uvTPJHCASdLkvuwgkvXAM+XRiSHRHpnNXAiWgcp VdLpdMr9LMwkLdpG8yy9YWk3cmj1XAmLr/3ce+ul9JYvvuJiXVJOlFtHKD1ZcPzKRaoeHwajfvVw bcybjh92ZISAhmqUKcFJfcUl/6DsfYhTRC8fBbLawj2xEfWGCf6hnRiR7r4XPFKXg9GvnWVjr5gc Qr1Gl1mQHBMQG/TrdBC7cgCb5aSNMRbmrx6ey9LInw==
-----END PGP PUBLIC KEY BLOCK-----
#472199#107
Date:
2018-05-22 00:17:49 UTC
From:
To:
Dear Maintainer,

currently the following methods for creating a DEP-5 compliant copyright
file:

 debmake -c
 cme check dpkg-copyright
 licensecheck -r --merge-licenses --deb-machine .

Thus this wishlist bug can probably be closed.

Cheers,
Lars

#472199#112
Date:
2020-05-15 12:54:52 UTC
From:
To:
Quoting Lars Kruse (2018-05-22 02:17:49)

Let's instead repurpose this bugreport to track making _ideal_ DEP5
output with licensecheck.

What we have now is this:

  licensecheck -r --merge-licenses --deb-machine --lines 0 .

The plan is to integrate these scripts:

  cdbs:/usr/lib/cdbs/license-miner
  cdbs:/usr/lib/cdbs/licensecheck2dep5

When that's done, we will likely have different opinions on what is
ideal - how aggressively (and thus slow) licensecheck should scan, and
how the DEP-5 output should be laid out (e.g. if same-licensed files by
different authors should be grouped or not).  Hopefully we can identify
some broad patterns that groups of us can agree on being ideal, and
licensecheck can offer such "profiles" under a --usage option.


 - Jonas

#472199#127
Date:
2023-05-12 14:36:59 UTC
From:
To:
Good morning,

 Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance  for invoice IN0099203.

 Please provide payment as soon as possible.




 Best Regards,
 Shawneen Chisholm
 Accounts Receivable Coordinator

 UNITED RENTALS, INC.
Branch L02 BONNYVILLE
4920 56TH AVE
BONNYVILLE AB T9N 2N8 CA
780-826-7610


 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited

#472199#132
Date:
2023-05-12 14:36:46 UTC
From:
To:
Good morning,

 Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance  for invoice IN0099203.

 Please provide payment as soon as possible.




 Best Regards,
 Shawneen Chisholm
 Accounts Receivable Coordinator

 UNITED RENTALS, INC.
Branch L02 BONNYVILLE
4920 56TH AVE
BONNYVILLE AB T9N 2N8 CA
780-826-7610


 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited