#880556 parse upstream metadata files like package.json, setup.py, Cargo.toml or .gemspec files

#880556#5
Date:
2017-11-02 08:58:12 UTC
From:
To:
package: libconfig-model-dpkg-perl
version: 2.102
severity: wishlist

Most ruby gems have the license information in .gemspec file and
similarly most nodejs modules have this information in package.json file
(likely for similar files for other languages). It would be good to
parse it and use that information.

#880556#10
Date:
2017-11-05 10:49:29 UTC
From:
To:
Yes, good idea. I'm thinking also to parse the content of META.yml to retrieve
the same kind of information.

Do you have examples of one ruby and one nodejs package that could be used as
a reference ?

Then I just need to find time to do this...

All the best

#880556#19
Date:
2019-12-13 15:11:23 UTC
From:
To:
Hi,

Please find attached proof of concept scanner for Rust Cargo.toml files.
I’m not fluent in Perl, and I wasn’t sure where exactly to put this
piece of code, but this is something you probably can polish up a bit.

Thanks for considering this.

#880556#24
Date:
2020-12-12 18:32:54 UTC
From:
To:
Hi,

Almost exactly a year later I decided it’s be great to write a
Cargo.toml parser for scan-copyrights, I fought with Perl for an hour or
so and then checked the BTS… Damn it, I already *have* written one a
year ago :D

Have you had a chance to have a look at it? :)

#880556#29
Date:
2020-12-12 18:38:14 UTC
From:
To:
For your convenience, I’ve rebased it to the current HEAD.
#880556#34
Date:
2020-12-13 14:43:14 UTC
From:
To:
Hi

Sorry for the delay.

That's a good start. At least, this patch tells me how to retrieve the
relevant information from the toml file.

However, your patch implies that the information from toml file applies to all
files that do not have a copyright header. This is often correct, except when a
directory is a component from another author.

I think it would be better to treat toml data like the information contained
in the main README file (if present). I.e as a hint for the top directory.

I'll change your patch to that effect.

Do you have an example of a rust package that I could add in my test suite ?

All the best

Dod

#880556#39
Date:
2020-12-15 12:37:22 UTC
From:
To:
In Rust crates most often upstream authors only put copyright and
licensing information into Cargo.toml only and rarely add them to the
sources unless they come from elsewhere. The issue with this is that the
information about the years is usually missing.

I don’t think I’m aware of the distinction between a hint and what I
did: I thought that '.*' basically was a hint of a sort :)

I guess almost any Rust package in Debian could serve as an example, but
let’s say this one is quite typical:

https://sources.debian.org/src/rust-num-traits/0.2.14-1/

#880556#44
Date:
2020-12-20 18:12:45 UTC
From:
To:
Done. The last version of libconfig-model-dpkg-perl can parse Config.toml file.

Please check if this fits your requirements.

All the best

Dod

#880556#49
Date:
2020-12-21 11:05:20 UTC
From:
To:
Files: *
Copyright: MIT or Apache-2.0
License: The Rust Project Developers

I guess it should be the other way around? :)

#880556#54
Date:
2020-12-21 13:39:52 UTC
From:
To:
oh my.... I can't believe I did not see this bug ...

ok, I'm going to fix this.

#880556#59
Date:
2020-12-23 16:55:32 UTC
From:
To:
.gemspec files are written in Ruby. I don't really know how to extract the
relevant information from this file. I don't think using regexp to parse the
gemspec file would be reliable.

Do you have other ideas ?

All the best