#880556 parse upstream metadata files like package.json, setup.py, Cargo.toml or .gemspec files #880556
- Package:
- libconfig-model-dpkg-perl
- Source:
- libconfig-model-dpkg-perl
- Submitter:
- Pirate Praveen
- Date:
- 2020-12-23 16:57:03 UTC
- Severity:
- wishlist
- Tags:
package: libconfig-model-dpkg-perl version: 2.102 severity: wishlist Most ruby gems have the license information in .gemspec file and similarly most nodejs modules have this information in package.json file (likely for similar files for other languages). It would be good to parse it and use that information.
Yes, good idea. I'm thinking also to parse the content of META.yml to retrieve the same kind of information. Do you have examples of one ruby and one nodejs package that could be used as a reference ? Then I just need to find time to do this... All the best
Hi, Please find attached proof of concept scanner for Rust Cargo.toml files. I’m not fluent in Perl, and I wasn’t sure where exactly to put this piece of code, but this is something you probably can polish up a bit. Thanks for considering this.
Hi, Almost exactly a year later I decided it’s be great to write a Cargo.toml parser for scan-copyrights, I fought with Perl for an hour or so and then checked the BTS… Damn it, I already *have* written one a year ago :D Have you had a chance to have a look at it? :)
For your convenience, I’ve rebased it to the current HEAD.
Hi Sorry for the delay. That's a good start. At least, this patch tells me how to retrieve the relevant information from the toml file. However, your patch implies that the information from toml file applies to all files that do not have a copyright header. This is often correct, except when a directory is a component from another author. I think it would be better to treat toml data like the information contained in the main README file (if present). I.e as a hint for the top directory. I'll change your patch to that effect. Do you have an example of a rust package that I could add in my test suite ? All the best Dod
In Rust crates most often upstream authors only put copyright and licensing information into Cargo.toml only and rarely add them to the sources unless they come from elsewhere. The issue with this is that the information about the years is usually missing. I don’t think I’m aware of the distinction between a hint and what I did: I thought that '.*' basically was a hint of a sort :) I guess almost any Rust package in Debian could serve as an example, but let’s say this one is quite typical: https://sources.debian.org/src/rust-num-traits/0.2.14-1/
Done. The last version of libconfig-model-dpkg-perl can parse Config.toml file. Please check if this fits your requirements. All the best Dod
Files: * Copyright: MIT or Apache-2.0 License: The Rust Project Developers I guess it should be the other way around? :)
oh my.... I can't believe I did not see this bug ... ok, I'm going to fix this.
.gemspec files are written in Ruby. I don't really know how to extract the relevant information from this file. I don't think using regexp to parse the gemspec file would be reliable. Do you have other ideas ? All the best