I’m interested in getting some large-scale data concerning RPM packages, and I’d like to know if it already exists somewhere, to avoid reinventing the wheel. Something similar to the now-defunct debtags for Debian.
In particular, I’d like to know, for a given package, which programming languages it is written in.
I currently believe no such information exists, and I’d have to develop my own tools to do so: get a list of existing packages, then download its sources, then count lines of code and classify them. Is there some example of existing tools/project which already do it?
SLOC is not a particularly great metric, but I assume you know this.
If you’re familiar with rust you can utilise tokei (docs are here for reference) and if you merely want something handy (and pretty fast) there’s cloc in the repos.
You could for a package “A”, get the “A.src” rpm, extract the spec file from the src package and find all the BuildRequires lines in the file and interpret those strings for which is the compiler and then get the language from that. For example I got “alpine.src” and the BuildRequires were: