This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.
Summary
License texts for a subset of commonly used, approved licenses will be shipped in common-licenses. Packages can opt to reference this rather than shipping their own license texts.
Email: michel AT michel-slm DOT name, ngompa13 AT gmail DOT com
Detailed Description
A few common licenses are used by a majority of Fedora packages (e.g. GPL/LGPL, BSD, Apache). Many of these licenses have standardized text (MIT notably does not, since the license file contains a copyright notice specific to the distributed software), and our guidelines as of now requires [Making sure you're not a bot! shipping the license text] if it is provided by upstream.
This causes a lot of duplication - as a back of envelope calculation:
[https://www.gnu.org/licenses/gpl-3.0.txt GPL-3.0 text] is 35K
Assuming 25K source packages, and each source package only has one binary package that carries the license (and other subpackages inherit the license file via dependency) and 20K average license size, we are wasting up to 500 MB on disk (the actual figure will be way less given most people wonât have all packages installed).
We propose pre-shipping common license texts in an RPM, common-licenses. This will have the texts in this layout â as an example, for GPL-3.0:
license text in /usr/share/common-licenses/GPL-3.0/LICENSE
valid identifiers in /usr/share/common-licenses/GPL-3.0/identifiers.json - in this case, ["GPL-3.0-only", "GPL-3.0-or-later"]
a package that wants to use this license text can do so with Requires: common-license(GPL-3.0-or-later); the package comes with a dependency generator that will index all identifiers.json files shipped
Feedback
Benefit to Fedora
Once this gets adopted more broadly we will start seeing a non-trivial amount of disk space saving.
This will also benefit packagers - they will have a set of standard license texts on disk, rather than having to look them up in Fedora documentation, or SPDX, etc.
In cases where the license requires shipping a copy of the text, but upstream does not do this, while this should still be addressed upstream, in the meantime we can unblock packagers by having the source RPM BuildRequires: common-license(the-license-identifier) for each missing licenses.
Scope
Proposal owners:
** package common-licenses
** submit packaging guidelines changes to allow referencing common licenses in specific situations
** update rpmlint to accept requiring a common license in lieu of shipping license texts
Other developers:
** FPC - review packaging guidelines changes
** packagers - dogfood the new process
Policies and guidelines: N/A (not needed for this Change)
Trademark approval: N/A (not needed for this Change)
Alignment with the Fedora Strategy:
Upgrade/compatibility impact
Early Testing (Optional)
Do you require âQA Blueprintâ support? Y/N
How To Test
User Experience
Dependencies
Contingency Plan
Contingency mechanism: (What to do? Who will do it?)
The RPM can still be shipped, as it just ships license files. Until we get guidelines updated to specify how packagers can make use of them, packages simply have to ship license texts as they do now and should not reference the license texts shipped in common-licenses yet
If you are in favor but have reservations, or are opposed but something could change your mind, please explain in a reply.
We want everyone to be heard, but many posts repeating the same thing actually makes that harder. If you have something new to say, please say it. If, instead, you find someone has already covered what youâd like to express, please simply give that post a instead of reiterating. You can even do this by email, by replying with the heart emoji or just â+1â. This will make long topics easier to follow.
Please note that this is an advisory âstraw pollâ meant to gauge sentiment. It isnât a vote or a scientific survey. See About the Change Proposals category for more about the Change Process and moderation policy.
How do I use this as a packager? Could you provide a simple example? E.g. do I create some symbolic links, or simply omit the LICENSE file entirely and replace it with a runtime Requires?
Iâm concerned that anything that requires packagers to manually replace license files with symbolic links and/or add manual Requires will be too labor-intensive to be useful. Iâm also concerned about the possibility of accidentally mixing up similar license texts in a manual process.
I remember some time in the past that there was an effort to deduplicate these files by using automatic detection and then hardlinking them. Wouldnât that be a more efficient solution (and require less packager effort)?
So 3064 easily tested for dupes and 1600-ish unique files.
Without getting into fine details.. simple deduping using fdupes style discovery would result in a rough 2/3rd space savings and file count on my system.
The 1600 or so unique files is most likely indicative of small typographical differences of the same intended license. How many of those remaining 1600 are actually unique licenses, rather than unintended typographical variants? Thatâs harder to determine.
Example from my system:
NetworkManager/COPYING NetworkManager-vpnc/COPYING
are both version 2 GPL with only minor typographical differences and show up in different sets of duplicates according to fdupes.
In fact it looks like there are 3 different minor typographical variants for the GPLv2 in use for the NetworkManager* packages installed on my system when looking closely at the duplicate set information.
Also note that according to fdupes the systemd/LICENSE.GPL2 file is a unique typographical variant of the GPLv2, and not a duplicate, its nearly the same as the NetworkManager/COPYING file⌠with mostly indention differences and an interesting difference of opinion as to what that correct mailing address for the FSF Boston office is. Thatâs a weird differenceâŚ
Iâve got 59 files that use the 59 Temple address and 1224 files that use the 51 Franklin address. I guess the mailing address changed at some point. But this brings up an interesting question, since the mailing address is not merely typographical difference, would common-licenses provide variants with both mailing addresses? Iâm not sure itâs a material difference.
Anyways you donât have to get into that level of detail of trying to sort of the typographical variance or try to resolve intent. The bulk of the savings on my system is simple deduping the 3k down to 401 unique files in the same manner fdupes did its duplicate discovery. Thereâs no interpretation of intent needed for that as the match is checksum and/or bitwise dedupe.
Can we do what fdupes is doing with existing packaging mechanisms in an automatic fashion and just live with the small number of variances of each license type.
There is one additional long term benefit here in making the licensing a set of requirements instead of just a metadata string, it may help us, and our downstreams deal with compliance issues by making sure licenses are on disk in systems that get Fedora installed on them.
If all the binary fedora packages had to have an explicit set of license requires that could be fulfilled by a set of packages it may help us prevent and back stop corner case mistakes when subpackages that technically donât need the main package to function.
There are subpackages out there that by policy are required to pull in the main package just to get the license file. Itâs difficult to imagine a linting scheme that can deal with the current policy because its highly contextual. Policy makes license files a reason to establish a package depedency without actually making licensing an explicit set of dependencies that rule based tooling can lint for.
But if licensing was explictly an rpm depedency requirement for every single binary package.. we could lint on that, at the cost of additional complexity in the dep resolution system.
If we express the licensing file requirements explicitly as a set up deps, then we can better ensure (sub)packages get installed with the licenses files they are suppose to have, and we can dedupe. Whether the requirements are fulfilled by one common package or a whole bunch of different packages is less important to me than making the license files as requirement to be fulfilled part of the package build and release systemâs mandate to help prevent mistakes on this.
One of the items I see with this sort of package is that including it in Fedora at first isnât a problem as a reference package which could be installed. Then the work of how things could be/would be used by packagers can be worked out by Fesco for a different release. However, this is a legal oriented package so who will keep this package up to date for address changes, license additions/changes, format issues and SPDX redefinitions of names? This starts to make it a political package to keep in shape which may need more review than a single packager.
Also what is the benefit for this? Slimming down containers with one RPM versus having some sort of dedup step? Or something else?
I think the gain here is not at all worth all this churn.
The amount of duplicated space is trivial.
Unless I am missing something this also doesnât change anything about requirements, does it? We already require packages to ship license text where available. Where they do not, itâs already a bug and should be fixed.
Thank you for writing this up, and I agree this is a problem, but Iâm not sure common-licenses is going to make it much better. As others said, this seems like it would create a lot of extra manual extra work for packagers (unless the Change comes along with additional automation and Guidelines â which it does not currently) and would also require more decision making (what is the official license text, are minor textual differences significant, etc.) and potentially cause confusion.
It would be better if the package manager itself or a file trigger could reflink or hardlink duplicate files in /usr/share/licenses directory as part of the transaction (if thatâs not too performance intensive).
Or maybe we could compress license texts automatically during the build process like we do with manpages.
Or perhaps the kiwi container image build process itself should deduplicate licenses (do Docker/OCI images support hardlinks?) since thatâs where saving a couple megabytes seems to matter most.
1. would especially help out the new Go tooling, since each package now includes the license text for each vendored dependency. I guess license texts are deduplicated within an individual package by the hardlink BRP, but it would be nice to deduplicate license texts (including duplicated MIT/BSD variants) across the boundary of a single package.
Saving 30MB of space is not a convincing argument for making every packager do extra work, and neither is the reduction of inodes. Modern file systems do some similar leverage already.
Reducing redundancy would convince me, though. Indeed: During SPDX conversion, we already checked that packages use licenses from a certain standard set. So, for a package with confirmed SPDX license, we do not need to check again - the license tag says it all. Itâs only the licensesâ requirement to ship the file which forces us to ship the actual file (in addition to the tag).
Someone with âlegalâ in their job description would need to clarify whether any of the following is still compliant with the licencesâ requirements:
ship the license verbatim, but only indirectly (dependency) rather than as part of the package
ship the license in an âequivalent formâ (for some definition of âequivalentâ) rather than verbatim
They moved to Temple Place in 1995 and then to Franklin St in 2005. Theyâre now fully remote, so neither address is in use now, and the current version of the license just gives a URL.
This is on purpose. If you look at the Change description, we wrote out that itâs a common base prefix and thereâs a json file that enumerates the valid identifiers.
One thing @salimma and I are looking into is whether we could auto-generate dependencies on licenses from common-licenses if their identifiers are present in the License: field. It would be a nice optimization to reduce the toil.
indrect dependency of verbatim duplicates is allowed under current subpackaging policy. The question of âequivalent formâ is probably something that needs to be considered.
if this change is accepted, I would encourage packagers to take a conservative approach to the use of the common license package, and only replace verbatim duplicates (testable with fdupes), until there is better clarity with regard to the question of the use of equivalent license text. Better still, if there was some sort of linter logic that helped ensure the common licenses dep was used to only replace strict duplicates (testable via fdupes) that would be very interesting guardrail to ensure packagers who opt-in to use the common licenses package arenât overstepping and replacing something that isnât a strict dupe.
Assuming for the rest of this post that the indirect dep verbatim license file is acceptable, but an equivalent license text is notâŚ
I can envision the common licenses package containing several variants of the same license with checksum information. I can also envision a dep generator/rpm macro making use of that checksum info to ensure the common licenses package dep is used when there is a verbatim file copy available, and if the verbatim copy isnât available then a copy of the license is installed instead.
So the first thing I thought I would use this for would not work without having to use a command line tool versus my eyes. I would normally just want to look at a spec file License tag and then look up what it was with ls /usr/share/common-licenses/${SPDX ID}/LICENSE but instead I will need some sort of Captain Crunch decoder ring to map things instead.
My second item would be using the directory names as a short cut to remember what the SPDX might be. I would instead put GPL-3.0 in the spec file to get dinged for it not being the correct format. I would expect this would happen a lot for both new and experienced spec writers.
Another way it could be done is with symlinks, but either way, a basename is needed for licenses that have alternative tags instead of suffixes in SPDX parlance.
And the GPL-3.0 identifier is valid, even if it is âdeprecatedâ with SPDX v3.0.