F41 Change Proposal - Reproducible Package Builds (System-Wide)

Reproducible Package Builds

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

Wiki
Announced

:link: Summary

A post-build cleanup is integrated into the RPM build process so that common causes of build irreproducibility in packages are removed, making most of Fedora packages reproducible.

:link: Owner

:link: Detailed Description

As of 2023 there is an active effort to implement Reproducible builds in Fedora. Reproducible builds will allow our users to be able to independently verify that the RPMs have not been tampered with (either maliciously or via hardware/software fault): someone can do an independent rebuild of a package and confirm that they get identical binaries when building with the same versions of the compiler and other tools. This Change allows us to move forward in this direction by removing the common sources of irreproducibility.

add-determinism is a Rust program which, as its name suggests, adds determinism to files that are given as input by attempting to standardize metadata contained in binary or source files to ensure consistency and clamping to $SOURCE_DATE_EPOCH in all instances. add-determinism is the “Fedora version” of strip-nondeterminism from the Debian project. Since strip-nondeterminism is written in perl, it is undesirable for use in Fedora, as we don’t want to pull perl in the buildroot for every package.

It’s worth noting that this Change does not intend to impose any specific reproducibility requirements on Fedora packages. Once this Change is implemented and we have been through a mass rebuild and can verify that the common causes of irreproducibility have indeed been removed, we can consider further steps. But that will be at least one release later.

This change does add a small amount of time to the processing of RPMs at the end of a build. Accordingly, packages containing large quantities or sizes of files be slower, but this effect is not expected to be noticeable. add-determinism takes steps to ensure it does not interfere with other buildroot post processors like mangle-shebangs, python-hardlink, python-bytecompile. It defaults to not doing any modifications in case it doesn’t understand the input file or there are any other problems.

A mechanism to opt-out will be provided: to either completely disable the postprocessing step or to disable specific “handlers” (i.e. implementations of cleanup for specific file types, for example static archives). See macros.build-reproducibility.

:link: Related Changes

:link: Feedback

:link: Benefit to Fedora

Adding determinism (i.e., removing non-determinsim) enables the Fedora community to have confidence that, if given the same source code, build environment, build instructions, and metadata from the build artifacts, any party can recreate copies of the artifacts that are identical except for the signatures and some parts of metadata.

Reproducibility of builds leads to packages of higher quality. It turns out that quite often those irreproducible bits are caused by an error or sloppiness in the code. In particular, any dependence on architecture in noarch packages is almost always unwanted and/or a bug. Test builds that check reproducibility will expose such instances.

Reproducibility of builds makes it easier to develop packages: when a small change is made and a package is rebuilt (in the same environment), then with a reproducible package, the only difference is directly caused by the change. If the package is different every time it is rebuilt, making a comparison is much harder.

Build reproducibility for noarch subpackages solves the problem where package builds on different architectures are different, causing mock to reject the whole build. In particular, this issue occurs for pyc files. This will now be solved without requiring opt-in from individual packages.

:link: Scope

  • Proposal Owners:

    • Integrate add-determinism as a BuildRoot Policy script
    • Add a dependency on marshalparser to python3 (probably conditionalized on rpm-build)
  • Other Developers:

    • Test their packages with the additional phase, report problems
    • Potentially integrate changes to packages to enable reproducibility
  • Release Engineering: Ideally we want this to happen before the mass rebuild, but that is not strictly required.

  • Policies and Guidelines: Fedora Packaging Guidelines should be updated to include information on the add-determinism BuildRoot Policy. User documentation should be amended to include instructions on how to verify reproducibility for a given package, and what packages are known to be non-reproducible, and how to opt-out.

  • Trademark approval: N/A (not needed for this Change)

  • Alignment with Community Initiatives: All software and requests are consistent with the decision process and similar across other groups in Fedora. The Fedora Reproducibility Working group begin at Flock 2023 in Cork.

:link: Upgrade/compatibility impact

No impact is expected.

:link: How To Test

To test on the level of individual files:

  • install add-determinism
  • call SOURCE_DATE_EPOCH=… add-determinism -v ./path/to/file

To test package builds:

(This can be done on a normal system or in a mock chroot.)

:link: User Experience

No impact is expected.

:link: Dependencies

:link: Contingency Plan

  • Contingency mechanism:
    • In case of major problems, disable the change in redhat-rpm-config.
    • In case of problems with specific packages, opt-out by setting a macro.
  • Contingency deadline: No limit really.
  • Blocks release? No.

:link: Documentation

:link: Release Notes

Fedora package builds are now more deterministic, bringing the distribution closer to the goal of achieving fully reproducible builds for all of its packages.

Last edited by @amoloney 2024-04-12T21:45:15Z

5 Likes
How do you feel about the proposal as written?
  • Strongly in favor
  • In favor, with reservations
  • Neutral
  • Opposed, but could be convinced
  • Strongly opposed
0 voters

If you are in favor but have reservations, or are opposed but something could change your mind, please explain in a reply.

We want everyone to be heard, but many posts repeating the same thing actually makes that harder. If you have something new to say, please say it. If, instead, you find someone has already covered what you’d like to express, please simply giving that post a :heart: instead of reiterating. You can even do this by email, by replying with the heart emoji or just “+1”. This will make long topics easier to follow.

Please note that this is an advisory “straw poll” meant to gauge sentiment. It isn’t a vote or a scientific survey. See About the Change Proposals category for more about the Change Process and moderation policy.

Related & worth to be read (or at least skimmed), the current discussions in the devel mailing list:

Related topics:
reprodubible builds (re)introduction
F41 Change Proposal - Reproducible Package Builds (System-Wide)

Discussions that contain a related discussion tree (related “sub-discussion” contained):
Three steps we could take to make supply chain attacks a bit harder

1 Like

Rather than rewriting something that works, especially if it’s doing textual and environmental modification and written in a scripting language that excels at both, I’d really prefer the upstream package were used as-is instead of being rewritten in Rust. The drive to remove perl from buildroot was borne from a perhaps overzealous attempt to optimize away every tiny bit of size from the buildroots, and I don’t know that adding complexity from forks and reinterpretations is a great substitution.

This looks reasonable to me overall.

We should perhaps try and get an audit of the upstream code, and make sure any changes to it are inspected very closely. This package will be used on a vast number of packages, so would be a very juicy target. :frowning:

3 Likes

Agreed - I have reviewed the add-determinism code, but I’m not an expert in rust (or security). that said, the strategy taken is sound (imo) and is a lot more maintainable than the alternative tool.

more thorough inspection is likely warranted, as would be updates to this package in the future–that is something I think is a bit of a risk due to the nature of the crates being used as dependencies, but as discussed in the mailing list, the crates are fairly “standard”. (On the other hand, infecting dependent crates would have impacts wider than this single package)

This is just some marketing speak, but what it does in reality?

The readme from the links above gives more technical details. As of this post, that is:

Build postprocessor to reset metadata fields for build reproducibility

This crate provides a binary add-determinism that one or more paths,
and will recursively process those paths,
attempting to run the handlers on any files with extensions that match.

For each processed file, a temporary file is opened,
the contents are rewritten,
the modification timestamp is copied from the original file to the temporary copy,
and the copy is renamed over the original.

If processing fails, a warning is emitted,
but no modifications are made and the program returns success.

Processors

ar

Accepts *.a.

Resets the embedded modification times to $SOURCE_DATE_EPOCH and owner:group to 0:0.

jar

Accepts *.jar.

This rewrites the zip file using the zip create.
The modification times of archive entries is clamped $SOURCE_DATE_EPOCH.
Extra metadata, i.e. primarily timestamps in UNIX format and DOS permissions,
are stripped (also because the crate does not support them).

javadoc

Accepts *.html.

This looks at the <head> portion of an HTML file and finds standard
lines inserted by Javadoc that specify the file creation date.
For example,
<!-- Generated by javadoc (<version>) on <date> --> is replaced by a version without the version and date,
and <meta name="dc.created" content="<date>"> is replaced by a version with $SOURCE_DATE_EPOCH.

pyc

Accepts *.pyc.

Uses the MarshalParser Python module
to clean up the internal Python object serialization in cache files.

Notes

This project is inspired by
strip-nondeterminism,
but is written from scratch in Rust.
For Debian, build tools are written in Perl and more Perl is not an issue.
But in Fedora/RHEL/…, tools are written in Bash, Python, or compiled,
and we don’t want to pull in Perl into all buildroots.

Or perhaps reach out to the Debian folks and see if they would also benefit from the same?

Or perhaps reach out to the Debian folks and see if they would also benefit from the [same]?

[I assume that by “the same” you mean this tool.]

The considerations for Debian are different: their packaging stack is based on Perl, in particular debhelper. So adding new tools in Perl is essentially free, and also they by necessity have a bunch of folks who are fluent in Perl. Their tooling was written a long time ago, and in particular strip-determinism is 10 years old. It’s likely that they’d pick a different implementation language if they were starting from scratch.

There’s another aspect: the “handlers”, i.e. the specific implementations that we’ll need are slightly different. For example, Debian does not package pre-generated pyc files, they are generated on the end system, so they never hit this particular issue. Also, Fedora does much more extensive preparation of debuginfo data. We hit some issues with how that’s generated that were not encountered by other distributions working on build reproducibility. In the end this was solved in gdb, but if we were adding a handler for this, it’d again be completely new code. So it’s actually not the case that we could take their tooling and plug it into our build system, extensive modifications would need to be made.

OTOH, AFAIK, Debian still has issues with Rust. (At least in the past they had long discussions about some rust libraries being required by the core system.) I’m not sure if they are ready to put Rust so deep in their build system. Maybe if it turns out to work well, they can consider switching. But that means we’d need to add the handlers that they need. It’s possible, but probably not in the short term.

[Discourse is being “helpful” and removing the quote I inserted at the top. I added “” to confuse it.]

1 Like

This is just some marketing speak, but what it does in reality? [confuse discourse]

To add to what @mattdm wrote: the tool can be called just fine on any file, there is no magic.

For example:

$ cp /usr/lib64/libresolv.a /tmp/ 
$ add-determinism -v /tmp/libresolv.a
Initialized logging with log level DEBUG
Requested handlers: ar, jar, javadoc, pyc (strict=false)
SOURCE_DATE_EPOCH timestamp: (unset)
Running as add-determinism… (brp=false)
Initialized handler ar.
Initialized handler jar.
Initialized handler javadoc.
Initialized handler pyc.
Looking at /tmp/libresolv.a…
/tmp/libresolv.a: handler ar: true
/tmp/libresolv.a: reading file header at offset 8
...
/tmp/libresolv.a: replacing with normalized version
/tmp/libresolv.a: handler jar: false
/tmp/libresolv.a: handler javadoc: false
/tmp/libresolv.a: handler pyc: false

$ diffoscope --text-color=always /usr/lib64/libresolv.a /tmp/libresolv.a
--- /usr/lib64/libresolv.a
+++ /tmp/libresolv.a
├── file list
│ @@ -1,21 +1,21 @@
│  ----------   0        0        0     1018 2024-03-26 23:54:58.000000 /
│  ----------   0        0        0        0 1970-01-01 00:00:00.000000 //
│ --rw-r--r--   0     1000      425     3120 2024-03-26 23:54:58.000000 base64.o
│ --rw-r--r--   0     1000      425     1048 2024-03-26 23:54:58.000000 compat-gethnamaddr.o
│ --rw-r--r--   0     1000      425     1032 2024-03-26 23:54:58.000000 compat-hooks.o
│ --rw-r--r--   0     1000      425     2632 2024-03-26 23:54:58.000000 inet_net_ntop.o
│ --rw-r--r--   0     1000      425     3960 2024-03-26 23:54:58.000000 inet_net_pton.o
│ --rw-r--r--   0     1000      425     2216 2024-03-26 23:54:58.000000 inet_neta.o
│ --rw-r--r--   0     1000      425     2784 2024-03-26 23:54:58.000000 ns_date.o
│ --rw-r--r--   0     1000      425     2256 2024-03-26 23:54:58.000000 ns_name.o
│ --rw-r--r--   0     1000      425     1808 2024-03-26 23:54:58.000000 ns_netint.o
│ --rw-r--r--   0     1000      425     3696 2024-03-26 23:54:58.000000 ns_parse.o
│ --rw-r--r--   0     1000      425    19792 2024-03-26 23:54:58.000000 ns_print.o
│ --rw-r--r--   0     1000      425     2448 2024-03-26 23:54:58.000000 ns_samedomain.o
│ --rw-r--r--   0     1000      425     4384 2024-03-26 23:54:58.000000 ns_ttl.o
│ --rw-r--r--   0     1000      425     2088 2024-03-26 23:54:58.000000 res-putget.o
│ --rw-r--r--   0     1000      425     1736 2024-03-26 23:54:58.000000 res_data.o
│ --rw-r--r--   0     1000      425    28064 2024-03-26 23:54:58.000000 res_debug.o
│ --rw-r--r--   0     1000      425     2600 2024-03-26 23:54:58.000000 res_hostalias.o
│ --rw-r--r--   0     1000      425     2136 2024-03-26 23:54:58.000000 res_isourserver.o
│ --rw-r--r--   0     1000      425     2128 2024-03-26 23:54:58.000000 resolv-deprecated.o
│ +-rw-r--r--   0        0        0     3120 2024-03-26 23:54:58.000000 base64.o
│ +-rw-r--r--   0        0        0     1048 2024-03-26 23:54:58.000000 compat-gethnamaddr.o
│ +-rw-r--r--   0        0        0     1032 2024-03-26 23:54:58.000000 compat-hooks.o
│ +-rw-r--r--   0        0        0     2632 2024-03-26 23:54:58.000000 inet_net_ntop.o
│ +-rw-r--r--   0        0        0     3960 2024-03-26 23:54:58.000000 inet_net_pton.o
│ +-rw-r--r--   0        0        0     2216 2024-03-26 23:54:58.000000 inet_neta.o
│ +-rw-r--r--   0        0        0     2784 2024-03-26 23:54:58.000000 ns_date.o
│ +-rw-r--r--   0        0        0     2256 2024-03-26 23:54:58.000000 ns_name.o
│ +-rw-r--r--   0        0        0     1808 2024-03-26 23:54:58.000000 ns_netint.o
│ +-rw-r--r--   0        0        0     3696 2024-03-26 23:54:58.000000 ns_parse.o
│ +-rw-r--r--   0        0        0    19792 2024-03-26 23:54:58.000000 ns_print.o
│ +-rw-r--r--   0        0        0     2448 2024-03-26 23:54:58.000000 ns_samedomain.o
│ +-rw-r--r--   0        0        0     4384 2024-03-26 23:54:58.000000 ns_ttl.o
│ +-rw-r--r--   0        0        0     2088 2024-03-26 23:54:58.000000 res-putget.o
│ +-rw-r--r--   0        0        0     1736 2024-03-26 23:54:58.000000 res_data.o
│ +-rw-r--r--   0        0        0    28064 2024-03-26 23:54:58.000000 res_debug.o
│ +-rw-r--r--   0        0        0     2600 2024-03-26 23:54:58.000000 res_hostalias.o
│ +-rw-r--r--   0        0        0     2136 2024-03-26 23:54:58.000000 res_isourserver.o
│ +-rw-r--r--   0        0        0     2128 2024-03-26 23:54:58.000000 resolv-deprecated.o

And the same for the other types… If you have doubts about what the tool is doing, please check! In fact, it’s probably better if people do tests, rather than trusting that I didn’t make any mistakes :wink:

1 Like

For the javadoc html handler:

$ cp /usr/share/javadoc/aqute-bnd/aQute/bnd/annotation/Cardinality.html /tmp
$ SOURCE_DATE_EPOCH=1234567 add-determinism /tmp/Cardinality.html
/tmp/Cardinality.html: replacing with normalized version
$ diffoscope /usr/share/javadoc/aqute-bnd/aQute/bnd/annotation/Cardinality.html /tmp/Cardinality.html
--- /usr/share/javadoc/aqute-bnd/aQute/bnd/annotation/Cardinality.html
+++ /tmp/Cardinality.html
@@ -1,15 +1,15 @@
 <!DOCTYPE HTML>
 <html lang="en">
 <head>
-<!-- Generated by javadoc (21) on Sat Mar 02 16:07:41 UTC 2024 -->
+<!-- Generated by javadoc -->
 <title>Cardinality</title>
 <meta name="viewport" content="width=device-width, initial-scale=1">
 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-<meta name="dc.created" content="2024-03-02">
+<meta name="dc.created" content="1970-01-15">
 <meta name="description" content="declaration: package: aQute.bnd.annotation, enum: Cardinality">
 <meta name="generator" content="javadoc/ClassWriterImpl">
1 Like

This change proposal has now been submitted to FESCo with ticket #3201 for voting.

To find out more, please visit our Changes Policy documentation.

2 Likes

This change has been accepted by FESCo for Fedora Linux 41. A full list of approved changes to date can be found on the Change Set Page.

To find out more about how our changes policy works, please visit our docs site.

2 Likes