Completely agree this is important and you are definitely not the first one to mention this… As with other new features asked for Fedora dist-git CI, we would like to feature-match the existing one first and then add new stuff.
I don’t think people routinely check tarballs. I’m not sure how much that was ever the case, but the effort to check just a single tarball by eyeballing is just too large. (I occasionally use diffoscope, either on the source tarball or on the binary rpms, if I have a more obscure upstream and I think a spot check is useful, but that only works for small packages, e.g. python, that haven’t done too many changes.) I think we should accept that maintainers don’t do this.
So the benefit from having a human first download the tarball to their machine, calculate the checksum and commit the checksum to a file, and then upload the tarball to the look-aside cache, vs. having a machine do similar steps is small. And with packit automation, we’re already doing this.
That’s a good question. I think an extra commit would be fine.
This is a very individual matter. Some people do this, while others do not. This is a very important question that we should discuss elsewhere. I don’t believe we should accept that maintainers don’t review tarballs. They are responsible for the code they bring into Fedora. If we are to change any related process, we should be cautious about any change that even remotely demotivates the people that still do review.
Is this asking for any policy change?
The Forgejo movement is a great thing. Let’s not make it bitter because of this.
Similarly to the last time something like this was discussed, I feel like it is optimizing for the wrong thing. As a packager, by far the main source of toil is coordinated rebuilds and updates, where I need to juggle half a dozen to a hundred packages to build in the right order inside a side-tag. This is very easy to screw up, and in my experience is one of the major barriers to getting new people onboarded on packaging. It’s also a mostly automatable workflow, and as @decathorpe mentioned most packagers have their own collection of shell scripts and stuff to reduce toil here. I have nothing against a pull request workflow as proposed, and I think it’d definitely be nice to have CI improvements in general, but I don’t think the proposed approach makes things better here (and if we were to forbid direct pushes it would actively make things worse).
I concur, sources should contain updated hashes and that should be good enough.
BTW not all sources can be downloaded. In that case, I’d propose to allow providing link to SRPM, which would already contain the sources pre-prepared.
And no, I am not fan of “preprocess_sources” script or what not, because it goes against the idea that the .spec file contains all the required information. If there are custom tarballs included, this information is typically quite important.
Is this something that is in scope for onboarding someone? Really?
or is this an advanced packaging topic?
Because when I think of onboarding, Im thinking about having a process where someone gets to the serotonin hit of contributing value to the cause fast enough that they raise their hand for future toil, again and again. The juggling of dozens or hundreds of packages sounds more like a veteran issue, for people who know where all the bodies are buried. And I’m not discounting that its important to address that toil… but I wouldn’t call it an onboarding consideration.
What if direct pushes were an earned trust level sort of thing and SIGs as part of self-organizing work, get to decide who gets access to that.
Again, I would expect that people “onboarding” wouldn’t get access to it, and SIGs would generally want to use a more interactive process for onboarding or driveby contributions. But SIG veterans may because there is understood process complexity that the tooling doesnt work for yet that requires additional human trust to work through.
I’d much prefer to have active groups of people, SIGs in good standing, to take responsibility for deciding who needs access to direct pushes because there is some rough consensus in that group about how the more complex parts of the work needs to get done.
Yes, in my experience it’s something one will hit very quickly when trying to contribute. Almost any non-trivial package has dependencies, and often those dependencies are also not packaged and need to be dealt with with side tags, it’s absolutely not an advanced topic IMO. This is even more common for language ecosystems like Rust that tend to split things up in small components. I remember having to work with side tags very early on when I started packaging in Fedora, and I’ve personally had to teach this to multiple people to get them unblocked. I also know of at least one person I was working with that ended up giving up because of the amount of toil involved.
There is one very important aspect that I just was reminded of (oh well …): forbidding direct pushes avoids direct pushes. I mean those which were meant to go to a fork instead of the main repo.
The problem is that the main repo allows fast forward pushes only. It does so for a good reason, but this means you cannot correct any mistakes by force-pushing a new ref (e.g. rewritten HEAD). Ideally, those repos or branches should not be pushed to directly.
I know some people are worried about their tooling. Let’s put it differently: We have one repo or branch A (say rpms/foo.git:rawhide) which “drives koji”, i.e. whwre you push to for a release build. And we have another repo or branch B which you push to for CI. Currently, some of us use B=A, while others have B on a fork. If working with B is as simple as with A (and then some) and ff-merging B into A is friction-less enough then current A-users need not be worried about their tooling, while we reap the benefits of a rewritable branch B and CI there.
No. The only real piece missing for Dist-Git is that we don’t integrate in the Git CLI like Git LFS and Git Annex do. And that is only because when Dist-Git was created, that facility wasn’t really there. We could do it now if we wanted to, but we already encourage people to use fedpkg instead anyway. Dist-Git is optimized for us, and critically makes it easy for Fedora to get forked to build distributions. Red Hat, Amazon, and others leverage this to have pristine connections between Fedora and their downstream distributions (CentOS Stream, Amazon Linux, etc.).
This is dependent on a number of factors, particularly around how LFS is implemented in the other Git server. And Git LFS mirroring is quirky since the LFS server is encoded in the extended data for the Git repository, and changing the LFS server location changes the Git repo data itself too. I’ve personally experienced a number of issues mirroring repositories with LFS data between GitHub and GitLab, much less with the Gogs family and other forge systems.
I wouldn’t want to go down this road.
(And “IMHO” doesn’t make sense in this context because that’s about opinions, FYI)
Given as I’m primarily an openSUSE guy, and am used to how OpenBuildService handles things (Yeet your .spec and a tarball at OBS and it just figures out the build order and whatnot on the backend) Yeah, I’d say it’s an onboarding consideration.
I stepped up to help maintain the LXQt stack in Fedora, and the process is just painful if you have to update more than one package.