Hello everyone. I want to see about getting some feedback about the furture for a currently not working service.
A bit of history first. When fedora extras was setup, we also had a mailing list, called at the time ‘fedora-extras-commits’. After the fedora extras / core merge happened, it became the ‘scm-commits’ list. This list would get an email via a git hook with a diff of things changed in that commit on packages. Note that this was only packages, no other git repos.
This had some good uses:
Interested folks could watch this flow of emails for commits and review/comment on them (back to the author).
Interested parties could have a copy of all those changes in the event something happened to the cvs/git server. Tampering with the server is one place, but once all the emails went out to people they would have a copy of what really was committed.
Interested parties could see when things were happening. ie, “oh, the python rebuild is happening now” or "oh, I see someone is rebuilding a stack for a soname change, I wanted to mention something to them about that.
It was a way to watch ALL the package commits in one easy place.
It was a way to have copies of emails so you could use whatever tools you liked locally to search them/find history/etc.
It also had bad aspects:
it was a lot of emails, some people would sign up and swamp their isp.
it was a lot of emails and some providers (google) would throttle emails from fedoraproject because of the volume of scm-commits
The volume is likely even a bunch more these days.
The service broke when we moved to the new notification service. Our old service had a way to just say ‘please give me all messages about packages git commits’, but the new one does not have that. It has been broken for about 2 years now. See: https://forge.fedoraproject.org/infra/tickets/issues/11641 It’s turned out to be difficult/annoying to just get the list posting working again, and we want to move away from lists anyhow.
Recently @smoliicek noticed that ticket and asked about moving it forward. We talked about it in some infra meetings and I suggested that perhaps instead of a mailing list (which turns out to be difficult to get working again in this case) we look at using public-inbox. This is something kernel.org folks have developed that stuffs emails into a git repo. So, you can clone it and have it locally, or “Readers may
read via NNTP, IMAP, Atom feeds or HTML archives.” He’s setup a proof of concept of this and it works fine.
Then @mwinters noted that the FDWG project that he’s working on can likely do a lot of the things that the list did/or that public-inbox could do. He also points out that public-inbox of all the commits could be… very very large, very very quickly.
I have my own feelings on what might be useful or not to me, but I’d really love some more data from packagers. I know @mwinters and @smoliicek will want to chime in here with their thoughts too.
scm-commits replacement options
I don’t think we need anything here
I would use a public-inbox of packging git commits
I would not use it, but having that back up of commits could come in very handy in case of certain breaches. A public inbox would be great. FDWG would be good too, though I would be unsure of the timelines getting it happening there.
As mentioned in the linked issue, we solve the “analysis” use cases very well today – you can query for anything you can imagine, and we can build dashboards if you can tell us what you want to see.
However, we’re currently dependent upon a nightly batch load of datanommer data, so we can’t provide alerts for what happened 5 seconds ago. It’s on our roadmap to switch from a nightly batch to a continuous feed, but that’s indeed a ways off.
I agree with @mwinters that the git repository of public-inbox could grow big really fast, and that integrating it with the FDWG project we are working on would probably be better, however I also think that the public-inbox idea was good, and people who were subscribed to the list may want to get emails again. My idea is:
Integrate the commits into the FDWG project - for easier searching / filtering / archiving / other stuff people may want to do
Keep ~6 months of commit history in public-inbox - for people who would like to connect it to their mail client, using an atom feed or something else
I know that this is probably a really niche mailing list and I understand this is a niche use case, and I’m not arguing that every small request justifies implementation work. But the cost of implementing this is pretty low - the current POC of public-inbox is using 3.4m of CPU resources, and 128mb of RAM, the only higher usage would be storage, that should be mitigated by limiting the number of emails there.
I know contributors who would use this, and would benefit from this hybrid approach.
FDWG = Fedora Data Working Group. We’re new, founded late last year.
In short, we’re focused on building analytical capabilities for our data. Actual analysis is mostly TODO, though we’re seeking use cases including those from packagers.
I agree. As I’ve said elsewhere, there are two different use cases here:
Analysis
Alerts
It seems like these would each be best served by separate systems. My thought was that FDWG can provide analysis (today!) and public-inbox can provide alerts.
@kevin seemed to disagree with that separation of use cases, but he didn’t have time in the moment to say why. Kevin, would you like to chime in with your thoughts on a hybrid approach like this?
The timestamp of the first message to come into the public-inbox is 2026-05-29 00:17:06 UTC
Since then we have 7488 commits (1 commit = 1 commit to src.fp.o = 1 email in the public inbox), that rounds up to 22 464 git objects - 3x the number of commits
Details are in the linked ticket, but we had ~500,000 package commits in 2025. Trying to stuff all of that into one git repo and keep it forever seems crazy to me, but I’ve never tried it.
It sounds like public-inbox only support a git repo as backend … given the amount of data, something like a sqlite database would probably scale better (and be easier to query)?
This is probably orthogonal to your question, but in case it was unclear from the FDWG diagram: we are providing data file downloads of message bus data which are very similar to sqlite.
Today we only provide “all bus data for a given timerange” but FYI we could easily create a subset for “just package commits” if there is interest.