Is most recent Mesa push to stable (mesa-25.2.7-2.fc43) reasonable?

Isn’t this a bit hypocritical, saying RPMFusion isn’t a part of Fedora, while a lot of Fedora users depend on it, to use the NVIDIA drivers which are widespread on Linux nowadays?

Even your selves relayed on that:

Please do not cut branches off on which your selves sitting on, it could hurt while falling down.
Inclusion (maybe also in $'s) would probably better suit than just blame someone.

@ilikelinux and @gtb, this kind of conversation is why I advised against such generalisations. Is anything specific actually being discussed here? Obviously, nobody’s a pawn to anyone else, so I don’t see the value in this, especially since all subscribed receive a notification for each of these messages.

Similarly, I don’t envision much good becoming of discourse conducted in this manner. Do you, really? Perhaps, I’m assuming too much negative intent. If so, I’ll shut-up hereafter.

2 Likes

That’s not really the right framing. Disabling autopush is the conservative, restrictive choice.

Manual push is always ‘enabled’, for all updates. It’s not actually a setting, it’s the basic flow of the update system. If the update reaches the applicable policy requirements - which vary depending on whether it is critical path or not, and what phase the release it’s for is currently in - it may be manually pushed stable, by someone with the necessary privileges (a maintainer of the package, or a proven packager).

Autopush is an additional convenience feature that was engineered in later. Autopush is optional and configurable. Anyone with power to edit the update can set a karma autopush threshold and/or a time autopush threshold. These cannot be set lower than the policy minimums (well…in fact there’s a couple of sneaky ways you could make it happen, but we protect against that). If either autopush threshold is reached, the update gets pushed stable automatically. So, say this update had had a 14 day autopush threshold set - once it reached 14 days in testing it would have been pushed stable automatically regardless of the negative feedback.

The autopush thresholds both default to on - when you create a new update it has a default 3 karma autopush threshold, and a default time autopush threshold of 7 days for non-critpath or 14 days for critpath. So turning them off is, as I said, the conservative choice - it’s basically the maintainer saying “I don’t want to trust autopush for this update, I intend for it only to be pushed stable by a human for whatever reason”.

The rules we have are a pragmatic compromise between “try not to break stuff” and “allow updates to flow without annoying maintainers too much”. Every step along the way from “just allow maintainers to push whatever they like whenever” got some pushback. Any attempt to tighten the policy further would also experience pushback.

It’s important to remember a lot of Fedora maintainers are volunteers so there are practical limits to how many onerous ‘mandatory’ processes they can be put through. If we implement a formal review program for “bad” updates (and for a start you’d have to define that), what do you do if the packager just doesn’t show up? The only lever we’d really have would be to remove packager privileges, which is fine, but…now you have some orphan packages you have to retire or find someone else to maintain, and eventually you wind up with no volunteer packagers, maybe. And removing privileges from people also causes blowback (see the discussion earlier this year around proposed sanctions on a proven packager). It’s never free to poke stuff.

We definitely can consider steps to take in this situation, I just want to emphasize that it’s part of a very-long-running history and there are practical constraints.

5 Likes

I definitely wouldn’t say you’re crazy, but I would say “hard cases make bad law” - it’s generally a mistake to generalize from specific cases without considering all cases. There definitely are cases where it’s correct to push an update stable despite negative feedback. Negative feedback isn’t always “correct”, and it doesn’t always consider the wider picture.

For instance, it’s not uncommon for people to give an update negative feedback because it doesn’t fix a bug they were already experiencing. But that’s never a sensible reason not to push the update stable - after all, doing so isn’t going to make their situation worse. If the only purpose of the update was to fix that one specific bug, the negative feedback is justified, but if the update is meant to fix 10 things and only fixes 9 - or if it was only meant to fix 9, but someone is complaining about bug 10 which the update never claimed to fix at all - the feedback should properly be ignored.

Yes, if we had a “two people must sign off” rule or something then two people could just sign off, but it’d make the flow slower. Not every package is mesa that lots of people care about immediately; if the package is something more obscure, a good update could be held up unnecessarily while the maintainer tries to find someone else to sign off on ignoring the irrelevant negative feedback.

And there’s the cost of building this whole mechanism in the first place, which would require rather fundamental surgery to Bodhi. Bodhi doesn’t have any concept of ‘maintainer voting to push updates’ or anything like that. Pushing an update is just an API action with a permission gate: the client sends an API request, the server acts on it (if the user who sent the request has the appropriate permissions) or rejects it (if not). That’s it, that’s the whole mechanism. (OK, yes, there’s a “does the update meet the karma/time/gating requirements” check too). Rewriting that to “there is a maintainer-only voting system separate from the karma system and Bodhi checks its requirements have been met before deciding how to act in response to the API call” is a substantial piece of engineering. Bodhi has one official (volunteer) maintainer who is absolutely not working on it full time, ATM. I’m probably the second most experienced kinda-active Bodhi contributor ATM and I have a hundred other things I’m supposed to be doing (and working on Bodhi isn’t my job either). Bodhi is also vaguely supposed to be getting replaced by a whole new Konflux-y pipeline at some point, so finding the resources to do major surgery on Bodhi is tricky.

3 Likes

BTW, as a meta note, the classic counterpoint to this situation is the “OMG you didn’t push that CVE fix stable yet?!??!” case, where some CVE gets a cute nickname and does a round of global press. When that happens, there’s a ton of pressure to ship a fix as fast as possible, and anything in the process which can cause a “delay” - like quality requirements - gets framed as a problem.

So these are the two classic cases of “public opinion” creating a pressure on the update process, and they work in opposite directions. The CVE case tends to cause feedback like “omg why do you have all this red tape when all we need to do is push out this OBVIOUSLY IMPORTANT FIX as fast as possible”. The broken update case tends to produce feedback like “why isn’t it harder to push out updates?!” So, there’s obviously a need to balance those two contrary pressures.

9 Likes

Some people need a reminder that you are using an incredibly awesome Fedora Linux operating system basically for free thanks to real life people:laptop: :bluethumb:

3 Likes

What’s the gist of what happened? Did breakage only affect NVIDIA proprietary drivers? Did it only affect Steam/games running through it and/or Proton?

It sounds like a core-component of the graphics stack was updated with ease of automation, before human-side testing?


Would there have been a delay on “undoing” the update? If the update was pushed easily, it seems like as soon as an issue was reported that an undo update could be pushed quicker?

I can’t say I’d be happy with a broken graphics stack on a Sunday, but might be encouraged to find something else to do for like an hour if there were reports of issues shortly after the push and an undo update queued minutes after the push (I was watching this thread yesterday and only glanced at quick dnf downgrade workarounds mentioned).

I’m not sure if this is entirely known yet. Some of the reports seem to indicate AMD was also affected. The problem involved layers, and it seems you may have needed some kind of ‘layer’ in the sense of e.g. an FPS or HUD overlay in Steam to trigger the problem, at least the “my game doesn’t run” incarnation of it.

No, in fact exactly the opposite. Everything significant that happened here was manual. No automation was involved.

“Undoing” updates is a rare and manual operation which would usually be done by folks who work for RH and thus aren’t reliably around on weekends. An “undo update” is something a maintainer could have done, sure, but there’s not many people who understand the internals of mesa well enough to be confident about poking it. I would not have wanted to leap in and push a reversion in this case, for instance, until at least a couple of days had passed with no apparent action.

In this case, the bad update was pushed stable at 01:37 on 2025-11-22 - it would have taken a few hours after that for it to make its way out to mirrors and start showing up on the systems of people without updates-testing enabled. @airlied submitted the fixed update at 01:07 on 2025-11-24. So just under 48 hours, over a weekend. If this had happened during the week those of us who work for RH would probably have jumped on it that much sooner (I for one spent most of the weekend playing Blue Prince and I’m not apologizing for it :>). I can’t speak for anyone who doesn’t work for RH who had the power to do anything about it.

8 Likes

I wrote some patches so that multi-GPU laptops wouldn’t hang the desktop on monitor hotplug, I pushed them to Fedora 43 because it was blocking a laptop from shipping Fedora 43. Unfortunately Vulkan layers are messy, and you have to do something nasty and I remove the nasty as it seemed wrong, it wasn’t. I just put it back and built the update.

@leigh123linux can we not push mesa updates with negative karma in the future, please allow the mesa team to actively figure out bugs before pushing. I’m fine with pushing positive or neutral karma to keep rpmfusion happy, but if it has negative, please ping someone if it’s causing problems. We will push fixes to mesa for things that aren’t fully upstream, but we rely on the bodhi karma to avoid regressions, so please work with that system. Under normal circumstances what you do was fine and helpful.

6 Likes

Just for reference it definitely wasn’t Nvidia only, it affected AMD GPUs too.

2 Likes

Not my problem anymore, I have quit as rpmfusion admin.

Is it possible to introduce or improve internal sanity/smoke testing to cover various scenarios (like Steam/Proton), so that development process doesn’t rely exclusively on released (and supposed to be stable???) Fedora version? It is not healthy if single button click is the only thing separating OK from TROUBLE.

The Bodhi karma is based on users testing packages deployed into the “updates-testing” repo. At that stage, the updates are not part of the stable Fedora and aren’t released to any user who hasn’t voluntarily opted into using “updates-testing”.

In this case, users who opted in to testing did indeed discover and report the issues, but the problematic update was released into stable Fedora regardless of that feedback.

Unless they are.

I will repeat myself: it is not healthy if single button click is the only thing separating OK from TROUBLE.

I’m sorry to hear that Leigh.

Folks here that use the -freeworld variant of the package, it now needs a maintainer in RPM Fusion, so please consider volunteering to maintain it:

1 Like

If you have a concrete suggestion/plan, please make one in a new thread as a proposal that the community can then discuss. Please do note what it’ll need both in terms of technical requirements and human resources.

Please also refrain from using caps on forums, it implies screaming.

1 Like

I don’t think it is possible because it appears that Red Hat is paying for @airlied time.

I don’t know how Red Hat works inside.

This is why I am asking the person who knows better.

I am sorry, my intention was to denote different states of affairs, not scream.

To add to my previous post, considering that @leigh123linux characterized his experience as

It would be far more responsible to focus on actual process improvements first, instead of passive-aggressive tone policing and attempting to groom someone else into same abusive relationship.

1 Like

It isn’t a maintainer shortage, I have quit from the infra duties.

1: signing and package push.

2: branching and release tasks.

3: processing new package requests.

4 Likes

As noted above, please make suggestions on how the process can be improved. If it needs input from other parties, one will have to wait for them to respond.

I do not see “passive aggressive tone policing” here. I don’t think we could be more explicit about what is expected from you and everyone else on the community channels.

I also do not know who is being groomed into an abusive relationship. I will disregard that remark.

Ah, cool, thanks for clarifying Leigh.

1 Like