Well, it is useless because the driver doesn’t work without the other part that we don’t have or offer. We can of course avoid preinstalling it, which avoids the issue of the driver being used automatically, but then it really changes nothing for the user experience.
No, it’s pretty much guaranteed that will be the case because NVIDIA supports openSUSE Tumbleweed officially alongside SUSE Linux Enterprise. I have also talked to the NVIDIA folks before in another context and they are actively committed on supporting the latest mainline kernels.
Not really? Firstly, there’s no such thing as a “stable” kernel by your definition (frozen API/ABI, etc.). Even the longterm kernels do not provide that guarantee, which is why Red Hat had to do extra work in the RHEL kernel for it, and they stopped doing that in RHEL 9. Secondly, all the examples you gave of out of tree kernel modules people want to use (OpenRM, VirtualBox, ZFS) all regularly validate and track the latest mainline kernels. As a former contributor to the OpenZFS project, I know for a fact that they use Fedora specifically to track and support newer Linux kernels.
And other community out of tree modules like v4l2loopback and such are developed specifically targeting distributions like Arch and Fedora, so they too support the latest kernels fairly quickly.
Well, I disagree here, because Red Hat is notably the only one of the big three commercial Linux vendors to not do this. Both SUSE and Canonical have their Linux kernel teams actively engaged and supporting their community platforms (openSUSE Tumbleweed and Ubuntu STS, respectively). I have personally experienced getting assistance and upstream kernel fixes delivered in both platforms simply by filing bug reports in their community spaces. This is something Red Hat currently fails at and it’s really baffling.
By definition, Fedora being the upstream to their products means that it’s better for Red Hat supply resources there to improve the quality of things there. It benefits RHEL too because their kernel regularly rebases whole subsystems for every minor release. In this sense, even the RHEL kernel “rolls”.
So there is no project or product rationale here that justifies the “longterm” kernel in Fedora. It doesn’t improve stability, it doesn’t improve quality, it just creates a “holding pattern” that doesn’t actually fix the problem.
We have one kernel maintainer in Fedora, and as far as I’ve been told, only Red Hatters can be maintainers of the kernel in Fedora right now because of the assumptions of the infrastructure (between secure boot and CKI, there seems to no opportunity for the community co-maintenance there). Most of our problems with things like hardware support regressions come from the anemic support when users file reports of issues.
We get away with this because eventually things get fixed as new kernels roll because eventually it affects everyone and someone who is a kernel hacker eventually notices. But that’s basically the “hopes and dreams” method, which has mostly but not always worked for us.
This project is meant to provide systems that include the support necessary for hardware-accelerated inferencing, and to support community-building for AI and ML applications.
LLMs fit within that class of applications.
I think it’s important to understand the difference between the inference engine and the model. I don’t think there are any moral or ethical issues that are inherent to LLMs or inference engines.
That’s separate from models. Some models were/are built from data of questionable provenance, while other models (e.g. Granite) are built from data that is licensed in a way that suggests that this is an appropriate use of that data.
Models, one way or the other, are out of scope for this project.
Personally, I would really like to see signing keys kept in an HSM so that they can’t be exfiltrated if a developer’s system is compromised. That would require executing the workflows in a private runner (though it could still be a GH or other CI workflow.)
I was referring to DKMS and similar systems when I wrote, “Reliable systems generally follow a pattern of build, test, deploy, but many users today are using software that requires a deploy, build, test pattern”
I see users frequently report problems that are caused by compiling a kernel module in the background. Most of the time, I suspect they are disrupting the process because they believe the system has stopped working.
Distributing drivers as source means that the drivers are deployed before they are built or tested, and no matter how you slice it, that’s not a recipe for reliability.
I believe you, but if everyone believed that, then the simplest path forward would be to simply include OpenRM in the Fedora kernel build. Right now, that idea has been rejected due to the risk that it wouldn’t build.
I don’t think it’s reasonable to reject building OpenRM with the mainline kernel because it might delay kernel minor updates AND to reject stable kernels because OpenRM can reliably track mainline.
Offhand, those seem like logically contradictory statements.
Do we see more people report hardware support regressions after patch updates or minor updates to the kernel package? I think it’s the latter, and I think that suggests that a stable kernel would improve stability (for whichever definition of “stable” you choose.)
As for quality: If that were literally true, we probably wouldn’t need “test days” for the kernel. We don’t need test days for patch updates.
I also think it’s worth contrasting the QE for kernel updates to the QE for the release overall.
As Fedora conducts testing for the release, we have lists of features to test, we have classifications for bugs that block the release vs other types of bugs, we have rollback plans for features that aren’t ready, we can delay a release until the features that will be included work reliably, and (most critically) we continue to support the previous release for the benefit of users for whom the new release doesn’t work yet in spite of all of our testing.
Most of that isn’t true for the kernel packages. We have test days. We can invite users to test the release before it’s GA. But our options after that are actually pretty limited. We can’t roll back changes that don’t work. We can’t delay the release much in the case of hardware support regressions, because by the time we test the new minor release series, the previous series is either EOL or very close to EOL. And in part because there are no more updates for the old release series, we can’t continue to ship the old release series to users who experience hardware support regressions.
The kernel package doesn’t and can’t have the kind of QE processes that support a Fedora release. I don’t want anyone to think I’m pointing fingers at the Fedora kernel maintainer or our QE people. This is a structural problem that isn’t their fault.
But a stable kernel option would absolutely improve quality.
Sometimes we “get away with this” because users just hop to another distribution and we stop hearing about the problems.
I never did figure out why my trackpad stopped working with 6.15+ in F42.
I have never considered that as a reason. The older driver, sure, but the current one? No. As someone who has been building and maintaining KMPs for years, I don’t think the OpenRM driver is in the bucket of those where this risk is high.
We could embed it in the kernel package (a la Ubuntu ZFS style), but I’d rather instead actually focus on fixing the KMP infrastructure and figure out a path to restoring KMPs in Fedora. That would result in a better experience for everyone, and we can basically get rid of akmods and DKMS could probably reliably use that infrastructure again like it did long ago.
What I think probably makes sense is making an arrangement similar to the one SUSE has with NVIDIA where the KMP is built in Fedora infrastructure and synced to publish externally where the KMP and userspace are in the same repository. As an example, this is how the driver stack is made available for openSUSE. NVIDIA hosts the repository built by the openSUSE Build Service. And a variant of this strategy is used by AlmaLinux as well, though the repositories are hosted by AlmaLinux themselves. We could work with NVIDIA directly, or with a partner like Fyra Labs. I’d probably prefer working with a partner so that we can more effectively coordinate update schedules.
These aren’t contradictory in context. Red Hat used to provide that as part of the value proposition for Red Hat Enterprise Linux. It wound up getting harder and harder to do and justify along the desire to fix things, offer new features, and enable hardware. So with RHEL 9, they stopped offering API guarantees across RHEL minor versions.
This isn’t true for nearly all packages in Fedora. In practice, rolling back a change (executing contingency plans and whatnot) winds up being highly infeasible after a certain point. That’s why we require checkpoints to identify failure states. It’s one reason why the bar to accept changes has gotten higher in recent years.
This is clearly not true. As I stated earlier, the other two commercial Linux vendors don’t have this problem. Filing bugs on the openSUSE kernel package reaches the SUSE kernel team, and they promptly respond. Filing bugs on the Ubuntu linux package has a similar effect.
I’m not saying that this problem is the Fedora kernel maintainer’s fault. The fact that the other Fedora kernel maintainer slots were not backfilled and the RHEL kernel folks never stepped up to contribute to Fedora as part of the CKI/ARK effort is the problem. And that occurs because the Red Hat kernel team does not prioritize Fedora problems.
We absolutely can have the kind of quality engineering resources required for it, because companies that are significantly smaller than Red Hat are able to do it for their community projects. Just because Fedora is a “project” doesn’t mean it doesn’t have “product impact.” It’s a frustrating dynamic when "Fedora is a project” is used as a justification when it doesn’t have to be this way. And it wasn’t always like this either! The fedora-kernel-<subsystem>@ BZ/email aliases are an artifact of the era when it normal for there to be much more engagement than there is now.
Having the Fedora kernel package have subcomponents where people can classify bugs to report and those go to some relevant Red Hat kernel person who has to look at it as part of their day to day work like they do bug reports for RHEL would significantly change the dynamics of things.
(As it is right now, I get alerted of Btrfs bugs either by the Fedora kernel maintainer or when I run queries on the kernel package bug reports to find them. Or if someone bubbles it up to the Btrfs SIG. This is less than ideal, but it’s at least better than all the other subsystems that get nothing right now.)
I’m not sure I understand your reply, so I’ll rephrase the point I was trying to make:
Rollback, deferral, and continued delivery of the older stable release series are quality processes that are not an generally an option for the mainline kernel, because there is not enough overlap between the maintenance window of release series.
Those quality processes are an option for the stable release series.
They are always on the table. They aren’t desirable outcomes, which is different.
Fedora kernels already track the upstream “stable” release series. You’re talking about the “longterm” series, which is a different track.
What I’m saying is that it doesn’t actually matter what kernels we have, how many of them there are, and what processes we have. Because we have no people to do anything. Red Hat does not commit resources to fix problems discovered by Fedora, which means it doesn’t matter if the problem is in 7.0-rc6, 6.19-stable, or 6.18-longterm. They are still not getting fixed.
And combined with the architectural complexity of actually making the packaging, install, and bootloader infrastructure sane and stable with multiple kernel variants, it does not make sense to go down that rabbit-hole.
I’m saying this as someone who is maintaining kernel trees and alternative kernel flavors for Fedora Asahi Remix and CentOS Stream Hyperscale: it’s a bad place to be, and I would rather not be here if it wasn’t absolutely required.
I would rather focus on the core problem with the Fedora kernel, which is that the sole kernel maintainer is massively overworked and isn’t able to engage on Fedora kernel bugs and we need more people that engage with the different subsystems in the kernel to support the Fedora kernel. Then things like “Vulkan crashing applications on NVIDIA hardware” and your touchpad issue can actually have hope of being fixed. The upstream Linux kernel project has no formal singular bug tracker, and there is no reliable way other than sending emails to the relevant subsystem mailing lists. But the upstream Linux kernel project relies on Linux distributions to have their own maintainer teams to funnel those bug reports and engage productively upstream. Fedora needs to be doing more of that, like we do for Btrfs.
You mention making a remix for part of this. Why not just do it entirely
as a remix? There are other remixes out there that maintain their own
kernel and ship things fedora cannot ( ashai, the mobility sig images,
etc), seems like this would be a good fit for this use case?
They are still Fedora sigs and part of the fedora community and work in
the project as much as they can…
As far as the LTS kernel, the reason the ‘one kernel only’ rule came
about was not only the maint burden to maintain such a build, but also
the increasing burden it would add to the existing kernel maintainers.
For example, someone hits a bug or issue with the LTS kernel and files a
bug against the mainline fedora kernel, etc.
It seems like theres two different threads here around the kernel: one
about needing a LTS kernel because support for out of tree things will
be more workable and secondly about wanting Fedora kernels to not move
so quickly for user preference or docmentation needs. Might focus on
just the first for this?
I assume this proposal is only talking about using that LTS kernel for
these deliverables, but I am sure some users would use them for other
things and also ask about images with them.
anyhow, thanks for opening the discussion around this…
I’m not sure offhand, to be honest. It might be something that you need to announce and then gauge interest - there is a history of that, like release-monitoring.org does see usage to track non Fedora packages. Maybe ask in the distributions mailing list?
Arch does offer the LTS kernel, and so does Ubuntu.
We use … Fedora. So believe me I do share your pain with regressions in the latest kernels, we always have affected coworkers.
Could you explain why based on that, Granite wouldn’t plagiarize GPL, AGPL, … code without attribution, just like Co-Pilot apparently does?
I’m not aware of a single code model with CC0 data sources. Non-questionable models seem to be a hypothetical myth at this point. (I’m not a lawyer, though, this isn’t legal avice.)
Hence I wonder how any LLM code would be safe to encourage, in practice.
After you replied, I updated the post with links to the ostree config, the atomic image I built with it, and the kernel COPR. I’d definitely welcome any testing or contributions you’d like to make!
That’s where I’ve started, but beyond that… A remix has a connected to Fedora, but I would like Fedora to be connected to the initiative, because I believe that everything that Fedora builds up, builds up Fedora.
I would like Fedora to provide the platforms and frameworks that machine learning (/AI) developers need, because developers will recommend the systems that they use.
I would like Fedora to build channels that direct developers to a space where they can collaborate, and channels that direct changes back upstream into Fedora, whenever possible. Upstreaming successful changes into Fedora should be an explicit goal, as it is an explicit goal for Fedora to upstream changes we make to components.
I would like Fedora to take part in community building, helping developers find interested users, and helping users find interesting projects, because I believe that the communities we promote will promote the project in return.
I really like this proposal, because it provides a number of things that result in goodness for Fedora:
A better experience for AI developers who need a working system and don’t want to fight with driver setup & config. Reduced friction here means enabling more folks to build with Fedora.
In making this possible, a number of systems across Fedora will need to be improved. This provides fantastic motivation and reason to put in the effort for what has been up to this point deferred maintenance. How we handle kernel modules, and automating processes that are now manual that shouldn’t be are going to be wins across the entire project.
The ability to learn from what ublue / Bluefin has built and being able to collaborate with them is going to be a big boost in terms of our community health. There is a lot of motivated, next-generation open source hackers in those projects and I would love to see their energy and perspective reflected in Fedora as well.
You may want to check out Bob.. The Magical Jinn .. He can already open terminals, connect to machines, spit out code into vim and all sorts of OS level commands ..
#1 simply download, #2 burn #3 plug into something with an nvidia card in it. (he’s ephemeral)
there is no step 4 .. bob will magically appear to serve.
ps: I can convert to a Fedora Desktop if you like em
openSUSE provides the “longterm” kernel for Tumbleweed users, via the kernel-longterm package
[sfalken@mustang ~]$ zypper info kernel-longterm
Loading repository data...
Reading installed packages...
Information for package kernel-longterm:
----------------------------------------
Repository : openSUSE-Tumbleweed-Oss
Name : kernel-longterm
Version : 6.18.20-1.2
(I don’t actually use it, but I know it exists)
If you want to have a look at what it’s built out of