If you're on the kernel mailing list (LKML), aren't you concerned about AI? Would you be interested in joining the discussion?

Dear person reading who is perhaps on the kernel mailing list (LKML),

I’m surprised to see nobody else spoke up here yet: Making sure you're not a bot! It seems like previously there was no systematic and sourced list of wider concerns with AI and nobody requesting the kernel developers address them. I tried to fill this gap, but it seems like people gotta jump in to bring attention to it, to get any response.

I have seen hints that part of the sentiment might be that the kernel team is using AI just to rewrite own code and backport patches, avoiding the plagiarism part. But:

  1. The data seems to suggest that this may not be true: Linus Torvalds declares massive AI-fueled code surges as the new normal for Linux - Neowin There is so much more new code, it seems unlikely this is just backporting and rearranging.

  2. The data on plagiarism doesn’t seem to suggest to me that rewrite prompts are necessarily safe from training data injections.

  3. The kernel’s LLM policy doesn’t actually seem to say that backporting patches and rearranging existing code is the only allowed use.

    (Edit: I’m not saying that is my favorite solution, I’d rather have AI code out of the kernel and have it limited to review without code proposals, but it’d at least have backed up that sentiment.)

I would be happy to see more people join the discussion on the LKML.

Regards,

Ellie

PS: Sorry if I’m posting this in the wrong section.

I think this should be in the water cooler. Its not a Fedora project topic in my view.

It could be, but given we also have a much-discussed Initiative proposing that we ship an LTS kernel, both have something in common:

Fedora struggles with validating kernels and stopping regressions from happening. What this post highlighted is that the rate of change in the kernel in ramping up, thanks to AI, and related to that, the rate of CVEs being filed is also going up.

We can’t carry on as we are right now - ideally we get more kernel maintainers, better testing, and maybe for those who don’t want to be on the bleeding edge we can then also support an LTS kernel. Wearing my work hat, I’d rather not deal with kernel regressions during the lifetime of a Fedora release, and I know many users who don’t (and the team that internally supports Fedora certainly doesn’t)

2 Likes

I don’t disagree. But I would also assume that the “extreme” we currently experience is a temporary phenomenon as far as it concerns CVEs etc.

We have now more knowledge than before, and as such, it can show us what we have not seen before, throughout everything that we have created before. It’s now all sub systems of the kernel with all their code that need to be processed by this new knowledge to bring them to the current state of knowledge.

In the military, this is a more rationalized phenomenon: at some point, someone found a way to better penetrate their tanks’ armor by some new technology. When it happened the first time, they learned their tanks need adjusted armor. The clue: they did not need to add adjusted armor to the tank damaged, but to all tanks in their army.

For us, armor is testing, and the many tanks to upgrade are our sub systems etc. It surely will take some time until the kernel is “covered” throughout. But my expectation (or, hope?) is that things become smoother once all parts of the kernel have been widely AI-reviewed, and/or tested otherwise for new means of disruption enabled by new AI etc.

Adding AI means optimally to our testing is a very good idea though, and maybe a dependency to get rid of these phenomenons over time. The new knowledge must be applied before release, not after. Imho, that part might be the most important discussion. That said, not sure if we need a dedicated maintainer for this, or another one for the kernel, or both, or neither?

Regarding the main topic here, I don’t want to underestimate the implications of AI in mailing lists → I experienced it here in Discourse many times, and the AIs get more and more people into conversations before they are discovered, etc etc. How big that can prove as an issue (plus code they can submit) for LKML, I think I am too passive in LKML to make a guess…

1 Like

My guess is that no one responded to your email on LKML because there is no desire to re-litigate/re-debate this policy after it was already debated for months and they agreed on the current policy as a middle ground. Concerns like this probably come up every once in a while, but while I was not part of the months-long debates around what the LLM policy should be, I imagine these points were considered.

(Personally, though, I think you raise some fair points.)

1 Like

From the kernel folks I do talk to, most of them don’t consider the longterm series to be better than the stable kernel series (which is what Fedora currently ships). Several of them actually say that longterm kernels are of worse quality, and would prefer people not use it because there are far fewer people actually paying attention to them, and lately the way changes are getting shipped into longterm kernels creates even more regressions than what goes into stable kernels.

I think we’d be signing ourselves up for much more pain with the longterm kernels. Even the stable kernels that “become” longterm ones have been worse in my experience, so I usually like it when we upgrade out of those versions.

4 Likes

Timeliness is always a factor when raising concerns. Being late with raising them, doesn’t invalidate them, but it does impact the level of consideration that should be expected.

Given that a policy decision was already made, after significant discussion, the discussion should now focus on concern mitigation strategies.

Just as an example, anyone who is legitimately concerned about the issue of plagiarism should be asking questions about what scanning tools are being used already to scan the kernel looking for human plagiarism as a way to baseline how often code is structurally similar enough to other existing code to rise to a legitimate concern. While its reasonable to conclude that the AI assisted workflows raise the likelihood of it happening, that new found awareness that it might be happening more frequently with AI assisted workflows doesnt change that fact that it could be happening already in unassisted workflows. So understanding what scanning tools are employed or could be employed to mitigate that risk regardless of whether the commits are AI assisted or not should now be the focus of that concern. How does the kernel development identify and derisk human plagerism? Answer that question and you probably have identified the mitigation strategy for AI assisted development risk.

The technical quality concerns are similarly likely mitigated by the use of scanning tools in the CI/CD workflows that should already be in place looking at unassisted commits made by humans. I’m pretty sure there are static analyzers employed already. If AI slop is as sloppy as people expect, I would expect the analyzers should catch the AI slop as easily as it would catch my human generated slop.

I personally think it’s a backward approach. Even if it worked, and with plagiarism testing services apparently being notoriously unreliable I assume it won’t, this is putting out the fire after you watched the arsonist without intervention.

Establishing coying things without atribution as normal seems toxic for FOSS and that seems like the actual issue, not the cleaning up of the fallout afterward.

I should add that the timeliness points are totally fair. However, I emailed the Linux Foundation a long time ago given I’m not a kernel developer. But it doesn’t seem like they’re reading any of their e-mail.

I think the piece you’re probably missing is nobody reads linux-kernel@ anymore. In order to reach kernel developers, you have to email the respective subsystem mailing lists.

2 Likes