I think that’s a good point, maybe the policy should be split in smaller chunks and be discussed separately. For example, I think the one about scraping would be quite uncontroversial, and could be in a separate document (even if also likely unconsequential, because Fedora Infra is already dealing with scrapers and acting on it).
And to add on the debate on what really count as open, this 2024 article also point that the EU AI Act obligations are lessened for "opensource”.
I have not read it yet, but this point indeed to the importance of the concept for Fedora as a project that might deploy AI systems, but also to downstream users and integrators that might rely on Fedora diligence to not have to deal with burdensome legal obligations, in the same way that people trust us and others distributions to not have proprietary software that they can’t use because the license prevent them. We rightfully refuse licenses pushed by vendors like the ELv2 , even when some people still call that “open source” (most recent example I found, this project ).
I fully agree, Fedora should take a clearer position, as failure to do will just push the problems to everybody else building on Fedora with risk of work duplication of work, which will create lack of consensus and likely hinder innovation. Just imagine how the world would have been if Fedora decided to just ship everything that could be put in a RPM and be done with it.
Firstly, thank you everyone for the engagement on this proposal already. I think there are many valid points made and an iteration (or two) is needed on this proposal. It may be in our best interest to wait until next week before considering to redraft the proposal to incorporate the feedback received so far, as it was only posted yesterday and there may be others who have not had the opportunity to read it or offer their thoughts.
Secondly, this proposal originated from mainly this post, and from numerous conversations in council meetings, which I wont link them all as there was many, but feel free to check back on the meetbot logs. The most recent discussion we had on this was in our last meeting on September 10, but there have been many times we have had this conversation in our meetings. There may also have been informal conversations at Flock this year too, however it was not a conference talk. The vast majority of the discussion on this policy happened in the discussion post linked over many months.
Thirdly, thank you for recognising that this is intended to be a policy that covers ‘everyone’ and tried to meet the middle ground of those who wish to use and experiment with AI in Fedora, and those who do not. It was intended to be as general as possible, but its clear from the discussion that this wont hold up, so that in itself is very valuable to know and a stronger position can be taken on the next iteration. Please understand that whatever policy is put forward and ratified will not please everyone. That might actually be impossible in Fedora
But the council have recognised that there is, and has been a need for some sort of official guidance for the use of AI in Fedora and we intend to action one.
Fourthly, this is the work of the collective council. Council members received this draft last week via email from me with an ask to review the policy and give any feedback they had. Jason had kindly taken the initiative to compile the discussion from the previously linked discussion post to form this proposal. I and Im sure others on the council certainly read it and offered feedback and asked questions before ‘signing off’ that this was enough to now at least propose. The plan that was put forward to the council by me, was to take that time to review the policy and propose it under the Policy Change Policy. Should there be any feedback received that requires significant changes to the proposal (which there has been), we would withdraw and redraft and re-propose. So this proposal is absolutely the shared responsibility of all council members. And I would personally like to thank Jason for pulling it together on our behalf.
And finally fifthly, Id like to reiterate thanks again for the feedback so far on this proposal. Its clear a redraft is needed, but we will continue with the process for the next few days to ensure we capture maximum feedback before withdrawing the current proposal and re-proposing another soon.
Ah, this is a great explanation. If that is the sole reason for this clause it would be helpful to include that insight in the policy itself. This reinforces the importance of leaning on written legal guidance and referencing that guidance in the policy. In the absence of such connections the reader is left to wonder “why can’t I do this?”, limiting people’s ability to propose amendments to the policy in the future. Thanks,
-Brendan
As @gordonmessmer suggested, this is actually multiple policies; I suggest they be handled as such.
The first policy regards using AI for contributions. The second is the use of AI within the project. The third is policies for AI tools in packages. The last one, perhaps, should also include a policy about telemetry, etc, which has come up in other conversations over the years. Those seem tightly related.
Some parts of the proposal aren’t policy, they’re preferences. Specifically, “We SHOULD explore AI for accessibility” and “Package AI tools and frameworks” don’t belong in a policy, they belong in a strategic plan. The “honor our licenses” section seems redundant to the licenses themselves. “Aggressive scraping is prohibited” seems more like part of an acceptable use policy than an AI policy.
I mostly like the intent here, but I’m opposed to this proposal as written because it’s not focused and concise. I’d like to see this split into multiple policy proposals that could be discussed and advanced independently.
I think it’s fine for a policy to address multiple audiences… Though it might be helpful to organize the policy in a way that more explicitly reflects the audience of various statements. And it might even be useful to discuss the sections in separate threads and combine the policies later, but that seems like a stretch.
I can’t differentiate between the first and the second. Can you rephrase one of them? Does your second audience mean “AI backend services hosted by the project?”
Telemetry does seem related in that it implies a service operated by Fedora which receives data from user systems. But I think telemetry is a very different class of data, because it is generally not data that the user has created. Telemetry can have privacy implications, but it’s not usually something like the content of a user’s email archive, or the content of their calendar. They’re related in that we should care about user privacy, receive no more than is necessary, and store as little as possible. But I think that’s getting a little meta.. it’s more like guidelines for us that inform how we write policy about related concepts like telemetry, and users’ input and content. This policy probably doesn’t need to address telemetry. Even if an AI system is processing telemetry somewhere, separate policies should be coherent.
That’s probably true, unless there’s uncertainty about what can be packaged and redistributed, and especially about what can be installed by default on various configurations.
Maybe. There are differing opinions about how copyright licenses apply when copyrighted works aren’t being redistributed, per se.
It’s probably useful to express our policy on using data for training in both written policies on AI and written policies on acceptable use, as long as it’s consistent.
If Wesnoth were to come preinstalled and configured to run at startup with AI features enabled, that’d be an issue. The intention of this is the Council saying, no, you won’t be surprised to find hidden AI in your desktop. If it’s there, it’ll be because you turned it on, installed it, etc. yourself.
Is the issue that AI-assisted spam filtering could be viewed as a review of a contribution? We could clarify that moving forward, as in the past, automated spam filtering is allowed.
The first is “using AI to develop code/docs/etc” for Fedora. The second is things like “using AI to flag Discussion posts as spam or abusive”.
I’m thinking of telemetry here as “data that gets automatically shipped elsewhere”. We should have a holistic policy about that which would cover both “an AI app that analyzes your email as a Thunderbird plugin” and “Audacity is sending usage data”. In this case, AI is an implementation detail.
You misunderstand. Aoife’s comment below characterizes the sources of this accurately.
Thanks everyone for your thoughtful and detailed feedback so far on the draft AI policy. It’s clear that the initial draft was trying to do too much at once. It was trying to be both a point-of-view on AI and an AI policy, and the result was a lot of ambiguity.
Based on your feedback, I’ve split out the POV bits, which I suggest we take up separately, and pared the policy part into a slim, strong, and (hopefully) unambiguous set of rules.
In addition to removing the non-policy POV content, here are the key changes I’m suggesting for the policy:
-
A Universal Accountability Principle: I’ve removed the specific, biased-sounding rules against “AI slop.” The policy now begins with a single, foundational principle that all contributors are accountable for the quality and license compliance of their submissions, regardless of what tools they use.
-
Stronger Transparency (
SHOULD): I replaced the weak “encourage” language has with a firmSHOULD, adopting the RFC 2119 standard to set a clear community expectation for disclosure, while acknowledging that we’re still figuring out the details for how this should work in practice. I didn’t add a definition of AI, but I added more specificity in the “Contributor Accountability” section by calling out “large language models (LLMs) or other generative AI tools.” -
Clearer Prohibitions (
MUST NOT): I clarified the rule against using AI to evaluate people to ensure it doesn’t accidentally prohibit necessary tools like spam filters. -
Legally Sound Licensing: Following Richard Fontana’s feedback, I removed the legally murky clause about “honoring our licenses” in favor of the universal accountability principle in the preamble.
AI-Assistance Transparency: In the spirit of the transparency this policy recommends, I want to note that I used AI tools extensively in this revision process. I used notebooklm to analyze and synthesize the large volume of community feedback, and I Gemini to help draft the refined language. With that said, I read and edited and tweaked every part of this contribution, and I stand by it, and am eager to particpate in more discussion of it!
Policy on AI-Assisted Contributions
Contributor Accountability
Contributing to Fedora means vouching for the quality, license compliance, and utility of your submission. All contributions, whether wholly human-authored or assisted by large language models (LLMs) or other generative AI tools, must meet the project’s standards for inclusion. The contributor is always the author and is fully accountable for their contributions.
Policy Rules
-
Transparency: Contributors SHOULD disclose the use of AI assistance. This is a strong recommendation in line with RFC 2119; non-disclosure should be exceptional and justifiable. Disclosure should be made where authorship is normally indicated. For contributions tracked in git, the recommended method is the
Assisted-by:commit message trailer; for other contributions, this may include document preambles, design file metadata, or translation notes. -
Unacceptable Uses: AI MUST NOT be used to make the final determination on the acceptance of a contribution or to evaluate a person’s standing within the community (e.g., for funding, leadership roles, or Code of Conduct matters). This does not prohibit the use of automated tooling for technical pre-screening tasks, such as spam filtering or checking for common packaging errors.
-
User-Facing Features: Any feature that sends user data to a remote service for processing by a machine learning model, or that modifies user content with machine-generated output, MUST be opt-in by default and require explicit user consent before activation.
-
Infrastructure: Aggressive or disruptive scraping of project infrastructure is prohibited. For efficient data access, please contact the Fedora Infrastructure team.
Now let’s consider an ambiguity: would this prevent us from using AI to analyzing ABRT reports? Let’s consider only ABRT reports collected by default, not the ones submitted to Bugzilla, because the Bugzilla reports created by ABRT often contain sensitive personal data and plainly that shouldn’t ever be submitted by default. In contrast, the ABRT reports collected by default should never contain personal data as it should be only data from Fedora packages like function names, file names, line numbers, etc. We probably do want to allow processing those with AI, but do they constitute “user data”? Debatable. Sorry for bringing up an edge case, but it seems interesting to discuss.
I’m also not sure whether restrictions on modifying user content are actually necessary? That sounds like it would prohibit, say, a non-remote grammar check feature in a text editor. If the feature is local-only and works well, then it shouldn’t be controversial.
(I also don’t like bad use of AI. What is bad? “I know it when I see it.” But I think that’s already well-covered by the contributor accountability rule.)
Could this even be a MUST instead of SHOULD?
Because I wonder if there actually are any situations in which not disclosing the use of “AI assistance” would be reasonable and justifiable.
Is ABRT enabled by default?
It could be, I thought that Brendan’s comments about how SHOULD isn’t necessarily a weak requirement, in this RFC 2119 sense, was convincing, but we could easily shift to MUST. Since we’re early in all this, it seems reasonable to have some wiggle room, but I’m not married to it.
Yes, ABRT has been enabled by default for as long as I remember. (It doesn’t actually use AI, though.)
Why? In the original draft the suggested reason for having a policy about this at all was for analytics. If some people do and some people don’t, will analytics be invalidated? It seems like SHOULD is going to provide a useful degree of rigor. I think this because many people in open source spaces are already doing it.
I personally holds grudge to the AI craze that’s going on in this world, but I will try not to let my personal bias affect my opinion.
I think this point should be made clearer in the policy. If someone (code contributors, reviewers, etc.) blindly added AI generated contents to their contributions (or trusted AI output, or let AI made decisions, you know what I mean) and messed things up, the only one that’ll take the blame would be themselves and no others. If we’re to treat AI as a tool, then make sure it was treated as a tool.
P.S. Most of my concern here are with those shiny new LLMs, genAIs, "agentic"s but it should also apply to machine learning, markov chain and other more fundamental technologies.
“Encourage” won’t do. It should be required that contributor note the presence of AI in metadata so others can be aware of it. There should also be punishment if someone fails to disclosure the use of AI in their contributions deliberately.
This could turn complex quite fast, because what if it was a tool (let’s say a editor like Zed) that I installed manually 2 years ago using a copr repo, and that suddenly grow AI features long after the initial installation during a routine upgrade (and 3 or 4 Fedora version upgrade). As a user, I would suddenly find “AI thingies” on my desktop, which seems to violate the spirit of the policy, while the letter of the policy wouldn’t be since this wasn’t installed by default on the current Fedora.
To be clear, I am personally fine with having AI coming upon upgrade, but my personal opinion is not what the proposed policy say. If the council goal is to not have any AI turned on by default in the default installation of a edition, I think this should be more clearly articulated, as it will bring more clarity.
But we could also argue that since most AI integrations would requires a API key at the moment, nothing can be turned on without explicit consent, and so that part of the policy would be limited to not running any LLMs locally without consent (as I guess no one is going to give out API keys to be used by random people on the internet, given the cost of running inference).
This is exactly the kind of thing we need to avoid. There are only 31 replies that take about an hour to read. It is unclear if you have read them, or if notebooklm read them for you.
In essence, notebooklm has evaluated the weight of the contributions. To me this stands againt the principle that human understanding is the better tool for judgement.
We must not have LLMs write policy on using and accepting LLMs.
In the policy WIP thread AI policy in Fedora - WIP we have the statement
“Embrace your human side Contributing to Fedora is hard, especially when you have to overcome language and cultural barriers. And it might be tempting to use AI tools to polish the language and “look more professional”. In the Fedora Project we would generally prefer people to use their own voice, however imperfect, than to communicate via a bland averaged LLM tool.”
and
“Do NOT use LLM tools to expand your talking points. If you want to write a one line question, do write a one-line question, not a generated one-pager from it. You will be perceived more professional and trustworthy in FOSS projects if you do not water down your main message.”
It feels to me like you are pushing a boundry here, testing the waters to see how the community responds to your statement about using AI extensively.