Council Policy Proposal: Policy on AI-Assisted Contributions

Artificial Intelligence is a transformative technology, and as a leader in open source, the Fedora Project needs a thoughtful position to guide innovation and protect our community’s values.

For the past year, we have been on a journey with the community to define what that position should be. This process began in the summer of 2024, when we asked for the community’s thoughts in an AI survey. The results, which we discussed openly at Flock and in Council meetings, gave us a clear message: we see the potential for AI to help us build a better platform, but we also have valid concerns about privacy, ethics, and quality.

The draft we are proposing below is our best effort to synthesize the Fedora community’s input into a set of clear, actionable guidelines. It is designed to empower our contributors to explore the positive uses of AI we identified, while creating clear guardrails to protect the project and its values from the risks we highlighted.

Next Steps

In accordance with the official Policy Change policy, we are now opening a formal two-week period for community review and feedback. We encourage you to read the full draft and share your thoughts.

The policy proposal is also available to read on [the community blog.](https://communityblog.fedoraproject.org/council-policy-proposal-policy-on-ai-assisted-contributions/_

After the two-week feedback period, the Fedora Council will hold a formal vote on ratifying the policy via ticket voting. Thank you for your thoughtful engagement throughout this process. We look forward to hearing your feedback as we take this important next step together.

Fedora Project Policy on AI-Assisted Contributions

Our Philosophy: AI as a Tool to Advance Free Software

The Fedora Project is a community built on four foundations: Freedom, Friends, Features, and First. We envision a world where everyone benefits from free and open source software built by inclusive, open-minded communities. In the spirit of our “First” and “Features” foundations, we see Artificial Intelligence as a tool to empower our contributors to make a more positive impact on the world.

We recognize the ongoing and important global discussions about how AI models are trained. Our policy focuses on the responsible use of these tools. The responsibility for respecting the work of others and adhering to open source licenses always rests with the contributor. AI assistants, like any other tool, must be used in a way that upholds these principles.

This policy provides a framework to help our contributors innovate confidently while upholding the project’s standards for quality, security, and open collaboration. It is a living document, reflecting our commitment to learning and adapting as this technology evolves.

1. AI-Assisted Project Contributions

We encourage the use of AI assistants as an evolution of the contributor toolkit. However, human oversight remains critical. The contributor is always the author and is fully accountable for their contributions.

  • You are responsible for your contributions. AI-generated content must be treated as a suggestion, not as final code or text. It is your responsibility to review, test, and understand everything you submit. Submitting unverified or low-quality machine-generated content (sometimes called “AI slop”) creates an unfair review burden on the community and is not an acceptable contribution.
  • Be transparent about your use of AI. When a contribution has been significantly assisted by an AI tool, we encourage you to note this in your pull request description, commit message, or wherever authorship is normally indicated for the work. For instance, use a commit message trailer like Assisted-by: <name of code assistant>. This transparency helps the community develop best practices and understand the role of these new tools.
  • Fedora values Your Voice. Clear, concise, and authentic communication is our goal. Using AI tools to translate your thoughts or overcome language barriers is a welcome and encouraged practice, but keep in mind, we value your unique voice and perspective.
  • Limit AI Tools for Reviewing. As with creating code, documentation, and other contributions, reviewers may use AI tools to assist in providing feedback, but not to wholly automate the review process. Particularly, AI should not make the final determination on whether a contribution is accepted or not.

2. AI In Fedora Project Management: To avoid introducing uncontrollable bias, AI/ML tools must not be used to score or evaluate submissions for things like code of conduct matters, funding requests, conference talks, or leadership positions. This does not prohibit the use of automated tooling for tasks like spam filtering and note-taking.

3. AI Tools for Fedora Users

Our commitment is to our users’ privacy and security. AI-powered features can offer significant benefits, but they must be implemented in a way that respects user consent and control.

  • AI features MUST be opt-in. Any user-facing AI assistant, especially one that sends data to a remote service, must not be enabled by default and requires explicit, informed consent from the user.
  • We SHOULD explore AI for accessibility. We actively encourage exploring the use of AI/ML tools for accessibility improvements, such as for translation, transcription, and text-to-speech.

3. Fedora as a Platform for AI Development

One of our key goals is to make Fedora the destination for Linux platform innovation, including for AI.

  • Package AI tools and frameworks. We encourage the packaging of tools and frameworks needed for AI research and development in Fedora, provided they comply with all existing Fedora Packaging and Licensing guidelines.

4. Use of Fedora Project Data

The data generated by the Fedora Project is a valuable community asset. Its use in training AI models must respect our infrastructure and our open principles.

  • Aggressive scraping is prohibited. Scraping data in a way that causes a significant load on Fedora Infrastructure is not allowed. Please contact the Fedora Infrastructure team to arrange for efficient data access.
  • Honor our licenses. When using Fedora project data to train a model, we expect that any use of this data honors the principles of attribution and sharing inherent in those licenses.
12 Likes

This statement is not strong enough. When writing a policy like this, the phrase “should not” carries no weight. This statement ought to be “Particularly, AI must not make the final determination on whether a contribution is accepted or not.”

A “should not” here would be advisory rather than proscriptive and therefore would be ignored.

This language is even less committal. I question why it’s in here at all, except to say “We considered this, so don’t ask us about it, but we couldn’t decide what to do about it”. Either own it and say “The Fedora Project does not require that AI tool usage be identified in git commits and/or merge requests” (which is what this phrasing is trying desperately not to admit to meaning) or else take a stand and mandate the annotation and note that intentional violation will be met with penalties up to and including loss of packaging rights. Just don’t try to imply something other than what is intended in this policy.

23 Likes

IMO this looks pretty good, though there are some points that I would disagree with – and which are, I think, also a bit contradictory in the current proposal.

While this policy would require models that ingest Fedora Project data to honor the respective licenses (how?), the same cannot be required of almost all existing “AI” systems - they are basically universally considered copyright / license laundering machines that are unable to properly attribute the origins of their outputs. In my opinion, this creates a kind of double-standard:

You can use Fedora Project data to train your “AI” but you must ensure proper attribution and abide by our license terms – but when using existing “AI” systems that cannot do this, because they cannot do proper attribution and / or ingested data without abiding by the original input’s license terms, this is apparently “fine”?

So, given that using “AI” assistant technologies still can’t guarantee that you’re not inadvertently infringing copyright and / or violating licenses of other FOSS projects, I don’t think the encouragement here can be justified – unless it is qualified with something like “training data for the “AI” assistant / model was obtained in a legal and ethical way and does not violate licenses”:

… which probably excludes almost all widely, commercially available “assistants”.

But I think it would be fine to encourage use of “ethically trained AI” systems (if you can find one) :wink:

14 Likes

This is the first time in all the AI threads that I have heard this stated. The general gist of the community responses has been quite the opposite.

The middle-ground that I have heard Jef talk of is, if I may paraphrase him, “people will use AI anyway and if we disallow it they will lie”.

I think the community sentiment is “we discourage use of AI, but understand that some of you may need it for valid cases like translation”.

12 Likes

In general, I’m pretty happy with this proposal, and it largely aligns with similar statements I’ve made in the past. I have a few concerns with some specific phrasings within though:

Particularly, AI should not make the final determination on whether a contribution is accepted or not.

This needs to replace “should not” with “must not”. If you’re defining a policy, a “should not” will be taken as a suggestion, not a rule. In other words, if this isn’t “must not”, I doubt it’ll be three weeks before someone is using AI for decision-making.

we encourage you to note this in your pull request description, commit message, or wherever authorship is normally indicated for the work

Similarly, this is just weasel-words. You need to either commit to requiring the commit message trailer (or similar) and note that intentionally violating it will carry penalties up to and including loss of packager privileges, or else explicitly state that Fedora does not require annotation of AI-assisted code. A weak statement like this basically guarantees that people won’t bother.

19 Likes

So, that policy do not really define AI. Given the constant blurring of meaning between machine learning, AI, Generative AI, LLM, agent, etc, I think we should bring some clarity, or at least, point to a existing definition.

For example, the policy say “AI features MUST be opt-in”, so does that mean that we MUST add a opt-in to Wesnoth to play against the computer in the main campaign (because historically, AI was used to mean that too, and that’s the exact term used in the doc).

I also want to point to my previous point on AI/ML used in spam prevention ( AI policy in Fedora - WIP - #8 by misc ), especially since a mail is a contribution, but our current setup for mail use RH servers managed by RH IT, who use mimecast, who proudly display on the company web site “Email & Collaboration Secured by AI”.

So a AI is in the final step on contribution. hence I think we need to bring clarity on what we mean by “AI”.

11 Likes

The results, which we discussed openly at Flock and in Council meetings

Were the results ever posted online, or only in closed forums? I recall the survey being discussed, but I don’t think I’ve ever seen the results posted online. It would be good to see the data. Thanks!

2 Likes

This whole thread is weird to me. I don’t know how the conclusions for this policy proposal were made, because it doesn’t seem like it jives with the community sentiment from most contributors.

6 Likes

While the broad structure of the AI policy as currently stated, has some merit; as a desktop user I want to chime in and ensure that community concerns are being met:

1. Regarding the language around opt-in:”AI features MUST be opt-in. Any user-facing AI assistant, especially one that sends data to a remote service, must not be enabled by default and requires explicit, informed consent from the user.”

I believe this language is not strong enough or specific enough to protect users.

I would like to see language that incorporates this:

Any user-facing AI Assistant should be an optional component, that is an “Opt-in” for the user to install. The AI assistant will be first and foremost an On-Device or Local LLM. The user will have the “opt-in” choice to add web-connected components to enhance the experience/performance of the AI Assistant, however the AI assistant should always be functional as a standalone software that the user can choose to update manually at their discretion.

Additionally, I believe AI functionality should NOT be added to the terminal as a default option. This provides too much surface area for attacks, exploitation, and potential data exfiltration of the system.

Question: IBM/Red Hat currently has an open source AI project, which looks interesting :+1:, is this planned for Fedora?

I am adding a link for those who want to try/test an online version of it:

Suggestions for local model:

-I strongly recommend the addition of a flatpak version in the future, that allows for more granular user permissions. Perhaps it can even be added to flathub in the future.

-I also strongly recommend having more repositories hosting the code to make it widely available to the general public. These projects after all, really work best at scale with everybody having the opportunity to contribute and review the code.

So Why I am chiming in:

-The current regulatory structures surrounding AI are insufficient to protect user privacy and security. We are currently in a “move fast and break things” phase of development for AI in general, with companies racing to be the ‘King of the Hill’.

-I chose the fedora project because user choice is respected here. That is something I am very grateful for, thank you :heart:. After 30 years of using the Windows desktop environment, these past 10 years had become progressively more challenging:

Software being added without permission, firewall rules added/or changed without permission, added telemetry incorporated without permission, forcing of updates for ‘built-in software’ that is not needed/used and now the ‘forced’ (or find a hacky workaround) addition of AI for desktop users as laid-out in Microsoft’s 2030 plans.

-Watch for yourselves as Microsoft apparently wants to be a part of your business, and be in your business 24/7:

4 Likes

AI must not make the final determination on whether a contribution is accepted or not.”

Why not though? If we design a system and it is sufficiently successful why would we want Fedora to lag behind RHEL or even CentOS Stream for example where no such policies exist.

Note, that’s different than saying “let’s just accept whatever AI says”. But if someone builds a system that is proven to be 99.999% correct in its contributions, why bother making a policy against it? Is this a concern with ethics or quality? If its quality, that’s a technical issue that I don’t think we need a policy for. What about for testing mass rebuilds? Banned for simple docs bumps? A ban of this kind seems to step away from Fedora’s “first” policies without even having done the experiment to see what it would look like today.

At a bare minimum I’d like to see Fedora get in front of RHEL again with a more aggressive approach to AI. Not just in how we build Fedora but formally opening the doors to AI contributors and data scientist so that Fedora is their first stop shop. I’d hate to see a scenario where Fedora’s policies make it a less attractive innovation engine than even CentOS Stream :-/

2 Likes

For many people, the “innovation” you seem to be talking about here needs some gigantic air-quotes. Just because it’s en vogue with the business people right now doesn’t mean that you won’t “innovate” yourself into a corner (or into the ground) by adopting these systems. Red Hat / IBM can experiment with " AI " all they want, but as far as I can tell, it’s pretty clear that most Fedora contributors don’t want this to happen in Fedora itself.

7 Likes

I think that’s been part of the issue. It’s been a while but I was once a top contributor to Fedora but I know these roads well. What I don’t understand is why Fedora, and particularly its governing boards, have become so well known for what they don’t want. I don’t have a clue what they do want. Is Fedora truly complete at this point?

The proposal as written seems to be a compromise / middle ground between Fedora’s actual mission, and nay-sayers. I just think it sends a weak message at a time where Fedora could be leading and shaping the future and saying “AI, our doors are open, let’s invent the future again”

For those that haven’t seen it, I think Fedora’s values align very well with this (and yes, I’ve signed it) - https://amplified.dev/. If the authors agree, then it might be worth it re-write this whole proposal with Fedora’s values at the center of it.

FWIW I think. Topic 1 is too limiting to apply globally. Topic 2 is fine. There are two topic 3’s for some reason and I think they should both be more open and encouraging, especially allowing for things to happen outside of Fedora Linux Proper (think of a project like Atomic or Silverblue but for AI). Section 4 doesn’t seem to really say anything beyond “don’t DDOS us” which really isn’t an AI policy :slight_smile:

4 Likes

I think of the policy as a collection of statements that are not all directed at the same audience. In this case, “Honor our licenses” appears to be directed at the individuals and organizations collecting data and training models, rather than the people contributing to Fedora.

We don’t have technical means to control the scraping of Fedora’s data, and it’s currently unclear whether machine-generated output will be protected by the copyright system. Because those two things are true, I think that at a minimum, we should do whatever we can to state our position on training and data use, as early as possible, and in whatever forms are available to us.

But I think it would be fine to encourage use of “ethically trained AI” systems (if you can find one) :wink:

Do you think a locally running AI system using the Granite foundation models would fit that description?

1 Like

I agree, and I think it would be really difficult to define “AI” or “ML” in a way that is useful to the policy.

If the definition is too broad, then it may prohibit the sorts of systems that you described, which have been included and allowed for a long time. And if the definition is too specific, then I think we’ll also miss systems that we want to include in the policy. For example, we could state that systems that output machine-generated content must be opt-in, but that definition might not include something like an organization and planning assistant that doesn’t really output much of anything, but does read your email and calendar data to prompt you to do work.

It might be more useful to focus on what a system does rather than what a system is. For example: we don’t want systems sending data from a user’s system to a remote service unless the user opts in. We don’t want machine-generated content retrieved from a remote service unless the user opts in. We don’t want users data replaced/modified by machine-generated content without opt in.

1 Like

A few comments on the policy:

  • It’s a little disappointing to me that Fedora is not taking a position on the fact that the tools in question (the models, I mean) are at best only semi-’open’ and in the more typical case are proprietary. That doesn’t mean that they aren’t useful or shouldn’t be used but it would be nice to see Fedora say something aspirational about how a future state with AI tools that are more aligned with free software would be desirable, even if Fedora can’t meaningfully contribute to that effort.
  • The “AI Tools for Fedora Users” section was confusing to me - what is the scope of this section? Is it about third-party SaaS tools that Fedora is thinking of making use of to aid the development of Fedora? Is it about packaging of third-party open source AI tools for inclusion in Fedora Linux releases?
  • Regarding the “Honor our licenses” bullet point: This is problematic because the licenses allowed by Fedora do not (or at least are not understood to) require attribution and certainly not sharing in the mere context of training models. Or am I misunderstanding what this is saying? Any licenses that did require “sharing” as a condition of training would probably not be allowed in Fedora because they wouldn’t be open source. Furthermore, in at least some jurisdictions, it is hopefully likely that doctrines limiting copyright such as fair use would enable training without being restricted by the copyright owner by license conditions or otherwise. Limits on copyright are generally a good thing, and copyright maximalism is not a good thing and is contrary to general traditions in open source. I would note that we justify a lot of stuff in Fedora packages based not on licensing but on fair use and other doctrines that place limits on copyright.
    • Edit: There have been atypical cases/experimental initiatives where the model training workflow has a distribution component: BigScience/BLOOM apparently involved training datasets shared across institutions. In that kind of setting, open source license conditions would perhaps be triggered. But it isn’t the training itself that is the trigger, it’s the sharing.
4 Likes

Hi Jason,

Thank you for putting something together for feedback. Here are some detailed thoughts. While I’m replying to your message, I understand this proposal to be the work of the full council, so the comments are directed at the full council.

Yes, this is good. We want to evolve while staying true to the ideals that make Fedora’s distribution and community enduringly robust.

(…snip…)

Yes.
(…snip…)

Yes… but what follows seems problematic.

Isn’t this true even when code is human-generated? Why are we encoding the idea that machine-assisted code is defacto low quality and human generated code is defacto high quality? What happens when the reverse is true? I suggest removing this clause. Keep it simple. The fewer words the better in a policy document. Make it easy to understand that contributing is vouching for its inclusion. That’s on many grounds: quality, license compliance, utility, and so forth. This isn’t a generative AI topic. If you make it one, it will age badly.

Would you consider formulating the language in this proposal to comply with rfc 2119? This feels like a “should”. If that is what is meant, be explicit.

Is the idea to sprinkle in known places where LLMs are useful? Structure-wise I would keep this in a separate section, or structure each of the 4 major areas of the policy to lead with examples of utility that are the sort that we want to encourage, then trail with areas where we need to be thoughtful.

Again, this is defacto encoding that LLMs=bad and Humans=good. Even if that is true >50% of the time today, what if it’s true <50% of the time tomorrow? Why encode that in a policy right now? Let’s take the bias out of these sections and ground ourselves in what gives the best results, irrespective of origin. We don’t give humans a free pass today, we expect transparency. Let’s simply keep that expectation.

I understand the intent on this one, but I don’t understand why we need a policy on it right now. Shouldn’t we evaluate proposals at the time they’re made by the transparency and effects of the results they provide? We really don’t need to encode AI=suspicious when our change process puts everything up to evaluation already. Seriously, Fedora’s change process, even with its imperfections, is an excellent benchmark. Why does AI matter here? What are we trying to prevent? Did somebody propose it? Wouldn’t it already be subject to the same scrutiny without this policy element?

Yes!

No. The “especially” clause is the only part that makes sense to me. Self-contained models are self contained. If some package contains a model that it uses internally, that would be a violation of this section, even though it doesn’t send data to a remote service. Informed consent with mandatory opt-in for anything that talks to a remote service is fine and good. Why go further?

(…snip…)

If we want to include content that doesn’t meet existing guidelines, what is the path forward for such material? There are multiple classes of content that people doing cutting edge AI R&D use that cannot be included in Fedora verses current guidelines. I would like to know how the Council proposes to make Fedora key in this space when our policies prevent its inclusion. I cannot overstate how pivotal this is to success or failure of the goal.

Thanks for this. I appreciate the importance of limiting the negative performance consequences of scraping, thank you for including it. I hope this clause can link to a fleshed out process on how we make those arrangements.

At the same time it seems like a missed opportunity to set an expectation that Fedora Infrastructure may run its own models to harness the value of its community asset for the benefit of community. Things like summarizing meetings, training on our community documentation, and so forth. Can we make it explicit that this kind of use has the kinds of positive benefits the council would support community members to pursue?

This seems really murky and I cannot understand why we would say this rather than adhere to Red Hat legal’s guidance. It would be better to simply get in writing what the legal guidance is at this time, and refer to it. This is a rapidly evolving field, the guidance is going to change as case-law grows. Let’s keep our policies aligned to the reality that unfolds.


Wrapping up this feedback: what’s been proposed seems like an attempt at a healthy compromise, but there are a few structural elements that need to be seen to to make it a more robust policy:

  1. Consistent formatting and voice. Embrace RFC 2119.
  2. Consistent structure per section: Lead with go energy, be explicit about the positive things the Council sees as opportunities.
  3. Remove Human=Good results, LLM=Bad results framing. Instead, lead with the expectation that we will prefer methods leading to good results, like transparency and rigor. This will remove almost all of the “you can’t use AI for this” stuff and help evolve over time.
  4. In the event you still need a “shall not” type section, please be certain that it is explicit to the AI use case. I’m guessing that at least 90% of them are already covered by existing policies that simply need to be extended to say “… and using an LLM for this doesn’t make it OK either.”

I think if you follow these guidelines you’ll be able to remove about 60% of the policy’s content, and have a stronger document that more people can agree on.

Thanks,

-Brendan

5 Likes

I think it comes down to trust. Or more importantly a framework on how to establish trust.

In RHEL and CentOS, where contribution access is hierarchical, I would assume that how trust is established for both the humans and the systems they build is different than in Fedora. For example, trust as a human packager in RHEL doesn’t imply the same level of trust in Fedora or vice-versa, even if its the same exact human working in both contexts. How humans gain/lose trust across the boundary between RhEL and Fedora already differs. Human trust has to be established differently because work is organized differently. I would expect the same of the trust imparted to the non-deterministric systems.

I don’t think the existing Fedora experience around how humans gain enough trust to get commit access provides a workable framework by which to establish the same level of trust for non-deterministic systems, even systems that are provably correct to several 9’s better than I would be capable of achieving as a packager at this point.

But I will say, I think the policy as drafted would allow people to run the experiments and start trying to answer the questions around how to evaluate when a non-deterministic system is trusted enough to operate without human review. And I’d rather find a way to move forward, knowing we have to come back and address a narrow set of prohibited uses via experimentation that has measurable outputs. What I don’t want to do is grind on a policy discussion for another year without any progress stuck on the things we don’t have enough experience with to feel comfortable. I would much prefer to iterate on this every year.

Is the “must not” statement too strong? maybe. I do not want to discourage experimentation that challenges our current skepticism and biases. But I think it would be very dishonest to not be explicit about what use cases this existing community is most uncomfortable with exploring currently. Because when we do invite people from the AI community and for it to go well, we have to be honest about where we are right now in terms of comfort with a technology we collectively aren’t experts in yet. This is to some an extent, meet us where we are situation.

2 Likes

To clarify, the proposal is mostly the work of Jason. As a member of the council, I agreed to go forward with the proposal because I believe things like this should be discussed publicly and not behind closed doors. However, I did not write this proposal, nor did I sign it as “proposed by the Council”.

2 Likes

I would rather have AI deal with the abrt bugreports and close any that don’t include steps to reproduce.

Maybe AI could be used on these reports to prompt the user to provide better info.

I currently ignore them and let the EOL process close them.

1 Like

But there is also legal obligations. For example, GDPR article 22 might be important depending what we do. There is already 50 decisions listed on GDPRHub for that article, so that would be a rate of 7 per year on average. I also suspect that the 2024/1689 EU Reglementation (aka, the EU AI Act) may concern us in some specific case, even if unlike GDPR, I have not read the whole text in details, so maybe there is nothing substantial on that front.

And on this topic, there is also some debate on the interplay of the Digital Service Act and GDPR, see this article on something where we should keep a eye on, as in theory, it could concern us as Fedora is distributing softwares, even if we are not officially designated as gatekeeper for the purpose of the DSA/DMA since we are very far away from the required threshold )

1 Like