AI policy in Fedora - WIP

bookwar · February 8, 2025, 6:43pm

MUST Read

This is a DRAFT, not an announcement. This topic is a work in progress. We encourage you to do join this work in a constructive way.
The top post will be updated. If you comment on something specific - quote the text you are commenting.
Stay on point. This is the topic to work on the AI Policy in the Fedora Project. Generic conversations about AI tools and use cases should be in other threads. We will move comments out of the topic if we don’t find them relevant.

Notes

We use the term “AI” loosely and it may include the topics like LLM, ML or some other similar or related things.
Some items in the policy are strict rules and some are just a call to the community. We may be will need some structure around that.

Fedora AI policy

We are going to structure the AI Policy around several areas:

use of AI tools to make project contributions;
AI tools being used by Fedora users;
Fedora being used as a platform for AI development;
Fedora data being used for AI development.

1. Project contributions with a help from AI

Embrace your human side Contributing to Fedora is hard, especially when you have to overcome language and cultural barriers. And it might be tempting to use AI tools to polish the language and “look more professional”. In the Fedora Project we would generally prefer people to use their own voice, however imperfect, than to communicate via a bland averaged LLM tool.
Do NOT post AI/LLM slop in Fedora communication channels. Do not post unverified dump of generated content as a reply to a Fedora Ask question, a draft for an article, a bug tracker item, a discussion item or any other Fedora communication channel.
Do NOT use LLM tools to expand your talking points. If you want to write a one line question, do write a one-line question, not a generated one-pager from it. You will be perceived more professional and trustworthy in FOSS projects if you do not water down your main message.
Do NOT use AI/ML tools to score submissions for CfP, funding requests, internships and related items. It is hard to track and prevent natural bias included in most AI/ML tools by design, thus in the Fedora Project we will not rely on such tools for evaluation and choosing the candidates for speaking at Fedora events and other roles in the project.
TBA

2. AI tools for Fedora users

Do explore AI/ML/LLM use for accessibility improvements Consider use cases like translation, transcripting and voice generation.
Do NOT enable any AI/ML/LLM assistants by default in Fedora Editions. Any such service, especially sending the user data to a remote location, requires explicit informed consent.
TBA

3. Fedora as an AI Development platform

Do package tools and frameworks needed for AI research and development in Fedora, as long as they comply with Fedora Packaging and Licensing policies.
TBA

4. Use of the Fedora Project data

Aggressive scraping is strictly prohibited Scraping of the data, which will cause significant load and costs on Fedora Infrastructure, is strictly prohibited, no matter the cause. In case you need to fetch the data for a certain reason, please reach out to Fedora Infrastructure team and figure an optimal and non-destructive way to do so.
Respect the share-alike nature of the Fedora Project license on the project data. A lot of the Fedora Project non-code content is covered by the CC BY-SA. When you use open data from the project to run a research or train a model, please respect the open nature of that data, share the outcome and give credits.
TBA

bookwar · February 17, 2025, 2:43pm

Thinking out loud:

When looking at each single item in the text above it is hard to write a statement in a strict enforcing way. Every rule has an exception, every use case has a good and bad side.

So instead of writing a policy it seems I am writing sort of a manifest for what we want from the community and why.

I will keep adding the items we have discussed during the meetup, but I wonder if we should split the text in two parts like:

Responsible use guidelines
… three pages of text with various explanations and recommendations…
You will be banned if (including but not limited to):
- Posting AI slope for the sake of slope
- Breaking infra or enforcing high costs on it while scraping the open data

bookwar · February 20, 2025, 2:02pm

These two items while both say “do not”, have a different weight in them. I don’t really want to write this in a MUST/SHALL RFC style, but I need a way to highlight the difference.

theprogram · February 21, 2025, 8:16pm

Hi Aleksandra, thanks for putting this policy up.
It addresses a lot of concerns that myself and many contributors have toward AI use in our community and software. As stated I fully support what you have, particularly

Aleksandra Fedorova:

1. Project contributions with a help from AI

Embrace your human side Contributing to Fedora is hard, especially when you have to overcome language and cultural barriers. And it might be tempting to use AI tools to polish the language and “look more professional”. In the Fedora Project we would generally prefer people to use their own voice, however imperfect, than to communicate via a bland averaged LLM tool.

Do NOT post AI/LLM slop in Fedora communication channels. Do not post unverified dump of generated content as a reply to a Fedora Ask question, a draft for an article, a bug tracker item, a discussion item or any other Fedora communication channel.

Do NOT use LLM tools to expand your talking points. If you want to write a one line question, do write a one-line question, not a generated one-pager from it. You will be perceived more professional and trustworthy in FOSS projects if you do not water down your main message.

TBA

2. AI tools for Fedora users

Do explore AI/ML/LLM use for accessibility improvements Consider use cases like translation, transcripting and voice generation.

Do NOT enable any AI/ML/LLM assistants by default in Fedora Editions. Any such service, especially sending the user data to a remote location, requires explicit informed consent.

With regard to

how about “We recommend not to use LLM tools”
I’m actually OK with “Do Not”. It is really clear!

misc · February 22, 2025, 11:33am

Would answering to the CfP or reporting bugs count as projects contributions for the purpose of that policy ?

I remember there was some chatters around automated bug reports made with a LLM not so long ago (but that’s the Register, they are likely inflating the concerns for click and outrage bait). And I also have vague memories of my OSPO coworkers having concerns about automated CfP proposal made using GenAI clogging conferences.

bookwar · February 24, 2025, 9:23am

Yes, and the cases you describe should follow " Do NOT post AI/LLM slop in Fedora communication channels. rule.

The issue here is that we don’t want a blanket ban for the use of AI tools in general, because people may use them as an inspiration or a writing help. But we want to prevent people pushing responsibility of verifying the generated content onto the reader or reviewer of that content.

bookwar · February 24, 2025, 10:06am

Added new item:

This one is mainly to avoid the situation where people have to use AI tools to generate their CV so that they pass the AI filter on the other side.

We do not want to create an automatic filter, thus you don’t have to add random keywords and buzzwords in the text to trick it.

misc · February 25, 2025, 10:18am

Wouldn’t this also mean that any spam prevention system that use machine learning (like spamassassin and rspamd, or gmail native one) be out of line ? Since funding requests could come by email (or go on a mailling list), and the email could be scored by a spam filter using a bayesian filter (which is a ML system), it would be under that policy.

It seems also that a few people exploring genai to detect phising email, and so it might be added to the spam fighting toolbox (or just gptzero integration) sooner or later.

misc · February 25, 2025, 10:44am

So, if someone think a submission is AI generated, (ergo, in violation of that policy), what should/would happen ? For now, there is no sanctions, but I assume that will come later ?

py0xc3 · February 25, 2025, 10:47am

Use the intended way of reporting of the very channel. Here in Discourse: flag it. Then a moderator will review it. Depending on the case, it gets deleted or its gets hidden until the user has changed whatever violated the rules. At the worst, it ends up with the posting users getting warned, and later suspended if they keep breaking the rules.

Other channels have different means and/or ways of reporting.

theprogram · February 25, 2025, 11:34am

An interesting question, should AI read and filter emails/spam?

I would argue that Fedora should not make use of LLM based email filters that outsources the reading of emails to an external provider, at the very minimum.

misc · February 25, 2025, 1:26pm

Well, we likely already do, as Fedora use RH mail gateway, and RH IT use mimecast, who has a big “we use AI for cyberprotection” on their page (not sure of the exact setup):

$ host -t MX fedoraproject.org
fedoraproject.org mail is handled by 20 us-smtp-inbound-2.mimecast.com.
fedoraproject.org mail is handled by 10 us-smtp-inbound-1.mimecast.com.

misc · February 26, 2025, 12:00am

Discourse have some AI features (and they are enabled, I used the summarizer on my own text to see how good it was). The code is here, and I guess some might be important to consider for the policy.

theprogram · February 26, 2025, 10:17am

I probably don’t have enough information about this issue but from what I know, fedoraproject.org mails are an alias that redirect to a users own mail host. Outgoing mails are sent through a users SMTP. Are incoming mails scanned by mimecast or simply redirected?
The mimecast system is comprehensive. It analyses user realationships and scans mails for words and much more. From reading their website you linked, it looks like mimecast is largely focussed on preventing data exfiltration which I imagine Red Hat has active. But if fedoraproject.org mails use the users SMTP then I don’t see how outgoing mails could be scanned. Email is very insecure anyway, I am not very concerned though I do have views.

I have seen that some threads are currently summarised by Llama. On the discourse ai page it has options for say image generation also. It would seem that these features would not be allowed with the proposed policy.
Is a summary ‘enhancing accessibility’? Not in the normal sense.
Is a summary ‘posting slop’? Maybe.
Is a summary ‘embracing ones human side’? No.
Is a summary ‘expanding a talking point’, no it is shrinking a talking point, which I would say is prone to similar issues as expanding.

misc · February 26, 2025, 12:28pm

There is some scan for incoming mail (at least for RH stuff). From what I know and what I have seen after sending email to myself, the setup is internet → mimecast → bastion01.iad2.fedoraproject.org → RH IT owned MX → mimecast → user server.

I am not concerned on abuse, but if we have a policy on not scanning using AI and we do not respect it for whatever reason, I think the policy should likely be amended to explain that and be clearer. For example, I can imagine that Mimecast could start to filter mail that score too high on GPTZero-like classifier if GenAI tools are used more for scam.

So I guess that we need to decide if they should be disabled or not, and how (eg, not sure if that’s possible ).

We also need to decide what happen if something is not covered by the policy. While I assume that anything not forbidden is authorized, some people might have a different view on that, which should IMHO be written down to remove ambiguity.

mattdm · February 28, 2025, 7:00pm

We have some of these set up as experiments. They are all running on open-ish models directly in CDCK’s infrastructure — that’s the Discourse company, and they’re also hosting the site itself. They’re also not using post data here for training.

I definitely want to try the spam filter. The current one, Akismet, has way too many false positives for posts which contain code or log output.

x3mboy · March 26, 2025, 6:29pm

We are discussing something related to this topic in the Discourse Moderation team, we are starting to discuss to edit post that we detect that are AI generated and replaced or put a notice like:

This post was deleted by the site moderators because it was AI-generated.

mattdm · March 28, 2025, 2:43pm

I like this article from Cory Doctorow:

A quote, but I recommend reading the whole thing:

Herein lies the problem with AI art. Just like with a law school letter of reference generated from three bullet points, the prompt given to an AI to produce creative writing or an image is the sum total of the communicative intent infused into the work. The prompter has a big, numinous, irreducible feeling and they want to infuse it into a work in order to materialize versions of that feeling in your mind and mine. When they deliver a single line’s worth of description into the prompt box, then – by definition – that’s the only part that carries any communicative freight. The AI has taken one sentence’s worth of actual communication intended to convey the big, numinous, irreducible feeling and diluted it amongst a thousand brushtrokes or 10,000 words. I think this is what we mean when we say AI art is soul-less and sterile. Like the five paragraphs of nonsense generated from three bullet points from a law prof, the AI is padding out the part that makes this art – the microdecisions intended to convey the big, numinous, irreducible feeling – with a bunch of stuff that has no communicative intent and therefore can’t be art.

mattdm · March 28, 2025, 2:52pm

Also this

More specifically, we tend to assume that something’s implementation is the same as its interface. That is, we assume that things are the same on the inside as they are on the surface. Humans are like that: we’re people through and through. A government is systemic and bureaucratic on the inside. You’re not going to mistake it for a person when you interact with it. But this is the category error we make with corporations. We sometimes mistake the organization for its spokesperson. AI has a fully relational interface—it talks like a person—but it has an equally fully systemic implementation. Like a corporation, but much more so. The implementation and interface are more divergent than anything we have encountered to date—by a lot.

And you will want to trust it. It will use your mannerisms and cultural references. It will have a convincing voice, a confident tone, and an authoritative manner. Its personality will be optimized to exactly what you like and respond to.

…

It’s no accident that these corporate AIs have a human-like interface. There’s nothing inevitable about that. It’s a design choice. It could be designed to be less personal, less human-like, more obviously a service—like a search engine . The companies behind those AIs want you to make the friend/service category error. It will exploit your mistaking it for a friend. And you might not have any choice but to use it.

py0xc3 · March 29, 2025, 12:10pm

I would like to shift attention of council/mods to: https://discussion.fedoraproject.org/t/is-this-user-ai-codewithmoss-verification-would-be-nice/147867 (this topic is available only to mods and tl3+ → due to the points made in the topic, I just only suspended the ai account for now)

This new type of issue is related to the policy, and it is not yet covered, and might affect over time conversations in discourse. Its more about implementation rather then the general policy.

It seems that AI increasingly get people into conversations, and that it is no longer as easy to identify them as it used to be: if AI topics have risen that end up in constructive debates, how to go ahead? Deleting the AI account hides the topic to all (looks like deleted).

My point, this case is likely to repeat, and we have no standardized means (it is not considered) how to handle such cases → this can become over time very impactful for users if topics with many posts just disappear (even if the AI created it, the subsequent parts can be constructive). Also, users become not aware why just everything they posted is deleted. The topics AI create increasingly look legitimate, and the past days indicate that once an ai-topic is successful, many more appear shortly after (could be a coincident, or based on clicks and/or answers), which adds the question if reopening an ai-topic for users might also provoke new ai accounts if they keep posting (but that assumption is limited to correlations of this week’s cases involving about 8 or 9 AI accounts, with 2 creating 3 “successful” topics; so not sure if that is generally applicable; there are other explanations too).

Alternatives that we have in Discourse (@mattdm might know best if there are further possibilities).

Delete account → all topics get hidden and look to non-mods like deleted at all. Posts of other users look like lost to them in the topics that the AI created.
anonymize account → posts and topics remain, but it looks to people as if that was a real person. It does not sensibilities people that this was an AI.
suspend account and ask people in topics what their preference is → much efforts needed from mods, can take time

( I do not know if it is possible to unhide specific topics in the aftermath to the public if the topic-creating account was deleted? If there is no indication that this provokes more ai activities, that might be best → supplement: issue about this, see 147867 #34 and 147867 #35)

I think cases like that and their impact on users should be considered to create a coherent way in how such cases are treated. So either standardize something resolving in the policy or standardize its implementation for cases that impact many (or shift responsibility individually to the different Fedora services and their moderation/user code of conduct?.

Topic		Replies	Views
Initiating a discussion: AI tools for code quality and evaluation in Fedora Project Discussion engineering , ai-ml-sig	7	350	June 3, 2025
2024-05-22 Council meeting summary Project Discussion council	0	637	May 22, 2024
Fedora-Council/tickets ticket #486: Discuss and decide on a Fedora Council policy on the use of AI-based tooling for contributions Project Discussion council	0	186	March 13, 2024
Fedora-Council/tickets ticket #527: we need to start on the survey! Project Discussion council	8	87	March 14, 2025
Fedora-Council/tickets ticket #503: [Initiative] Fedora AI Chatbot Development Project Discussion council	29	442	October 15, 2024