This is a DRAFT, not an announcement. This topic is a work in progress. We encourage you to do join this work in a constructive way.
The top post will be updated. If you comment on something specific - quote the text you are commenting.
Stay on point. This is the topic to work on the AI Policy in the Fedora Project. Generic conversations about AI tools and use cases should be in other threads. We will move comments out of the topic if we don’t find them relevant.
Notes
We use the term “AI” loosely and it may include the topics like LLM, ML or some other similar or related things.
Some items in the policy are strict rules and some are just a call to the community. We may be will need some structure around that.
Fedora AI policy
We are going to structure the AI Policy around several areas:
use of AI tools to make project contributions;
AI tools being used by Fedora users;
Fedora being used as a platform for AI development;
Fedora data being used for AI development.
1. Project contributions with a help from AI
Embrace your human side Contributing to Fedora is hard, especially when you have to overcome language and cultural barriers. And it might be tempting to use AI tools to polish the language and “look more professional”. In the Fedora Project we would generally prefer people to use their own voice, however imperfect, than to communicate via a bland averaged LLM tool.
Do NOT post AI/LLM slop in Fedora communication channels. Do not post unverified dump of generated content as a reply to a Fedora Ask question, a draft for an article, a bug tracker item, a discussion item or any other Fedora communication channel.
Do NOT use LLM tools to expand your talking points. If you want to write a one line question, do write a one-line question, not a generated one-pager from it. You will be perceived more professional and trustworthy in FOSS projects if you do not water down your main message.
Do NOT use AI/ML tools to score submissions for CfP, funding requests, internships and related items. It is hard to track and prevent natural bias included in most AI/ML tools by design, thus in the Fedora Project we will not rely on such tools for evaluation and choosing the candidates for speaking at Fedora events and other roles in the project.
TBA
2. AI tools for Fedora users
Do explore AI/ML/LLM use for accessibility improvements Consider use cases like translation, transcripting and voice generation.
Do NOT enable any AI/ML/LLM assistants by default in Fedora Editions. Any such service, especially sending the user data to a remote location, requires explicit informed consent.
TBA
3. Fedora as an AI Development platform
Do package tools and frameworks needed for AI research and development in Fedora, as long as they comply with Fedora Packaging and Licensing policies.
TBA
4. Use of the Fedora Project data
Aggressive scraping is strictly prohibited Scraping of the data, which will cause significant load and costs on Fedora Infrastructure, is strictly prohibited, no matter the cause. In case you need to fetch the data for a certain reason, please reach out to Fedora Infrastructure team and figure an optimal and non-destructive way to do so.
Respect the share-alike nature of the Fedora Project license on the project data. A lot of the Fedora Project non-code content is covered by the CC BY-SA. When you use open data from the project to run a research or train a model, please respect the open nature of that data, share the outcome and give credits.
When looking at each single item in the text above it is hard to write a statement in a strict enforcing way. Every rule has an exception, every use case has a good and bad side.
So instead of writing a policy it seems I am writing sort of a manifest for what we want from the community and why.
I will keep adding the items we have discussed during the meetup, but I wonder if we should split the text in two parts like:
Responsible use guidelines
… three pages of text with various explanations and recommendations…
You will be banned if (including but not limited to):
Posting AI slope for the sake of slope
Breaking infra or enforcing high costs on it while scraping the open data
These two items while both say “do not”, have a different weight in them. I don’t really want to write this in a MUST/SHALL RFC style, but I need a way to highlight the difference.
Hi Aleksandra, thanks for putting this policy up.
It addresses a lot of concerns that myself and many contributors have toward AI use in our community and software. As stated I fully support what you have, particularly
With regard to
how about “We recommend not to use LLM tools”
I’m actually OK with “Do Not”. It is really clear!
Would answering to the CfP or reporting bugs count as projects contributions for the purpose of that policy ?
I remember there was some chatters around automated bug reports made with a LLM not so long ago (but that’s the Register, they are likely inflating the concerns for click and outrage bait). And I also have vague memories of my OSPO coworkers having concerns about automated CfP proposal made using GenAI clogging conferences.
Yes, and the cases you describe should follow " Do NOT post AI/LLM slop in Fedora communication channels. rule.
The issue here is that we don’t want a blanket ban for the use of AI tools in general, because people may use them as an inspiration or a writing help. But we want to prevent people pushing responsibility of verifying the generated content onto the reader or reviewer of that content.
Wouldn’t this also mean that any spam prevention system that use machine learning (like spamassassin and rspamd, or gmail native one) be out of line ? Since funding requests could come by email (or go on a mailling list), and the email could be scored by a spam filter using a bayesian filter (which is a ML system), it would be under that policy.
It seems also that a few people exploring genai to detect phising email, and so it might be added to the spam fighting toolbox (or just gptzero integration) sooner or later.
So, if someone think a submission is AI generated, (ergo, in violation of that policy), what should/would happen ? For now, there is no sanctions, but I assume that will come later ?
Use the intended way of reporting of the very channel. Here in Discourse: flag it. Then a moderator will review it. Depending on the case, it gets deleted or its gets hidden until the user has changed whatever violated the rules. At the worst, it ends up with the posting users getting warned, and later suspended if they keep breaking the rules.
Other channels have different means and/or ways of reporting.
An interesting question, should AI read and filter emails/spam?
I would argue that Fedora should not make use of LLM based email filters that outsources the reading of emails to an external provider, at the very minimum.
Well, we likely already do, as Fedora use RH mail gateway, and RH IT use mimecast, who has a big “we use AI for cyberprotection” on their page (not sure of the exact setup):
$ host -t MX fedoraproject.org
fedoraproject.org mail is handled by 20 us-smtp-inbound-2.mimecast.com.
fedoraproject.org mail is handled by 10 us-smtp-inbound-1.mimecast.com.
Discourse have some AI features (and they are enabled, I used the summarizer on my own text to see how good it was). The code is here, and I guess some might be important to consider for the policy.
I probably don’t have enough information about this issue but from what I know, fedoraproject.org mails are an alias that redirect to a users own mail host. Outgoing mails are sent through a users SMTP. Are incoming mails scanned by mimecast or simply redirected?
The mimecast system is comprehensive. It analyses user realationships and scans mails for words and much more. From reading their website you linked, it looks like mimecast is largely focussed on preventing data exfiltration which I imagine Red Hat has active. But if fedoraproject.org mails use the users SMTP then I don’t see how outgoing mails could be scanned. Email is very insecure anyway, I am not very concerned though I do have views.
I have seen that some threads are currently summarised by Llama. On the discourse ai page it has options for say image generation also. It would seem that these features would not be allowed with the proposed policy.
Is a summary ‘enhancing accessibility’? Not in the normal sense.
Is a summary ‘posting slop’? Maybe.
Is a summary ‘embracing ones human side’? No.
Is a summary ‘expanding a talking point’, no it is shrinking a talking point, which I would say is prone to similar issues as expanding.
There is some scan for incoming mail (at least for RH stuff). From what I know and what I have seen after sending email to myself, the setup is internet → mimecast → bastion01.iad2.fedoraproject.org → RH IT owned MX → mimecast → user server.
I am not concerned on abuse, but if we have a policy on not scanning using AI and we do not respect it for whatever reason, I think the policy should likely be amended to explain that and be clearer. For example, I can imagine that Mimecast could start to filter mail that score too high on GPTZero-like classifier if GenAI tools are used more for scam.
So I guess that we need to decide if they should be disabled or not, and how (eg, not sure if that’s possible ).
We also need to decide what happen if something is not covered by the policy. While I assume that anything not forbidden is authorized, some people might have a different view on that, which should IMHO be written down to remove ambiguity.
We have some of these set up as experiments. They are all running on open-ish models directly in CDCK’s infrastructure — that’s the Discourse company, and they’re also hosting the site itself. They’re also not using post data here for training.
I definitely want to try the spam filter. The current one, Akismet, has way too many false positives for posts which contain code or log output.