"xz" lessons learned: if/how to involve Fedora Magazine in CVE handling?

I see you point, the content might be formally wrong because the malicious code was introduced even in beta, but what the article puts in the mind of people is correct because Richard’s actions ensured the malicious code to not work and thus betas were effectively not vulnerable.

However, in case of any doubt, malicious/hostile code should be removed asap and thus I tend to tell users to get rid of it, especially as I am not sure if the author(s) knew 100% for sure about Richard’s fix when that sentence was introduced, and it increases complexity (risks of misinterpretation) of the content without the need. If at all, the point of the high chance of being effectively not vulnerable in beta should be made explicit, and not be the major message.

But I just saw the article was updated once again, now it looks good to me: not much space for misinterpretation, and all users who might have downloaded something they surely don’t want will be warned to get rid of it:

I might had made the context of “stable NOT impacted” more explicit, but it is now simple, comprehensible and without much risk for users to misinterpret its context. Yet, it took days to get it there. And @rlengland 's point is definitely a valid one, too.

I just removed the warning about the article in the other topic.

Well now this is just wrong, because there is no known backdoor in 5.6.0-3.fc40. The backdoor was disabled at configure time. Downgrading is a good idea to do anyway, but won’t avoid any known backdoor.

I don’t understand why this is so hard to get right. It’s very simple. 5.6.0-1.fc40 and 5.6.0-2.fc40 are backdoored. 5.6.0-3.fc40 is not. All engineers who worked on the problem agree with this. I will give up now.

(The Red Hat blog post seems wrong too; we are still trying to get that fixed.)

1 Like

I didn’t verify this level of details :rofl:

But I try to take it with humor. And at least it motivates all affected users to take the mitigation steps, even if some more (update doesn’t hurt :wink: ). Yet it leads back to the problem about how to avoid such issues in future :see_no_evil:

I think the best way to handle the communication of sensitive security exploit requiring action from users is for the security team responsible to have an outlet for communication. They should then have a relationship with the marketing, social media, or communications people (for us it’s Fedora Marketing) to distribute the news with a link to source. If the blog or source of updates can be linked to alongside the initial advisory, and then further updates can be found from that source. Ideally the security team and communications team would agree on milestones on the way to returning to normalcy. In this case I’m waiting for the all clear on using Rawhide as normal and Fedora 40 as normal.

Unfortunately what we’re seeing with the Fedora Magazine is what happens with any form of news dissemination, which is that accuracy and real time updates goes down the further you get from the source of the news. Without communication with the team, it’s hard to know when to update, what to say, and whether we’re even explaining it right. So we wait until things stabilize enough or can get in touch with the people who know.

As a result, I think it’s better to communicate the safest solution or fix that users can take first and then not give updates until we have reasonable confidence that users can go back to normal. You just have to take it in chunks when you are a step or two removed the source. The security team can make live updates, but the comms team downstream can’t do that so easily.

Now this could change and improve. I just don’t know who to talk to for improving this process.

Sure, but my point was to write the announcement post without (all of) the technical details and reference (with appropriate link) wherever the project tracks CVE’s for said details and potential mitigation. Having information on a straight forward work around, like downgrading to a specific version, is fine on it’s own too I guess. In which case, I think the follow up post, if one is desired should be once there is definitive info available.
The source of truth should be the Fedora projects CVE tracking I would think, not the Magazine post for something like this. The project must be tracking security issues of this nature somewhere no?

I’m +1 for a follow-up Fedora Magazine article to give a “all clear” message and perhaps a brief “post mortem” of what exactly occurred.

4 Likes

As of now at least, we don’t really know a lot of the crucial details beyond speculation. Some we may never know.

1 Like

Well, apparently you know that “we” don’t know, which is more than I know. :slightly_smiling_face:

1 Like

Based on the experience from the discussion topic, the devel mailing list has proven sufficient for that so far (or at least an acceptable point to begin the consideration). The information was up to date, people there were willing to clarify what was not clear and we had a feedback loop in both directions. As long as there was no consensus that a group was not vulnerable, the group kept being warned, except a short period where I thought the magazine had more up to date information and added the “testing = disabled by default” thing. But the feedback loops worked and Adam made me aware early. I could also feedback the inconsistencies in other sources I became aware of (or was made aware of) to check which is up to date and where we need to adjust.

However, the devel mailing list has two issues as a source when it comes to a security incident:

  • it is normal email without authentication/signatures, and thus without guarantee for integrity of content or originality of the author, and thus …
    • content has to be evaluated in terms of if it makes sense to mitigate the issue (this is already a radical means, but in such a situation, one might expect the worst until clarification and a mailing list without signatures is an easy target IF fedora is a target)
  • also, a lot is written there, and one has to verify what is the consensus and what is debated, and among whom - everyone can write there. But again, Michael, Kevin and Richard can be seen reliable sources and if they end up in a consensus, I take it for granted (as long as I consider their messages to be authentic)

It’s not a perfect solution, but in the current organization it seems to have created the best results. But there is space for improvements.

I don’t know about the source of the magazine, some people in the mailing list said they had made aware of issues before publishing but somehow their information made it not to the authors. So they might investigate internally, if possible with the engineers, how that occurred in order to mitigate it in future.

Yet, I also would like to point out that this situation was not expected, and a lot of people had to respond without a clear framework, and without clear knowledge how to respond: it was improvised. That the devel mailing list was a proper point to rely on was an “educated guess”, since I didn’t know for sure that the list is kept up to date by reliable sources in these circumstances. It was more hope than knowledge, but I concluded that using that is better than hoping that average users read the Red Hat blog, which anyway also contained some inconsistencies at that time. My own improvisation is reflected by the fact that I was not sure myself if the “testing = disabled by default” thing is up to date information when I read it outside the devel mailing list.

Maybe it is worth to have a wider consideration of how to mitigate such issues in future, not just about the magazine. We currently rely on RH. They had their issues themselves, but they still did a good job considering how this began. But I think we need our own security incident handling process that connects developers/engineering with those who reach the majority of users. Having slept a night over it, it might be worth to superordinate that topic over the magazine issue. The latter is maybe more a symptom.

A magazine article and a discourse topic make sense and are the two Fedora means to reach most users immediately, although it might be debated which contains what depth of information, and how they align with each other, and with upstream information.

In any case, thanks to all people who responded early and took the risk to make mistakes and to make decisions in any direction in the unclear situation, including explicit decisions to refrain from actions due to a lack of knowledge. I was happy for every response and every incentive, from confirming that actions are not possible at some place, up to the confirmation what actions have been taken.

That is already philosophical :smiley:

I agree with Chris that the devel@ list is the winner here. Without that mailing list, I’m not sure how we would have sorted out fact vs. fiction.

Yes, but unfortunately the security team doesn’t seem to be an accurate source either. Their blog post still says Fedora 40 is not affected by the malware exploit but we still do not know why they think that. Maybe it’s true, but all engineers who worked on this issue believe Fedora 40 is affected, and we haven’t received an explanation for this discrepancy yet. Product Security is aware of this and is still investigating.

1 Like

I wonder if perhaps we should establish a ‘communications hub’ matrix
room or something and have people from all the different areas meet and
sync up there? but I suppose we could just bless the admin room or
something with that role.

Thanks to everyone for coordinating and getting information flowing on
this…

5 Likes

Good idea, I didn’t think about Matrix but its a good point to connect. However, I would not create a new channel: if the next incident is in two years, this will be a channel that then was not in use for two years and thus, I would not rely on people to remember about it. The less channels and such we have, the less can split.

But we could make a wiki page or so about behavior, steps and responses in security incidents… to have something where everyone knows “hey that’s where I get informed how we have to respond” when they experience that a security incident happened. So not a wiki page about the very incident, but a page about incidents at all, e.g., to explain that we shall meet at the admin room and keep the devel mailing list watched (to bring everyone on the same page).

You mean like this one? Security Bugs - Fedora Project Wiki

No, that’s not related to immediate actions after such an incident occurred. This page contains information to report if one has found something security critical, but not the subsequent steps that have to be conducted / processes if it proves to be critical (it tells a user to understand, out of interest or because of a potential case, and if it is the latter, to report, but it doesn’t tell the teams how to respond if it is a critical CVE or so). It gives some explanations for users to know how we work and how security incidents are handled in general (CVEs etc.) and how/where we prepare to ensure security, but that’s different information than an incident response plan for those who have to implement it.

What I meant is an incident response plan, even if it is abstract/generic and just ensures that people end up communicating and exchanging without splitting them into groups with different knowledge.

However, the idea of a a Fedora-specific security team / response team is nice as well. So far this team seems to have been integrated into RH and not effectively existing in Fedora atm, and it remains unclear if that worked out given that their information seems to have been different to that of the engineers/devel, while I GUESS (afaik!) the RH security team acted isolated from Fedora realms. But I would wait for them (RH security team & engineers) to elaborate internally if / where the problem was before responding to it.

It’s Fedora projects Security Bugs wiki. It already exists, can be edited to suit the “new ideas” and doesn’t need to be recreated, maybe just advertised a bit more. I don’t see why it would be “wrong” to revamp it.

The existing Release Day Matrix room could be a good fit for this. We use it already every release and it has many folks who have roles in various parts of Fedora.

1 Like

Good idea, I didn’t think about Matrix but its a good point to connect. However, I would not create a new channel: if the next incident is in two years, this will be a channel that then was not in use for two years and thus, I would not rely on people to remember about it. The less channels and such we have, the less can split.

Sure…

But we could make a wiki page or so about behavior, steps and responses in security incidents… to have something where everyone knows “hey that’s where I get informed how we have to respond” when they experience that a security incident happened. So not a wiki page about the very incident, but a page about incidents at all, e.g., to explain that we shall meet at the admin room and keep the devel mailing list watched (to bring everyone on the same page).

Yeah a process might be nice, but of course it’s hard to keep that up to
date if you don’t use it much.

1 Like

Indeed, I guess that makes only sense with people responsible for such situations, which effectively means having an incident/security team, and even then it can be questioned if all consider that place unless someone from an incident response team is always available to coordinate. I guess that’s not realistic so far.

However, when thinking about it, it is the same with a channel: If we agree to use a specific matrix channel, it is questionable if everyone will know in 2 years - especially everyone who is then available at the very time to respond.

The major question is how to ensure that people who are necessary in the response and people who can contribute to the response will know where to align/verify, connect & coordinate.

Theoretically, this leads back to the devel mailing list, which seems natively to be involved (and thus a native point to start) given the current experience, and maybe check if we can provide some useful additions and incentives in a simple way that can be remembered and that is likely to be triggered if that happens again.

Suggestion:

We have three native ways of quick communication that are intended to be flexible and quick in communications. Simple response plan in case of an incident:

→ determine one matrix channel as default in advance (#admin)
→ determine that anyone should open a discussion topic about the case
→ determine that anyone should open the thread in the devel@
→ verify information always in a second channel (so always at least in two of the above)
→ anyone shall connect the three channels by a message or so, to remind others
→ anyone is encouraged to make other channels aware of new/other information

I am quite sure the devel@ thread will occur anyway natively, based on the given experience.

If we create a simple short wiki (or something / somewhere else, whatever) that just summarizes that simple means, we can distribute that plan to teams and channels, have a good chance that in the worst case someone remembers, and we also have a page where people can review to be sure (and the page also gets SEO → search results when people search on search engines or so). But we are talking about a generic page that people can review in case (or get for SEO), but nothing more complex: that can be summed up in two or three sentences → no updates necessary.

If anyone in any channel remembers the simple idea, they can remind and make the connection available for everyone

Addition: If someone from the magazine is available, they might be encouraged to verify what is the consensus among the three channels and publish something, although I still think the magazine should remain generic.

2 Likes