F41 Change Proposal: Mark Fedora KDE AArch64 as Release-Blocking (System-Wide)

:link: Mark Fedora KDE AArch64 as Release-Blocking

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

Wiki
Announced

:link: Summary

Mark Fedora KDE AArch64 deliverables as release-blocking, leveraging the same criteria for Fedora on AArch64 and Fedora KDE on x86_64.

:link: Owner

:link: Detailed Description

The Fedora KDE SIG already considers Fedora KDE on AArch64 at the same level of importance as Fedora KDE on x86_64, and the SIG wants the Fedora KDE AArch64 deliverables (such as the disk images and ISOs) to be release-blocking alongside existing Fedora KDE x86_64 deliverables.

:link: Feedback

:link: Benefit to Fedora

This allows the Fedora KDE SIG to reinforce its commitment to the highest quality KDE Plasma experience on Fedora possible on AArch64 like on x86_64, and it lets Fedora better support the larger ecosystem around Fedora leveraging KDE Plasma (of one notable member being the Fedora Asahi Remix).

:link: Scope

  • Proposal owners: Update pungi-fedora to mark AArch64 artifacts as release blocking (pagureio#pungi-fedora#1290)

  • Other developers: N/A (not needed for this Change)

  • Release engineering: #12165

  • Policies and guidelines: N/A (not needed for this Change)

  • Trademark approval: N/A (not needed for this Change)

  • Alignment with the Fedora Strategy: This aligns with expanding and improving Fedora’s connections to the larger ecosystem by supporting upstream KDE with our expertise on AArch64 and downstream distributions like the Fedora Asahi Remix using Fedora KDE on AArch64.

:link: Upgrade/compatibility impact

This has no impact on users.

:link: How To Test

Fedora KDE can be tested on AArch64 the same way Fedora Workstation is:

  • Fedora KDE AArch64 ISO on either QEMU/KVM or a device with UEFI CD boot such as the Raspberry Pi using UEFI firmware
  • Fedora KDE AArch64 raw disk image on existing supported AArch64 devices

:link: User Experience

This does not change anything for the user experience.

:link: Dependencies

There are no extra dependencies.

:link: Contingency Plan

  • Contingency mechanism: Revert release-blocking status for Fedora KDE on AArch64
  • Contingency deadline: Final Freeze
  • Blocks release? Yes

:link: Documentation

There is no user-facing documentation to update. Fedora QA documentation on release-blocking artifacts will note Fedora KDE AArch64 artifacts as release-blocking.

:link: Release Notes

Not applicable as this is not a user-facing Change.

Last edited by @amoloney 2024-06-16T16:09:35Z

How do you feel about the proposal as written?

  • Strongly in favor
  • In favor, with reservations
  • Neutral
  • Opposed, but could be convinced
  • Strongly opposed
0 voters

If you are in favor but have reservations, or are opposed but something could change your mind, please explain in a reply.

We want everyone to be heard, but many posts repeating the same thing actually makes that harder. If you have something new to say, please say it. If, instead, you find someone has already covered what you’d like to express, please simply giving that post a :heart: instead of reiterating. You can even do this by email, by replying with the heart emoji or just “+1”. This will make long topics easier to follow.

Please note that this is an advisory “straw poll” meant to gauge sentiment. It isn’t a vote or a scientific survey. See About the Change Proposals category for more about the Change Process and moderation policy.

Could the Change be updated to point to a specific list of aarch64 devices that are release-blocking? I see some things under “How to test” that implies it should be tested against the same hardware that Fedora Workstation does, but this doesn’t make it clear exactly what hardware must work in order to ship.

The intent is that it should be the same hardware that Fedora Workstation is qualified against. That is, all the devices listed on Fedora ARM that works with Workstation should work with KDE.

Offhand, the main device I’m aware of that’s release blocking for ARM is the Raspberry Pi 3/4 series.

1 Like

I’m missing any mention of whether there are some people who volunteered to perform release validation testing (especially the Desktop part, imagine extra KDE column where appropriate) on this image. It would require occasional testing after Branching, and regular and timely testing for release candidates before Beta and Final.

The release validation process expects involvement from SIGs in testing, the QA team can’t do it all. Especially with aarch64 hardware, which is very time-consuming (slow hardware, slow storage, various usability obstacles). QA is very much utilized now and I can’t imagine we would like to cover another aarch64 image at this moment. Unless there are people who step up to perform the testing and can be relied on.

I thought I would be able to add quality-team tag to this topic, but I can’t. @ngompa can you please add it? Thanks.

Members of the KDE SIG are already doing some testing for Fedora KDE on ARM platforms. Aside from me doing it on Apple Silicon platforms with Fedora Asahi Remix and in AArch64 KVM on Apple Silicon, @marcdeop from the KDE SIG is willing to do this on Raspberry Pi hardware.

I’m also hopeful that we can get AArch64 testing in OpenQA soon to make a lot of this easier for us.

I cannot either. I’m pretty sure this tag is not available in this section.

One person testing is nowhere near sufficient for making a DE release blocking, especially when it’s limited to a single piece of hardware. There have already been issues with getting enough testing on rpi4, adding to the things which demand testing before release isn’t the way to fix that.

What are you aiming to get by making KDE release blocking for aarch64? This feels like we’re putting the cart before the horse.

IMHO, if there are enough people testing quickly enough for every release for at least 2 releases in a row, we can talk about making it release blocking but making something release blocking before we get to that point is going to make life more difficult; adding release requirements does not make testers materialize.

2 Likes

There’s quite a bit more than one person testing KDE on AArch64, especially with Fedora Asahi Remix using Fedora KDE as the flagship experience. And as far as I know today, there are no Workstation folks actively testing Fedora Workstation on AArch64. So on that point, we’d be doing more.

My goal is to ensure that we never have a Fedora KDE AArch64 deliverable missing from the website by GA. I also want to shift more of the Fedora KDE AArch64 qualification work from FAR to Fedora proper. People rely on Fedora KDE to be good on AArch64, and Fedora KDE already treats it the same as x86_64.

I agree with the rest of the quality team that this risks substantially increasing load on us and we’re not really able to cope with that.

Currently we don’t run KDE tests on aarch64 on openQA (because it isn’t release blocking). I can add it, but that’s some extra work for me right off the top - turn them on, run them, check the results, fix any problems that show up. It’s also extra load on the aarch64 workers, which we’re quite thin on. We have one very powerful worker host for prod, and three or four very underpowered old ones for stg, which give false failures all the time because they’re just old and underpowered and unreliable. Even on prod, we get a lot more flakiness on aarch64 than on x86_64; it’s manageable with just GNOME to care about, but adding KDE to the list makes things harder.

openQA does not cover all the tests, anyhow, there is still substantial manual testing required, for things that cannot be automated or are hard to automate. As Tim and Kamil mentioned, this is slow and frustrating work given the capabilities and reliability of the hardware we have to test on.

1 Like

The way you worded your comment, it sounded like “I test with this hardware that isn’t quite supported and this one other guy will test with an rpi4”.

I’m not trying to give other groups a pass but at the same time, we don’t want more deliverables that we’re stuck having to test at the last minute to get a release out the door. There are exceptions but overall, the SIGs don’t have a great history of testing the stuff that they “should” be testing.

I’m not saying that it should never be release blocking, I’m just wary of being burned by the “take on this commitment, we promise we won’t leave you holding the bag” nature of declaring something as blocking instead of changing that status after a few releases of the responsible group making sure that there are tests, that the matrices exist and that the matrices are filled out in a timely fashion even when the demanded test turnaround time is less than ideal.

And that’s just talking about the manual bits that would be required if we made another DE release blocking on aarch64. As adam says, there is the automated testing parts and that has 2 limitations - the HW used to run the tests and the human bandwidth needed to parse and triage any failures.

I’d like to see the process to make something release blocking change. Instead of just declaring something to be release blocking and that there is sufficient interest to justify making it release blocking, have a sort of “trial period” where Fedora will not release without the proposed thing.

During this “trial period”, Fedora will not release without the proposed thing so long as the responsible group continues to do all the testing required of the thing they want to see made release blocking. This includes

  • adding (or at least helping to add) any new test cases
  • making sure that the test matrices are updated to include the new thing
  • making sure that the new thing sees sufficient test coverage leading up to release so that we’re not scrambling to find someone who has the hardware and inclination to test the new bits at the last minute before go/no-go.

If and only if all of those items happen and continue to happen for a period of 2-3 releases, something would become actually release blocking.

If the group behind the proposed release blocking thing fails to hold up their end of the bargain, that thing is no longer treated as release blocking and there is no requirement for Fedora Quality to do the missed testing or testing related work.

Has this standard been applied to previous bits declared to be release blocking? No, it hasn’t and at this point, that’s water under the bridge. I’m interested in making sure we don’t repeat some of what I consider mistakes from the past.

2 Likes

That’s true. At the same time, though, most of the ARM tests used to be performed by Paul Whalen, so we didn’t have to do much work on it, in the past. Then he transitioned elsewhere and we inherited much more related work (and we quickly learned that it’s very painful work). That can easily happen again with other desktops/environments.

Also, I have a feeling that Workstation WG are not very happy that they need to deal with it at all, the related Change was not driven by them but by ARM SIG, because XFCE desktop didn’t have developer resources to fix issues, while GNOME did, so I think it somewhat just fell into their laps. It doesn’t justify the lack of their testing involvement, but perhaps illustrates why we are where we are.

I would rather resolve this problem by different means than all affected groups to request their image to be release blocking. Release blocking is a huge superset over “image not missing” requirement.

I understand it feels very unfair if a spin image is missing at GA because of an infra bug - and we had a lot of these infra issues lately. But I think we can find a better way to resolve it than go with the ultimate hammer - release blocking. Perhaps we can vouch to spin the failed composes (when caused by an infra bug) until they build and then maybe store them in a different directory on mirrors (no longer as the same compose, it’s hard to stitch composes together), but at least it would be available and users visiting the Fedora homepage don’t really care where it is stored. Or perhaps we have two separate release blocking criteria sets - image builds, and content passes functional testing, and only require the former for certain images? Those are just immediate ideas, but I think it’s possible to find a middle ground to make the groups happy and also stay realistic with how many things we can test and verify. (If we go the other route and keep adding to the release blocking list, that will just force us to look for ways to lower the quality bar to keep it manageable).

(Personally, I’m not too happy about aarch64 arch being release blocking, and think we should rather detract from it rather than add more, but that’s probably for a different discussion.)

I think this could be reasonable. But at least from the KDE SIG perspective, the status quo of having artifacts that the SIG cares about randomly missing is not acceptable, so we will do whatever we can to accomplish that. And if that means beefing up direct Fedora KDE AArch64 testing, we can do that.

The challenge right now is that only a few of our members have the hardware to do testing. At least one has committed to actually buying hardware to do this. I do plenty of testing of KDE on AArch64 as part of Fedora Asahi Remix, so I do know it all works once we get past the boot part and the drivers part. But the boot and drivers part is also not really in our purview either, so I don’t know what we should do here…

@kevin Do you think that Releng/Infra could make sure that all images are available somehow (not necessarily part of the main candidate compose) for Beta and Final, even if they are not release blocking? (assuming the compose failure happens due to a releng/infra bug, see my proposal quoted in the previous post).

No, “all images” is much too high a bar. We build something like 70+ images. Quite frequently at least some of them are broken.

We could potentially have a list of “not quite release blocking” images that we respin for until they work, but then the problem is we might wind up having to play a lot of Compose Roulette.

I missed that apparently a motivating factor for this change is the live ISO being missing. If anything, that makes me more opposed to it. The Workstation live ISO for aarch64 is not release blocking either, so that was missing for F40 GA too, see Index of /pub/fedora/linux/releases/40/Workstation/aarch64/iso . The only ISO there is the osbuild one, which is not the “official” live image, the official one also failed. So making both KDE’s ARM disk image and live image release blocking would mean we have more release blocking desktop images for KDE than Workstation, and I don’t think that’s reasonable without more resources.

It’s always been an intentional choice that the disk images are the release-blocking media for ARM, not the live images, because deployment via disk image has always been the ‘standard’ for ARM. I really would like to avoid a world where we have to test both, for any desktop.

There is also a practical concern, which is: nobody knows how to fix the freaking aarch64 live image build bug. It’s been open for months - 2247319 – sporadic sigill with python3.12 in livemedia composes . We’ve had Top People digging into it. We even thought we’d figured it out at one point, but nope. These images fail all the time and we can’t fix it. If we make one of them release blocking, we’re essentially dooming a large proportion of all our composes. Looking through the last couple of weeks of composes, the aarch64 KDE live failed in just over half of them, so making it release blocking would doom about half of our composes. We can’t do things that way around. The bug must be fixed (or, I guess, we must switch to osbuild or Kiwi for building the image, if that works around the problem) before we can even consider making an affected image release-blocking.

No, I sure wouldn’t want to commit to something like that.

Sometimes infra/releng bugs are just sporadic or we haven’t figured out
the failure and can retry, but sometimes they are worse… additionally
doing that kind of thing bypasses a lot of our process, ie, would QE
actually test fully those images like the normally would? And how much
time would we leave for community testing, etc.

I’m happy to try and make images available where there is a bug we
couldn’t fix or was sporadic, but I don’t think we can say “we will
always make every image”.

Ugh, that’s awful. Yeah, okay. I’m happy to take the condition that the Fedora KDE live ISO needs to move away from Lorax for this to work. It’s already on my TODO to figure out the remaining gaps for us to produce Fedora live ISOs with kiwi “properly”. This sort of bumps it up in priority.

This change proposal has now been submitted to FESCo with ticket #3232 for voting.

To find out more, please visit our Changes Policy documentation.