I think there are several parts to this, assuming that the issue of obtaining specialized hardware is taken care of (for ROCm and/or any other stacks we may pursue in the future). The order of this may seem backwards but I think it’ll make more sense as I expand on them
- How will we coordinate/run the testing?
- Where will those host machines live?
- Where do we get host machines to facilitate testing?
How will we coordinate/run testing of ROCm?
I’m aware of 3 existing systems for running automated tests in Fedora at the moment: fedora-ci, zuul and openqa.
As far as I know, none of those systems are set up for running hardware-specific tests in Fedora beyond arch. I’ve heard that upstream OpenQA might add some support for that in the future and I’m not sure if the fedora-ci backing systems have any concept of hardware-specific testing but if they do, I don’t believe any of that is set up in Fedora at this time. @adamwill and @mvadkert would be the best references for this. The zuul setup we have access to is hosted outside of Fedora and I doubt that is an option for specialized hardware at the moment and I’m not clear on if upstream zuul has the support for hardware testing that we’d need.
If my suspicions are correct and the three existing systems are not options at this time, I can think of four remaining options for automated testing of ROCm:
- Ask Red Hat to host the hardware and use their systems to coordinate and run the HW specific tests
- This has the obvious problems of access for non-RH folks and overall visibility
- Put together a beaker instance
- I’m not aware of another setup that can do hardware coordination and management and is relatively compatible with Fedora from the get-go.
- I did put together a beaker instance for Fedora in the past but it was never used
- Work to add support to one of the existing systems
- Roll our own setup
- Even minimizing the amount of things that would have to be done (i.e using an existing runner, build upon an existing base system), this would be a lot of work.
If we’re choosing between the non-existing options, my instinct would be to start exploring the possibility of setting up a beaker instance in/for Fedora.
No matter what we do, there are options for getting results where they need to be. Whether there’s available bandwidth to maintain that glue code or if they’re needed at this time are different questions that can be left for later.
Where Will the Test Infrastructure Live?
The easiest answer is “in Fedora infra” but I have zero visibility into how much rack space they have available or if this is a potential option. I assume that @kevin is the best reference for whether hosting more test machines in Fedora infra in is an option.
If that’s not an option, we’d have to find another host whether that’s Red Hat, some other company or a contributor. There are issues with any of those options and I’d rather not explore alternatives until we have an answer to whether Fedora infra is an option.
Where Will the Host Machines Come From?
No matter what we do, some form of host machine will be required. The specifics of what we’re looking for will depend on the answers to how and where the tests are being run but unless there’s hardware lying around that I’m not aware of, it’s going to be a budget or a sponsor issue.
I’m of the mind to leave this alone until we answer the other two questions. Once we have some (at least potential) answers to those, we can worry about finding sponsors for the host hardware.