Rhel7 status in fedora infra - early 2024 edition

Hey everyone.

I thought I would go over the rhel7 instances we still have left. rhel7 goes end of life on june 30th, which is less than 6 months away now and we MUST have all these taken care of before then. I don’t know the status on some of these, so if you do, or you know someone who does, it would be great to chime in.

We have a total of 46 rhel7 instances left.

badges. Whats the status of moving this off?

badges-backend01.iad2.fedoraproject.org
badges-backend01.stg.iad2.fedoraproject.org
badges-web01.iad2.fedoraproject.org
badges-web01.stg.iad2.fedoraproject.org

These are fedmsg gateways. I guess we could move them forward, but… perhaps we could also think about sunsetting fedmsg finally? The last 2 things I think using fedmsg are badges and github2fedmsg. However, there may be end users still listening there.

busgateway01.iad2.fedoraproject.org
busgateway01.stg.iad2.fedoraproject.org

These I just need an outage to migrate forward. I plan on doing it week after next tenatively.

db-fas01.iad2.fedoraproject.org
ibiblio05.fedoraproject.org
virthost-cc-rdu01.fedoraproject.org
virthost-cc-rdu02.fedoraproject.org

These can just be done by me anytime:

dedicatedsolutions01.fedoraproject.org

There has long been a plan by the cloud folks to replace fedimg.
Will it be ready before times out? What can we do to help this along?

fedimg01.iad2.fedoraproject.org

This was planned to be re-written into a more generic ‘webhook to fedora-messaging’ thing. However, we haven’t yet done it and time is running out. ;(

github2fedmsg01.iad2.fedoraproject.org
github2fedmsg01.stg.iad2.fedoraproject.org

This we planned on re-writing, but haven’t yet. Time is running out. ;(

kerneltest01.iad2.fedoraproject.org

This is all finally now in epel9 (many thanks to all the hard work on this!)
So, I plan to start on deploying to stg very soon and look at migrating prod after thats sorted out.

mailman01.iad2.fedoraproject.org

This needs to stay around until f38 EOL. We do have a epel8 version available, so I guess I will look at migrating these to 8 sometime.

mbs-backend01.iad2.fedoraproject.org
mbs-backend01.stg.iad2.fedoraproject.org
mbs-frontend01.iad2.fedoraproject.org
mbs-frontend01.stg.iad2.fedoraproject.org

This was ported to python3 long ago, but never deployed.
@abompard was looking at this, possibly deploying in openshift?
Any news on the status here?

mm-backend01.iad2.fedoraproject.org
mm-backend01.stg.iad2.fedoraproject.org
mm-crawler01.iad2.fedoraproject.org
mm-crawler01.stg.iad2.fedoraproject.org
mm-crawler02.iad2.fedoraproject.org
mm-frontend01.iad2.fedoraproject.org
mm-frontend01.stg.iad2.fedoraproject.org
mm-frontend-checkin01.iad2.fedoraproject.org

These will all go away as soon as the flatpak sig can finish migrating the last of the f38 flatpaks forward. There’s still issues with libreoffice they are trying to solve. Hopefully this will happen soonish.

osbs-control01.iad2.fedoraproject.org
osbs-control01.stg.iad2.fedoraproject.org
osbs-master01.iad2.fedoraproject.org
osbs-master01.stg.iad2.fedoraproject.org
osbs-node01.iad2.fedoraproject.org
osbs-node01.stg.iad2.fedoraproject.org
osbs-node02.iad2.fedoraproject.org
osbs-node02.stg.iad2.fedoraproject.org

These can be redone as soon as osbs goes away:

os-control01.iad2.fedoraproject.org
os-control01.stg.iad2.fedoraproject.org

This box has a bad disk and is out of warentee. The replacement should be hopefully coming in Q1. So, we will install that, move things off this and retire it.

osuosl01.fedoraproject.org

Lots of work has been ongoing to move things off pdc. So, hopefully we can retire it soon. Whats left to do here? @humaton ?

pdc-web01.iad2.fedoraproject.org
pdc-web01.stg.iad2.fedoraproject.org
pdc-web02.iad2.fedoraproject.org

This is waiting on migrating planet.fedoraproject.org off to an openshift pod.
phsmoura has been working on this and working on deploying in staging right now.
Once that works in stg, we will roll to prod, but then we need to advertise the change and set a date to drop the old one.

people02.fedoraproject.org

These need to be sorted out. They have a bunch of small scripts and sync to proxies stuff. It might be we could drop them, but failing that we should be able to move them to rhel9 hopefully easily enough.

sundries01.iad2.fedoraproject.org
sundries01.stg.iad2.fedoraproject.org
sundries02.iad2.fedoraproject.org

Hopefully we can get everything sorted before EOL time. :wink:

1 Like

We haven’t got anything ready yet as far as I know. I’d love to switch
badges away from fedmsg, but nobody wants to touch that ancient code.

We’ll meet again next week. I’ll put the topic on the agenda. Maybe
someone else has an idea for an interim solution.

Edit: I replied by e-mail quoting the badges section. I don’t know what the magic is to make Discourse not drop quoted text in e-mail replies. :frowning_face:

1 Like

Please do bring it up. If nothing is done by rhel7 EOL time, we will
have to just take down the badges service, which I really would not want
to do. ;(

I’m probably going off-topic and is better to discuss this in another thread, but I’ll be brief. I had a look to Tahrir (Fedora badges app) and apart of migrating it away from fedmsg it looks like we’ll soon need to adjust it also for Pyramid 2 and Sqlalchemy 2.
I couldn’t find any other open source platform like it to issue open badges, so if we decide to continue supporting Tahrir it would make sense to do a more complete rewrite so that it would be usable also outside Fedora. Maybe RH could be interested in putting some workforce and have an app which can use for their customers?

Badges is being rewritten from the ground up. At least that’s what we
still aim for. Unfortunately, we had a bit of a setback. While the plans
and goals are all there, we were unable to attract sufficient developers
implementing those.

For people driving by, the project lives on
GitLab. Feel
free to join in.

I’m not sure if we will have anything ready replacing Tahrir before EL7
goes EOL. I will bring it up next meeting and report back here. I
totally agree with Kevin, having tot take down Badges would be the least
favorable outcome.

1 Like

Yeah I still think this would be an interesting project to take on. I’m happy to help mentor it if somebody from the community is interested. An Outreachy intern would be ideal, but I suppose that’ll be too late?
There’s even been an ARC investigation: webhook2fedmsg — ARC notes documentation

Otherwise, I have a branch where I’ve ported it to use fedora-messaging, but it would still be better to rewrite it so we can get rid of Pyramid.

Why did it need to be rewritten? I can’t remember.

Yeah I was planning on getting back to it this month, but some other things took priority. Most of the scripts in there never had any tests, and I wanted to add some before doing a giant refactor, however that’s just too complex, it took me too long in december. The scripts are really not designed to be tested in unit tests. And there’s a lot of copy-pasted-and-slightly-modified code in there. I think I’ll just deploy a new version on staging and test there, before I add tests to the refactored code.

Most of the app will be able to run in Openshift, but Fabian pointed out that the crawler will need IPv6 connectivity and apparently we don’t have that in Openshift.

I don’t know a lot about what the MirrorManager app is supposed to be doing, I mean I can guess from the UI and the code but there are quite a few subtleties in the scripts, so if somebody wants to work with me on that it’d be great.

There is an ARC investigation available for this one as well. The main reason for rewriting is the tech debt.

1 Like

Current tahrir is python2 only using fedmsg. It also runs in dedicated
vm’s. So, Ideally any update would be python3, use fedora-messaging and
run in OpenShift. :wink:

Yeah I still think this would be an interesting project to take on. I’m happy to help mentor it if somebody from the community is interested. An Outreachy intern would be ideal, but I suppose that’ll be too late?
There’s even been an ARC investigation: webhook2fedmsg — ARC notes documentation

Otherwise, I have a branch where I’ve ported it to use fedora-messaging, but it would still be better to rewrite it so we can get rid of Pyramid.

Yeah, like I said, the deadline looms, but I agree a more generic
webhook gateway could be much nicer.

Why did it need to be rewritten? I can’t remember.

It’s python2/fedmsg running in a vm, and we wanted to move it to
python3/fedora-messaging and openshift.

Yeah I was planning on getting back to it this month, but some other things took priority. Most of the scripts in there never had any tests, and I wanted to add some before doing a giant refactor, however that’s just too complex, it took me too long in december. The scripts are really not designed to be tested in unit tests. And there’s a lot of copy-pasted-and-slightly-modified code in there. I think I’ll just deploy a new version on staging and test there, before I add tests to the refactored code.

Sounds good. Let me know if I can help any…

Most of the app will be able to run in Openshift, but Fabian pointed out that the crawler will need IPv6 connectivity and apparently we don’t have that in Openshift.

Well, we don’t have it… at all. There’s no ipv6 in that datacenter, so
even the current vm doesn’t crawl ipv6. ;(

I should ping them again about enabling ipv6 for us, but I don’t think
this should be a factor in our deployment.

I don’t know a lot about what the MirrorManager app is supposed to be doing, I mean I can guess from the UI and the code but there are quite a few subtleties in the scripts, so if somebody wants to work with me on that it’d be great.

I’m happy to answer any questions I can… and adrian should hopefully
be able to too.

Thanks for taking that on!

I will be at CentOS Connect. Let’s talk there. Happy to help.

“Tech debt” means too many things all at once, and ends up conveying different things to different people.

Instead:

  1. What is the current “maintenance cost” in terms of person-hours/year? [1]

  2. With the current code,[2] what is a reasonable estimate for the cost of:

    • maintenance at a healthy level?[3]
    • maintenance at a minimal level?[4]
  3. For each of current, minimal, and healthy levels of maintenance, calculate the likelihood of downtime or serious disruption (due to security incidents or otherwise) in a given year.

  4. What would that disruption,[5] cost Fedora in terms of:

    • loss of productivity for users of the service?
    • loss of productivity from cascading effects?
    • person/hours to mitigate?
    • disruption to higher-priority work because there’s a fire?
    • other fallout (bad press, bad community feelings, etc.)?

The “risk formula” is “likelihood × consequence”. So, for each maintenance level (current, minimal, healthy[6]), make a best estimate for the risk-associated potential costs, and add that to the maintenance costs.

Next, make these maintenance assessments for an updated or replaced codebase, at both the “minimal” and “healthy” levels.

And, estimate the cost of the work to do the upgrade.[7]

Considering the cost-to-replace and hopefully-lowered ongoing maintenance, how many years would it take for the replacement work to pay off?


  1. dollars would be better, but hard to figure in an open source project! ↩︎

  2. application, tool, service, whatever ↩︎

  3. Users are satisfied, bugs get fixed, appropriate new features can be added, security is reasonable, infrastructure is robust ↩︎

  4. Users aren’t angry, serious bugs are fixed, no new features, basic security mitigations are in place, infrastructure is … in support, at least ↩︎

  5. again, security or otherwise ↩︎

  6. and add in “ideal”, if you want! ↩︎

  7. With some cynicism from 25 years in IT, maybe double that best reasonable estimate? Estimating is hard! ↩︎

“Tech debt” means too many things all at once, and ends up conveying different things to different people.

Yeah, agreed. It’s a pretty overloaded term many people use to describe
lots of different things. ;(

Instead:

  1. What is the current “maintenance cost” in terms of person-hours/year?

  2. With the current code, what is a reasonable estimate for the cost of:

    • maintenance at a healthy level?
    • maintenance at a minimal level?
  3. For each of current, minimal, and healthy levels of maintenance, calculate the likelihood of downtime or serious disruption (due to security incidents or otherwise) in a given year.

  4. What would that disruption, cost Fedora in terms of:

    • loss of productivity for users of the service?
    • loss of productivity from cascading effects?
    • person/hours to mitigate?
    • disruption to higher-priority work because there’s a fire?
    • other fallout (bad press, bad community feelings, etc.)?

The “risk formula” is “likelihood × consequence”. So, for each maintenance level (current, minimal, healthy), make a best estimate for the risk-associated potential costs, and add that to the maintenance costs.

Next, make these maintenance assessments for an updated or replaced codebase, at both the “minimal” and “healthy” levels.

And, estimate the cost of the work to do the upgrade.

Considering the cost-to-replace and hopefully-lowered ongoing maintenance, how many years would it take for the replacement work to pay off?

Well, in this case the platform it’s running on is going end of life.

So, IMHO, our alternatives are:

  1. Turn it off and no longer offer the service.

  2. Put some effort into moving it to a supported platform (and I mean by
    platform here: python3 instead of EOL python2, rhel9 instead of rhel7,
    fedora-messaging instead of fedmsg).

I guess we need to see how important this service is and how much effort
it might take to move it to something supported. :frowning:

If I understand right, if we bumped Badges/Tahrir today to RHEL 8, the single biggest issue we are facing is Python 3 support. Is that correct?

This is huge dilemma of whether we invest a non-trivial amount of time in hobbling the legacy system along another year or two, or if we invest in building a new MVP for something that is “good enough” to fix the old system. Either pathway we go, it requires a significant amount of time and investment. The challenge we have today is that there is no way for the community to actually do the heavy-lifting needed to get us onto a new system, or even hobble the old along.

The friction point we have is not app development, but the infrastructure and deployment. We don’t have a clear way today for someone to build and prototype an application with the intention of running it in official Fedora Infrastructure. The way I see it, there is a heavy lift required to do this, and right now, the burden sits on the Fedora Infrastructure / CPE team.

We need to start discussing how to get attention and eyes on this, and what the sustainable pathway might be. We have six months, but we have Fedora 40 ahead of us, which will be a more packed release than our typical cycle. We are going to have to make some tough decisions about what to prioritize and when.

Actually it should be possible for anyone to prototype an application, since all the messages going through the bus are publicly broadcasted outside the Fedora infrastructure. The old badges system also made direct calls to the datanommer database, but these can be prototyped by using REST API calls to datagrepper. I don’t see anything that badges use that is restricted to within the infra.

Actually, it seems that all components support python3 just fine. The only one that doesn’t, is fedsmg, but we need to get rid of that anyway. So the dependency on fedmsg is the immediate problem. But looking at e.g. tahrir, tahrir/tahrir/notifications.py at 0852ee0f38102a0ca08e4a8eb762bdf61bde4f3a · fedora-infra/tahrir · GitHub is all of the code that uses fedmsg. Surely it can be ported to fedora-messaging.

Considering that we’ve been talking about the a rewrite for a few years, without it actually happening, I would very much try to do the minimal update to keep things working on modern systems instead.

Tahrir is the web UI of the badges system, and indeed its dependency on fedmsg is not heavy. However if we want to port the whole badges system to Fedora Messaging, there are other components such as fedbadges that are more tied to fedmsg.
It’s not impossible to port though, and I’d happily help anyone who wants to take that on.

Really, it needs to be brought up to python 3 and made to work in a container. It also uses fedmsg and likely could use an auth update. The functionality of the app is fine, it just needs to be ported to modern infrastructure…

1 Like

The source code at GitHub - jmflinuxtx/kerneltest-harness: Fedora automated kernel test harness seems to already be basedPython 3, Fedora Messaging and OIDC. Does it just need to be deployed in a container, or does it still need more testing and/or development?

That was Jeremy’s rewrite. It was not a direct replacement for current functionality, and is not what is currently deployed. It was supposed to be fixed up, but got forgotten due to lower priority and the existing site still working. If we want to deploy that in testing, I am happy to take a look and see what needs to be done to bring it up to what is needed, it has been 5 year, and I don’t recall the details. Unfortunately I am running against a hard deadline where I will be out for a rather long time (1-2 months) in ~3 weeks for medical reasons.

We already looked into that during ARC investigation. kerneltest — ARC notes documentation