High CPU usage from rpm-ostreed

I have a stock FCOS instance on GCP (version 33.20210217.3.0). I noticed I am getting periodic CPU usage spikes up to 15% every couple of minutes since booting without any additional programs running (see picture below). After some investigation, it is caused by the rpm-ostreed daemon.

image

This is the output of systemctl status rpm-ostreed:

Mar 07 07:34:09 instance-2 systemd[1]: Started rpm-ostree System Management Daemon.
Mar 07 07:34:09 instance-2 rpm-ostree[7238]: In idle state; will auto-exit in 63 seconds
Mar 07 07:35:13 instance-2 rpm-ostree[7238]: In idle state; will auto-exit in 60 seconds
Mar 07 07:35:13 instance-2 systemd[1]: rpm-ostreed.service: Succeeded.
Mar 07 07:39:23 instance-2 systemd[1]: Starting rpm-ostree System Management Daemon...
Mar 07 07:39:23 instance-2 rpm-ostree[8371]: Reading config file '/etc/rpm-ostreed.conf'
Mar 07 07:39:25 instance-2 systemd[1]: Started rpm-ostree System Management Daemon.
Mar 07 07:39:25 instance-2 rpm-ostree[8371]: In idle state; will auto-exit in 63 seconds
Mar 07 07:40:29 instance-2 rpm-ostree[8371]: In idle state; will auto-exit in 63 seconds
Mar 07 07:40:29 instance-2 systemd[1]: rpm-ostreed.service: Succeeded.

And this goes on and on every couple of minutes.

I was wondering if this is expected behavior or if there is a problem. I am also confused about the role of rpm-ostreed vs the zincati service. I have zincati configured to check for updates in a weekend window, so I am wondering why rpm-ostreed is constantly checking for updates every couple of minutes (I assume that is what it is doing, but I am not sure because my /etc/rpm-ostreed.conf is the unmodified default which is supposed to not check for automatic updates).

If anyone could assist with this problem, I would really appreciate it.

Thank you.

This is possibly due to Zincati periodically checking the deployments status, thus waking up rpm-ostree daemon.

Better caching has been implemented very recently in rpm-ostree: cache status to minimize daemon wakeups by kelvinfan001 · Pull Request #472 · coreos/zincati · GitHub. The improvement is part of 0.0.18 release, which is on its way to the testing stream.

I have zincati configured to check for updates in a weekend window

Minor correction, Zincati finalizes updates during that time window, as explained in the relevant docpage.
It still checks for updates every few minutes, and tries to download and stage one as soon as available.

Thank you for your reply. You are right, as a test I stopped the Zincati service and the problem went away.

That’s excellent! I will give the testing stream a shot once it’s released.

Is it possible to tell Zincati to reduce the frequency of update checks, like once per hour, for example?

Thank you so much for your help.

Perhaps we should scale the update query time to the finalization window? Checking for updates every few minutes seems overkill if we’re only going to finalize on the weekend.

I agree, unless I’m missing something I feel there’s no point checking for updates every few minutes when the update windows are confined to a specific day or set of days, such as in my case.

For now, I think I will disable the Zincati service until the weekend around the time of my update windows. If I am missing some other functionality or increasing security risk by doing this please let me know.

Also I’m wondering why the rpm-ostree daemon takes up so much CPU usage? I am wondering if that is expected or if there might be something wrong with my configuration. Thank you for your help.

This is something I’ve noticed recently-ish as well but hadn’t filed a bug for (in my case it was the delay in a cold rpm-ostree status that was annoying me). Did so now: rpm-ostree startup delays because of GPG key loading · Issue #761 · coreos/fedora-coreos-tracker · GitHub.

That explains it, thanks for looking into it. Will be following along.

Is it possible to tell Zincati to reduce the frequency of update checks, like once per hour, for example?

Yep, the default is 300 seconds, as seen in /usr/lib/zincati/config.d/10-agent.toml; but to override that, you can drop in e.g. /etc/zincati/config.d/90-update-check.toml:

[agent.timing]
# Pausing interval between updates checks in steady mode, in seconds.
steady_interval_secs = 3600

See the docs here for more about configuring Zincati.

2 Likes

Thank you, that is very helpful. I modified my Zincati config file to space out my update checks and restarted the Zincati service. This works nicely for my setup.

@kfan @lucab Seems checking updates every five minutes is a high frequency. Any particular reason against moving to 1h by default?

Five minutes isn’t particularly high-frequency, the real issue is that we had to wake up rpm-ostreed every time.

The background polling loop is better kept short for many things in the agent to work properly (deadend signaling, rollouts spreading, abandoning buggy/undeployable updates, metrics refresh, etc).

The steady_interval_secs knob is mostly meant to tweak the agent into a shorter interval (i.e. refreshing more often). I’d recommend against making that value too large. In particular, when the periodic strategy is in use, each defined maintenance window should be long at least twice that interval.

1 Like