How do we kick off manual updates?

In the older CoreOS, the strategy was to stop the update-engine and locksmithd services from running automatically. When you wanted to update, start the update-engine service and kick off an update using update_engine_client -update.

In FCOS, all that was replaced with Zincati. There is documentation on how to disable Zincati to prevent autoupdate, but the question remains on how to kick it off manually.

What is the recommended way of doing this?

May I ask what is your final goal in the steady state?

Do you want to opt-out auto-updates and just manually perform selective updates? If so, do you care about which version you are manually updating to?

I have two use cases:

Use Case #1

I have customers who run CoreOS in very secure environments that have no access to the internet. Enabling auto-update means they will be alerted to expected failures because it can’t reach the outside world. So we disable auto-update. If they need to update their systems, they obtain a temporary firewall exception to allow the system to reach the internet for a specified period of time. During that time, the admin can manually kick off updates to bring them to the latest version of CoreOS.

Use Case #2

I have customers who are very sensitive to reboots. So for those customers, we employ the same strategy of disabling auto-update and allowing them to manually kick off update during their maintenance windows.

In both cases, our desire is to bring them to up to the same state they would have been in if they were running auto-update all along.

Thanks, those were indeed the details I was looking for. In particular:

This seems to hint at a few things:

  • you don’t want manual updates (i.e. fiddling with rpm-ostree), you want to restrict when auto-updates can happen. Going through Zincati seems fine.
  • you have some human/automation logging into each node to trigger actions.
  • for case #1, you want to restrict when the system is checking for updates.
  • for case #2, you want to restrict when an host can go and reboot itself for updates.

My personal approach would be different in the two cases:

  1. Disable zincati.service so that it is not started by default. Manually start/stop it based on firewall exception being on/off. (Alternatively you can just let it run, it will keep polling and will only work when allowed by the firewall, but it will spam logs and metrics due to errors).
  2. Tune your finalization strategy. It looks like you are interested in strategy: add weekly maintenance windows mode, configuration and logic · Issue #34 · coreos/zincati · GitHub to be implemented. While that is still in progress, you can already implement a simple fleet_lock backend with allows/denies reboot based on current time.
1 Like

Thank you for your reply! I have questions about this:

Disable zincati.service so that it is not started by default. Manually start/stop it based on firewall exception being on/off.

  1. With the update_engine_client -update method the update ran in the user’s session so they had immediate feedback for the update process. For Zincati, I assume starting the service kicks off updates immediately. How does a user get progress and completion notices for the update?

  2. By disabling the service, do you mean disabling auto-update as specified here or do you mean disabling the service via ignition’s systemd.units.enabled: false.

  1. Starting the service kicks off checking for updates. There may be no updates available at the time, or other reasons (e.g. config or lock-manager) not allowing updates at that point. The service will keep running anyway, hoping to eventually apply an update. If you want to wait for that, you can systemctl start --wait, but there is no guarantee that it will eventually return (i.e. there can be no more updates to apply, forever). If you want to track progress, you can tail the journal (journalctl -f -u zincati.service) or monitor service metrics.
  2. The latter. If you disable the Zincati service at the systemd level, you can still systemctl start it and auto-updates will work. Instead, if you set updates.enabled = false in the Zincati configuration, the service will run (e.g. you can query its metrics) but the auto-updating logic is disabled.

It sounds almost like you want a oneshot mode for zincati to startup up, check for updates, apply updates, and give feedback in the terminal window.

It sounds almost like you want a oneshot mode for zincati to startup up, check for updates, apply updates, and give feedback in the terminal window.

Yes. If you guys don’t provide it, I would end up writing a script to do it anyways.

Speaking of which, it looks like that script would start the service, monitor the logs looking for the update completion message, then shut down the service.

It’s worth noting if there is an update then the service will reboot the node.

It’s worth noting if there is an update then the service will reboot the node.

I’m assuming at the end of the update, correct?

I fear there is a basic misunderstanding going on, due to the transition from update_engine to Zincati which have different (mental) models.

On Container Linux: update-engine was checking and staging updates, either on its own or triggered over command-line. It was not tacking care of reboots in order to apply an update. Any reboot (including power outages) would thus activate a staged update. Locksmith was in charge of just rebooting the machine when a new update was staged.

On Fedora CoreOS: Zincati checks, stages and finalizes updates on its own as long as it is running. It cannot be triggered from command-line (open to feature requests here, but not on the immediate radar). It proactively asks for permission to apply an update and reboot. It does not need any external controller to watch its progress and reboot the node. In fact, Zincati finalizes a staged update and reboots the node in a single step. If a spurious reboot occurs while there is a staged update, it doesn’t get applied (because it wasn’t explicitly finalized).

If my client has a firewall exception for the next 12 hours, he needs to get those updates kicked off and done before the window closes. If they are not done, he needs to know this so the window can be extended.

I understand that FCOS’s new update methodology does not support this. However, based on this conversation, I can write a script to start Zincati and it will immediately check for updates. I can tail the logs to provide feedback so my client knows the status. When the system reboots, then the update is done.

Is my thinking incorrect here?


Edit: I wanted to also add that the following ignition config snippet:

"systemd": {
    "units": [
      {
        "name": "zincati.service",
        "enabled": false
      }
    ]
  }

does not seem to disable the zincati service. Masking it works, but setting enabled to false still shows the service running:

zincati.service | loaded active running | Zincati Update Agent

Correct. We don’t support a one shot mode currently. As @lucab mentioned above, you’re welcome to open a feature request for that. You’ll probably have to workaround it for now with a custom script.

You’re thinking is correct. I think zincati checks for updates on some randomized time interval, but I think it should happen within a few minutes after starting up. The interesting part might be when there are no updates. I don’t see a string that you can grep for in the logs to indicate there was no update. You can increase the logging verbosity but even with that I don’t see a log message when it polls.

Hmm. That’s suspicious. I’m seeing the same thing. Let’s track it over in Unable to disable zincati.service using Ignition · Issue #392 · coreos/fedora-coreos-tracker · GitHub

Your thinking is correct. I think zincati checks for updates on some randomized time interval, but I think it should happen within a few minutes after starting up. The interesting part might be when there are no updates. I don’t see a string that you can grep for in the logs to indicate there was no update.

Zincati checks for updates right on startup. Further checks are scheduled periodically, at a configurable rate (since v0.0.8).

The “check for updates… none found” is the steady mode for the daemon so we don’t log anything in that state because 1) everything is alright 2) it would constantly spam the logs 3) it won’t contains any actionable info.

If you max out the logs to trace level (-vvv, please don’t do that on real nodes as it will fill the journal with development/debugging info) you’ll see the following:

[TRACE] trying to check for updates
...
[TRACE] got an update graph with <X> nodes and <Y> edges 
[TRACE] scheduling next agent refresh in <Z> seconds

That said, there is feature-request ticket open to expose more internal state to systemd status to track the steady state: update-agent: expose current state-machine state via sd_notify · Issue #94 · coreos/zincati · GitHub.

So if I was to write a script to manually kick off updates by starting the service, that works fine as long as there are updates. In the case of no updates, I guess I can look for successive trying to check for updates messages with not got update message but that will not be ideal.

Is there a command I can run that checks for updates and just tells me if there are any? If so, I can run that and if there are updates, start Zincati and let it do its thing.

Not right now. I’m trying not to sprawl the Zincati design too much for the moment, as it is easy to take wrong turns at the beginning which make things worse later on.

For your case, I don’t have an ETA but I’ve in mind tackling this two things (in order):

The end goal would be being able detect the idle state with a single query through systemd.

For the moment, I’m NOT considering adding a new sub-verb (e.g. zincati check-next) as it brings a cascade of further complexity in the software (i.e. an asynchronous client/server mechanism, or some standalone privileged logic to access configuration and state).

1 Like

That solution works for me. Thank you for considering them!