I have a fresh FC35 install that’s running dhcpd / bind and mounts an nfs share through fstab.
After an update/reboot I noticed that only bind had started but dhcpd and the nfs client had failed to start, digging into the boot log it appears that systemd had tried to start those services -before- it had started NetworkManager even though the service files have ‘After=network-online.target’
I’m not entirely sure why this is happening or how to fix it, manual modifications to the service files to add some delays will just get overwritten when there’s an update.
We need more precise information about the issue. What is the output of journalctl -u dhcp.service? I assume you use the normal dhcp-server from the Fedora repository?
Complementary, you can already go through the /etc/systemd/system/* & /usr/lib/systemd/system/* directories with ls -l and check, which of the files were changed during that update. Changes in the latter directories are only interesting if they are symlinked to the first.
Nothing in the journal funny enough, this is from messages(cleaned up):
Jan 14 09:26:34 Hostname systemd[1]: dhcpd.service: Main process exited, code=exited, status=1/FAILURE
Jan 14 09:26:34 Hostname systemd[1]: dhcpd.service: Failed with result 'exit-code'.
Jan 14 09:26:34 Hostname systemd[1]: Failed to start DHCPv4 Server Daemon.
Jan 14 09:26:34 Hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dhcpd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Jan 14 09:26:34 Hostname kernel: FS-Cache: Netfs 'nfs' registered for caching
Jan 14 09:26:34 Hostname kernel: Key type dns_resolver registered
Jan 14 09:26:34 Hostname kernel: NFS: Registering the id_resolver key type
Jan 14 09:26:34 Hostname mount[1482]: mount.nfs: Network is unreachable
Jan 14 09:26:34 Hostname systemd[1]: mnt-nas.mount: Mount process exited, code=exited, status=32/n/a
Jan 14 09:26:34 Hostname systemd[1]: mnt-nas.mount: Failed with result 'exit-code'.
Jan 14 09:26:34 Hostname systemd[1]: Failed to mount /mnt/nas.
Jan 14 09:26:34 Hostname systemd[1]: Dependency failed for Remote File Systems.
Jan 14 09:26:34 Hostname systemd[1]: remote-fs.target: Job remote-fs.target/start failed with result 'dependency'.
Jan 14 09:26:37 Hostname NetworkManager[910]: <info> [1642148797.8736] manager: NetworkManager state is now CONNECTED_LOCAL
Jan 14 09:26:37 Hostname NetworkManager[910]: <info> [1642148797.8739] manager: NetworkManager state is now CONNECTED_SITE
Jan 14 09:26:37 Hostname NetworkManager[910]: <info> [1642148797.8740] policy: set 'enp88s0' (enp88s0) as default for IPv4 routing and DNS
Jan 14 09:26:37 Hostname systemd-resolved[807]: enp88s0: Bus client set default route setting: yes
Jan 14 09:26:37 Hostname NetworkManager[910]: <info> [1642148797.8765] device (enp88s0): Activation: successful, device activated.
dhcpd just fails to start because there’s no ipv4 address and this is by design, same for the nfs mount and then a few seconds later NetworkManager comes back with that the interface is ready.
I also don’t think this is related to the recent updates, this seems more of a hit or miss thing.
I originally missed the /usr/lib/systemd/system/* directories in my previous post (just changed it). Was there a change after the last update in these files?
However, your NetworkManager logs are only information about NetworkManager’s state, they do not indicate that NetworkManager has just started - NetworkManager was started before this info was broadcasted (I assume before dhcpd started, you will find systemd[1]: Started Network Manager. in the logs) - so I do not see an issue here. In my logs, NetworkManager’s CONNECTED_LOCAL/SITE messages are also after dhcpd started.
journalctl -u dhcp.service leads to no output? Don’t forget to use it with sudo to get all relevant logs!
If you start dhcpd manually after everything is up, systemctl start dhcpd.service, does the same error appear?
Have there any changes be made in the dhcpd.conf recently?
It only gave the normal running output with journalctl, not the error where it exits due to not having an IP.
Everything starts fine after the interface actually has an IP, this also does not happen with every reboot. Maybe like 1 out of 5 reboots where it’ll fail like this.
There have been no changes to any of the service files, there have also been no updates to dhcp-server, NetworkManager or systemd since the machine was installed which was only just over 2 weeks ago.
The fstab nfs mount failing actually does seem to happen more often than dhcpd.
I did just add an on-failure restart to the dhcpd service file so that it tries again after 10s and that seems to work.
i feel like ‘network-online.target’ is somehow reached too early before the interface is actually fully up.
According to the systemd documentation the conditions that this target is reached at are defined in ‘the network management software’, which would make this a NetworkManager issue ?
It is the normal running output that ends up in the error and thus, that indicates where we have to look and where to adjust.
My suggestion is to check on each boot immediately if dhcpd is running, and if not, then directly get the output of sudo journalctl -b (all logs of the current boot; given what you describe, we don’t know yet where we have to look) and let us know. Feel free to also compare the journal of a working boot with the failed one.
If it sometimes work and sometimes not, while the service/target/socket files remain unchanged for weeks, I tend to exclude them as origin.
That no IP address is allocated to an interface is not the origin of the problem but the outcome, because dhcpd was not able to allocate an IP due to its crash/stop. So, the IP allocation has nothing to do with the origin of the issue.
That no IP address is allocated to an interface is not the origin of the problem but the outcome, because dhcpd was not able to allocate an IP due to its crash/stop. So, the IP allocation has nothing to do with the origin of the issue.
The machine has a fixed ip, dhcpd requires an ip in the same subnet that it is attempting to serve or it will exit unless you have some special things configured for vlans. And this is what is happening, at the time of dhcpd starting there is no ip4 bound to the interface yet and thus it exits like it’s supposed to.
All the NetworkManager systemd things seems to be enabled/normal.
● NetworkManager-wait-online.service - Network Manager Wait Online
Loaded: loaded (/usr/lib/systemd/system/NetworkManager-wait-online.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2022-01-14 19:18:44 CET; 18h ago
Docs: man:nm-online(1)
Process: 907 ExecStart=/usr/bin/nm-online -s -q (code=exited, status=0/SUCCESS)
Main PID: 907 (code=exited, status=0/SUCCESS)
CPU: 11ms
I also tried setting ‘ipv4.may-fail’ to false as according to the NM documentation it’s supposed to wait till it can ping the gateway but that didn’t make any difference either.
If I run systemd-analyze blame the NetworkManager-wait-online.service comes up at just over 6 seconds which is a lot compared to the rest.
Systemd has always been a bit of a grey area for me as I grew up with initd but going over all the things my current main culprit is ‘nm-online’ just somehow giving the OK too early.
I checked all the service files and it’s either a systemd bug, which I am doubting as there would be a lot more people complaining, or some sort of weird network manager issue in combination with my hardware.
This is a shiny new Intel NUC that got released not that long ago.
Without specific/system information, I cannot help you. A rough description of the outcome or assumptions about the origin don’t change that. I still don’t see a reason to assume that the service files are the origin as long as they remain unchanged while the problem sometimes appears at boot and sometimes not. I cannot include/exclude a relation to NetworkManager because I have no information that can include or exclude it. We have to explicitly identify, not implicitly assume