I have a USB NIC that seems to have issue with some kernel updates. When the issue presents, dhcpd logs a “Network is down” and stops working until I manually restart it.
Does anyone know of a way to automatically restart dhcpd if it logs the “Network is down” entry?
Thanks in advance…
Please see if Monitor of NetworkManager fits your purpose or not:
## Activity Monitor
`nmcli monitor`
Observe NetworkManager activity. Watches for changes in connectivity state, devices or connection profiles.
See also **nmcli connection monitor** and **nmcli device monitor** to watch for changes in certain devices or connections.
Hey thanks for the reply…
I can have a look at that. Shortly after the disconnect I do see NetworkManager activity so that may be a solution.
I’m currently testing a systemd trigger that monitors messages log for “Network is down” to restart dhcpd.service and send me an email.
Ultimately I hope that whatever change happened to the last kernel update will be fixed. This happened a couple months ago where it was disconnecting multiple times per day. A new kernel was released a few days later that fixed it.
Restart the service using a dispatcher script:
sudo tee /etc/NetworkManager/dispatcher.d/dhcpd.sh << "EOF" > /dev/null
#!/usr/bin/bash
if [ "${NM_DISPATCHER_ACTION}" = "up" ] \
&& [ "${CONNECTION_ID}" = "CONNECTION_NAME" ] \
&& journalctl -b -u dhcpd.service -g "Network is down"
then systemctl restart dhcpd.service
fi
EOF
sudo chmod +x /etc/NetworkManager/dispatcher.d/dhcpd.sh
You can fetch the connection from the output:
nmcli connection show
See also: NetworkManager-dispatcher: NetworkManager Reference Manual
Wow, thanks for that! This is some great info!
In my situation, the connection does come back on its own. It’s a very brief blip due to the usb nic dropping:
Mar 14 06:56:55 secret kernel: usb 4-1: USB disconnect, device number 4
Mar 14 06:56:55 secret kernel: xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Mar 14 06:56:55 secret kernel: cdc_ncm 4-1:2.0 enp0s20u1c2: unregister ‘cdc_ncm’ usb-0000:00:14.0-1, CDC NCM (NO ZLP)
Mar 14 06:56:55 secret kernel: xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Mar 14 06:56:55 secret kernel: xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Mar 14 06:56:55 secret dhcpd[8649]: receive_packet failed on enp0s20u1c2: Network is down
Mar 14 06:56:55 secret named[979]: no longer listening on 192.168.1.1#53
Mar 14 06:56:55 secret NetworkManager[893]: [1678795015.1500] device (enp0s20u1c2): state change: activated → unmanaged (reason ‘removed’, sys-iface-state: ‘removed’)
Mar 14 06:56:55 secret NetworkManager[893]: [1678795015.1506] device (enp0s20u1c2): set-link: failure to reset link negotiation
Mar 14 06:56:55 secret systemd[1]: Starting NetworkManager-dispatcher.service - Network Manager Script Dispatcher Service…
Mar 14 06:56:55 secret systemd[1]: Started NetworkManager-dispatcher.service - Network Manager Script Dispatcher Service.
Mar 14 06:56:55 secret systemd[1]: Stopping sm-client.service - Sendmail Mail Transport Client…
Mar 14 06:56:55 secret systemd[1]: sm-client.service: Deactivated successfully.
Mar 14 06:56:55 secret systemd[1]: Stopped sm-client.service - Sendmail Mail Transport Client.
Mar 14 06:56:55 secret systemd[1]: Stopping sendmail.service - Sendmail Mail Transport Agent…
Mar 14 06:56:55 secret systemd[1]: sendmail.service: Deactivated successfully.
Mar 14 06:56:55 secret systemd[1]: Stopped sendmail.service - Sendmail Mail Transport Agent.
Mar 14 06:56:55 secret systemd[1]: sendmail.service: Consumed 1.920s CPU time.
Mar 14 06:56:55 secret systemd[1]: Starting sendmail.service - Sendmail Mail Transport Agent…
Mar 14 06:56:55 secret systemd[1]: sendmail.service: Can’t open PID file /run/sendmail.pid (yet?) after start: Operation not permitted
Mar 14 06:56:55 secret systemd[1]: Started sendmail.service - Sendmail Mail Transport Agent.
Mar 14 06:56:55 secret systemd[1]: Starting sm-client.service - Sendmail Mail Transport Client…
Mar 14 06:56:55 secret systemd[1]: sm-client.service: Failed to parse PID from file /run/sm-client.pid: Invalid argument
Mar 14 06:56:55 secret systemd[1]: Started sm-client.service - Sendmail Mail Transport Client.
Mar 14 06:56:55 secret kernel: usb 4-1: new SuperSpeed USB device number 5 using xhci_hcd
Mar 14 06:56:55 secret kernel: usb 4-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=31.00
Mar 14 06:56:55 secret kernel: usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
Mar 14 06:56:55 secret kernel: usb 4-1: Product: USB 10/100/1G/2.5G LAN
Mar 14 06:56:55 secret kernel: usb 4-1: Manufacturer: Realtek
Mar 14 06:56:55 secret kernel: usb 4-1: SerialNumber: 001000001
Mar 14 06:56:55 secret kernel: cdc_ncm 4-1:2.0: MAC-Address:
Mar 14 06:56:55 secret kernel: cdc_ncm 4-1:2.0: setting rx_max = 16384
Mar 14 06:56:55 secret kernel: cdc_ncm 4-1:2.0: setting tx_max = 16384
Mar 14 06:56:55 secret kernel: cdc_ncm 4-1:2.0 eth0: register ‘cdc_ncm’ at usb-0000:00:14.0-1, CDC NCM (NO ZLP), 8c:ae:4c:dd:47:11
Mar 14 06:56:55 secret NetworkManager[893]: [1678795015.4109] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/8)
Mar 14 06:56:55 secret kernel: cdc_ncm 4-1:2.0 enp0s20u1c2: renamed from eth0
Mar 14 06:56:55 secret NetworkManager[893]: [1678795015.4379] device (eth0): interface index 8 renamed iface from ‘eth0’ to ‘enp0s20u1c2’
Mar 14 06:56:55 secret NetworkManager[893]: [1678795015.4461] device (enp0s20u1c2): state change: unmanaged → unavailable (reason ‘managed’, sys-iface-state: ‘external’)
Mar 14 06:56:57 secret ModemManager[994]: [base-manager] couldn’t check support for device ‘/sys/devices/pci0000:00/0000:00:14.0/usb4/4-1’: not supported by any plugin
Mar 14 06:56:58 secret kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp0s20u1c2: link becomes ready
Mar 14 06:56:58 secret NetworkManager[893]: [1678795018.9987] device (enp0s20u1c2): carrier: link connected
As you can see, the time from the USB disconnect to reconect is less than a second. The problem is the dhcpd service does not recover and has to be restarted. The script that ChatGPT generated for me actually worked this morning when it blipped again. I also see there is another kernel update so maybe that will fix it.
(Also interesting that named and sendmail both recover)
For now, I’m going to use this:
/bin/bash -c ‘tail -n0 -F /var/log/messages | grep --line-buffered “Network is down” | while read line; do echo “Network is down detected at $(date)” && sleep 5 && systemctl restart dhcpd.service && echo -e “Subject: DHCP service restarted\n\nDHCP service restarted at $(date).\n\nLog output:\n$line\n” | /usr/sbin/sendmail user@example.com; done’
Oh, and I will still look at nmcli monitor
Thanks again for your time!
Thank you again! If I wanted to include an email routine similar to what I shared above, would I just add the additional commands separated with && ?
It is possible, although practical utility is questionable as you typically want to receive notifications about something that actually requires your attention.
True In this case, however, I do want to know if it’s happening.
Again, thanks for your help…really appreciate it!
I have tried your script and although I have yet to have the network blip that I need this for, I did update my kernel last night and did a reboot.
The only issue I see with this is that when booting, the disbatcher triggers the script 3 times due to each adapter starting.
Would there be a way to identify that we are in the boot process so that it does not trigger on bootup?
Remove the script, then trigger the issue and check:
systemctl is-active dhcpd.service
systemctl is-failed dhcpd.service
systemctl status dhcpd.service
We need to determine the service condition when it fails.
OK I will try to fail it by unplugging the USB device but I have to wait until nobody is using Internet as I’ll knock them off
I will reply back here once I have been able to reproduce. (hopefully unplugging briefly will have the same effect)
Thanks
OK so I briefly unplugged the USB device and interestingly, the error message was exactly the same as before:
Mar 15 12:29:02 secret kernel: cdc_ncm 4-1:2.0 enp0s20u1c2: unregister ‘cdc_ncm’ usb-0000:00:14.0-1, CDC NCM (NO ZLP)
Mar 15 12:29:02 secret kernel: xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Mar 15 12:29:02 secret dhcpd[1419]: receive_packet failed on enp0s20u1c2: Network is down
Here is the output from the 3 commands…unfortunately, nothing useful
$ systemctl is-active dhcpd.service
active
$ systemctl is-failed dhcpd.service
active
$ systemctl status dhcpd.service
â— dhcpd.service - DHCPv4 Server Daemon
Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; enabled; preset: disabled)
Active: active (running) since Wed 2023-03-15 07:12:39 CDT; 5h 17min ago
Docs: man:dhcpd(8)
man:dhcpd.conf(5)
Main PID: 1419 (dhcpd)
Status: “Dispatching packets…”
Tasks: 1 (limit: 38309)
Memory: 10.3M
CPU: 901ms
CGroup: /system.slice/dhcpd.service
└─1419 /usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid
FYI my initial reply was flagged as spam, reviewed and subsequently posted. I don’t think you got notified of that reply, however as it’s not showing as a reply to your post.
I added a couple of conditions which you can adjust as needed:
- A matching connection name.
- Matching error messages since system boot.
Wow thank you so much for your help here! I will try the updated script and let you know what how it goes. I was away all day and won’t be able to apply changes until tomorrow but I will be sure to advise.
Again, thank you so much for your help!
FYI, I tested by briefly disconnecting the USB adapter again and the new script works as expected. Based on the new conditions you added, I am confident it will only trigger when there is a USB disconnect and not at boot.
Thank you once again for your help!
(PS…I think the kernel update that I got may have fixed the source of the problem but having this solution as backup will make my network much more reliable.)