F40 Change Proposal: Enable IPv4 Addresses Conflict Detection by Default

Enable IPv4 Address Conflict Detection by default

Wiki
Announced

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

:link: Summary

Enable IPv4 Address Conflict Detection by default in NetworkManager.

:link: Owner

:link: Detailed Description

A common source of networking issues is the presence of duplicate IPv4 addresses in the same physical network. Such problems are quite common, and at the same time hard to diagnose for users.

To the rescue comes RFC 5227 (“IPv4 Address Conflict Detection”) which provides a mechanism to detect address conflicts. A host implementing Address Conflict Detection (from now on “ACD”) sends ARP probes for each IP address it wants to use; if another host replies, the address is already in use and can’t be configured on the interface.

Note that this mechanism applies to both static and DHCP addresses. It might seem unnecessary for DHCP, as a well-behaving server should give out unique leases; however, there could be hosts on the network not using DHCP. Indeed, RFC 2131 (Dynamic Host Configuration Protocol) specifies that the client should probe the newly received address and should send a DHCPDECLINE to the DHCP server if the address is already in use.

In Fedora 39, ACD is disabled by default; it can be enabled by setting property “ipv4.dad-timeout” to a positive value in a connection profile. The property name contains “DAD” which stands for “duplicate address detection” and is another name of ACD. The property specifies the maximum timeout in milliseconds used to check for the presence of duplicate IP addresses on the network. If a duplicate is found, a warning is logged; in the DHCP case, NetworkManager tries to get a different lease, while in the static case, the address is just skipped.

This change aims at enabling ACD by default in Fedora 40, by setting the default value to 3000ms. Note that this change is only about IPV4; IPv6 always performs a duplicate check for each address that is configured, as specified by RFC 4862.

:link: Benefit to Fedora

NetworkManager will not configure IPv4 addresses that are detected as duplicate. This will save users from having to debug weird connectivity issues. Instead, NetworkManager will report an error and will indicate the MAC of the conflicting host.

:link: Scope

  • Proposal owners: change the default value, test that no regression is seen in the upstream test suite.

  • Other developers: N/A (not needed for this Change)

  • Release engineering: #Releng issue number

  • Policies and guidelines: N/A (not needed for this Change)

  • Trademark approval: N/A (not needed for this Change)

:link: Upgrade/compatibility impact

The change in default behavior will affect all users that install or upgrade to the new Fedora release.

:link: How To Test

To test the effect of the change on F39, add the following configuration snippet to file /etc/NetworkManager/conf.d/20-ipv4-dad.conf and then restart the NetworkManager service:

[connection-dad-default] ipv4.dad-timeout=3000

To trigger a conflict, configure the local machine with a static address that is already in use by another host. When bringing up the connection, it will fail and report an address conflict.

:link: User Experience

Enabling ACD will cause an additional delay when bringing up interfaces, because NetworkManager needs first to probe the address. The delay is between 1.5 and 3 seconds, because RFC 5227 requires that the probe interval is randomized. The delay will affect both static and DHCP connections.

In case users want to avoid this delay, ACD can be disabled for the specific connection profile by setting property ipv4.dad-timeout=0, or globally by adding the following configuration snippet to /etc/NetworkManager/conf.d/20-ipv4-dad.conf:

[connection-dad-default] ipv4.dad-timeout=0

Apart from this small delay, the big advantage of this change is that users will be able to discover the potential conflict immediately. If the address is static, the activation will fail and report an error. For DHCP, NetworkManager will send a DHCPDECLINE message to the server and it will try to get a different lease. In all cases, the conflicting address will be skipped and the network will not be brought in an inconsistent state.

:link: Dependencies

N/A

:link: Contingency Plan

  • Contingency mechanism: Revert the change, try again the next Fedora release.
  • Contingency deadline: Beta freeze
  • Blocks release? No

:link: Documentation

The “nm-settings” man page will indicate the new default value. No other documentation changes are required.

:link: Release Notes

The change needs to be mentioned in the release notes.

2 Likes

Great change and I am generally in favor, but a few questions for you:

  • I assume this will mean all connections? wired, wireless, etc?

  • I assume this is a distro-wide change, affecting all editions?
    I would expect there might be some that wouldn’t want this change

    although I guess the only cost is a slight slowdown in activation.
    (ie, cloud for example might not hit duplicates very much?)

  • What does this look like on workstation or other desktops when it gets
    a duplicate and fails? I guess just a ‘network failed to activate’ and
    users would need to investigate more? Is there any way to get the
    ‘duplicate ip detected’ message up to the GUI? Or is that likely too
    difficult?

Thanks for proposing it!

2 Likes

I agree, this is a nice change.

I had the same question.

1 Like

I assume this will mean all connections? wired, wireless, etc?

It affects all connections. Users can opt out for specific connections by editing the connection via nmcli, or globally via configuration file.

I assume this is a distro-wide change, affecting all editions?

Yes, the change in default will be in the NM daemon and thus it will affect all editions.

I would expect there might be some that wouldn’t want this change
 although I guess the only cost is a slight slowdown in activation. (ie, cloud for example might not hit duplicates very much?)

Right, the cost is minimal - the proposal is about using a 3 seconds timeout, but based on the feedback on the mailing list I think we are going to decrease the timeout to 1 second or less. If the additional delay is not desired on a specific edition (cloud?), it can be disabled via a configuration snippet. But I don’t expect that the delay will cause any particular problem.

What does this look like on workstation or other desktops when it gets a duplicate and fails? I guess just a ‘network failed to activate’ and users would need to investigate more?

If all IPv4 addresses are detected as duplicate, IPv4 is considered failed for the interface. If IPv6 is enabled, NM will try to configure IPv6 via SLAAC/DHCPv6 within 30 seconds. If IPv6 succeeds, the activation will succeed and only the journal will show the warning about the duplicate IPv4 address.
If IPv6 also fails, the connection will fail and so there will be a user-visible error. In case IPv6 is disabled, the connection fails as soon as ACD detects the conflict(s).

Is there any way to get the ‘duplicate ip detected’ message up to the GUI? Or is that likely too
difficult?

It’s possible to set the activation failure reason to a specific error code that means “duplicate address detected”. Then, when (and if) the activation fails, GUI tools can potentially display that reason; however at the moment GNOME shell and the GTK applet don’t display the reason, they just say “activation failed”.

An other possibility would be that the D-Bus interface of NM gets extended with a new signal that GUIs can subscribe to in order to get important messages for users (like the duplicate IP). This has the advantage that it would be independent from the activation result - for example if IPv4 fails due to the conflict and IPv6 succeeds, NM could still signal to users the conflict via this new API.

In practice, for now users will have to check logs. But that seems still an improvement over the previous situation, where conflicts are not detected and connectivity just breaks without any warning.

1 Like

This change proposal has now been submitted to FESCo with ticket #3131 for voting.

To find out more, please visit our Changes Policy documentation.

1 Like

Talked with @bengal about this change. He pointed out that the timeout value was adjusted from the original proposal from 3s down to 200ms. I think this alleviates some concern I had where in situations like cloud environments (tightly controlled, conflicts unlikely) we’d be paying a 1-3s penalty for no reason. 200ms seems much more reasonable.

1 Like