X11 huge delay for each process (VMware Workstation); regression

Recent changes made by Fedora to the X11 subsystem have created a major performance regression for users of VMware Workstation 17.5:

Every single GUI process in a Fedora Linux guest installation on X11 gets a startup delay of approximately two seconds (2s!) .

I’d be very happy to learn how to fix this, as I’d really like to stay on Fedora Linux.

To reproduce the problem,

  • install a fresh Fedora Linux guest into VMware Workstation, preferably the KDE spin ( Fedora KDE Plasma Desktop | The Fedora Project)
  • log into X11 session
  • sudo dnf update --refresh
  • reboot
  • log into X11 session
  • sudo dnf install ltrace
  • ltrace -T -o glxinfo64.ltrace glxinfo64 -B
  • grep glXChooseVisual glxinfo64.ltrace

For me, this yields

glXChooseVisual(0x555b64bb9da0, 0, 0x555b63650020, 0)              = 0x555b64bda0e0 <2.272554>

which means that the call to “glXChooseVisual” took 2.272554 seconds to complete.

This call is made by almost every single X11 application on startup, to initialize the X11 client. It is supposed to return almost immediately.

The net effect of this behaviour is that

  • KDE Plasma startup is very slow on X11
  • launching any GUI process on X11 takes that two second startup penalty

This behaviour is unique to Fedora and its provisioning of X11; it has crept in over the last couple of weeks (on F38, now F39).

An Archlinux installation, on the same physical+virtual hardware combination, has perfectly good “glXChooseVisual does not consume any time to complete” behaviour. As a result, KDE on X11 is a joy to use on Archlinux, while this is much less true of Fedora since recent updates.

Note that an Ubuntu 22.04 LTS install also shows undesirable performance characteristics - glXChooseVisual takes some substantial amount of time (on the order of 500ms), but this is not as massive as on current Fedora.

While I would love to switch to Wayland, alas, KDE / Plasma 5 only works well with X11 on the VMWare Workstation 17.5 virtualization stack. I am not keen on switching to GNOME (Wayland).

Any suggestions?

FWIW, on a fresh and then fully patched F37 installation, glXChooseVisual returns within reasonable time: 0.08 seconds.

So F39 is slower than F37 by a factor of 25. And that really hurts.

glXChooseVisual(0x556f3e454da0, 0, 0x556f3c914040, 0)              = 0x556f3e475da0 <0.082861>

Flamegraph produced by hotspot (sudo dnf install perf hotspot):

F39:

F37:

While I understand the frustration with slowdown, I must note that you are using VMWare which is proprietary and does not follow along with the rapid changes done in fedora.

If this is a production machine then maybe you should consider slowing down the upgrades until everything has time to be tested and problems worked out before you do the version upgrades. Enterprise IT staff often lag several months behind the vendor upgrade timing to allow planning and testing to ensure the upgrades do not cause problems within the enterprise itself.

I suspect this is something that is within the vmware software that does not like the newer kernels and libraries on F39.

If you wish to avoid the potential problems that seem to come from vmware I might suggest that you switch to libvirt (native on fedora) or VirtualBox (from oracle linux) which is also packaged for direct install on fedora. Both are 100% free to use for as many VMs as you choose.

Appreciate your insight on proprietary vs Free software :slight_smile: VMware Workstation really has the best implementation of a virtual graphics adapter, so VMware Workstation it is for that X11 use case. In other scenarios, libvirt (kvm) or other hypervisors are better.

On the technical side regarding the challenge at hand, compared to F39, Archlinux has an even more current kernel and mesa - and everything flies on Archlinux.

The major difference between F39 and Archlinux is the packaging - i.e. “the distro work”. Archlinux out of the box is very light in that department, F39 does a whole lot more. I suspect that some patch or some config option added 2-3 months ago(?) on Fedora (F38/F39) may have down-tuned things.

Having thought about it, I might actually try going for an F38 Silverblue installation and then roll this back and forth to hit the point where F38 “flipped” (distro-bisecting - is that a thing?).

For me I get

glXChooseVisual(0x5643edfe4da0, 0, 0x5643ec1e6020, 0)                           = 0x5643ee008230 <0.056001>

on bare metal.

Without ltrace the command returns within a fraction of a second.

Silverblue F38 “initial release” (the ISO) with GNOME on X11 is not suffering from the problem.

Silverblue F38 “current” is suffering from the problem (and, reminder, Archlinux is even more current than that and works fine).

Let’s learn about the suitability of Updates, Upgrades & Rollbacks :: Fedora Docs for bisecting :slight_smile:

1 Like

glXChooseVisual working fine on bare metal is expected, I guess :wink:

This challenge here seems to be tied to distribution-specific choices made on deploying X11, affecting performance under VMware Workstation. Archlinux with its mininal+rolling-so-everything-is-current works fine.

Right now I can tell that in Fedora Linux this regression was introduced between April 2023 and “today”, having looked at Silverblue F38. I’ll try to find the time to bisect that properly.

Having thought about it, I might actually try going for an F38 Silverblue installation and then roll this back and forth to hit the point where F38 “flipped” (distro-bisecting - is that a thing?).

Silverblue and OSTree system in general are uniquely well suited to “distro-bisecting” as you call it. I have done the same thing myself to track down a regression in CUPS.

# 38.20230901.0 good
# 38.20231018.0 good
# 38.20231025.0 good
# 38.20231027.0 good
# 38.20231028.0 good 47f696a1af532ec01243e412f728a6c7ff7f217096b64af5c514d5ccb5ffd972
# 38.20231029.0 bad  472c26c865d1fa428f1153d692fc6097245783843899a2ac9d54fa1f72c20d4d
# 38.20231030.0 bad
# 38.20231113.0 bad

[stefan@fedora ~]$ rpm-ostree db diff 47f696a1af532ec01243e412f728a6c7ff7f217096b64af5c514d5ccb5ffd972 472c26c865d1fa428f1153d692fc6097245783843899a2ac9d54fa1f72c20d4d
ostree diff commit from: 47f696a1af532ec01243e412f728a6c7ff7f217096b64af5c514d5ccb5ffd972
ostree diff commit to:   472c26c865d1fa428f1153d692fc6097245783843899a2ac9d54fa1f72c20d4d
Upgraded:
  libdrm 2.4.114-2.fc38 -> 2.4.117-1.fc38
  mesa-dri-drivers 23.1.8-1.fc38 -> 23.1.9-1.fc38
  mesa-filesystem 23.1.8-1.fc38 -> 23.1.9-1.fc38
  mesa-libEGL 23.1.8-1.fc38 -> 23.1.9-1.fc38
  mesa-libGL 23.1.8-1.fc38 -> 23.1.9-1.fc38
  mesa-libgbm 23.1.8-1.fc38 -> 23.1.9-1.fc38
  mesa-libglapi 23.1.8-1.fc38 -> 23.1.9-1.fc38
  mesa-libxatracker 23.1.8-1.fc38 -> 23.1.9-1.fc38
  mesa-va-drivers 23.1.8-1.fc38 -> 23.1.9-1.fc38
  mesa-vulkan-drivers 23.1.8-1.fc38 -> 23.1.9-1.fc38
  python3-regex 2023.6.3-1.fc38 -> 2023.10.3-1.fc38
  xorg-x11-server-Xorg 1.20.14-23.fc38 -> 1.20.14-26.fc38
  xorg-x11-server-Xwayland 22.1.9-2.fc38 -> 22.1.9-3.fc38
  xorg-x11-server-common 1.20.14-23.fc38 -> 1.20.14-26.fc38

I am not surprised by the set of packages.

Could someone please provide input on how to best isolate this further?

The finest unit of granularity here is the Silverblue snapshot, but, alas, this combines

  • a (most likely irrelevant) Python module (“regex”) update
  • libdrm upgrade from 2.4.114-2 to 2.4.117-1
  • mesa from 23.1.8-1 to 23.1.9-1
  • X11 from 1.20.14-23 to 1.20.14-26

How can I just go, say, libdrm 2.4.114-2 on Silverblue?

1 Like

From 38.20231028.0, you can selectively add individual updates as overrides. First, find the update on bodhi: https://bodhi.fedoraproject.org/updates/?search=&packages=libdrm&releases=F38

Then:

rpm-ostree override replace https://bodhi.fedoraproject.org/updates/FEDORA-2023-5c58d0641b

The update to libdrm from 2.4.114 to 2.4.117 changed the behaviour significantly, and massively to the worse.

(EDIT: For reference, the bodhi commit to enable this upgrade is Commit - rpms/libdrm - 0bc13e3b32f5fa9165aa4e751a03b71991add30a - src.fedoraproject.org and that really only switched to the other libdrm tag, nothing on top)

Given Fedora Updates System (fedoraproject.org) and with

rpm-ostree override replace https://bodhi.fedoraproject.org/updates/FEDORA-2023-5c58d0641b

applied, the time to complete the call to glXChooseVisual goes up by 300% (0.8 seconds to 3 seconds - the individual durations vary depending on my current environment)

For reference,

Now 2.4.117 is very very fine on Archlinux, with

uftrace -P record glxinfo -B | grep glXChooseVisual

            [  9028] | glXChooseVisual() {
  24.501 ms [  9028] | } /* glXChooseVisual */

Note that Archlinux really returns after 0.024 seconds, as opposed to the Fedora F38 installation in the good state returning after 0.8 seconds. This is really noticeable for me, as a human interacting with the system.

The reason for that difference different between Archlinux and F38, I suspect, will be rooted in the same cause that made Fedora F38 regress from 0.8 seconds to almost 3 seconds.

Right now I am only looking at the F38 performance regression, though.

For full disclosure, this is the running Fedora system with the performance regression:

rpm-ostree status

State: idle
Deployments:
● fedora:fedora/38/x86_64/silverblue
                  Version: 38.20231028.0 (2023-10-28T02:35:21Z)
               BaseCommit: 47f696a1af532ec01243e412f728a6c7ff7f217096b64af5c514d5ccb5ffd972
             GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464
           LocalOverrides: libdrm 2.4.114-2.fc38 -> 2.4.117-1.fc38
          LayeredPackages: langpacks-en ltrace
...

This is fresh out of the box, with the only configuration change being suppression of automatic software updates.

3 Likes

The last three commits of Commits · 5254fd1146b95a86fef1bb8e950d0146d829f3c4 · Mesa / drm · GitLab seem to be the only candidates - everything else in the diff between libdrm 2.4.114 and 2.4.117 appears to be either harmless or not applicable.

xf86drm: use drm device name to identify drm node type (3bc3cca2) · Commits · Mesa / drm · GitLab smells like it might have an impact; previously this was just a few CPU instructions, now it’s a lot of code calling somewhere - in particular the calls to drmGetDeviceName and access.

I’ll take a look at the before-and-after flamegraphs, I guess?

All additional ideas / insights much appreciated!

Disclosure: I have zero domain competence in the libdrm / X11 area.

It might be interesting to build locally, revert that single commit, then LD_PRELOAD=my-local-libdrm.so glxinfo64 -B?

What is the best way to create a reproducible build of libdrm, locally, out of Overview - rpms/libdrm - src.fedoraproject.org on that F38 Silverblue installation?

FWIW, me being unfamiliar with “immutable distro” + “toolbox” + “rpmbuild”, I am concerned about friction. I am totally fine with the concepts and ideas in general, and have successfully built / used custom mesa builds on Microsoft WSL2 once upon a time.

The code change to libdrm seems to expose a challenge in general distro system setup. The flamegraphs below show

  • left side == “good”
  • right side == “bad”

Key insight: in the bad situation, notice the call stack full of “nft_ct_pcpu_template” (nft_ct.c - net/netfilter/nft_ct.c - Linux source code (v6.5.11) - Bootlin) where-as none of that is present in the good state.

Hypothesis: the code change to libdrm causes the “network synchronization primitive” code path to be hit more frequently; that introduces a lot of latency.

Why is the network synchronization primitive being hit at all? (F38 goes 0.8 secs to 3 secs) And is that also the root cause for the pre-existing performance difference between Archlinux and F38? (Archlinux == 0.03 secs, F38 == 0.8 secs)

2 Likes

libdrm sits between the X server and the kernel. X clients communicate with the X server via a socket. This is part of the communication:

writev@SYS(3, 0x7fffcc34a080, 1)                                                                          = 12 <0.000057>
poll@SYS(0x7fffcc349fe8, 1, -1)                                                                           = 1 <2.529654>

Effectively, the X client makes a request on file descriptor 3 (which happens to be the X socket) and blocks, waiting for a response. That response (above) arrives 2.5 seconds later.

Nicer rendition from strace with parameter resolution:

writev(3, [{iov_base="\226\1\3\0\343\4\0\0\0\0\0\0", iov_len=12}], 1) = 12 <0.000>
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) <2.744>
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\1\30\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], msg_controllen=24, msg_flags=0}, 0) = 32 <0.000>

That writev is the X protocol. I am unable to resolve what that is, as the tooling I have found did not work out on first sight.

How to debug that further?

Tracing the X server is challenging. Looking at the changes, I am quite keen on reverting that call to access, but I’d have to be able to build libdrm first, and then inject it into some recent plain Fedora (doesn’t have to be immutable).

How to Easily Patch Fedora Packages | Dan Aloni and Package Maintenance Guide :: Fedora Docs (fedoraproject.org) look like rather useful instructions?

You can use Fedora Workstation to reproduce the issue and verify that it disappears after downgrading the relevant package, then report a regression in that package on the bug tracker and contact the maintainer or open a PR reverting the change that causes the problem.

Good point: 2249838 – Regression performance VMware Workstation X11

Root-caused to a single line change, increasing DRM_MAX_MINOR (see Bugzilla at 2249838 – Regression performance VMware Workstation X11 for relevant commit and patch)

But, seriously, that only makes Fedora performance tolerable again; Archlinux works fine with that change, so that change actually hits a Fedora weakness elsewhere. I am unable to identify for lack of background knowledge in what is different between Archlinux and Fedora.

For the record, this general slowness problem exists on at least

  • OpenSUSE Tumbleweed (rolling, includes latest libdrm, so the patch will take this back into “tolerable” there, too - most likely)
  • Ubuntu 22.04 LTS (which doesn’t have libdrm at 2.4.117)

which implies that those distribution do the same as Fedora does, with the same negative consequences.

1 Like

The performance difference at large is becoming clearer now:

Fedora libdrm has been built with UDEV enabled.
Archlinux libdrm has been built in state !UDEV.

The material effect of this can then be found in

static int drmOpenDevice(dev_t dev, int minor, int type)

Fedora will keep polling in a relatively tight loop (with constant back-off of usleep(20);) for a render device to appear, for up to 50 iterations (wait_for_udev).

Archlinux simply does one stat and is done.

The fun thing here is that these are the minor type render nodes that are being checked for, and for VMware Workstation there is apparently only ever one of those.

So Fedora Linux bangs on DRM_MAX_MINOR count of minor nodes for “quite a while”, with futility, and, yes that takes non-trivial amounts of time. Quadruple DRM_MAX_MINOR (which is what happened from libdrm 2.4.114 to 2.4.117) and that futile activity … quadruples … in duration from 0.8 seconds to approx 3 seconds.

Archlinux does all the same futile banging, it simply is much faster in getting the “nope” out there.

And that is the root cause for slow X11 application start on Fedora in general, topped by egregiously slow starts thanks to DRM_MAX_MINOR*4.

Now, building libdrm with DRM_MAX_MINOR == 1 will really speed things up on Fedora - but may cause a maintenance disaster.

Is Archlinux correct in having been built with !UDEV? … Quest to be continued?

1 Like