X11 huge delay for each process (VMware Workstation); regression

DRM_MAX_MINOR == 1 on Fedora Linux yields

glXChooseVisual(0x55dcebf0fda0, 0, 0x55dcea489020, 0)                                                                                           = 0x55dcebf30110 <0.112807>

With the default build, that’s 0.7 seconds more.

This is very much a perceptible improvement for anyone who needs / wants to stay on a native X11 session.

And the general fix, and hopefully the generally robust fix, to benefit all users of libdrm alike, switches from using the C define DRM_MAX_MINOR to calling drmGetDevices2 within libdrm.

I’ll try to get something approximating the below upstreamed into libdrm, it makes a Very Pleasant Difference:

diff --git a/xf86drm.c b/xf86drm.c
index ebc60956..55390a62 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -1088,7 +1088,7 @@ static const char *drmGetMinorName(int type)
  * \return a file descriptor on success, or a negative value on error.
  *
  * \internal
- * This function attempts to open every possible minor (up to DRM_MAX_MINOR),
+ * This function attempts to open every possible minor,
  * comparing the device bus ID with the one supplied.
  *
  * \sa drmOpenMinor() and drmGetBusid().
@@ -1097,6 +1097,7 @@ static int drmOpenByBusid(const char *busid, int type)
 {
     int        i, pci_domain_ok = 1;
     int        fd;
+    int        device_count;
     const char *buf;
     drmSetVersion sv;
     int        base = drmGetMinorBase(type);
@@ -1105,7 +1106,14 @@ static int drmOpenByBusid(const char *busid, int type)
         return -1;
 
     drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
-    for (i = base; i < base + DRM_MAX_MINOR; i++) {
+
+    device_count = drmGetDevices2(0, NULL, 0);
+    if (device_count < 0) {
+        drmMsg("drmOpenByBusid: drmGetDevices2 failed with error code %d\n", device_count);
+        return -1;
+    }
+
+    for (i = base; i < base + device_count; i++) {
         fd = drmOpenMinor(i, 1, type);
         drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
         if (fd >= 0) {
@@ -1161,6 +1169,7 @@ static int drmOpenByName(const char *name, int type)
 {
     int           i;
     int           fd;
+    int           device_count;
     drmVersionPtr version;
     char *        id;
     int           base = drmGetMinorBase(type);
@@ -1172,7 +1181,13 @@ static int drmOpenByName(const char *name, int type)
      * Open the first minor number that matches the driver name and isn't
      * already in use.  If it's in use it will have a busid assigned already.
      */
-    for (i = base; i < base + DRM_MAX_MINOR; i++) {
+    device_count = drmGetDevices2(0, NULL, 0);
+    if (device_count < 0) {
+        drmMsg("drmOpenByName: drmGetDevices2 failed with error code %d\n", device_count);
+        return -1;
+    }
+
+    for (i = base; i < base + device_count; i++) {
         if ((fd = drmOpenMinor(i, 1, type)) >= 0) {
             if ((version = drmGetVersion(fd))) {
                 if (!strcmp(version->name, name)) {
@@ -3303,6 +3318,7 @@ drm_public char *drmGetDeviceNameFromFd(int fd)
     struct stat sbuf;
     dev_t d;
     int i;
+    int device_count;
 
     /* The whole drmOpen thing is a fiasco and we need to find a way
      * back to just using open(2).  For now, however, lets just make
@@ -3312,15 +3328,19 @@ drm_public char *drmGetDeviceNameFromFd(int fd)
     fstat(fd, &sbuf);
     d = sbuf.st_rdev;
 
-    for (i = 0; i < DRM_MAX_MINOR; i++) {
+    device_count = drmGetDevices2(0, NULL, 0);
+    if (device_count < 0) {
+        drmMsg("drmGetDeviceNameFromFd: drmGetDevices2 failed with error code %d\n", device_count);
+        return NULL;
+    }
+
+    for (i = 0; i < device_count; i++) {
         snprintf(name, sizeof name, DRM_DEV_NAME, DRM_DIR_NAME, i);
         if (stat(name, &sbuf) == 0 && sbuf.st_rdev == d)
-            break;
+            return strdup(name);
     }
-    if (i == DRM_MAX_MINOR)
-        return NULL;
 
-    return strdup(name);
+    return NULL;
 #endif
 }

I have looked into this some more, paying closer attention to the semantics.

Most of the above patch will not work correctly under udev in the general case.

Basically, in the udev case, the existing setup does a very very implicit dance on “wait until device has been created by udev, then return it”, while drmGetDevices2 outright enumerates and returns.

The key challenge (in the udev case) is to “wait short enough for udev to pop up that device some place in /dev/dri”. For the current looping implementation, and the way things are written, this results in wait times scaling proportionally with DRM_MAX_MINOR. That’s hardly the intent…

Final post on this matter:

These challenges are ultimately down to a combination of rootless Xorg (X11), udevd, virtualization software, and a somewhat crude polling strategy in libdrm. Fedora Linux is affected because the way the distribution has been set up - but the exact same challenges apply to, say, OpenSUSE and Ubuntu.

To scratch my personal itch, I have created Accelerate device open on rootless X11 · shoffmeister/drm@db85c9d (github.com) - and once you notice that the commit message is massively longer than the straight-forward change itself, you’ll either tune out or be totally pleased that this enables you to restore a lovely responsive X11 desktop environment under virtualization.

This is kept under Commits · shoffmeister/drm (github.com). I am not sure whether I will maintain any of that, as I do hope that KDE will start working well on Wayland and VMware Workstation in the near future.

1 Like