SSSD Broken after 41 upgrade on bazzite (Kinoite)

My SSSD setup from Fedora 40 is no longer allowing my domain user to log in after the upgrade to the first Fedora 41 image. This is on a bazzite-desktop install, though I don’t believe this issue could be unique to bazzite. SSSD is joined to a Windows Active Directory with the following /etc/sssd/sssd.conf:

[sssd]
domains = raysdomain.com
config_file_version = 2
services = nss, pam

[domain/raysdomain.com]
debug_level = 9
ad_domain = raysdomain.com
krb5_realm = RAYSDOMAIN.COM
realmd_tags = manages-system joined-with-samba
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = False
fallback_homedir = /home/%u
access_provider = ad
access_provider = simple
simple_allow_groups = linux-admins

The initial error was that /etc/krb5.keytab was not readable by the ‘sssd’ user, which I resolved by changing its permissions to 0644.

Oct 29 15:20:44 raypute ldap_child[6031]: krb5_kt_start_seq_get failed: Permission denied
Oct 29 15:20:44 raypute ldap_child[6031]: Failed to read keytab [FILE:/etc/krb5.keytab]: No suitable principal found in keytab

After doing that, SSSD will start but users cannot quite log in. I see this error in /var/log/sssd/krb5_child.log:

(2024-10-29 15:07:45): [krb5_child[4767]] [old_ccache_valid] (0x0040): [RID#124] Cannot check if saved ccache KCM: is valid
(2024-10-29 15:07:45): [krb5_child[4767]] [k5c_check_old_ccache] (0x0040): [RID#124] old_ccache_valid failed.
(2024-10-29 15:07:45): [krb5_child[4767]] [k5c_ccache_setup] (0x0020): [RID#124] Cannot check old ccache [KCM:]: [1][Operation not permitted]. Assuming old cache is invalid and not used.
(2024-10-29 15:07:45): [krb5_child[4767]] [k5c_precreate_ccache] (0x4000): [RID#124] Recreating ccache
(2024-10-29 15:07:45): [krb5_child[4767]] [become_user] (0x0200): [RID#124] Trying to become user [1011201125][1011200513].
(2024-10-29 15:07:45): [krb5_child[4767]] [become_user] (0x0020): [RID#124] setgroups failed [1][Operation not permitted].
(2024-10-29 15:07:45): [krb5_child[4767]] [main] (0x0020): [RID#124] become_user failed.
(2024-10-29 15:07:45): [krb5_child[4767]] [main] (0x0020): [RID#124] krb5_child failed!

I’m stuck at troubleshooting the ‘setgroups failed / Operation not permitted’ error here. Has anyone seen this before?

1 Like

I’m on Bluefin stable…

If you reboot into the 40-based ostree, are you able to login? I’m just wondering if this breaks the login altogether (e.g. by corrupting some data file)…

Also, did you try removing the computer from the domain and then rejoining?

Yes I did try leaving the realm, deleting the Computer object in AD, and rejoining keeping the default generated sssd.conf file. Same behavior. Will try going back to F40 next, but need to figure out how to do that exactly.

If this was your last update, you should have the previous image in your grub boot menu.

If not, check your current channel with:

# rpm-ostree status
State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: last run 21h ago
Deployments:
  ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:latest
...

You should get something ending with bazzite:latest or similar. Change the :latest with :stable or :40, for instance in my case for bluefin I would do:

# rpm-ostree rebase ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:stable

Okay, that took me a minute. I used the tool bazzite-rollback-helper (unique to bazzite) to go back to F40. Unfortunately SSSD was still broken, but it was due to the folders under /var/lib/sss and /etc/sssd being owned by sssd:sssd. I had to switch them back to being owned by root:root and then things were working again after leaving the realm and re-joining.

I’ve built two VMs: one on Kinoite 41 and another on Bazzite w/ F41. I’ve configured them with identical SSSD configurations connected to my AD domain. The vanilla Kinoite system has no problems whatsoever, while the bazzite system exhibits the same error with login in my original post.

So, I suppose the issue is somehow unique to bazzite… not sure how that’s possible yet.

I’ve managed to strace the krb5_child process but the output isn’t really telling me anything useful about the failing setgroups() call.

strace -v -s 1000 -f -o /tmp/strace.log /usr/sbin/sssd -i --logger=files

I am by no means an expert with strace so please do let me know if this is in fact helpful in a way I’m not realizing.

446  getsockopt(17, SOL_SOCKET, SO_PEERSEC, "kernel\0", [256 => 7]) = 0
3446  epoll_ctl(3, EPOLL_CTL_ADD, 17, {events=EPOLLIN|EPOLLRDHUP, data={u32=2170684144, u64=94792098906864}}) = 0
3446  epoll_wait(3, [{events=EPOLLIN, data={u32=2170684144, u64=94792098906864}}], 1, 4462) = 1
3446  recvfrom(17, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0", 1536, 0, NULL, NULL) = 20
3446  epoll_ctl(3, EPOLL_CTL_DEL, 17, 0x7ffccd093504) = 0
3446  epoll_ctl(3, EPOLL_CTL_ADD, 17, {events=EPOLLOUT, data={u32=2170684144, u64=94792098906864}}) = 0
3446  epoll_wait(3, [{events=EPOLLOUT, data={u32=2170684144, u64=94792098906864}}], 1, 4462) = 1
3446  sendto(17, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0", 20, 0, NULL, 0) = 20
3454  <... poll resumed>)               = 1 ([{fd=3, revents=POLLIN}])
3446  epoll_ctl(3, EPOLL_CTL_DEL, 17, 0x7ffccd093504 <unfinished ...>
3454  read(3,  <unfinished ...>
3446  <... epoll_ctl resumed>)          = 0
3454  <... read resumed>"\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
3446  epoll_ctl(3, EPOLL_CTL_ADD, 17, {events=EPOLLIN|EPOLLRDHUP, data={u32=2170684144, u64=94792098906864}} <unfinished ...>
3454  poll([{fd=3, events=POLLIN}], 1, 300000 <unfinished ...>
3446  <... epoll_ctl resumed>)          = 0
3454  <... poll resumed>)               = 1 ([{fd=3, revents=POLLIN}])
3446  epoll_wait(3,  <unfinished ...>
3454  read(3, "\1\0\0\0", 4)            = 4
3454  write(24, "(2024-10-30  9:50:37): [krb5_child[3454]] [become_user] (0x0200): [RID#5] Trying to become user [1011201125][1011200513].\n", 122) = 122
3454  geteuid()                         = 995
3454  setgroups(0, NULL)                = -1 EPERM (Operation not permitted)
3454  write(24, "(2024-10-30  9:50:37): [krb5_child[3454]] [become_user] (0x0020): [RID#5] setgroups failed [1][Operation not permitted].\n", 121) = 121
3454  capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
3454  capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0
3454  capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0

I’m soo close I can taste it, lol. ldap_child needs the CAP_SETUID capability to run setgroups(), but doesn’t have it on the bazzite system. How can I grant this?

Kinoite system:

(2024-10-30 14:28:39): [ldap_child[5257]] [main] (0x0100): Starting under uid=995 (euid=995) : gid=993 (egid=993)
(2024-10-30 14:28:39): [ldap_child[5257]] [main] (0x0100): With following capabilities:
                   CAP_CHOWN: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
            CAP_DAC_OVERRIDE: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
                  CAP_SETGID: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*
                  CAP_SETUID: effective = *1*, permitted = *1*, inheritable =  0 , bounding = *1*

Bazzite system:

(2024-10-30 14:28:40): [ldap_child[2861]] [main] (0x0100): Starting under uid=995 (euid=995) : gid=993 (egid=993)
(2024-10-30 14:28:40): [ldap_child[2861]] [main] (0x0100): With following capabilities:
   (nothing)

It’s not selinux, it’s not obvious permissions or user settings… what is it?

I have looked up the capabilities of the binaries used by SSSD’s child processes using getcap:

kinoite system:

root@kinotest:/usr/libexec/sssd# getcap *
krb5_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
ldap_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
selinux_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
sssd_pam cap_dac_read_search=p

bazzite system:

root@bazzite:/usr/libexec/sssd# getcap *
selinux_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep

But I cannot modify these binaries as they are in the /usr/ folder. :frowning:

root@bazzite:/usr/libexec/sssd# setcap 'cap_dac_read_search=p' sssd_pam
Failed to set capabilities on file 'sssd_pam': Read-only file system

I think you need to open a bug report in Bazzite. They may be applying their own security policy? Head over to: https://universal-blue.discourse.group/new

Start thread there and post a link here so that people can follow. I am interested because I am also using SSSD on a ublue system (though mine is Bluefin). If this is working in Aurora, they should be able to trace it quickly and fix.

I actually did already, forgot to cross-post here. SSSD binaries missing capabilities in Bazzite 41 vs. Kinoite 41 · Issue #1818 · ublue-os/bazzite · GitHub