Ceph client (and rbd, rbd-nbd, etc) segfault on Fedora 39

,

Hey y’all - seems like Ceph is broken on F39 currently, maybe anything which uses librbd is too. Can anyone else confirm?

Cluster is 18.2.0 as is the client, F38 machines have no trouble (though that’s Ceph 17.2.6). ‘ceph’ itself segfaults quietly, rbd is quite loud so not including all of that (available if needed though).

Might just need to wait for 18.2.1 but hoping it’s just a missed library or something :slight_smile:

Cheers
-Alex

Same issue here with Fedora 39. It’s not a problem with my client configuration since I tried both a rollback and a Fedora 38 fresh install and everything works smoothly there. Libvirt is also affected so I suspect librbd too.

CephFS mounts and works fine.

Might try compiling the old and new packages manually to compare.

I have this issue as well. It seems to be safe_timer within /usr/lib64/ceph/libceph-common.so.2. Steps to reproduce are basically as follows:

Install Fedora 39
Make sure its up to date, on the latest kernel.
Type ceph.
segfault!

Any of the underlying commands also segfault. I have tried launching ceph-mon manually as well as other binaries. Everything segfaults. Filling out /etc/ceph/ceph.conf with a basic cluster still segfaults.

I am grabbing the new build files from F40 just as a last resort to try.

really regretting my upgrade to F39 at this point. Ran into this same issue after also running into the lvm leaks issue. :confused: Did @guinness ever find a solution?

Still nada here. Ceph has upgraded to 18.2.1 and still segfaulting. Lucky I only upgraded my clients…

Filed a bug… pushing it internally
https://bugzilla.redhat.com/show_bug.cgi?id=2252160

Thank you (and apologies for the laziness in not doing so) :slight_smile:

There was a ticket filed already and added my own trace, but I noticed it’s been sitting there for a while: 2241339 – [abrt] ceph-common: std::_Rb_tree_rebalance_for_erase(): python3.12 killed by SIGSEGV

I do regret upgrading my main PC to Fedora 39. I can’t use my RBD-based VMs. Thinking about rolling it back :frowning:

I did not. I pulled in the F40 RPMs, because sometimes you can get away with that. But the dependency tree goes quite deep (new to ceph, just starting to learn) for ceph so it didn’t work.

I can at least confirm it is still broken for me with the latest updates.

But that’s the thing about installing a beta release. Sometimes things break :slight_smile: , though I will say Fedora has always been rather stable even in its beta for me.

Fedora 39 isn’t in beta…

It was when I installed it. This issue was present in the beta.

If this was an OS that was proprietary and single source software it would not be released with packages that had significant bugs (Think Microsoft and Apple).

Since Linux in general and Fedora in particular are fully open source and use software written by hundreds or thousands of developers it is impossible to test for all hardware and software combinations so problems do sometimes appear. Report the bugs on bugzilla so the developers are aware of problems and they probably would be able to address the issue. If not reported then they cannot know of the problem.

It was already reported, and linked to, earlier in this thread. Come on y’all.

Yeah I’m not really sure what he’s preaching about. It was reported awhile back. I think he’s just looking for things to preach about. The bug has been reported. RH and ceph devs are aware of the issue. Fedora is great everyone simmer down.

This is an LTO bug. The object file has the correct code, but the linked library does not. Changing -fto=auto to -fno-lto fixes the problem. Trying to minimize the linker command line now.

Recompile the rpms and in theory itll fix the issue?

FWIW - I updated the above BZ with some instructions on how I made ceph packages that worked for until the real bug is resolved. This is based on Hector Martin’s work. Details are here: 2241339 – [abrt] ceph-common: std::_Rb_tree_rebalance_for_erase(): python3.12 killed by SIGSEGV