I use fedora silverblue 35. I need to use docker because as a Software developer I use testcontainers which only supports running containers via docker.
It works, I can start my integrations test which start a Mysql database via testcontainers and rootless docker in a toolbox. The only problem is, that it is slower (55 sec) than when I start the same integration test in Windows (35 sec). Setting up the mysql database via sql statements takes much longer when starting the integration test in fedora. So IO seems to be much slower.
If it’s true most of the workload is IO, then it should also be true the time hit would show up with higher IO latency. I’m not sure how to measure that on Windows for a given workload in order to compare to the same setup in Fedora. But bcc-tools provides a variety of latency related tools to help narrow down where it’s coming from. You could start out with fileslower default of 10ms and see if you get something frequently hammered, with higher than expected time (that’s the tricky part, what’s expected?). There’s also file system specific tools, ext4slower, xfsslower, btrfsslower, so you can narrow down if it’s a file system induced delay. Finally there’s biolatency to find issues with the block layer and device itself. Granted it seems unlikely to be here since it’s not happening on Windows but might still be worth checking just to be thorough.
Does the database have WAL enabled persistently? Hopefully it already does, but that’s an important optimization for databases running on COW filesystems like Btrfs. It is possible to disable COW on Btrfs, and see if that improves performance, but this is not intended as a general recommendation. Rather, it’s intended to narrow down the problem, and see if the behavior is expected (there are btrfs kernel developers we can ask about it if it turns out that’s the main source of the slowness).
The bcc-tools also have database related latency tools, I’m not familiar with them so I can’t give any advice there.
The performance hasn’t been improved. When I switched to a newer mysql 8 image, the performance got even worse. Not it the test takes about on and half minutes instead of 55 sec with mysql 5.7.
What takes long is starting the container and executing the ddl statements at the beginning of the test. I see that because the statements are getting logged.
I had a similar problem using podman in which a project with a large shared volume took many minutes to start. In my case, the problem was that the security context was being set for every file in the shared volume which took almost three minutes in my case. I don’t know what the solution might be for you, but this is, at least, an avenue to investigate.
Just did a clean reinstall of fedora silverblue 35 with xsf file system. The performance is nearly as bad as before (~ 50 sec instead of 35 sec with ubuntu in windows wsl2). So the btrfs file system shouldn’t be the cause for this issue.
Windows: SkHynix (the original one of the lenovo p15)
Fedora: Samsung 980 PRO
Both are fast. So that shouldn’t be the problem.
It’s like searching for a needle in a haystack. Don’t know what (and how) to investigate further. Due to the performance issues I’ll have to stick to windows a little bit futher I think :-(.
Is it slow just with rootless docker and podman? Is it faster if it’s rootful? (I’m a filesystems guy, not a container guy.) I know at one point, might still be the case, that podman uses fuse-overlayfs for rootless and kernel overlayfs for rootful. I don’t know about docker or moby rootless.
@beroset@mschwartau What kind of drive? Spinning HDD? SATA/SAS SSD? Or NVMe? Is dm-crypt being used?
@mschwartau thanks for doing the clean install on XFS to help determine this isn’t btrfs specific.
I am aware of a couple bugs that sound similar, not container related at all, but the SELinux label might be a commonality and it’s affecting at least ext4 and btrfs. When there’s a ton of files and they all need xattr set, this is sometimes an expensive metadata-centric workload.
So is this a one time hit the first time you run the container? Or does it get hit again each time you run it?
I think it’s worth going further to get this the proper attention by actual developers of these projects:
reproduce the problem with a default clean install of Fedora (any variant any filesystem) with both moby-engine and podman;
determine whether rootful and rootless have the same problem;
Those two things can still be done here in this thread. Next step though is
bring this up on #podman:libera.chat for IRC or #podman:matrix.org and see if this sounds familiar to anyone; if not then
file respective upstream bugs against moby-engine and podman
Note that podman is about to be rebased on podman 4. So you could consider jumping straight to bringing this up on podman and ask about whether to further test on Fedora 35 with 3.4.4 or if it’s better to test with Fedora 36 and podman 4 or if it doesn’t matter.
But the executions times are still as bad as before
Startet the mysql database via podman:
podman run --rm --rm --network=host --privileged=true <Our mysql 5.7 image name>
But executing the ddl (create table, add column, …) statements took still 25 secs. If the image is started via docker, it takes the same time. In Windows it’s much faster (5 - 6 seconds for the same statements).
I already reported this and it was closed as NOTABUG. It was deemed to be working as expected because I had thousands of files in a shared volume that needed a new security label.
Just tried to analyze it instead of just disabling random options and used strace for that. It’s obvious that something is broken. Therefore I started the mysql 5.7 docker image with the parameters --cap-add=SYS_PTRACE --security-opt seccomp=unconfined and installed the packages procps and strace in it.
I started straced just before the execution of the ddl statements and stopped strace after they have been executed. That’s the output of using strace with the -c parameter, which shows where the time is spent:
It looks like you tried accessing roofull docker from inside a toolbx which is a rootless container thus this did not work, then tried to run docker rootless inside a toolbx. This will most likely have performance issues as this is stacking overlayed filesystems on top of one another.
I would recommend that you try running docker/podman (service) rootless outside of the toolbx and accessing it from inside. That should avoid performance issues.
No I just startet rootless docker / podman outside toolbox. It is possible to start docker containers from inside the toolbox then, because the docker socket is accessible from inside the toolbox then.
Anyway, toolbox shouldn’t be the cause for my problem. Because I formatted my ssd and did a clean install PopOS and had the same issues with rootless docker. So it seems to be a rootless docker / linux / ssd (Samsung 980 PRO) problem.