Just tried to analyze it instead of just disabling random options and used strace for that. It’s obvious that something is broken. Therefore I started the mysql 5.7 docker image with the parameters
--cap-add=SYS_PTRACE --security-opt seccomp=unconfined
and installed the packages procps
and strace
in it.
I started straced just before the execution of the ddl statements and stopped strace after they have been executed. That’s the output of using strace with the -c
parameter, which shows where the time is spent:
root@847ab2babba1:/# strace -c -p $(pgrep -x mysqld)
strace: Process 1 attached
^Cstrace: Process 1 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
56.70 0.003059 3059 1 restart_syscall
40.95 0.002209 736 3 poll
1.22 0.000066 16 4 clone
0.43 0.000023 5 4 accept
0.35 0.000019 4 4 mmap
0.35 0.000019 4 4 mprotect
------ ----------- ----------- --------- --------- ----------------
100.00 0.005395 20 total
So a lot of systemcalls are restarted due to timeouts.
Then I did a strace again and wrote the calls into a file:
root@17cd01bcf845:/# strace -s2048 -f -o /tmp/strace.out -p $(pgrep -x mysqld)
...
oot@17cd01bcf845:/# cat /tmp/strace.out | grep restart
157 restart_syscall(<... resuming interrupted read ...> <unfinished ...>
153 restart_syscall(<... resuming interrupted rt_sigtimedwait ...> <unfinished ...>
152 restart_syscall(<... resuming interrupted rt_sigtimedwait ...> <unfinished ...>
146 restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
145 restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
144 restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
143 restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
141 restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
1 restart_syscall(<... resuming interrupted rt_sigtimedwait ...> <unfinished ...>
141 <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
143 <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
144 <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
146 <... restart_syscall resumed> ) = 0
153 <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
152 <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
145 <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
1 <... restart_syscall resumed> ) = 1
157 <... restart_syscall resumed> ) = 1
So something is broken. The retries shouldn’t happen, no wonder it is slow.
Regards
Meinert