Slow boot on AWS

I’m evaluating different distribution for a specific workload that I need to execute on AWS EC2, and I’m considering Fedora as well.

The workload requires many short-lived VMs that can run Podman 4.x and that must have relatively short boot times (e.g. < 10 seconds).

All the boot times in this thread have been retrieved using systemd-analyze.

The baseline would be Amazon Linux 2, which on my instance type boots in 5/6 seconds, but unfortunately doesn’t have an official repo for Podman 4.x.

I then went on and deployed Fedora Cloud 37 to give it a spin, and to be fair I was quite disappointed as, without any kind of customization and/or user-script, the typical boot time was around 50 seconds.

With Fedora Cloud 38, which is the version I’m trying right now, times are quite faster but we’re still over 20 seconds which is too much for my use case.

This is all the data I was able to gather:

$ systemd-analyze blame
 8.291s sys-devices-platform-serial8250-tty-ttyS18.device
 8.291s dev-ttyS18.device
 8.281s dev-ttyS17.device
 8.281s sys-devices-platform-serial8250-tty-ttyS17.device
 8.281s sys-devices-platform-serial8250-tty-ttyS14.device
 8.281s dev-ttyS14.device
 8.278s dev-ttyS1.device
 8.278s sys-devices-platform-serial8250-tty-ttyS1.device
 8.278s sys-devices-platform-serial8250-tty-ttyS15.device
 8.278s dev-ttyS15.device
 8.277s dev-ttyS11.device
 8.277s sys-devices-platform-serial8250-tty-ttyS11.device
 8.277s dev-ttyS16.device
 8.277s sys-devices-platform-serial8250-tty-ttyS16.device
 8.276s sys-devices-platform-serial8250-tty-ttyS10.device
 8.276s dev-ttyS10.device
 8.275s dev-ttyS12.device
 8.275s sys-devices-platform-serial8250-tty-ttyS12.device
 8.274s sys-devices-platform-serial8250-tty-ttyS19.device
 8.274s dev-ttyS19.device
 8.274s sys-devices-platform-serial8250-tty-ttyS20.device
 8.274s dev-ttyS20.device
 8.270s sys-devices-platform-serial8250-tty-ttyS21.device
 8.270s dev-ttyS21.device
 8.270s sys-devices-platform-serial8250-tty-ttyS2.device
 8.270s dev-ttyS2.device
 8.270s dev-ttyS13.device
 8.270s sys-devices-platform-serial8250-tty-ttyS13.device
 8.270s sys-devices-platform-serial8250-tty-ttyS23.device
 8.270s dev-ttyS23.device
 8.269s dev-ttyS22.device
 8.269s sys-devices-platform-serial8250-tty-ttyS22.device
 8.267s dev-ttyS24.device
 8.267s sys-devices-platform-serial8250-tty-ttyS24.device
 8.266s sys-devices-platform-serial8250-tty-ttyS25.device
 8.266s dev-ttyS25.device
 8.262s dev-ttyS26.device
 8.262s sys-devices-platform-serial8250-tty-ttyS26.device
 8.261s dev-ttyS28.device
 8.261s sys-devices-platform-serial8250-tty-ttyS28.device
 8.257s sys-devices-platform-serial8250-tty-ttyS31.device
 8.257s dev-ttyS31.device
 8.255s dev-ttyS4.device
 8.255s sys-devices-platform-serial8250-tty-ttyS4.device
 8.255s sys-devices-platform-serial8250-tty-ttyS3.device
 8.255s dev-ttyS3.device
 8.254s sys-devices-platform-serial8250-tty-ttyS7.device
 8.254s dev-ttyS7.device
 8.254s dev-ttyS6.device
 8.254s sys-devices-platform-serial8250-tty-ttyS6.device
 8.253s dev-ttyS27.device
 8.253s sys-devices-platform-serial8250-tty-ttyS27.device
 8.250s dev-ttyS5.device
 8.250s sys-devices-platform-serial8250-tty-ttyS5.device
 8.250s dev-ttyS29.device
 8.250s sys-devices-platform-serial8250-tty-ttyS29.device
 8.249s sys-devices-platform-serial8250-tty-ttyS9.device
 8.249s dev-ttyS9.device
 8.248s sys-devices-platform-serial8250-tty-ttyS8.device
 8.248s dev-ttyS8.device
 8.245s sys-devices-platform-serial8250-tty-ttyS30.device
 8.245s dev-ttyS30.device
 8.169s sys-module-configfs.device
 8.150s dev-ttyS0.device
 8.150s sys-devices-pnp0-00:04-tty-ttyS0.device
 8.046s dev-nvme0n1p5.device
 8.046s dev-disk-by\x2ddiskseq-1\x2dpart5.device
 8.046s dev-disk-by\x2did-nvme\x2dnvme.1d0f\x2d766f6c3039306630636566663839343262313161\x2d416d617a6f6e20456c617374696320426c6f636b2053746f7265\x2d00000001\x2dpart5.device
 8.046s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a\x2dpart5.device
 8.046s dev-disk-by\x2dpartuuid-5781c0f3\x2df283\x2d4126\x2db1ad\x2d90b381c30c22.device
 8.046s dev-disk-by\x2dlabel-fedora.device
 8.046s dev-disk-by\x2duuid-b564ae15\x2d05af\x2d4f5b\x2da6ec\x2d7ce960e5bca7.device
 8.046s dev-disk-by\x2dpath-pci\x2d0000:00:04.0\x2dnvme\x2d1\x2dpart5.device
 8.046s sys-devices-pci0000:00-0000:00:04.0-nvme-nvme0-nvme0n1-nvme0n1p5.device
 8.030s dev-disk-by\x2dpartuuid-406f5a26\x2dbb9c\x2d4675\x2d943a\x2d8b3699d11f1f.device
 8.030s dev-disk-by\x2dpath-pci\x2d0000:00:04.0\x2dnvme\x2d1\x2dpart4.device
 8.030s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a_1\x2dpart4.device
 8.030s dev-nvme0n1p4.device
 8.030s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a\x2dpart4.device
 8.030s dev-disk-by\x2did-nvme\x2dnvme.1d0f\x2d766f6c3039306630636566663839343262313161\x2d416d617a6f6e20456c617374696320426c6f636b2053746f7265\x2d00000001\x2dpart4.device
 8.030s dev-disk-by\x2ddiskseq-1\x2dpart4.device
 8.030s sys-devices-pci0000:00-0000:00:04.0-nvme-nvme0-nvme0n1-nvme0n1p4.device
 8.026s dev-nvme0n1.device
 8.026s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a.device
 8.026s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a_1.device
 8.026s dev-disk-by\x2did-nvme\x2dnvme.1d0f\x2d766f6c3039306630636566663839343262313161\x2d416d617a6f6e20456c617374696320426c6f636b2053746f7265\x2d00000001.device
 8.026s dev-disk-by\x2ddiskseq-1.device
 8.026s sys-devices-pci0000:00-0000:00:04.0-nvme-nvme0-nvme0n1.device
 8.026s dev-disk-by\x2dpath-pci\x2d0000:00:04.0\x2dnvme\x2d1.device
 7.984s dev-disk-by\x2ddiskseq-1\x2dpart2.device
 7.984s dev-disk-by\x2did-nvme\x2dnvme.1d0f\x2d766f6c3039306630636566663839343262313161\x2d416d617a6f6e20456c617374696320426c6f636b2053746f7265\x2d00000001\x2dpart2.device
 7.984s dev-disk-by\x2dlabel-boot.device
 7.984s dev-nvme0n1p2.device
 7.984s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a\x2dpart2.device
 7.984s dev-disk-by\x2duuid-969d3b3f\x2d7060\x2d4ab6\x2d8cb6\x2d95aea16204b2.device
 7.984s sys-devices-pci0000:00-0000:00:04.0-nvme-nvme0-nvme0n1-nvme0n1p2.device
 7.984s dev-disk-by\x2dpartuuid-c90078e3\x2d21e9\x2d465b\x2d9050\x2de14314e3ebd1.device
 7.984s dev-disk-by\x2dpath-pci\x2d0000:00:04.0\x2dnvme\x2d1\x2dpart2.device
 7.984s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a_1\x2dpart2.device
 7.973s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a\x2dpart1.device
 7.973s dev-disk-by\x2did-nvme\x2dnvme.1d0f\x2d766f6c3039306630636566663839343262313161\x2d416d617a6f6e20456c617374696320426c6f636b2053746f7265\x2d00000001\x2dpart1.device
 7.973s dev-disk-by\x2dpartuuid-e482f721\x2d2613\x2d45e6\x2d92e1\x2d4e004cbbe349.device
 7.973s dev-disk-by\x2ddiskseq-1\x2dpart1.device
 7.973s sys-devices-pci0000:00-0000:00:04.0-nvme-nvme0-nvme0n1-nvme0n1p1.device
 7.973s dev-disk-by\x2dpath-pci\x2d0000:00:04.0\x2dnvme\x2d1\x2dpart1.device
 7.973s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a_1\x2dpart1.device
 7.973s dev-nvme0n1p1.device
 7.826s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a_1\x2dpart5.device
 7.730s dev-disk-by\x2dpartuuid-d07a08e4\x2dcca5\x2d49ac\x2d81d7\x2d391f938f6df6.device
 7.730s dev-disk-by\x2dpath-pci\x2d0000:00:04.0\x2dnvme\x2d1\x2dpart3.device
 7.730s dev-disk-by\x2did-nvme\x2dnvme.1d0f\x2d766f6c3039306630636566663839343262313161\x2d416d617a6f6e20456c617374696320426c6f636b2053746f7265\x2d00000001\x2dpart3.device
 7.730s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a\x2dpart3.device
 7.730s sys-devices-pci0000:00-0000:00:04.0-nvme-nvme0-nvme0n1-nvme0n1p3.device
 7.730s dev-nvme0n1p3.device
 7.730s dev-disk-by\x2duuid-83E4\x2d36CF.device
 7.730s dev-disk-by\x2ddiskseq-1\x2dpart3.device
 7.730s dev-disk-by\x2did-nvme\x2dAmazon_Elastic_Block_Store_vol090f0ceff8942b11a_1\x2dpart3.device
 7.010s cloud-init-local.service
 4.389s initrd-switch-root.service
 1.402s cloud-init.service
  907ms dracut-initqueue.service
  714ms systemd-vconsole-setup.service
  642ms cloud-config.service
  619ms cloud-final.service
  527ms systemd-tmpfiles-setup.service
  437ms auditd.service
  418ms systemd-resolved.service
  392ms dracut-mount.service
  374ms systemd-journal-flush.service
  345ms systemd-modules-load.service
  312ms modprobe@fuse.service
  311ms user@1001.service
  285ms systemd-udevd.service
  282ms systemd-binfmt.service
  267ms modprobe@loop.service
  264ms dev-hugepages.mount
  257ms systemd-network-generator.service
  248ms kmod-static-nodes.service
  247ms dev-mqueue.mount
  240ms sys-kernel-debug.mount
  235ms modprobe@dm_mod.service
  235ms modprobe@drm.service
  233ms sys-kernel-tracing.mount
  228ms systemd-remount-fs.service
  221ms systemd-udev-trigger.service
  208ms user@1000.service
  207ms systemd-oomd.service
  168ms initrd-cleanup.service
  164ms chronyd.service
  148ms systemd-logind.service
  118ms systemd-fsck@dev-disk-by\x2duuid-969d3b3f\x2d7060\x2d4ab6\x2d8cb6\x2d95aea16204b2.service
  107ms systemd-fsck@dev-disk-by\x2duuid-83E4\x2d36CF.service
  104ms dbus-broker.service
   90ms systemd-journald.service
   85ms proc-sys-fs-binfmt_misc.mount
   82ms NetworkManager-wait-online.service
   76ms systemd-tmpfiles-setup-dev.service
   72ms NetworkManager.service
   64ms boot-efi.mount
   62ms dracut-cmdline.service
   54ms sshd.service
   49ms dracut-pre-pivot.service
   47ms systemd-update-utmp-runlevel.service
   46ms systemd-tmpfiles-clean.service
   40ms systemd-sysctl.service
   39ms dracut-shutdown.service
   38ms systemd-zram-setup@zram0.service
   37ms dracut-pre-trigger.service
   37ms systemd-random-seed.service
   36ms systemd-userdbd.service
   32ms dev-zram0.swap
   29ms initrd-udevadm-cleanup-db.service
   29ms boot.mount
   27ms user-runtime-dir@1001.service
   27ms dracut-pre-udev.service
   25ms systemd-fsck-root.service
   24ms sys-fs-fuse-connections.mount
   24ms user-runtime-dir@1000.service
   19ms systemd-user-sessions.service
   14ms initrd-parse-etc.service
   12ms systemd-update-utmp.service
   10ms dracut-pre-mount.service
   10ms tmp.mount
    7ms home.mount
    4ms modprobe@configfs.service
$ systemd-analyze plot

(hoping the image shows correctly as it’s huge)

$ cloud-init analyze blame
 -- Boot Record 01 --
      05.58900s (init-local/search-Ec2Local)
      01.77900s (init-network/config-ssh)
      00.47500s (init-network/config-growpart)
      00.19300s (init-network/config-users-groups)
      00.18200s (modules-final/config-keys-to-console)
      00.09800s (modules-final/config-ssh-authkey-fingerprints)
      00.09400s (modules-config/config-set-passwords)
      00.03200s (modules-config/config-locale)
      00.03000s (init-network/config-resizefs)
      00.02000s (modules-final/config-final-message)
      00.01300s (init-network/activate-datasource)
      00.01100s (modules-final/config-reset_rmc)
      00.01100s (init-network/check-cache)
      00.00900s (init-network/config-update_hostname)
      00.00800s (init-network/config-mounts)
      00.00600s (init-network/config-set_hostname)
      00.00400s (modules-final/config-install-hotplug)
      00.00400s (init-network/consume-user-data)
      00.00400s (init-network/config-seed_random)
      00.00200s (modules-final/config-scripts-vendor)
      00.00200s (modules-final/config-scripts-user)
      00.00200s (modules-final/config-scripts-per-once)
      00.00200s (modules-final/config-scripts-per-instance)
      00.00200s (modules-final/config-rightscale_userdata)
      00.00100s (modules-final/config-refresh_rmc_and_interface)
      00.00100s (init-network/setup-datasource)
      00.00100s (init-local/check-cache)
      00.00000s (modules-final/config-scripts-per-boot)
      00.00000s (init-network/consume-vendor-data2)
      00.00000s (init-network/consume-vendor-data)
      00.00000s (init-network/config-migrator)

I understand that cuttting the cloud-init times is not feasbile as it will prevent some basic setup (I wish I could do it directly in Packer!), but what about all of those devices taking up to 8 seconds each?

I know they run in parallel, but are they all really needed?
Can somebody help me understanding what’s going on?

The config of the tty devices seems unnecessary in that quantity, but you might note that within that group are also the config for the disk drives, etc. (almost all the devices that end up in /dev). The whole group takes about 8 seconds .

What I see as time consuming is the cloud-init-local (7.010s) which has to complete before the network config, and initrd-switch-root (4.389s) which delays the pending systemd processes.

The total of the various cloud-init processes is 7.010s + 1.402s + .642s + .619s or almost 10 seconds combined and each has to wait for the preceding one to complete before starting. That 10s is almost half the total boot time and with the kernel load time of ~2s makes more than half the time. The 8+ seconds involved in configuring devices is also notable.

I guess that if there is some way to reduce the number of devices to be configured the time involved could be shrunk and shorten the total boot time. I don’t know how to do that.

Thus, since you said the cloud-init is required, I am not sure how you could anticipate reducing the times to much less than the ~23s shown in that plot.

I understand what you say, but allow me to post the same stats (minus the plot) for a Debian 11 AMI to better show you the differences I’ve been talking about.

$ systemd-analyze blame
 2.524s networking.service
 2.195s cloud-init-local.service
 1.051s cloud-init.service
  859ms cloud-config.service
  821ms dev-nvme0n1p1.device
  517ms apparmor.service
  396ms cloud-final.service
  203ms systemd-udev-trigger.service
  180ms systemd-sysusers.service
  176ms modprobe@fuse.service
  147ms systemd-modules-load.service
  143ms systemd-remount-fs.service
  141ms modprobe@drm.service
  138ms systemd-logind.service
  137ms systemd-journal-flush.service
  133ms user@1000.service
  130ms e2scrub_reap.service
  128ms systemd-journald.service
  120ms ssh.service
  117ms modprobe@configfs.service
  116ms chrony.service
  111ms sys-kernel-debug.mount
  111ms systemd-tmpfiles-clean.service
  109ms sys-kernel-tracing.mount
  108ms kmod-static-nodes.service
  106ms dev-mqueue.mount
  105ms dev-hugepages.mount
  100ms systemd-machine-id-commit.service
   92ms boot-efi.mount
   87ms systemd-sysctl.service
   86ms systemd-growfs@-.service
   80ms systemd-udevd.service
   66ms systemd-random-seed.service
   65ms systemd-tmpfiles-setup-dev.service
   57ms ifupdown-pre.service
   51ms systemd-update-utmp.service
   50ms systemd-tmpfiles-setup.service
   41ms rsyslog.service
   27ms sys-kernel-config.mount
   25ms sys-fs-fuse-connections.mount
   25ms user-runtime-dir@1000.service
   24ms systemd-update-utmp-runlevel.service
   20ms systemd-user-sessions.service

$ cloud-init analyze blame
 -- Boot Record 01 --
      00.33200s (init-local/search-Ec2Local)
      00.30700s (modules-config/config-grub-dpkg)
      00.22000s (init-network/config-ssh)
      00.11500s (init-network/config-users-groups)
      00.10000s (init-network/config-resizefs)
      00.06600s (init-network/config-growpart)
      00.06000s (modules-config/config-apt-configure)
      00.04100s (modules-final/config-keys-to-console)
      00.03400s (init-network/check-cache)
      00.02000s (modules-config/config-locale)
      00.01400s (modules-final/config-ssh-authkey-fingerprints)
      00.00800s (init-network/config-update_etc_hosts)
      00.00300s (modules-final/config-final-message)
      00.00100s (modules-final/config-scripts-per-once)
      00.00100s (modules-final/config-rightscale_userdata)
      00.00100s (modules-final/config-power-state-change)
      00.00100s (modules-final/config-mcollective)
      00.00100s (modules-final/config-fan)
      00.00100s (modules-config/config-ssh-import-id)
      00.00100s (modules-config/config-set-passwords)
      00.00100s (modules-config/config-runcmd)
      00.00100s (modules-config/config-apt-pipelining)
      00.00100s (init-network/consume-user-data)
      00.00100s (init-network/config-update_hostname)
      00.00100s (init-network/config-set_hostname)
      00.00100s (init-network/config-rsyslog)
      00.00100s (init-network/config-mounts)
      00.00100s (init-network/config-migrator)
      00.00100s (init-network/config-disk_setup)
      00.00100s (init-network/config-ca-certs)
      00.00100s (init-network/config-bootcmd)
      00.00100s (init-network/activate-datasource)
      00.00100s (init-local/check-cache)
      00.00000s (modules-final/config-scripts-vendor)
      00.00000s (modules-final/config-scripts-user)
      00.00000s (modules-final/config-scripts-per-instance)
      00.00000s (modules-final/config-scripts-per-boot)
      00.00000s (modules-final/config-salt-minion)
      00.00000s (modules-final/config-puppet)
      00.00000s (modules-final/config-phone-home)
      00.00000s (modules-final/config-package-update-upgrade-install)
      00.00000s (modules-final/config-chef)
      00.00000s (modules-config/config-timezone)
      00.00000s (modules-config/config-ntp)
      00.00000s (modules-config/config-emit_upstart)
      00.00000s (modules-config/config-disable-ec2-metadata)
      00.00000s (modules-config/config-byobu)
      00.00000s (init-network/setup-datasource)
      00.00000s (init-network/consume-vendor-data)
      00.00000s (init-network/config-write-files)
      00.00000s (init-network/config-seed_random)

Despite the first one being way shorter than on Fedora, here you don’t see that huge list of devices.

cloud-init log is even weirder: the first entry, init-local/search-Ec2Local on Fedora takes 5 seconds while on Debian it’s just 0.33 seconds.

How would you explain this? I really can’t wrap my head around it.