We have a problem with Datanommer’s performance at the moment, and I’m not sure what we can do.
Symptoms:
some queries over the entire database take hours to complete. Badges is making many such queries a minute, so I added caching but it’s far from enough.
The backup dump takes about 20 hours to complete and runs every day (69Gb compressed for a 1.2Tb database)
The load is sometimes pretty high
What I think we could do:
run the dump on another host, it’s the compression part that’s causing most of the CPU usage. Of course this means transferring the total database over the network. Unless it’s on the same VM server, maybe? Or give it more processors?
remove some data that we don’t need to store and avoid storing it in the future. The most frequent topic in the DB since 2024 is org.fedoraproject.prod.github.check_run, do we really want that? The next ones are org.fedoraproject.prod.copr.{build.start,build.stop,chroot.start}. I understand that we’d want to have build.start and build.stop, but maybe we can drop chroot.start. We may want to sit down and choose what we want to keep and what we don’t.
use materialized views and/or tweak PostgreSQL / TimescaleDB, and adjust the Datanommer code. I’m not sure this will be enough to solve the issue.
Do we have a host to let the dump run on? I would prefer just moving the load to another machine before removing any data from database. But if we have topics that don’t make sense to archive I’m not against it.
At the moment the compressor (pxz) is running with -T4 which makes it use all 4 CPUs. Maybe we could run it with -T2 instead to leave some CPU power to the actual database? Or run the whole backup script with a high nice & ionice? The risk, of course, is that it starts taking more than 24h to complete. But that’s an issue we’ll have to face soonish anyway, I’d say.
If we had unlimited disk space and unlimited CPU power, I would recommend running a read-only mirror of the datanommer DB for Badges to hammer.
So, I am not a fan of adding shards/warm spares unless we can engineer it much better than the last time I tired it (which is possible, but I don’t think it’s a quick/easy thing).
The biggest problem was that you had to do manual things… you couldn’t just reboot one host and have it rejoin and everything was good. You had to failover, and resetup replication when re-adding, etc. Anyhow, I am sure it’s better, but it will need careful planning.
In the mean time:
Wow. That vm only has 8 cpus? I’m fine right now to bump it to say… 32?
We could double memory too, or I guess thats not a bottleneck.
I could live migrate this vm to a new virthost. That might give it better i/o and faster cpu. It should in theory be completely transparent… just take a while copying all that storage.
We do need to move this db to rhel9 anyhow. We could do 0 1 or 2 of the above in the mean time, then get it migrated to rhel9? I know rhel9 has newer postgres thats much nicer, but I don’t know if there’s newer timescaledb versions that would help us any there?
Finally longer term, I am not sure what the answer is… things are just going to grow, so we need to figure out some way to scale it out…
If the compression is most of the issue, then maybe use a different compression algorithm? XZ is one of the most computationally expensive compression algorithms, much moreso than alternatives like zstd.
Or, if it’s using level 9 or something, drop it down to a lower compression level. For highly-structured data it might not make much difference to file size but could save a lot of compression time.
I’m pretty sure that would help, yeah. Memory does not seem to be the bottleneck though.
Better IO sounds nice indeed.
Maybe add the CPUs since that’s the most transparent, and then migrate to RHEL9 on a new virthost?
I have some software ideas to avoid those constant since-the-beginning queries, but I’ll need some time to implement. By definition, historic data does not change, we shouldn’t have to query it over and over again just to count the number of messages with topic X from user Y. We could have periodic aggregates. There’s many things we can do software-side to make it scale out.
it doesn’t seem cpu bound to me. There’s one connection from badges thats pegging 1 cpu, but it seems very simgle threaded, so the other cpu(s) are just idle most of the time. ;(
How about enabling max_parallel_workers and max_parallel_workers_per_gather.
That should get it to use more threads/split big index searches.
That will use more cpus as well… I would think if these are just long/big queries it could split them reasonably well?
How about we try setting those to say 4 (or more if we do add cpus, but that should tell us if this is a good way forward)?
On compression, yeah, we could try switching that, but I suspect it will end up using a good deal more disk space, which currently that server is pretty low on. If we migrate it to a new host I could grow the disk, or if we migrate to a rhel9 instance we could just have a bigger disk there. Then it might be ok to switch compression. Note that also currently that script is used for all our db servers, so we would need to either switch them all, or seperate out this one.
Sounds good! Should I do it? Or do you prefer to do it?
About the backup script, I’ve tested running VACUUM ANALYZE at the end of the backup and I think it may be a good idea, if we don’t run that periodically already.
Sounds good! Should I do it? Or do you prefer to do it?
I can try and do it today.
About the backup script, I’ve tested running VACUUM ANALYZE at the end of the backup and I think it may be a good idea, if we don’t run that periodically already.
yeah, probibly doesn’t hurt, but shouldn’t really change much unless the
memory/cpus/memory/config change…