Hey folks!
I’ve noticed that quite a few issues that are raised in the ticket tracker could have been caught earlier by noticing a traceback in an application log. However, with the number of apps we have, we can’t expect someone to look at the logs manually.
So I was looking at a way to gather the logs to a place where they would be searchable. I know I can grep for stuff in log01 but:
it’s not practical to grep all the worker logfiles and then go to the file and then find the line
I’d love to give app owners some sort of access to that as well do distribute maintenance workload.
And for apps that don’t traceback wildly for no good reason ( ) we could also have alerting.
Also, it’d be great if we could have that for apps outside of Openshift too (like Ipsilon & IPA)
I remember that we have Prometheus in openshift, but I’m not sure how to access it and how to integrate with it to send it the logs, or possibly just the tracebacks. I didn’t find any documentation for it in our infra docs. Do we have that somewhere else? Could someone point me in the right direction please?
And if Prometheus isn’t the best solution for application log/error monitoring, I’m interested in your thoughts.
Quoting from Overview | Prometheus “Prometheus collects and stores its metrics as time series data”. And its great at doing metrics, I use it all the time.
yeah, don’t think prometheus is very suited to this.
I would think someone would have made some kind of ‘traceback gather
from logs’ app, but when I made a quick search just now… this very
thread came up.
I’m not sure what the best answer is here, but I agree it would be nice
to have something.
Alright, thanks for your answers, I guess I jumped to Prometheus but that may not be the best tool for the job. I suppose some form of lightweight ELK stack would be nice.
Wait, if we didn’t find one after a quick 10 minutes search… let’s write one with the language/framework in fashion these days! That always worked for us in the past! I mean how hard could it be? I’m sure I can have a PoC working in a weekend that we can just deploy to prod on Monday