Instrumented applications bring in a wealth of information on how they behave. In the previous parts of this blog series, the focus has been mostly on getting applications to expose their metrics and on how to query Prometheus to make sense of these metrics. This exploratory approach is extremely valuable to uncover unknown unknowns, either pro-actively (testing) or reactively (debugging).
Metrics can also be used to help with things we already know and care about: instrumenting those things and knowing what is their normal state, then it’s possible to alert on situations that are judged problematic. Prometheus makes this possible through the definition of alerting rules.
Continue reading