When averages look fine, how can we discover that one customer who’s having a bad experience?

When averages look fine, how can we discover that one customer who’s having a bad experience?
Since completing OSCP in November 2019, I have been refining my penetration testing skills on Hack The Box, a Penetration Testing lab. Every target is usually a rollercoaster of both frustration and excitement, definitely pushing the Try harder philosophy. Here is a step-by-step guide to root one of the recently retired machines: Cache.
When I run workshops on practical monitoring with Prometheus, the same kind of questions usually get asked. Here are a few of them, with my answers and with pointers to other resources (articles, talks) to learn more.
For anybody starting a product today, the idea of deployments requiring downtime is ludicrous. The truth is, a lot of today’s products started their development more than a decade ago, with architectures reflecting a different set of practices and assumptions.
Instrumented applications bring in a wealth of information on how they behave. In the previous parts of this blog series, the focus has been mostly on getting applications to expose their metrics and on how to query Prometheus to make sense of these metrics. This exploratory approach is extremely valuable to uncover unknown unknowns, either pro-actively (testing) or reactively (debugging).
Metrics can also be used to help with things we already know and care about: instrumenting those things and knowing what is their normal state, then it’s possible to alert on situations that are judged problematic. Prometheus makes this possible through the definition of alerting rules.
So far in this Prometheus blog series, we have looked into Prometheus metrics and labels (see Part 1 & 2), as well as how Prometheus integrates in a distributed architecture (see Part 3). In this 4th part, it is time to look at code to create custom instrumentation. Luckily, client libraries make this pretty easy, which is one of the reasons behind Prometheus' wide adoption.
This post will go through examples in Go and Java. Prometheus has a number of other supported languages, through official libraries and community maintained ones.
In Part 1 and Part 2 of this series, we covered the basics of Prometheus metrics and labels. This third part will concentrate on the way Prometheus collects metrics and how clients expose them.