Why You Should Not Rely on StatsD for Monitoring or Optimizing Response Time
In my blog post last week, I outlined the difficulties and importance of accurately measuring and reporting response times, as they are critical for ensuring customer happiness. This week, I will be taking a look (and a rant) at StatsD, a commonly used tool for systems monitoring.
As a brief refresher, measuring and collecting response times can result in a huge number of metrics. A system that sees 70k RPS (requests per second) generates over 4 million response time measurements every hour. Statistical analysis is required to visualize or summarize these measurements, which inherently introduces a measure of subjectivity. Depending on how the measurements are aggregated, they will give varying impressions of how well clients are being served.
Subjective statistical analysis is actually the best case scenario. Many performance reporting systems resort to simply throwing data away.
Take StatsD, for example. An application with proper StatsD implementation will report all user-defined response metrics to the StatsD server. The server will then aggregate them into “flush intervals” and only store the aggregate values. This is highly problematic.
When the aggregates are visualized, the aggregates themselves are aggregated, which usually destroys any meaning. A common usage is to plot the 50th percentile (or 95th) over time. While a plot like this would seem like it is showing a trend in what the 50% is, it isn’t. It is just a plot of what the 50% is for each flush interval. For example, let’s look at data in a flush intervals that is:
[5, 5, 50, 50]
[100, 100, 5, 5]
[200, 200, 300, 500, 800, 5, 5, 5, 5, 5]
This plot will be a straight line at 5 even though the system performance is clearly degrading. The only StatsD aggregate values that have any real meaning are max and min. They can be compared from one flush interval to the next and rolled up correctly.
To use StatsD appropriately, you must have a firm understanding of your applications failure modes, the nuance of response time aggregation, and the underlying StatsD data model. Even then, the visualization of response time data makes it easy to forget how much data StatsD is throwing away — most of the meaningful bad cases. For this reason, StatsD should only ever be used to show that a system is obviously failing, not to verify that it is working or as a starting point for optimizations.