Search results for: September 2015

Scaling Data Ingestion Systems: From Perl to Go Part 2

InĀ part one of this post, I explored the scaling problems that we encountered when MediaMath’s user data delivery system, which was built initially in Perl without the headroom necessary to scale to our current size. In this post, I outline the way we used goroutines and channels (Go’s built-in concurrency primitives) and interfaces to simplify concurrency and parallelism and scale without complicating deployment. Concurrency in Go, or “Let’s just add more workers!” Tackling our concurrency issues proved to be the easiest part of this exercise, due to Go’s fantastic primitives and the fan-out and fan-inĀ patterns. We can start with, as […]

Scaling Data Ingestion Systems: From Perl to Go Part 1

A consequence of MediaMath’s astronomical growth over the past few years is dealing with huge growth in service usage. Rapid growth sometimes means that systems are built quickly, without making hard plans for the future. Systems with headroom can now often become insufficient in as little as six months, and so technical debt becomes a tough challenge to address. We deal with the question, “Do we try to re-write this, or do we modify what we already have to scale with the load we expect to see?” Nowhere has this been clearer than in ingesting user data, which since 2011 […]

Counting at Scale: HyperLogLog to the Rescue

MediaMath processes many terabytes of data each day for the various reports available in T1. One metric we show is the number of unique impressions for each campaign, there is a big difference between showing an ad to 100 different people and showing the same ad to one person 100 times. While this is conceptually a simple problem, solving it at scale is not quite as straightforward. The canonical way of solving this problem would be for any given campaign to put the id of each person who saw an ad for that campaign into a set and then check […]