Real-time Streaming Attribution Using Apache Flink

In this blog post, I will share a proof of concept for real-time attribution using Apache Flink from streaming data sources of impressions and events, and how we handled some of the specific problems inherent in windowing and processing real-time data streams at scale. Our goal was to determine if we could use Flink to stream impression and event data so that we could determine attribution in real time in order to optimize advertising strategies immediately. In digital advertising, we refer to ads – whether they are served on social networks, Mobile, Video, or display – as impressions. Once the […]

Using Design Patterns to Build Flexible and Extensible Software

Software design pattern is a general repeatable solution to a commonly occurring problem in software design. It provides a description and guideline to solve a problem that can be used in multiple different situations. Because development speed is increased when using a proven prototype, developers using design pattern templates can improve coding efficiency and final product readability. MediaMath’s Engineering team used design patterns to add flexibility, extensibility and reusability to components of a greenfield real-time sizing service for Data Management Platform (DMP). Advertisers use a DMP to store millions of data entries that they have on potential users they would like to […]

Cassandra War Stories: Part 2

In this series we have been relating some adventures MediaMath has been having getting the NoSQL database Cassandra to work for our needs as we built out of our Data Management Platform service. As mentioned in our previous post we needed to do a fair amount of tuning in order to scale Cassandra to our workload.  In this post we’ll focus on some of the techniques we developed (good and bad) in order to handle the rapid increase in our data ingest.  Using a combination of freely available automation tools, building our own custom tooling and clever utilization of AWS […]

Cassandra War Stories: Part 1

This is part one of a multi-part series exploring the successes (and scars) that we’ve had while tuning Cassandra to perform well in MediaMath’s Data Management Platform. Fast reads on time series data We use Cassandra as the backend data store for our Data Management Platform (DMP) system here at MediaMath. DMPs are used by advertisers to store their first party data as well as third party data segments they buy so that they can deploy these to bid on ad opportunities targeted audiences. This requires hardware that can handle extremely large volumes of data and then search them very quickly. We chose to […]

Data Liberation at MediaMath

MediaMath was recently at Amazon Web Services Re:invent 2014, where we presented on our open data platform and data liberation project, both of which are enabled by a variety of tools including many AWS tools. Below is a recording of our presentation: Data Liberation at MediaMath. Aggregating and processing terabytes of data per day is a challenge for any technology company. As marketers and brands become more sophisticated consumers of data, enabling granular levels of access to targeted subsets of data from outside your firewalls presents new challenges. In this presentation, VP of Engineering Edward Fagin and Senior Director of Data […]

Extending Play’s validation to work with Big Data tools like DynamoDB, S3, and Spark

In this two-part blog series, we are looking at how MediaMath uses Play’s API to perform data validation on big data pipelines. In part one, we covered data validation with Play’s combinator-based API. In part two, we’ll extend that data validation to work with Amazon Web Services DynamoDB, AWS S3, and Spark. Extending validation to work with AWS DynamoDB MediaMath uses a variety of technologies in our analytics stack, including AWS DynamoDB. DynamoDB is a distributed, fault-tolerant key value store as a service that makes it easy to store/query massive datasets. We use it to power a few internal troubleshooting […]

