Search results for: s3

Data Liberation at MediaMath

// 04.15.2015 // Data

MediaMath was recently at Amazon Web Services Re:invent 2014, where we presented on our open data platform and data liberation project, both of which are enabled by a variety of tools including many AWS tools. Below is a recording of our presentation: Data Liberation at MediaMath. Aggregating and processing terabytes of data per day is a challenge for any technology company. As marketers and brands become more sophisticated consumers of data, enabling granular levels of access to targeted subsets of data from outside your firewalls presents new challenges. In this presentation, VP of Engineering Edward Fagin and Senior Director of Data […]

Extending Play’s validation to work with Big Data tools like DynamoDB, S3, and Spark

// 03.18.2015 // Data

In this two-part blog series, we are looking at how MediaMath uses Play’s API to perform data validation on big data pipelines. In part one, we covered data validation with Play’s combinator-based API. In part two, we’ll extend that data validation to work with Amazon Web Services DynamoDB, AWS S3, and Spark. Extending validation to work with AWS DynamoDB MediaMath uses a variety of technologies in our analytics stack, including AWS DynamoDB. DynamoDB is a distributed, fault-tolerant key value store as a service that makes it easy to store/query massive datasets. We use it to power a few internal troubleshooting […]

Scaling data tools: How Play enables strongly typed big data pipelines

// 03.04.2015 // Data

The other day, I was talking with a colleague about data validation, and the Play web framework came up. Play has a nice API for validating HTML form and JSON submissions. This works great when you’re processing small amounts of data from the web-tier of your application. But could that same tech benefit a Big Data team working on a backend powered by Hadoop or Spark? We decided to find out, and the results were encouraging. The secret sauce? Play’s combinator-based approach to data validation. Whether your data is big or small, garbage in is garbage out MediaMath processes TBs […]