Search results for: amazon

Data Liberation at MediaMath

// 04.15.2015 // Data

MediaMath was recently at Amazon Web Services Re:invent 2014, where we presented on our open data platform and data liberation project, both of which are enabled by a variety of tools including many AWS tools. Below is a recording of our presentation: Data Liberation at MediaMath. Aggregating and processing terabytes of data per day is a challenge for any technology company. As marketers and brands become more sophisticated consumers of data, enabling granular levels of access to targeted subsets of data from outside your firewalls presents new challenges. In this presentation, VP of Engineering Edward Fagin and Senior Director of Data […]

Extending Play’s validation to work with Big Data tools like DynamoDB, S3, and Spark

// 03.18.2015 // Data

In this two-part blog series, we are looking at how MediaMath uses Play’s API to perform data validation on big data pipelines. In part one, we covered data validation with Play’s combinator-based API. In part two, we’ll extend that data validation to work with Amazon Web Services DynamoDB, AWS S3, and Spark. Extending validation to work with AWS DynamoDB MediaMath uses a variety of technologies in our analytics stack, including AWS DynamoDB. DynamoDB is a distributed, fault-tolerant key value store as a service that makes it easy to store/query massive datasets. We use it to power a few internal troubleshooting […]

Scaling data tools: How Play enables strongly typed big data pipelines

// 03.04.2015 // Data

The other day, I was talking with a colleague about data validation, and the Play web framework came up. Play has a nice API for validating HTML form and JSON submissions. This works great when you’re processing small amounts of data from the web-tier of your application. But could that same tech benefit a Big Data team working on a backend powered by Hadoop or Spark? We decided to find out, and the results were encouraging. The secret sauce? Play’s combinator-based approach to data validation. Whether your data is big or small, garbage in is garbage out MediaMath processes TBs […]

VPN, security groups, & more on NATs: Part 2 of our Hybrid Cloud Tips & Tricks Series

// 11.19.2014 // Data

Today’s post picks up where we left off last week. As a refresher, here at MediaMath, we run a hybrid data center-cloud environment. In this two-part blog series, I will be highlighting a couple of the trickier aspects of this integration with a focus on maintaining data integrity between your in-house data centers and the cloud – specifically, an Amazon Web Services cloud. Go to part 1: VPCs, Jump Boxes, and NATs Using a NAT on a Public Subnet Some of our services need endpoints accessible from our in-house data centers. These services must go in a public subnet with […]

VPCs, jump boxes, & NATs: Part 1 of our Hybrid Cloud Tips & Tricks Series

// 11.12.2014 // Data

Here at MediaMath, we are building out new services for our hundreds of terabytes of data at a lightning fast pace. This isn’t your grandma’s software development shop. Using cloud services like Amazon’s EC2 service allows us to scale up our infrastructure to match this pace. However, MediaMath doesn’t run a purely cloud-hosted environment. Instead, we run a mixed data center environment, with a number of high performance components running in a variety of data centers throughout the world. That means that the new AWS-hosted pipelines and data stores we are building must integrate with our in-house data centers, which […]

Breaking the logjam

// 10.15.2014 // Infrastructure

At MediaMath, our infrastructure generates terabytes of business-critical messages every day, such as ad impression logs and tracking beacon events. A service we’ve developed within our TerminalOne technology platform, nicknamed the “MediaMath Firehose,” enables our internal analytics applications and bidding systems to generate meaningful insights and take action on all of the data from these messages in real time. This wasn’t always the case; traditionally, this data was made available in hourly or nightly batches. We needed a significant technical and cultural transformation to move from batching to streaming. When we first began architecting our data delivery systems in the […]