Search results for: Data

Scaling data tools: How Play enables strongly typed big data pipelines

// 03.04.2015 // Data

The other day, I was talking with a colleague about data validation, and the Play web framework came up. Play has a nice API for validating HTML form and JSON submissions. This works great when you’re processing small amounts of data from the web-tier of your application. But could that same tech benefit a Big Data team working on a backend powered by Hadoop or Spark? We decided to find out, and the results were encouraging. The secret sauce? Play’s combinator-based approach to data validation. Whether your data is big or small, garbage in is garbage out MediaMath processes TBs […]

Cut your run time from minutes to seconds with HBase and Algebird

// 02.04.2015 // Data

[Note: Code for this demo is available here:] At MediaMath, our Hadoop data processing pipelines generate various semi-aggregated datasets based on the many terabytes of data our systems generate daily. Those datasets are then imported to a set of relational SQL databases, where internal and external clients query them in real time. When a query involves extra levels of aggregation on an existing dataset at run time, it starts to hog server resources, slowing down runtime. However, we have been able to reduce the query time on these terabyte–scale datasets from minutes to seconds by using a combination of […]

VPN, security groups, & more on NATs: Part 2 of our Hybrid Cloud Tips & Tricks Series

// 11.19.2014 // Data

Today’s post picks up where we left off last week. As a refresher, here at MediaMath, we run a hybrid data center-cloud environment. In this two-part blog series, I will be highlighting a couple of the trickier aspects of this integration with a focus on maintaining data integrity between your in-house data centers and the cloud – specifically, an Amazon Web Services cloud. Go to part 1: VPCs, Jump Boxes, and NATs Using a NAT on a Public Subnet Some of our services need endpoints accessible from our in-house data centers. These services must go in a public subnet with […]

VPCs, jump boxes, & NATs: Part 1 of our Hybrid Cloud Tips & Tricks Series

// 11.12.2014 // Data

Here at MediaMath, we are building out new services for our hundreds of terabytes of data at a lightning fast pace. This isn’t your grandma’s software development shop. Using cloud services like Amazon’s EC2 service allows us to scale up our infrastructure to match this pace. However, MediaMath doesn’t run a purely cloud-hosted environment. Instead, we run a mixed data center environment, with a number of high performance components running in a variety of data centers throughout the world. That means that the new AWS-hosted pipelines and data stores we are building must integrate with our in-house data centers, which […]

Learning how to learn: My summer on the Data Platform Team

// 08.20.2014 // Data

During the summer of 2014, I worked as an intern on the Data Platform team. One of the team’s main initiatives is to develop data workflows and reporting for other internal groups. My first project was to build a report using the programming language Scala. The report I built was for the Site Uniques Workflow, which is the data processing pipeline for all video advertising campaigns. Specifically, this report allows you to group various campaign attributes together to obtain different metrics. For example, you can group by campaign ID, website, ad exchange ID, auction ID, etc. It pulls raw bid […]

MediaMath in 4D: Mapping sound and graphics on the XYZ to interpret data

// 07.31.2014 // Data

A while back, I wrote about some of the new opportunities that data sonification can open up for monitoring and analysis. As a quick refresher, data sonification is the process of making data sound like how it looks. I’d like to share a few more examples of data sonification, one “state of the art” example from California Institute of Technology (Caltech), and one experiment from MediaMath. MediaMath Experiment: MM3D In 2013, I worked on a project called MM3D. It was an experiment in combining sound and graphics into one single presentation, with multiple MediaMath datasets. The gist of the app […]

Page 2 of 3123