Search results for: data

Learning how to learn: My summer on the Data Platform Team

// 08.20.2014 // Data

During the summer of 2014, I worked as an intern on the Data Platform team. One of the team’s main initiatives is to develop data workflows and reporting for other internal groups. My first project was to build a report using the programming language Scala. The report I built was for the Site Uniques Workflow, which is the data processing pipeline for all video advertising campaigns. Specifically, this report allows you to group various campaign attributes together to obtain different metrics. For example, you can group by campaign ID, website, ad exchange ID, auction ID, etc. It pulls raw bid […]

Monitoring and analyzing data with sound: MediaMath’s audio logo and the Bidder Moog Project

// 07.24.2014 // Data

Generally, if you need to analyze your data, you’ll pull up an Excel spreadsheet or the reports section of your statistic package and look at line graphs, scatter plots, pie charts, etc. Sometimes even in 3D. Those tools do the job, but did you know that there is a whole different way to monitor your data that doesn’t require looking at the screen, and can even add an emotional aspect to your information? It’s an essentially unknown field called “data sonification.” The first question I always get when I mention data sonification is “huh?.” The next is always “why?” In […]

Making your local Hadoop more like AWS Elastic MapReduce

// 05.21.2014 // Data

A version of this article originally appeared on Ian’s personal blog here.  At MediaMath, we’re big users of Elastic MapReduce (EMR). EMR’s incredible flexibility makes it a great fit for our data analytics team, which processes TBs of data each day to provide insights to our clients, to better understand our own business, and to power the various product back-ends that make Terminal 1 the “marketing operating system” that it is. An extremely important best practice for any analytics project is to ensure the local development and test environments match the production environment as much as possible. This eliminates the […]

Building faster, scalable reporting with Hadoop-Impala

// 05.21.2014 // Infrastructure

As a leading DSP with billons of online ads running through our platform every day, one of our biggest problems is how best to frequently report attribution data (which ad led to which action, like a sale or online signup) to our clients in a reliable way. The problem we are tackling, in numbers: A) 30-day impression volume = 35 – 40 billion records B) 1-hour event/click volume = 15 – 20 million records We need to join B (events) with A (impressions) twice every hour (once for event and once for clicks), find the matching records, perform complex sequencing […]

Page 3 of 3123