Search results for: hbase

Cut your run time from minutes to seconds with HBase and Algebird

// 02.04.2015 // Data

[Note: Code for this demo is available here: https://github.com/MediaMath/hbase-coprocessor-example] At MediaMath, our Hadoop data processing pipelines generate various semi-aggregated datasets based on the many terabytes of data our systems generate daily. Those datasets are then imported to a set of relational SQL databases, where internal and external clients query them in real time. When a query involves extra levels of aggregation on an existing dataset at run time, it starts to hog server resources, slowing down runtime. However, we have been able to reduce the query time on these terabyte–scale datasets from minutes to seconds by using a combination of […]

Learning how to learn: My summer on the Data Platform Team

// 08.20.2014 // Data

During the summer of 2014, I worked as an intern on the Data Platform team. One of the team’s main initiatives is to develop data workflows and reporting for other internal groups. My first project was to build a report using the programming language Scala. The report I built was for the Site Uniques Workflow, which is the data processing pipeline for all video advertising campaigns. Specifically, this report allows you to group various campaign attributes together to obtain different metrics. For example, you can group by campaign ID, website, ad exchange ID, auction ID, etc. It pulls raw bid […]