Search results for: data

Cut your run time from minutes to seconds with HBase and Algebird

// 02.04.2015 // Data

[Note: Code for this demo is available here: https://github.com/MediaMath/hbase-coprocessor-example] At MediaMath, our Hadoop data processing pipelines generate various semi-aggregated datasets based on the many terabytes of data our systems generate daily. Those datasets are then imported to a set of relational SQL databases, where internal and external clients query them in real time. When a query involves extra levels of aggregation on an existing dataset at run time, it starts to hog server resources, slowing down runtime. However, we have been able to reduce the query time on these terabyte–scale datasets from minutes to seconds by using a combination of […]

VPN, security groups, & more on NATs: Part 2 of our Hybrid Cloud Tips & Tricks Series

// 11.19.2014 // Data

Today’s post picks up where we left off last week. As a refresher, here at MediaMath, we run a hybrid data center-cloud environment. In this two-part blog series, I will be highlighting a couple of the trickier aspects of this integration with a focus on maintaining data integrity between your in-house data centers and the cloud – specifically, an Amazon Web Services cloud. Go to part 1: VPCs, Jump Boxes, and NATs Using a NAT on a Public Subnet Some of our services need endpoints accessible from our in-house data centers. These services must go in a public subnet with […]

VPCs, jump boxes, & NATs: Part 1 of our Hybrid Cloud Tips & Tricks Series

// 11.12.2014 // Data

Here at MediaMath, we are building out new services for our hundreds of terabytes of data at a lightning fast pace. This isn’t your grandma’s software development shop. Using cloud services like Amazon’s EC2 service allows us to scale up our infrastructure to match this pace. However, MediaMath doesn’t run a purely cloud-hosted environment. Instead, we run a mixed data center environment, with a number of high performance components running in a variety of data centers throughout the world. That means that the new AWS-hosted pipelines and data stores we are building must integrate with our in-house data centers, which […]

‪Using data to inform product and feature prioritization decisions ‬‬‬

// 10.28.2014 // Product

As a Product Manager, I have become comfortable with the idea of having a never-ending to do list. There will always be more bugs to fix and features to build, and thus, knowing how to prioritize that list of feature requests and bug fixes is an important part of a Product Manager’s job. Building feature X might mean that you have to hold off on building feature Y, especially when development resources are at a premium – and let’s be honest, what tech company can say that dev resources aren’t at a premium. At MediaMath, data plays a vital role […]

Breaking the logjam

// 10.15.2014 // Infrastructure

At MediaMath, our infrastructure generates terabytes of business-critical messages every day, such as ad impression logs and tracking beacon events. A service we’ve developed within our TerminalOne technology platform, nicknamed the “MediaMath Firehose,” enables our internal analytics applications and bidding systems to generate meaningful insights and take action on all of the data from these messages in real time. This wasn’t always the case; traditionally, this data was made available in hourly or nightly batches. We needed a significant technical and cultural transformation to move from batching to streaming. When we first began architecting our data delivery systems in the […]

From proof-of-concept to production: Building the centralized logging system using ELK

// 08.27.2014 // Platform API

As an intern on the Platform API team at MediaMath, I worked on developing an initial proof-of-concept for a centralized logging system, using the Elasticsearch, Logstash, and Kibana (ELK) stack. Before a centralized logging system was built, the Platform API team had the challenge of logs being scattered across multiple servers. Investigating issues meant having to search one server, then the next, and so on, and then stitching the evidence together to form a theory. It was hard enough to investigate an already reported problem. It was pretty much impossible to spot problems ahead of time. The solution: build a centralized […]

Page 2 of 3123