Search results for: Data Science

Apache Flink® at MediaMath: Rescaling Stateful Applications in Production

// 06.13.2017 // Data Science

This article was originally posted by DataArtisans, on June 12, 2017. Every once in awhile, Amazon Web Services experiences a service disruption, and millions of internet users around the globe panic as their favorite apps and websites cease to function. A short time later, the issue is resolved, and it’s back to business as usual. Most people move along with their day, eventually forgetting the micro-crisis altogether. But it’s not so simple for the software engineers whose companies are built on top of AWS and who are responsible for recovering from the disruption. Such was the case for MediaMath, a programmatic marketing company […]

Queryable States in ApacheFlink – Part 2: Implementation

This is part 2 of the blog Queryable States in Apache Flink. In the previous blog, we saw how Apache Flink enabled Queryable States. In this part, we will create a Streaming Job with Queryable States and create a QueryClient to query the state. I assume that Flink is already installed and setup. If not you can check out my earlier blog post on installation here. I will be using a Tumbling window in this example, to read about Windows in Flink, please read this blog post. All the code used in this blog post will be available on my GitHub. Creating the Pipeline Let […]

QueryableState in Apache Flink – Part 1

QueryableStates allows users to do real-time queries on the internal state of the stream without having to store the result on to any external storage. This opens up many interesting possibilities since we no longer need to wait for the system to write to the external storage (which has always been one of the main bottlenecks in these kinds of systems). It might be even possible to not have any kind of database and make the user facing applications directly query the stream, which will make the application faster and cheaper. This might not be applicable to all the use […]

A Tale of TwoTails – Mutual tail recursion in Scala

TwoTails is a compiler plugin written to add support to Scala for mutual tail recursion. While Trampolines or trampolined style recursion solve the direct need, they require explicit construction by a developer and add overhead in the form of additional data structures. Unfortunately, building a “native” solution directly into Scalac without using trampolines is not a straightforward task, even with basic tail recursion. In the latest version, a second compilation scheme has been introduced solving an issue peculiar to the JVM which the first scheme was not able to properly address. I’ll discuss both the motivation behind this new scheme […]

Take Reports From Concept to Production with PySpark and Databricks

// 04.19.2017 // Data Science

This article is was originally published on the DataBricks blog on April 3rd, 2017 Introduction: What is MediaMath? MediaMath is a demand-side media buying and data management platform. This means that brands and ad agencies can use our software to programmatically buy advertisements as well as manage and use the data that they have collected from their users. We serve over a billion ads each day, and track over 4 billion events that occur on the sites of our customers on a busy day. This wealth of data makes it easy to imagine novel reports in response to nearly any situation. Turning […]

Video: Extreme-scale Data Science Using Spark

// 11.14.2016 // Data Science

At the Spark Summit in Brussels, MediaMath’s SVP of Data Science, Prasad Chalasani, gave an invited keynote talk, Extreme Scale Ad-Tech at MediaMath with Spark and Databricks. MediaMath’s demand-side platform responds to over 200 billion ad-opportunities daily, and leverages massive amounts of data to power smarter digital marketing. We use Spark heavily both in production and R&D to develop innovative, proprietary, and scalable solutions to multiple large-scale data problems, such as: Training Machine-learning models for predicting conversion probability given an ad-impression Measuring causal effectiveness of advertising using randomized tests Estimating audience reach for specified targeting criteria. Finding deviceIDs belonging to the same user based on […]