Search results for: Data Science

QueryableState in Apache Flink – Part 1

QueryableStates allows users to do real-time queries on the internal state of the stream without having to store the result on to any external storage. This opens up many interesting possibilities since we no longer need to wait for the system to write to the external storage (which has always been one of the main bottlenecks in these kinds of systems). It might be even possible to not have any kind of database and make the user facing applications directly query the stream, which will make the application faster and cheaper. This might not be applicable to all the use […]

A Tale of TwoTails – Mutual tail recursion in Scala

TwoTails is a compiler plugin written to add support to Scala for mutual tail recursion. While Trampolines or trampolined style recursion solve the direct need, they require explicit construction by a developer and add overhead in the form of additional data structures. Unfortunately, building a “native” solution directly into Scalac without using trampolines is not a straightforward task, even with basic tail recursion. In the latest version, a second compilation scheme has been introduced solving an issue peculiar to the JVM which the first scheme was not able to properly address. I’ll discuss both the motivation behind this new scheme […]

Take Reports From Concept to Production with PySpark and Databricks

// 04.19.2017 // Data Science

This article is was originally published on the DataBricks blog on April 3rd, 2017 Introduction: What is MediaMath? MediaMath is a demand-side media buying and data management platform. This means that brands and ad agencies can use our software to programmatically buy advertisements as well as manage and use the data that they have collected from their users. We serve over a billion ads each day, and track over 4 billion events that occur on the sites of our customers on a busy day. This wealth of data makes it easy to imagine novel reports in response to nearly any situation. Turning […]

Video: Extreme-scale Data Science Using Spark

// 11.14.2016 // Data Science

At the Spark Summit in Brussels, MediaMath’s SVP of Data Science, Prasad Chalasani, gave an invited keynote talk, Extreme Scale Ad-Tech at MediaMath with Spark and Databricks. MediaMath’s demand-side platform responds to over 200 billion ad-opportunities daily, and leverages massive amounts of data to power smarter digital marketing. We use Spark heavily both in production and R&D to develop innovative, proprietary, and scalable solutions to multiple large-scale data problems, such as: Training Machine-learning models for predicting conversion probability given an ad-impression Measuring causal effectiveness of advertising using randomized tests Estimating audience reach for specified targeting criteria. Finding deviceIDs belonging to the same user based on […]

Video: Fast Distributed Online Classification and Clustering

// 05.05.2016 // Data Science

Thousands of software developers, full-stack engineers, consultants and systems architects flocked to Dublin, Ireland this last April for the 2016 Hadoop Summit. Hosted by Hortonworks – a major Apache Hadoop distributors – and Yahoo, the Hadoop Summit was home to 3 full days packed with Hadoop and big data innovations – straight from the elephant’s mouth. The first day featured our own Prasad Chalasani, SVP of Data Science at MediaMath, and his talk on Fast Distributed Online Classification and Clustering. He outlines how he and Ram Sriharsha at Databricks leveraged recent machine-learning research to develop a fast, practical, scalable, online, distributed, […]

Video: Monte Carlo Simulations in Ad-lift Measurement Using Spark

// 03.08.2016 // Data Science

Two weeks ago, engineers, developers and data scientists from all over the country packed into the Midtown Hilton in New York Spark Summit East 2016, the largest big data event focused on Apache Spark. MediaMath’s SVP of Data Science, Prasad Chalasani, partnered with Ram Sriharsha, a Senior Member of Technical Staff at Hortonworks to demonstrate how and why  and why they used Spark in Monte Carlo Simulations to measure ad lift, or the behavioral effect that advertisements can have on consumers. Watch Prasad’s presentation in it’s entirety below: Most traditional applications of Spark involve massive data-sets that already exist. A less-commonly encountered use-case, but […]