Video: Fast Distributed Online Classification and Clustering

// 05.05.2016 // Data Science

Thousands of software developers, full-stack engineers, consultants and systems architects flocked to Dublin, Ireland this last April for the 2016 Hadoop Summit. Hosted by Hortonworks – a major Apache Hadoop distributors – and Yahoo, the Hadoop Summit was home to 3 full days packed with Hadoop and big data innovations – straight from the elephant’s mouth.

The first day featured our own Prasad Chalasani, SVP of Data Science at MediaMath, and his talk on Fast Distributed Online Classification and Clustering. He outlines how he and Ram Sriharsha at Databricks leveraged recent machine-learning research to develop a fast, practical, scalable, online, distributed, single-pass, ML classifier that has significant advantages over most similar ML packages. Built in Scala on Apache Spark, the package allows for supervised machine learning with up to hundreds of millions of sparse features for a distributed implementation in Spark.

Watch Prasad’s presentation, in its entirety, or check out his presentation on slideshare.

A Picture of Annie Fei


Annie Fei is a Marketing Coordinator with MediaMath based in San Francisco, where she supports the communications team in sharing company goals to the greater public. She has her B.A. in Integrative Biology and Theater & Performance Studies at the University of California, Berkeley. When not in the lab or performing on stage, she is usually on the hunt for the best hot pot place in town and feeding her inner foodie.

Leave a Reply

Your email address will not be published. Required fields are marked *