Video: Fast Distributed Online Classification and Clustering
Thousands of software developers, full-stack engineers, consultants and systems architects flocked to Dublin, Ireland this last April for the 2016 Hadoop Summit. Hosted by Hortonworks – a major Apache Hadoop distributors – and Yahoo, the Hadoop Summit was home to 3 full days packed with Hadoop and big data innovations – straight from the elephant’s mouth.
The first day featured our own Prasad Chalasani, SVP of Data Science at MediaMath, and his talk on Fast Distributed Online Classification and Clustering. He outlines how he and Ram Sriharsha at Databricks leveraged recent machine-learning research to develop a fast, practical, scalable, online, distributed, single-pass, ML classiﬁer that has signiﬁcant advantages over most similar ML packages. Built in Scala on Apache Spark, the package allows for supervised machine learning with up to hundreds of millions of sparse features for a distributed implementation in Spark.
Watch Prasad’s presentation, in its entirety, or check out his presentation on slideshare.