[Note: Code for this demo is available here: https://github.com/MediaMath/hbase-coprocessor-example] At MediaMath, our Hadoop data processing pipelines generate various semi-aggregated datasets based on the many terabytes of data our systems generate daily. Those datasets are then imported to a set of relational SQL databases, where internal and external clients query them in real time. When a query involves extra levels of aggregation on an existing dataset at run time, it starts to hog server resources, slowing down runtime. However, we have been able to reduce the query time on these terabyte–scale datasets from minutes to seconds by using a combination of […]
Keshi Dai is a senior data engineer at MediaMath. He builds big data tools and platform that power MediaMath’s reporting and analytics products. Before MediaMath he worked at eBay, where he built a collaborative filtering recommendation system for eBay.com. He got his BS in Computer Science from Zhejiang Sci-Tech University in China, and received his PhD degree specialized in information retrieval from Northeastern University.