Using Design Patterns to Build Flexible and Extensible Software

// 07.26.2016 // Data

Gingerbread dough for Christmas cookies and cookie cutters.

Software design pattern is a general repeatable solution to a commonly occurring problem in software design. It provides a description and guideline to solve a problem that can be used in multiple different situations. Because development speed is increased when using a proven prototype, developers using design pattern templates can improve coding efficiency and final product readability.

MediaMath’s Engineering team used design patterns to add flexibility, extensibility and reusability to components of a greenfield real-time sizing service for Data Management Platform (DMP). Advertisers use a DMP to store millions of data entries that they have on potential users they would like to target. In order to use their advertising budget as effectively as possible, we wanted to build a service that lets advertisers and marketers analyze and estimate the size of a detailed user segment in real-time before they start spending money for targeting.

The end result was “Adaptive Segment Sizing,” which accepts an input query string consisting of a language we developed in-house called Segment Definition Language (SDL). SDL defines a set of filters, with recency, aggregates and time sequencing that applies Boolean operators and operands to event data enabling us to extract very specific user lists. Here is how we used various patterns in Python (with run down code examples):

Decorators for on demand throttling and caching of app-database interactions

In order to prevent the downstream systems from being overwhelmed by a barrage of database requests, the sizing service needed some sort of throttling and fair sharing mechanism that could be applied to downstream client connections and objects. We also wanted to cache responses from the database, based on whether the API client configures the incoming query to enable or disable caching of its results. The on-demand nature of these requirements suggested that instead of building throttling and caching responsibilities into our database client objects, we had to add these only when required, on a per query basis. This is where we used the Decorator pattern, which allows attaching additional responsibilities to an object statically and dynamically. Decorators provide a flexible alternative to sub-classing for extending functionality. We used this pattern to create Throttling and Caching decorators of the database driver objects, which enabled our application to dynamically throttle and cache at runtime.

Factories help program to interface and not to type

One of the constructs of the SDL query string that the sizing service receives is the recency clause, which finds users matching event criteria between client provided date ranges. Recency comes in different flavors, including “After”, “Before”, and “Between” given date/time ranges. The sizing service processes these SDL recency clauses to generate a date range, which is then used to query database indices. We designed this using a parameterized factory method with a base recency class. This helped us localize creation of different concrete recency implementations. Any part of the application code that needs to deal with recency would than use the recency factory to generate the correct concrete type and get correct date ranges by calling a common interface. The Factory pattern decoupled creation of different recency types from the application code and made it easier to add more recency flavors without affecting dependent modules.

Hollywood principle (Template method pattern) for publishing server stats

The sizing service reports numeric stats about the server (ex. number requests received) and non-numeric query stats (ex. query strings, timestamps). Both go to a home-built, open-sourced system called Qasino, which makes publishing and querying stats easy. In order to publish stats to Qasino, an application needs to generate a csv file, which is then consumed by the Qasino publisher. Application generated Qasino CSV files are required to follow fixed format expected by Qasino. For the server stats, we generate a CSV file where a row containing metric values is refreshed (overwritten) every 60 sec, but for query stats, we write the aggregate stat values for latest N queries received.

Basically, the overall algorithm to generate CSV files is same except for the one step which dealt with publishing and aggregating the metric values. This problem presented opportunity to apply Template method pattern which provided a way to create base Stats class holding the skeleton of CSV publishing algorithm structure and a template method which gets overridden by subclasses to publish metric values in customized way.

Strategy Pattern for testing role based auth

The sizing service was capable of doing role-based authorization to decide if a user is able to size a given segment. The task of adding role-based auth support was straightforward, but we needed a better way to test the service and ensure our health checks were working. We used the Strategy pattern to implement different types of auth strategies that could be applied to the sizing service at startup:

  • AllowAll – treats all requests as authorized and opposite for
  • DenyAll – treats all requests as unauthorized
  • SessionArbac – treats request as authorized or not based on user permissions

This allowed the service to use different auth levels in dev, QA, and production environments for a variety of use/test cases.

Façade and Visitor for query optimization

Our real-time segment size estimation engine relies on sampling, statistics and optimizations to generate approximate queries and quickly evaluate them. Sizing service does this by parsing and building abstract syntax trees (AST) that represent the SDL query string, and it does transformations on the AST nodes for certain optimizations. There exist about a dozen or so transformation classes that come into play during the SDL query processing. For example, one transformation is to canonicalize comparisons so that variables are on the left hand side of all comparisons and values are on right (e.g.  “$5 > sale_price” gets converted to “sale-price <= $5”). Separate transformations are used to flatten multi-level boolean trees: And(a, And(b, c)) -> And(a, b, c) to simplify user set evaluations, and so on. All of these transformation classes are exposed by a simple, unified Façade interface which calls the appropriate transformations in the right order. Another interesting pattern available in the AST library for manipulations is the Visitor pattern, which allows us to “visit” and process different type of AST nodes.

Conclusion

Besides the above, Sizing service is sprinkled with other patterns: it uses Singleton for a global stats registry, Chain of Responsibility for URL parsing, and Composite objects, which is built into the Python Twisted framework. Most of the patterns mentioned in this post are adaptions of classic design patterns, which is perfectly fine as design patterns are just guidelines and not a protocol that developers have to stick to. In the end, despite the value of design patterns, make sure to keep it simple (KISS pattern). Developers should strive for readable and maintainable code without worrying about using classic design patterns that may add unnecessary complexity and inefficiency. Happy designing!

A Picture of Navpreet Dhillon

NAVPREET DHILLON

Navpreet Dhillon is a Senior Software Engineer in the Data Services group at MediaMath. He spends his time designing and building the real time estimation systems in MediaMath's data management platform, mainly using Go and Python. Nav has a MS in Computer Science from Northeastern University.
0 Comments.

Leave a Reply

Your email address will not be published. Required fields are marked *