Dealing with discrepancies or how I learned to stop worrying and love TCP

As online advertising has grown from an experiment on a marketer’s checklist to a critical tool in the proverbial toolbox, so has the demand for actionable metrics of performance.

At first, measuring engagement was straightforward. A site serves a user an ad (delivered by an unbiased third-party, the ad server), and a user clicks on that ad to go to whatever page the marketer desired. Ad servers then collect the number of clicks and impressions, which serves two primary purposes. The first is that marketers use these numbers to draw insights into how their campaigns are performing. The second is that marketers pay their advertising partners based on things like number of clicks.

Soon, marketers clamored to gain deeper insights. Technology vendors introduced cookies to attribute actions on the site, such as a product purchase or online signup, called a “conversion,” to an ad impression or click. It’s this process — attributing actions on a site to ad impressions and clicks — where things get tricky, and which this blog post will attempt to explain.

Even the most novice marketing campaign manager has, at some point, dealt with the headaches of managing an ad campaign’s results. Often, when working with multiple partners, the reported number of clicks and ad impressions varies, sometimes greatly. When you’re paying partners based on things like number of clicks, this can quickly become a headache for the campaign manager.

“Why did this happen?” the campaign manager wonders. Was there an issue with the ad trafficking or setup? Is one party collecting data incorrectly? Is there questionable activity the two parties are filtering differently? Are moths crawling into the servers and messing with the data (1)?

The answer could turn out to be a combination of all of the above. Discrepancies in ad impression, clicks, and conversion numbers can come about from a myriad of factors, including:

  • garden-variety mistakes (such as trafficking errors or fat-fingering)
  • infrastructure mismatches (such as when dealing with multiple ad verification companies)
  • fraudulent/bot activity (common with click discrepancies)

In this series, I will take a detailed look at the most common cases of discrepancies, including a greater look at the impact TCP/IP itself has on our perceived numbers. We will see that often, tracking down the exact source of a discrepancy can be such a Sisyphean task that it is worth setting limits.

What is a “discrepancy”?

Let’s start with the basics. What constitutes a discrepancy? What does it mean to have a discrepancy? Why are there discrepancies in such basic metrics as ad impression counts?

Generally, discrepancies are characterized by one party having >20% difference in some metric from another party involved in the same campaign. These discrepancies can include:

  • Ad impression discrepancies, which are nothing more than “render this ad on the page,” are mild, though we’ll see cases in which those, too, can blow up.
  • Conversion discrepancies can often arise from differences in attribution — one partner thinks it deserves credit for a conversion, but a partner that has more visibility into the entire plan attributes it to someone else.
  • Click discrepancies are typically harder to nail down, as they sometimes involve the murky world of fraudulent activity.

Infrastructure & its effect on discrepancies

The infrastructure of online advertising has made great strides in recent years, and a number of companies have popped up to deal with many issues advertisers face. For instance —how does an advertiser know that the ads are running as expected? Perhaps they are appearing on a different part of the website, either by mistake or intentionally. While we, and many other companies, have measures in place to manage many of these concerns, an opportunity in the ad tech ecosystem opened for companies to specialize solely in online ad verification – becoming critical partners of ours.

Ad Verification Partners

These companies, at a basic level, put a piece of code within the ads that identify the site on which the ad is run. Often, they employ sophisticated schemes to penetrate iframes, in order to see the underlying page (2). They wrap the original ad tag in a script that does a page detection. If the page matches a predetermined list of allowed pages (alternatively, if it does not match a predetermined blacklist of websites the ad should not appear on), the impression will be delivered. If not, it will be blocked.

As a first pass, it is essential that the ad verification company’s whitelist/blacklist agrees with the DSP’s corresponding list. As sites are flagged by the ad verification partner, so too should the DSP’s list be updated.

The Exchange Landscape

However, in the advertising exchange landscape, this verification process gets even more complicated. Real-time bidding exchanges work by sending a bid request to many DSPs. This is usually in the form of a JSON POST containing information about the request — e.g. IP address, URL, etc. If the supply partner is not employing the same reporting methods as the ad verification partner, there will be instances where the buying platform and verification companies disagree on which site the ad appeared.

In cases like these, it is essential to get as much detail as possible from all partners in order to resolve the discrepancy. If Advertiser 1 says, “I served an impression on X site at Y timestamp”, and Verification Company 2 says, “Actually, at that timestamp Y, I detected a request for nefarious site Z,” it can become apparent where the problem originated. A subsequent update of a blacklist can mitigate subsequent instances from occurring.

This seems to be a random, sporadic occurrence. But could something more insidious be at work?

Check back for part 2 in this series, when I give a deeper dive into bot activity, its impact on click discrepancies, and how we work to root it out.

 

1. Bug: Interestingly, the use of “bug” to refer to errors or glitches in a program gained widespread use from this literal act of a moth trapped in a hardware relay.
2. iframe: Many publishers, in an effort to control how the page is laid out, will employ iframes to place ads in the proper spots. As an iframe creates a new HTML document, it can be difficult to discern the iframe’s parent. Different rendering engines offer different tools that can be used here — for instance, WebKit offers the JavaScript ancestorOrigins() method in the window.location object, allowing some visibility into what document created the iframe.

A Picture of Prasanna Swaminathan

PRASANNA SWAMINATHAN

Director, Developer Relations Prasanna Swaminathan is the Director of Developer Relations at MediaMath. In addition to teaching people how to pronounce his name, he leads a team that works with clients across the globe to open our API and build on our platform. Prasanna holds a BA in Physics from the University of Pennsylvania. Prior to joining MediaMath, he worked on power relay firmware systems, though he likes to think that prior to joining MediaMath, the matter/anti-matter ratio was even.
0 Comments.

Leave a Reply

Your email address will not be published. Required fields are marked *