ad networks
When an advertising network is turned off and traffic plummets (top) while conversions stay steady (bottom), investigators have pinpointed fraudulent ad traffic. Source: Augustine Fou

A brand manager at a medical device company was examining a chart of daily traffic to his website. On the chart he saw that one day in August, web traffic plunged, and locked in at a new normal about half of its previous level. For someone responsible for a website’s traffic, the scale and swiftness of the drop should have made him gasp, but the manager nodded knowingly.

His counterpart, Augustine Fou, a digital marketing consultant, returned the nods and grinned. The consultant and his client both knew what happened on August 15. They had turned off one of the advertising networks that sent traffic to the site from a patchwork of websites managed by the ad network. The planned shutdown was a test to confirm a hypothesis about the site’s traffic specifically, and digital marketing in general. As expected, half of the incoming traffic vanished.

But the real revelation would be whether conversions also declined in lock step. If they did, then the lost traffic was “human” because a predictable proportion of true site visitors is expected to respond to the advertiser’s call to action. But if conversions didn’t contract, then the traffic that disappeared was plausibly machine generated, inserted by an ad network infected with what Fou calls “bad guy sites.”

Fou produced a second chart. This one showed that conversions did not contract.

Fou’s “bad guys” are yet another incarnation of Internet scammers, sharing certain characteristics with more familiar foes such as email spammers and phishers. Their honey pot is the $50-billion-a-year market for digital advertising. These bad guys write computer code designed to illicitly divert a significant chunk of advertising dollars to line their own pockets. In so doing, they destroy the promise that digital advertising has thrived on—that it could make advertising efficient and accountable.

 

Hunting the Bots

Fou, a prodigy who earned a Ph.D. from MIT at 23, belongs to the generation that witnessed the rise of digital marketers, having crafted his trade at American Express, one of the most successful American consumer brands, and at Omnicom, one of the largest global advertising agencies. Eventually stepping away from corporate life, Fou started his own practice, focusing on digital marketing fraud investigation.

Fou’s clients install a small piece of code on their websites, which tracks and analyzes the traffic into these properties. Marketing managers pay close attention to the “referrer.” Examples of referrers are a search engine, a blog post, and a Facebook message. When used properly, the referrer data inform marketers about the source of incoming traffic, allowing them to optimize spending across different channels. In practice, such data are easy to fake and manipulate, causing widespread misallocation of resources. Fou’s tool classifies these referrers into several types, based on how likely the supplied traffic is “fake.”

When an advertising network is turned off and traffic plummets (top) while conversions stay steady (bottom), investigators have pinpointed fraudulent ad traffic. Source: Augustine Fou

Fake traffic is colored red in Fou’s tracking charts, such as the top graph above. (This chart uses data from a real website but the data have been obscured to preserve confidentiality.) Prior to the experiment, about 25 percent of the traffic bled red. “This traffic came from the bad guy sites,” Fou explained, “The blue traffic are legitimate–certified human visitors.” For the medical-device client, Fou discovered that most of the fake traffic originated from one advertising network. This was why he advised the client to turn off the network so they could observe any changes to the traffic and conversions.

Fou’s experiment proved that fake traffic is unproductive traffic. The fake visitors inflated the traffic statistics but contributed nothing to conversions, which stayed steady even after the traffic plummeted (bottom chart). Fake traffic is generated by “bad-guy bots.” A bot is computer code that runs automated tasks. Many bots are benign, such as the ones created by search engines to discover web pages. The challenge facing fraud investigators like Fou is to distinguish between legitimate bots and bad-guy bots.

To see how the bad guys operate, we follow the money trail.

The medical-device company spends multiple millions of dollars on digital advertising each year. In the display advertising world, advertisers pay by thousands of ad impressions, denoted CPM (cost per thousand). For a given campaign, the manager might spend $100,000 with an advertising network, which we call DeepNet. At an average CPM of $1, this budget buys 100 million ad impressions. Each impression is an ad that loads when someone browses a webpage belonging to one of the member websites of DeepNet. DeepNet acts as an exchange on which its members, or publishers in industry-speak, bid for the advertisers’ dollars.

For every thousand ad impressions delivered to browsers, the advertiser pays DeepNet one dollar. DeepNet then splits the revenues with its member-publishers. At a 50-50 split, the publisher earns 50 cents. The business relies on volume. Many websites, especially new sites, have come to rely on such advertising revenues as their lifeblood.

Now, imagine DeepNet is infected with fake websites, which compete for the advertisers’ dollars, side by side with legitimate websites. Traffic reports from DeepNet indicate that lots of people have seen the display ads but in fact, a large portion of these “eyeballs” aren’t real. They’re bots creating the appearance of an ad impression. Not long ago, these bots operated from computers taken over by malware but the fraudsters now prefer to generate fake browsing sessions in the cloud, because it’s more efficient.

 

Fake Clicks on Fake Sites

Only large-scale, deep-pocketed publishers, such as the New York Times, can afford a direct sales staff. Most digital publishers are matched to advertisers on exchanges. Even legitimate publishers frequently send their less desirable “remnant” traffic — traffic that comes in at odd hours, for example, that is unlikely to produce advertiser conversions.

Though relied upon for traffic, some of the advertising networks themselves are uncontrolled and opaque.

With the proliferation of cloud services, such as Amazon’s EC2, almost anyone can install a website, register it with an ad network, and start earning dollars from advertisers. The bad guys write code that generates fake websites, typically simple webpages stuffed with banner ads and then start collecting on ad impressions, most of which aren’t even real humans visiting the site. The exchanges are designed to run themselves without much intervention, so the bad guy activities often elude detection.

Predictably, the bad guys have written code to generate hundreds or thousands of fake sites. Using a visual clustering algorithm, Fou can find these code-generated sites, many of which look reasonably generic and identical.

This is lucrative business. Imagine a scammer who runs a network of 1,000 fake sites. Each site reports average traffic of one million page views per month. Conservatively, assume 10 ads populate each page, At an 80 percent sell-through, a CPM of 75 cents, and 40-percent revenue share, such a network would generate $2.4 million per month. The profit margin is 99% as it costs only a miniscule 0.13 cents per thousand impressions to implement the scam on reputable cloud services, and even less on lower echelon ones.

Spotting these networks of bad guy sites isn’t difficult. They show up to 100 percent click-through rates! In some cases, the data say every ad on a site has been clicked—an impossible scenario for a human visiting a site.

 

If It’s Broke, Why Don’t We Fix It?

Even though it’s easy to see the problem and measure the problem, but the problem persists.

Some of the reasons are technical. Spotting and shutting down offending site networks is a bit like playing whack-a-mole. Knock one out, another pops up. The Internet Advertising Board maintains a list of a few hundred known-to-be-nefarious bots, but the number of scam sites is many orders of magnitude larger. Blacklists that block offending sites used to be a popular tool amongst the first generation of spam fighters too, but that approach is known to have unacceptable inaccuracy.

Currently, there is little pressure on the bad guys to disguise their activities. If and when the industry gets serious about combating fraud, a technological rat race will develop, similar to the one between spammers and anti-spammers. Despite the best efforts there, spam never goes away.

Changing the business model from tracking ad impressions to counting click-throughs won’t work, either. In fact, a similar scheme to the one described here infects the equally vast search advertising market. It simply adds a little code that provides the appearance of a click-through.

But perhaps the most insidious reason the problem is growing, unabated, is because of the disincentives to fix it. Marketers have little motivation to deflate their own statistics. Currently, pay-per-lead and pay-per-sale models account for only 15 percent of the $50-billion-a-year digital advertising market. The rest relies on robust traffic.

Also, this illicit activity generates unintended profits for everyone up and down the supply chain. In the scenario above, the $2.4 million monthly revenue represented only 40% of the money generated. The rest went to the advertising network and other intermediaries. The advertising agencies who manage the budgets for advertisers also benefitted. As Fou remarked, “It’s very difficult to get anyone to pay attention to this problem. Everyone has something to gain from the fake traffic.”

As long as that’s the case, a see-no-evil attitude will persist. And, by Fou’s estimate, half of the $50-billion digital marketing industry will remain fraudulent. So much for making advertising more efficient and accountable.

Image and article via Harvard Business Review