Like many of you, my company had a Holiday Party, and about two weeks before the happy event my wife was on line shopping for a dress. She soon found one and asked me over to her computer to look it over. I did and thought nothing of the event until twenty-four hours later when, to my surprise, I began to see ads for the exact same dress from the exact same merchant pop up on my computer. Now, I consider myself a relatively tech-savvy person, but this was something new even for me. After all, I had not searched for the dress on my computer nor even landed on the merchant’s page in over a year. How on earth, I wondered, did this merchant manage to target ads on my browser for something I had only looked at on someone else’s computer? It could not be a coincidence, so then how did it happen? How did a glance follow me across one person’s device to another?
Of course, this kind of mystery was too good to leave alone so I started to do some research and soon discovered that my experience was the outcome of some of the most clever technology in the e-commerce world: cross-device matching. The best explanation I found for this phenomenon is on the blog of Gartner analyst Martin Kihn, who explains it all over a few excellent posts.
Kihn traces the origin of this technique to an obscure innovation (U.S. Patent 8,438,184 B) by a company called Adelphic. This company’s job is to help marketers target online ads, and marketers of course want to make sure that their ad is getting to the right person. Marketers know that people today move across devices all the time, so their challenge is to determine, regardless of device, that their ad is getting to the right person. In other words, Marketer A is happy to pay for ads targeting Person X but wants those ads to reach her on Devices 1…N. To match a consumer across devices, the marketer must be able to (a) distinguish one device from another and (b) match each unique device to a specific consumer.
As Kihn explains, there are two broad strategies for solving this challenge: deterministic and probabilistic. The former refers to knowing who owns what device and then matching that owner record to the targeted ad. The latter refers to estimating the likelihood that the user of a given device is the target customer and acting accordingly. The author notes that “all of the major stand-alone third-party matching vendors — Tapad, Drawbridge, Oracle’s Crosswise, Adbrain — use both deterministic and probabilistic methods.” Indeed, the deterministic approach used to be easier when device makers used to associate a unique device ID with each server call. But Apple, for example, stopped doing this a few years ago and replaced that very specific matching data with something new:
In its place, both Apple and Android created a unique, consumer-controlled ID available to apps selling ads. This is called IDFA and AdID, respectively. So apps can choose to share this ID if they want, but only if (1) the consumer downloaded their app, (2) is using it, (3) has not manually opted out of ad tracking (which they can do with both IDFA and AdID), and (4) the app really sells ads.
However, the ADID matching technique poses problems for marketers:
…it’s not available to mobile browsers, people the app doesn’t want to share with, opted-out consumers, and non-ad-sellers. In other words, it’s not a cure-all. And it is tied to a device, not a person.
This is where Adelphic’s patent enters our story:
The patent refers to something it calls a “signature.” This is a combination of attributes that collectively may be used to identify a unique device. These attributes are pieces of information that are shared in the course of routine communications with the app publisher or mobile website owner.
We described some of these attributes last time. They include:
- “system-type” data such as OS version, local time, phone model
- “usage-type” data such as headers, user query parameters, referrer, plug-in data, location, URLs viewed
How is this “signature” created? The patent describes it as a kind of list that contains some or all of the above-mentioned system and usage attributes that have been encoded in such a way that they can be quickly compared to similar signatures sitting in the master ID database. The attributes can be encoded as numbers, categories, or even distributions. Of course, it is not simply a list of all attributes. Much of any data set is noise. It is a selection of those attributes most likely to mean something to the system (the selection process is described below).
Adelphic’s patent, then, describes a method for looking at a whole set of data points and then determining through various calculations the likelihood that User X on Device Y is the person the marketer’s ad is trying to reach. In his analysis Kihn notes that Adelphic’s patent refers to a research paper from 1969 called A Theory for Record Linkage that the patent claims, “takes into account a wider range of potential identifiers, computing weights for each identifier based on its estimated ability to correctly identify a match or a non-match, and using these weights to calculate the probability that two given records refer to the same entity.” In other words, Record Linkage is a kind of hybrid two-step approach that combines known attributes as well as probabilistic models to complete the match:
Step 1: Take all the available attributes and figure out which ones deserve more weight, depending on how well they identify people
Step 2: Go through the master ID database full of signatures and figure out whether the particular device matches any of them
Kihn does a nice job of explaining these two steps further:
Step (1) here is a classic machine learning problem and can be done either on labeled or unlabeled data (i.e., records that we have already matched to people or ones that we haven’t). The preferred method mentioned is to take the master ID database and look at devices that have already been matched using deterministic methods (e.g., by email or phone no.).
The system can then look at all the various attribute data also captured with those devices and run machine learning algorithms to estimate the weights for different attibutes. (If you’re interested, the specific algorithm mentioned is EM, or Expectation Maximization.)
The output of step (1) is a “rank score function,” or a formula that can take the atributes on the unmatched device and bump them up against the device attributes in the master ID database (that is, for already-known devices) and calculate a score. This score is a number from 0-1, with 0 meaning definitely-no-match and 1 meaning oh-yes-match-baby. A higher number means more probably a match. This is step (2).
How does this work in the real world? I can’t say exactly how I got matched but let’s speculate that marketer of my wife’s dress (or its agent) has extensive models that say that the best way to match a person’s advertising identifier across devices is to match four variables:
- Web Site
- IP Address
- Time of Day
Let’s say, then, that on a Monday evening at my house my wife went on a fashion site and looked at a dress that she first came across on a banner ad on a news web site . The next day I visit the same news web site at about the same time on a computer located at the same address coordinates and sharing a core IP address. The marketer’s software now has a four-way identity match and decides that my wife is now browsing on my computer and so serves up the same ad for the same dress. As Kihn notes, “this is where the fun begins,” which it certainly did for me. It might also be lots of fun for people to know that their browsing and shopping history are serving up ads on their spouse’s or children’s devices in almost real time.
Now, just to be clear: the advertising industry is adamant that all this is not as bad as it may sound, as one 2015 MIT Tech Review article noted:
Advertisers stress they match potential buyers across platforms without gathering individual names. “People freak out over retargeting. People think someone is watching them. No one is watching anyone. The machine has a number,” says Roland Cozzolino, chief technology officer at MediaMath, a digital advertising company that last year bought Tactads, a cross-device targeting agency. “I don’t know who you are, I don’t know any personal information about you. I just know that these devices are controlled by the same user.”
I believe Cozzolino but perhaps he’s missing the point. What is not possible/permissible today is often possible/permissible tomorrow. Who’s to say that Adelphic or one of its competitors won’t soon be able to match devices not just to consumer identifiers but to a name and address? That’s not very hard to imagine or risky to predict, given where things stand with internet privacy. Indeed, it strikes me that there may be a lot of people out there who may not realize that ads they are seeing may be created by the on line activities of spouses or children (our even houseguests). Seen in that light, can anyone see a strange ad and not wonder now what its source really is?
All to say that as the 2017 holiday seasons comes to an end, the idea of shopping privately for gifts has also come to an end. As in Dickens’ classic tale, we each carry an invisible chain that follows us through life and from which it’s impossible to detach ourselves. Whatever our browsing history, good or bad, it is now forever present, forever shareable, and forever a part of our future as much as our past.
PS: For a different take on this issue, see this video clip from the band Big Data on AdAge. Maybe some people will want their history out in the open….