March 30, 2020
The Attribution Problem | Strata Data Conference 2019

The Attribution Problem | Strata Data Conference 2019


Good afternoon, everybody. I know the post lunch sessions are always a struggle so I’ll try to keep it as engaging as possible. And I think whenever you discuss about money, it becomes evidently interesting. So I work for a company called Lenovo. Apparently, they are the largest selling PC company. But… not a lot of people are still aware of the brand, per se. Whenever we talk about PC, we generally either talk about Apple in America, and obviously there is Dell and HP. So, um… I have been with Lenovo close to about 9 years now and I have been leading their analytics functions, as far as web analytics is concerned, and digital marketing is concerned. And one of the key projects (or shall I say business problem) that we face day to day is how do we understand what is the most effective demand generation channel in the digital space, and how much money we should invest in those, in those programs or channels. So, um… today’s topic would be mostly around how we found a way to look at a merit-based, multi-touch attribution solution, whereby we have a clear idea as how much money we should invest in what programs to drive the maximum ROI. So some of the key talking points as pertains to this presentation is what actually is attribution and how that relates to media spend in organization which are more direct and digitally transformed. What are the very common methodologies we are using in the industry today and what are some of their shortcomings? Then, we’ll go in detail about the case study that I was talking about as taking a very different approach towards solving the attribution problem. So, before we go into the problem statement, I would like to know how many of you know, actually, what attribution
means. You know, Mike. That’s fine. Yeah, go ahead. Mhmm… Can you be a little more specific about what you mean by intervention? That’s correct. That’s correct. So basically the touchpoints, we are talking about the touchpoints. So, what attribution strives to do is basically find a quantifiable way where we can connect with the effort that we invest in reaching our customers, to the success of getting those customers on board, or actually having a sales on board. And some of the key questions, and I would say one of the most important questions that we keep asking is if I have, say, X number of dollars then where would I put that X number of dollars, or in which programs, or as he said, the interventions. Which other interventions or touchpoints where we should invest this X amount of dollars judiciously so that we derive the maximum return on investment. Or in other words, we can drive the most sales. Is it search marketing? Search engine marketing? Is it paid social? Is it affiliate? Is it email? Tele? You name it. So, the commonly used methods in the
industry are either first touch or last touch. What do I mean by first touch? So first touch is basically when we are actually making someone aware of the brand, someone aware of the product, per se, through our marketing engines. A touchpoint where somebody sees, say, an ad through a display campaign or through a social media campaign and clicks on to see, okay, I’m interested in a PC as it pertains to my company. I would like to explore more about this brand. That’s my first touch. But there are some common misconceptions as you would… you would understand. If I give 100% of the weightage to the first touch, the lead generation team might be very happy but the merchandisers won’t be, because it does not always lead to a conversion. It might bring the traffic to your website but it’s not driving the conversion or influencing the conversion in any way. So, what it completely discounts are some of the players in the customer journey, or basically the last touchpoint where actually a person puts the product in the cart. The most popular method that’s being used in the industry is last touch. I’m hoping everybody is aware of Adobe’s analytics suite. And when you go and try to do some sort of attribution modeling, this is what you will get, the last touch, where they are giving 100% of the credit to the the converting channel. What, again, it discounts is the lead, or the initiation touchpoint, or even the influencing touchpoints. So, these are the two most widely practiced methods to derive at attribution. But there are others. For example, when there is a business question around how do we distribute investment across all our touchpoints linearly, and what it garners in its result. We use what we call a linear model. Then there is weightage given to both first and last touch, but not as much to the influencing channels. That’s where we use U-shaped models. And then obviously when more weightages are given to the channels which are closer to the cart, we generally leverage the J-shaped method. But, if you look at all these widely practiced methods, what it completely discounts, or does not take into account, is the customer journey. And in the digital space the customer journey is not a single touch or a linear, or heavily skewed to one side of the customer journey than the other. It is pretty random. There is no method to this madness unless we are trying to build a model which tries to predict or forecast some of those behavior and then applies the algorithm accordingly to provide weightages in its right accordance. So what do I mean by a
customer journey? So here is a simple example, where on day 1 our customer clicks on a display ad for after seeing a ThinkPad in any of the browsers, through cookie pull data we’re able to do, say, look-alike modeling. We thought this person might be interested in a PC. We send out that content through our display campaign and voila. We have a customer coming to our website trying to explore the ThinkPad product. But, he does not really go and put the product in the cart. He browses. He explores. He researches. He tries to understand the configurations better. He might as well go out of the website, look at reviews, look at other content that’s available in the public domain to make a call whether this is a brand to to side with or not. But he does not convert. He does not put the product in the cart or check out. Then, say, after 4 days or so, he comes back and then does more research, after getting a vibe or understanding of the brand from other sources outside of lenovo.com. But still, he does not buy it. Then on day 9, now that we have enough lookback window data through cookies we shoot an email. Maybe this person has registered on our website to get notifications. So now we have his ID. And we shoot an email saying hey, there is a 10% discount on the ThinkPad product that you were looking. Why don’t you go ahead and check it out? And immediately discounts are always lovely. He goes and converts. Now, the question is, in this entire customer journey, when you look, there have been several touchpoints. There has been a display touchpoint that triggered the brand awareness, or the product awareness. There was a self-initiated research process in between which is a direct interaction. And then obviously an email-based campaign that kind of facilitated the final purchase. How do we weigh this? Who do we credit? How much do we credit? Do we credit display more? Or do we credit the direct channel more? Or do we credit the email channel more? The popular belief would be if I have to go by last touch hey, email converted it. But what about if that display ad never showed up and he wouldn’t have been even aware of the brand itself, set aside going and purchasing the product. So we cannot say hey, display had no role to play or we cannot even say the direct had no role to play. They had some sort of contribution in the overall purchase process. So this brings to me to a very, very comical analogy: data from soccer game. So, we know in a soccer game we have goalkeepers, we have defenders, we have midfielders, quarterbacks, and we also have strikers. So, in this simple example, if you see the ball being passed by the goalie to the defender, then it goes to a second defender. Then it goes to a midfielder who passes on it, passes the ball to the striker. Eventually he scores the goal. So if I have to look at my defenders and bring and compare them against the display campaign in the previous use case, and I consider the midfielder as my direct channel, and apparently the striker being the email, you will see there were two defenders who actually passed the ball and negotiated it to the midfielder. And then he negotiated the ball to the striker and get the goal. So each of them played a part. So how do we weigh them? How do we understand who has more contribution? Who has more impact? And that’s where after we have explored several solutions and kind of gone ahead and tested
these methods. Tangibly, we came to a method called a Markov Chain. The Markov Chain is a probabilistic algorithm whereby we are looking at a futuristic state based on previous iterations or previous conditions. Or in other way, it’s a conditional probability based approach where each touchpoint and their impacts are weighed based on where they were in the occurrence. So, what are we doing with it? To give you a simple example, how this is so compelling is let’s say, for this example, we assume there are display email and direct are the only events where a purchase happens. And, this is, say, the probability of success or failure. Okay? So how does the Markov weigh the success through a display versus an email? So if I have to calculate the success of display actually driving a purchase in the final event we can look at it where we first compute the probability with display as one of the key channel. So in this example, we will see there are two ways to success. Display goes through direct. Our display goes through email. Through direct it goes through email, and through email it ends up in a purchase or not a purchase. And the numbers like .5, .5, 1, 0.1… those are the probabilities that we have of success. Similarly, there is another route where we can see display is going through email and then having a purchase directly. And there are other cases where email is the other route where the purchase happens, where display does not even play a part. So it goes through email, then direct, and then it goes through purchase. So let’s first calculate display being the first… display being the key contributor and calculate the probability. So the first… the first customer journey where we have display, direct, email we multiply the probability and it is .025 or 2.5% roughly. The other success through display, email and purchase. If I multiply the probabilities there, it’s again 0.025. Now what we do, we add those two up. So, success, collectively, where display is a channel is about 5%. Now we compute the probability of success without display being one of the touchpoints, which is the one in the lower pathway (this one). We again multiply email which has a .5% er, 50% probability of success direct and purchase, where the purchase is actually is happening and compute the probability which is again 2.5%. Well what we do is we compute the removal effect. So what is removal effect? Removal effect basically allows us in inferring the true value of a channel or basically it allows us to assess the credit that should be given based on the quality not the position. Based on the quality of the channel. So when I compute the removal effect, basically, the function is we have… the probability without display divided by the probability with display and that gives us .5. So we have a 50% chance yep, we have a 50% chance of… of converting through display being one of the channels. Which means, if in this particular example, we had an $1,800 sale of a certain product using these pathways, display would have contributed $900. Or half of it, pretty much. So by doing this what we’re trying to do is we are able to… infer the right weightages based on the occurrence or non-occurrence of a particular touchpoint. Now this is a very simplistic example. When you have to deal with thousands of orders, when you have to deal with multitudes of touchpoints. It’s not easy. So how do we do it? As I said, Adobe Analytics where we capture our clickstream data. We get the data in the raw format and then we have a partner called Syntasa who has a platform where we use this unstructured behavioral data and structure them against key metrics – whether it’s revenue, whether it’s profit margin, whether it’s investment. You name it. And actually create a format which is quintessential to what we were talking about, the customer journey. Then, we have a code for this Markov chain written in our… residing on the same platform, whereby we run the structured customer journey data and we are able to tell by channel whether it’s paid search, or organic search, or social. What each program is contributing to. The good part is given the clickstream data captures geographies… it captures… product, price bands, campaigns. We can slice and dice the data to that level of granularity. So this is a basic kind of architecture, how we are performing this data processing before the data is published in the staging tables and goes into the dashboards. So there are two applications: one is definitely the dashboard. That helps us understand the trend by different parameters, and I’ll come and explain it in a bit but also to build what we call a Budget Simulator or a Media Mix Optimizer, whereby we are able to answer the very first question. If we had $2000, in what proportion we should distribute that $2000 among all the programs and within the programs, among the campaigns, so that we can drive the most ROI. So one of the things we are able to do by processing the data in this fashion is we are now able to identify the
starter or the trigger point, which is generally the first touch. The influencers in between (or the players) who play the midfielder role and the closer, by means of closer we mean the converting channel. And the way this works, we can select, say, if display is my closing channel how much being display being my closing channel, other programs play a part being the trigger or the influencer. What it does then, basically, it tells us, alongside of display, how much money we should also put in behind these programs who are starters or players. Not just for conversion, but actually taking the purchase forward in the customer journey so that display could be a converter. The other application is more granular, where basically we have a lot of campaigns within the program, within the channels and we know by slicing the data at the campaign level what campaign is working within, say, a search engine marketing program. So we know how much money to put and where not to put any money given the traction it is gaining. And then finally, as I just mentioned, it is also capable of giving me insights at a granular level of product categories and price band. So in this example, if you see, basically I am able to tell if I am looking for Yoga 370 attribution within the 2000 to… um… Yeah, within the $1,000 to $1,500 mark, I have SEO as my biggest contributor. But similarly, within the… $1,500 to $2,000 mark, SEO as well as SEM plays almost equal part. So we know what kind of money we need to put in each of these campaigns based on the price band we are trying to target as well as what product we are trying to… advertise. So we have come all this far using the statistical probabilistic approach where it’s not a singular touchpoint based attribution. It’s a multi touchpoint based attribution. But I think there’s a long way to go for us as well. And what we intend to do in the coming days is… um… we would incorporate some of the
more relevant statistical frameworks that we face everyday in our daily life like seasonality, like, um… what if forecasting when things change when inventory management goes for a toss. And then also diminishing return curve. What do I mean by diminishing return curve? You saw that example within the campaign in an SEM we were able to tell which campaign is driving more money and which campaign is not. So for every campaign and for every program there is a threshold, there is a ceiling after which even if you spend incremental dollars they are not going to drive appreciable amount of return. And that’s where we need to stop. Understanding the diminishing return curve and then incorporating that into the model will allow us to make sure that we know those ceilings and we are not overspending just because it is a lucrative channel. We are spending optimally and be able to make the most out of the dollars that we have on the table and then that will lead us to a more nuanced more sophisticated version of the Media Mix Optimizer that we are looking for. Thank you very much. I’m happy to field any questions that you might have.

Leave a Reply

Your email address will not be published. Required fields are marked *