Skip to main content
LTV Curve by Channel

Why you should start linking your pre and post purchase data today

In the loan industry – where I first cut my teeth in marketing and analytics –  it’s typical that customers who actually read terms and conditions before hitting the ‘apply’ button are less risky.  By less risky, I mean less likely to default on their loan.

For the lending institution, this is a big freaking deal.  First of all, they have to decide whether to approve this borrower or not.  Secondly, they have to decide which line of credit to extend, at what interest rate.  That, combined with their assessment of the borrower’s riskiness, dictates capital requirements. At an aggregate level, we’re talking billions of dollars here.  All this is driven by a few click and scroll events on whether a person read terms or not.

So what’s my point?  Well for one, be sure to take 5 minutes to read (or pretend to read) the terms when applying for a credit card.

But the bigger point is, pre-purchase behavior can be highly correlated with post-purchase behavior, and make really freaking big differences.

The problem: very few startups actually link both sides

Most early stage companies have trouble setting up analytics period.  They either aren’t using them correctly or they are plain setup wrong.

But at one point or another, a company will get it’s act together and setup a proper funnel, typically using an out of the box tool like Google Analytics or Mixpanel, and will start hopefully A/B testing, and optimize for conversion rate.  Unfortunately, that’s typically where it stops.

And while these tools are great, they typically offer no way to connect pre and post purchase behavior.  The following are two very real, and very common instances where failing to link up this data can cost your company big time.

Example #1: You overspend on paid acquisition

Let’s suppose you’re running growth for a startup that’s just starting to get traction.  After a bunch of excel finagling you come up to the conclusion that each customer will be worth $28 after year one.  You decide that you want to target a 1 year breakeven (this is a whole debate in it of itself we can save for another post).

After a month of testing and tweaking, you think you have 4 viable channels, as all of them are $28 or lower.  Here are their costs to acquire (CAC), broken down by channel:

Cost to acquire by channel

One year down the road, you plot your cumulative LTV curve, and turns out your initial estimate was correct, your 1 year LTV is in fact around $28.

Screen Shot 2016-06-04 at 2.58.08 PM

Everything looks good right?

  • Adwords – your most expensive channel – is breaking even, and you know you can easily optimize that and/or adjust your bids if need be.
  • Facebook and Twitter are both below your CPA by $2, suggesting room to bid a tiny bit more
  • Your cheesy SXSW Gimmicks yielded the lowest CAC, so you gear up to tour nerd-fests around the country.

All seems good – right?

But suppose we decide to break these LTV curves down by channel, and see something like this:

LTV curves by channel

This tells a very different story

  • Facebook customers are crushing it with a $38 1 year LTV, compared to a $26 CAC.  You can afford to spend more!
  • Adwords customers are pretty darn close to breakeven, so you’re probably bidding right.  Though it appears as if LTV is increasing, at a fairly linear pace, so you may even be able to increase your spend a bit
  • Twitter customers LTV plateaus off at $24, under your cost per acquisition.  You definitely didn’t hit 1 year breakeven, and you may never break even
  • Your SXSW gimmicks were actually an utter failure.  They plateaued around month 9 with an LTV of $20, and will probably never break even.

While in aggregate it appears as if you’re doing well, when you break it down by channel you’re missing out on some channels and overspending on others.

If you never connect pre-purchase data (the customers’ source) to post-purchase activity (the customers’ spend), you’d never know that two of your channels are flawed, and one you’re missing out on.

This is a fairly straightforward example, and I debated even including it because I feel like most people understand this concept intuitively.  However, very few companies I know practice it, especially consumer companies that don’t use an out of the box CRM.

In case this example was too basic, let’s take a look at a more nuanced example.

Example #2: Losing predictive power

About when we raised our Series A round of financing last spring, I was interested in digging into increasing retention.

Fortunately, I knew of a fellow Techstars company called Data Robot, which allows people like me (non-data scientists) to build machine learning algorithms pretty easily.


To build your model, you just upload a tabular CSV file that contains whatever variable you want to predict output and whatever variables you think might be correlated.

In my case, the predicted output was whether the person had canceled during the last month, and my inputs were all sorts of things: lot size, price, average rating, NPS score, location, lead source, whether they joined via our app or online, and sorts of activity metrics.

Data Robot then builds a bunch of machine learning models for you, and recommends the best to use based on a variety of metrics.  It also does a univariate analysis on your dataset, and shows which variables play the biggest role in the outputs.  I won’t get too into the details here, but it’s a pretty cool tool.

Turns out the data we had collected actually had a good amount of predictive power.

Creating this model, I learned a few things:

  1. You never know what factors are going to matter – turns out that people who spell out “street” vs abbreviating it “st” are more likely to churn…weird.  Be sure to collect everything possible.
  2. You don’t need a ginormous dataset to start using predictive analytics – I always sort of assumed you needed to be at Pinterest-scale to warrant doing data science.  Yet, there we were, barely a series-A stage company, making models that had a real impact.
  3. Data Robot is really freaking cool – seriously, you should check it out.  In a night a guy who slept through stats class made a real machine learning model

But…potentially the biggest lesson I learned was what this whole post is about: pre-purchase behavior ended up being some of the biggest predictors of churn.  At this point in time, I had to hack the two sides together, and thus missed out on a lot of the pre-purchase data.  It was this point in time when I realized we needed to make the investment into connecting pre-purchase and post-purchase data together.

Retention isn’t the only thing you can use predictive analytics to predict – depending on your business, you might want to model upsell potential, free trial conversions, or even support burden.  And there’s a good chance the pre-purchase data will be valuable – so make sure you’re collecting and linking it.

How to link your data together in a usable form

First, you have to realize how important it is, and decide to make the investment.  If you’re early stage, you may not think it’s worth the investment since your volume is so low.  But once you get traction, that point will hit faster than you think, so make a plan on when to make the investment.

We decided to invest in a solution called Fivetran.  This is an incredible tool that synchs data from third party sources like Google Analytics, Mixpanel, Salesforce, Zendesk and your own databases, and puts it nice and cleanly into a Redshift database.  From there we can link together the sources and access via SQL code or via Tableau.  The important thing is ahead of time, to make sure that you have the database columns to join all these datasets together.

Whatever you do, be sure to link your pre-purchase and post-purchase data together, or you’ll one day regret it.

How has your pre-purchase data helped you grow?  Let me know in the comments!






  • Jenny Bebout

    You’ve got me intrigued by this Data Robot. Many questions about to come your way. You’re my hero, Ryan.

    • Ryan Farley

      Thanks Jenny!

  • Tanner Corbin

    Great post Ryan, thanks for sharing. I’m curious what “database columns to join all these datasets together” do you use? How do you know you’re comparing the same person across datasets? Thank you!

    • Ryan Farley

      That’s probably the hardest part about it.

      For example, Mixpanel we store Mixpanel’s distinct id in a MySQL database column, as soon as we can link a user to a lead. This not only requires a little bit of a hack when it comes to Mixpanel, but also using their aliasing. As as sidenote, Mixpanel sucks when it comes to big boy data collection; I can’t wait to switch to segment.

      Another challenging one was Zendesk. Zendesk’s data uses email address to identify a customer. However, a customer might use a different email address to create a ticket. Or they might call in. So we had to create ops processes that ensured we always link it up correctly.

      Basically, you have to get the entire organization aligned in order to make it happen, but it’s well worth it in the end.

      Thanks for reading!

      • Tanner Corbin