A high-level overview which will help you to analyze customer churn from scratch.
In this series of articles, we are going to shed some light on churn prediction and customer lifetime value usage. We will cover the following topics:
Let’s start by defining some common terms:
Before going into calculation details, let’s discuss why churn rate matters in the first place:
How to reduce churn:
There are at least three approaches to calculating churn:
The first approach to calculating churn rate (CR) is straightforward. We only need three numbers for it:
CR = Customers0- (Customers1 - NewCustomers)Customers0
For example, if you had 21,000 customers at the beginning of April, 40,000 customers at the end of April and 29,000 new customers for April, then your churn rate for April would be (21,000 - (40,000 - 29,000) ) / 21,000 = 0.48 or 48%. This is a good start for understanding your customer base and churn rate is a valuable metric to track in your reports. Once you monitor it, you can see whether it changes over time and whether your actions affect it.
However, this approach has a downside, as we are mixing all the active customers into one basket. To illustrate this potential pitfall, consider that customers with a two-month tenure (relatively new) likely have a much higher churn rate than customers with one-year tenure. Moreover, relatively new customers can have different behavior patterns. They could have been attracted through other marketing channels, which means their churn dynamics can be significantly different.
To tackle this problem, we can use the second approach to calculating churn rate: cohort-based.
In this approach, the idea is to attribute all customers to the month during which they were acquired and then calculate the churn rate separately for each month of acquisition. This month (or any other period) of acquisition is called a cohort.
Let’s describe the calculations using a synthetic example of some e-commerce shop. For simplicity, let’s assume that this business was founded four months ago. In table #1, we have the number of customers split by cohort. Each row represents dynamics for one cohort and each column represents a slice of our customer base for a particular month. From this table, we can see the overall number of customers, new customers and retained customers monthly. Our active customer base resembles a pie with detailed layers for each cohort.
The advantage over the first approach is that we can still calculate the general churn rate from this table, but we can get churn rates for each cohort separately as well. The cohort’s churn rates are calculated in table #2. Now we can see that the 48% churn rate for April consists of 10%, 30% and 53% churn rates for the first, second and third months, respectively. Pretty different, right?
With this one simple change, we can begin comparing apples to apples. Instead of looking at calendar months in columns, we can look at the cohort’s life month (the number of months since acquisition) such that March 2021 will become the first, second and third periods of life for the March, February and January cohorts, respectively. By rearranging the table in this manner, we arrive at table #3. Now we can see that for some reason, the February cohort retained much worse than the January and March cohorts. You can dig deeper and investigate what the root cause of this change was—promotions, acquisition of different types of customers, etc.
It is good to remember that it is not necessarily a bad sign. For example, it is common to observe increased churn during periods of high growth.
The last thing worth mentioning in this cohort-based approach is that we can additionally calculate retention curves for our customers. This is shown in table #4. From these curves, you can understand that you commonly keep only ⅓ of your customers in the third month. The formula for retention rate (RR) is shown below, where Customers_n is the number of customers on the period n for this cohort. Customers_1 is the number of customers in this cohort (also referred to as a cohort size).
The third approach to calculating churn rate is to do it individually for each customer. Although this is a much harder task to accomplish, it opens up a wide range of opportunities for handling your customer base.
You can do this based on some simple heuristics. For example, customer A is in cohort January 2021 and this is the fourth month of life for that cohort. From the historical data, we know that, on average, X% of customers active in the fourth month stop being active in the fifth month. Then customer A has an X% churn rate (i.e. an X% probability of stopping activity in the next month). This is an artificial example. The actual individual churn models are much more complicated, but you can get a general sense of how it works from it. The idea here is to use historical data about your clients to predict the probability of being active in the following period.
You have many options when going the individual route. For example, the model can be based on average heuristics (as illustrated above), statistical models such as Pareto/NBD, survival analysis models or machine learning models. There are so many choices and nuances, so we will cover this topic in greater detail in a future blog post.
Accurate individual churn rates have many advantages over general ones:
When we are building a model for individual-level churn rates, we need to work with definitions. What is the churn period for our business? If we did not see any activity from a customer, is it a good time to mark them as a churn? Here are some clues on how we can define it:
Frankly, there is no ideal approach to churn definition. One way or another, you will have false positives. There will be customers who were marked as churned based on your definition, but they will suddenly become active again. As you cannot avoid having this group, you can name it as reactivated customers and start to track them as another metric for your customer base. But for churn analysis and LTV prediction, you can put it aside and move forward. Just make sure this reactivated customer group is not huge and try not to forget about it completely in your customer analytics reports.
Keep in mind that you can have unregistered customers (quick checkout). In such cases, you can either try to stitch their orders through some secondary attributes or analyze them separately.
This blog post was an introduction to churn rate prediction. In the following posts, we will cover how churn rate is related to LTV in greater detail and which data science approaches can be used to predict churn rates and LTV.
If you enjoyed this post, subscribe to our publication or sign up for our newsletter.
If you would like to discuss your company’s use case, shoot us a message. We will provide a solution that gets you up and running with an agile analytics team that can be scaled up and down as needed.
Subscribe to our newsletter.