For any business, cashflow is an important quantity to track and predict. Customer churn can have a disproportionate effect on future revenues, so predicting which customers will churn and why can be crucial to a business' continued success. There are two distinct types of business models where churn can be modelled, and each one requires a slightly different approach: In subscriptinon businesses, revenue comes in regular intervals following a customer's signup, and churn events are observed when a customer cancels their subscription. These two events are the only ones that are common between all subscription businesses, which means that it is often a good idea to include other user engagement metrics in the analysis, as detailed in the next section.
On the other hand, in transactional businesses such as e-commerce companies, there is less of a need to use auxiliary data sources because each transaction represents a datapoint and together these can be used to train a model that can accurately predict whether a customer is likely to return, given their purchasing history. However in this case churn events are not explicitly observed and the best a model can do is return a likelihood for each customer of having churned.
NStack has models for both cases which allows specific actions to be taken depending on a customer's risk of churning. For example, one of our clients has set up an email campaign that sends a special offer to customers whose risk has just exceeded a given threshold. In this way the client avoided both the fruitless spamming of customers who have definitely churned, and the reducing of their margins by sending out offers to customers who would have repurchased anyway.
In the next two sections, we give an overview of each model, what the inputs and outputs look like, and how to interpret the results. In each case, links to the detailed user manual are also provided.
Subscription Churn Prediction
For subscription businesses, churn is one of the most important metrics to track and improve. Defined as the percentage of customers from a given month who unsubscribe in the following month, a few percentage points difference can make a huge difference in the financial metrics of the business.
As a consequence, predicting churn events and figuring out which factors contribute to them is a potentially crucial activity for any subscription business.
Unfortunately, simple rule-based methods for churn prediction can be inaccurate and don't fully leverage the large quantities of auxiliary data that is often available, such as website views, social media activity, etc. By contrast, regular machine learning approaches often do not perform well even on the combined data sets because of their highly imbalanced nature -- far fewer instances of customers churning than not churning -- and the fact that churn usually depends not just on the dynamic factors, but also on the length of time for which the customer has been subscribed.
The subscription churn prediction module provided by NStack addresses all these problems. Internally, it leverages a statistical model that captures both the effect of dynamic quantitative features such as engagement levels and the age of the subscription. This means that it is adaptable to a wide range of data sources, and in addition to predictions can also produce indications of which of the dynamic factors are most significant in a given prediction.
As an example, a few rows of input data would look like the following:
Note that this is using monthly data (the timestamps are in the ISO format YYYY-MM-DD, but only the first day of the month is ever given), but if your organization has usage and subscription data at daily resolution, that can be used as well.
The output will look as follows:
hazard column is a measure of the risk that that customer will churn;
a high value indicates a high churn risk.
hazard_bucket is just a bucketing of
hazard scores into "high" (top quartile), "low" (bottom quartile), and
"medium" (middle two quartiles). The additional columns are also buckets, but there
is one for each dynamic feature in the input, and they represent whether or not that
feature had a significant contribution to that user's
Additional guidance on how to use this model can be found in the [manual][subs-manual].
Transactional Churn Prediction
As explained in the introduction, for transactional businesses there are no
explicitly observed churn events, and the model output
the model's estimation of the likelihood that a given customer will purchase again
at some time in the future.
It takes into account not just the recency of the latest purchase, but also the
frequency with which purchases were made in the past, meaning that for a
customer who made their last purchase four weeks ago,
p_alive will be
quite low if they had purchased daily until then, and higher if they have
historically always purchased roughly once a month.
The input to the model is formatted as in the following example:
And as output, we get something like
The meanings of the columns are explained in the manual, but the
most important one is
p_alive which, as explained above, represents the
likelihood that a customer hasn't churned, and can be used to trigger a
marketing action with the goal of winning them back.