Important Components to Consider with Cohort Analysis

Understanding the health of your business and product utilization often begins with cohort analysis. This method simplifies the measurement of user behaviors crucial to a business, but its apparent simplicity can mask pitfalls that lead you to misguided insights and decision-making. Cohort analysis in its simplest and widely known form focuses on a fixed set of components:

Definition of a cohort based on user characteristics
The n periods considered in measuring behavior after a cohort has begun
Churn, (lost customers / total customers for a given period) * 100, representing the users lost for a given cohort in a specific cohort period
Retention, 1 - churn, representing the users retained for a given cohort in a specific cohort period

At first glance, this framework may seem sufficient enough to make informed product and business decisions. However, over time, data points and reports may emerge that contradict the assumptions about user behavior and product health made from this analysis. Inconsistencies that reveal design flaws undermine the reliability of this widely adopted method.

To address these challenges, a more nuanced approach is necessary. One that goes beyond relying on the previously defined components at face value. An approach that leverages your understanding of both the product and your customers to define cohorts that align with business impact, apply the right curve functions, and identify actionable insights from the results. In the rest of this article, I will explore key considerations and knowledge needed to create such a nuanced approach and build a more effective cohort analysis.

Cohort Definition

A cohort is defined by the activity metric you choose to represent a cohort of users. While there are no limitations to what you might deem to be a key activity, event, or customer segment, the value of your cohort measurement depends heavily on the relevance of the selection you make. The guiding principle you should consider when defining your cohort is to select something that closely correlates with the health of your business. For SaaS businesses, that might be paying subscribers. For transactional businesses, this could be daily active users (DAU). Remember that each choice highlights a different aspect of user behavior and to maximize the impact of the analysis, you should make sure it is tied to revenue-generating activities.

Retention Bounding

Retention bounding is an important consideration to make when determining how long after a user enters a given cohort they will be included in your analysis. Your bounding selection will be based on a time constraint, n-day window, or a continuous, unbounded window. The n-day bounding approach is best suited for time-sensitive metrics like trial periods, subscription lengths, or critical actions a user takes to realize product utility. In contrast, the unbounded approach is more appropriate when understanding the behaviors of more generalized retention groups, who interact with the product at irregular intervals or abstract patterns. Your choice between the two approaches should be based on the question you are trying to answer about the cohort.

Churn Distribution Curve

When churn is calculated as a churn rate to be used as an input for other formulas like lifetime value (LTV) the formula outcome assumes that users are equally likely to leave at any time. However, this situation rarely holds true in real-world scenarios, where churn is dynamic and varies over time. Relying on this simplified formula often results in a linear function that either overestimates or underestimates user behavior over time. In reality, churn follows a continuous exponential distribution. To better account for these dynamics, it is more effective to incorporate a Lomax or Weibull distribution function. These functions are better suited to accurately represent churn behavior within your cohorts and more reliable probabilities for use in other formulas like LTV and forecasting.

Retention Smile Curve

(source: Sophia Young - Towards Data Science)

While the exponential distribution over time is often expected, retention curves do not always follow a strictly exponential trend toward their boundaries. In some cases, you may observe a ‘smile curve’ pattern in retention data. This phenomenon typically occurs when products benefit from strong network effects or a seasonality impact due to design. For example, fitness applications may see high levels of engagement upon acquisition, followed by a decline, and then a reactivation around New Year as users set resolutions and engage in the following months. This smile-shaped pattern highlights the unique dynamics of user engagement over time and emphasizes the importance of understanding the specific factors influencing your product's retention patterns.

Good vs Bad Churn

Not all churn is inherently bad. A common pitfall in churn analysis is treating all churn equally, without considering the underlying differences between users who leave. Cohort analysis serves as the starting point for deeper investigation, rather than the last step. For example, you can segment your cohorts to differentiate between users who align with your ideal customer profile and those who don’t. This granularity will reveal whether churn is occurring among valuable users or those who are less aligned with business goals. However, even with more detailed segmentation, you should approach the results with curiosity to further investigate the reasons behind the churn like product utilization or feedback surveys.

Survival Analysis Or Lifetimes of Non-Churned Users

Cohort analysis often neglects a key aspect of user behavior which is understanding the expected lifetime of users who haven’t churned. Traditional formulas focus on summarizing behaviors of users who have churned during a period, providing a view of retention up to that point. However, it falls short of offering insights into when we might expect retained users from a cohort to churn in the future.

This is where survival analysis becomes invaluable. Survival analysis is a statistical framework that estimates the time until an event, such as churn, will occur. Given that it is very unlikely that all of your cohorts churn within the duration of your churn analysis periods, you are leaving out non-churned users from your analysis and true understanding of user behavior. Predicting the remaining lifetime of these users will lead to more accurate forecasting and better-informed business decisions.

Conclusion

Cohort analysis is a foundational tool for understanding your business’s health, but utilizing just the basic components of the analysis will lead to misguided insights. To fully capture and accurately interpret the behaviors of your users you need to understand the more advanced components that I have shared throughout this article. By adopting a more nuanced approach, you can create analyses that are not only more precise and actionable but can also reveal deeper insights, increasing the likelihood of driving meaningful strategic decisions.

Business