Get started with Propel's Serverless ClickHouse forever-free plan today. Propel is the only Serverless ClickHouse with a true pay-per-query pricing and instant auto-scaling.
Funnel analysis is a technique used to understand how users progress through a series of steps, often in a sales or conversion process. The goal is to determine where users drop off, helping you identify bottlenecks and opportunities to optimize the flow. When it comes to large-scale user event data, ClickHouse provides robust tools to perform funnel analysis due to its speed, efficiency, and real-time capabilities. This post will show you how to do a funnel analysis in ClickHouse, focusing on a practical, real-world example.
How to perform funnel analysis in ClickHouse
The most effective way to perform funnel analysis in ClickHouse is by using the windowFunnel
function. This function allows you to see how many users go through a predefined series of events within a specific time window. Let’s look at how to do this with an example.
Funnel analysis example
Imagine you run an online taco shop and want to understand how users move through the following funnel:
- Viewed Taco Menu
- Added Taco to Cart
- Purchased Taco
You have a table named taco_events
that tracks user behavior, with the following columns:
user_id
: The unique identifier for the user.event_time
: Timestamp of when the event occurred.event_type
: Type of event (e.g.,view_menu
,add_to_cart
,purchase
).
To analyze how many users complete all three steps of the funnel within a two-hour window, you can use the windowFunnel
function as follows:
Creating the dataset
First, create the table to store the taco events:
CREATE TABLE taco_events (
user_id UInt32,
event_type String,
event_time DateTime
) ENGINE = MergeTree()
ORDER BY (user_id, event_time);
Now let’s assume your table contains the following data:
| user_id | event_type | event_time |
|---------|------------|----------------------|
| 1 | view_menu | 2024-10-01 12:00:00 |
| 1 | add_to_cart| 2024-10-01 12:10:00 |
| 1 | purchase | 2024-10-01 12:20:00 |
| 2 | view_menu | 2024-10-01 13:00:00 |
| 2 | add_to_cart| 2024-10-01 13:30:00 |
| 3 | view_menu | 2024-10-01 14:00:00 |
| 3 | add_to_cart| 2024-10-01 14:30:00 |
| 3 | purchase | 2024-10-01 15:45:00 |
| 4 | view_menu | 2024-10-01 16:00:00 |
| 4 | add_to_cart| 2024-10-01 16:20:00 |
| 5 | view_menu | 2024-10-01 17:00:00 |
| 5 | add_to_cart| 2024-10-01 17:30:00 |
| 5 | purchase | 2024-10-01 17:45:00 |
| 6 | view_menu | 2024-10-01 18:00:00 |
Querying the funnel
Use the windowFunnel()
function to analyze the funnel:
SELECT
user_id,
windowFunnel (7200) (
event_time,
event_type = 'view_menu',
event_type = 'add_to_cart',
event_type = 'purchase'
) AS funnel_stage
FROM
taco_events
GROUP BY
user_id;
Explanation:
windowFunnel(7200)
: This function looks at events within a 7200-second (2-hour) window to track the progression through the funnel.event_time
: The timestamp column used to order events.event_type = 'view_menu'
,event_type = 'add_to_cart'
,event_type = 'purchase'
Specifies the steps in the funnel: viewing the taco menu, adding a taco to the cart, and making a purchase.
The query will return the following:
| user_id | funnel_stage |
|---------|--------------|
| 1 | 3 |
| 2 | 2 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 1 |
Interpretation:
- User 1: Completed all three stages (viewed menu, added to cart, purchased) within the 2-hour window, so the
funnel_stage
is 3. - User 2: Viewed the menu and added to the cart but did not purchase within the 2-hour window, so the
funnel_stage
is 2. - User 3: Viewed the menu and added to the cart, but the purchase occurred outside the 2-hour window, resulting in a
funnel_stage
of 2. - User 4: Viewed the menu and added to the cart but did not purchase within the 2-hour window, resulting in a
funnel_stage
of 2. - User 5: Completed all three stages (viewed menu, added to cart, purchased) within the 2-hour window, so the
funnel_stage
is 3. - User 6: Only viewed the menu, resulting in a
funnel_stage
of 1.
This result shows you which users completed each stage of the funnel and highlights where users are dropping off. In this case, Users 2, 3, and 4 added items to their carts but did not complete the purchase within the desired timeframe, and User 6 only viewed the menu.
Counting users per stage
To understand how many users are at each stage of the funnel, you can use the following query:
SELECT
funnel_stage,
COUNT(*) AS user_count
FROM
(
SELECT
user_id,
windowFunnel (7200) (
event_time,
event_type = 'view_menu',
event_type = 'add_to_cart',
event_type = 'purchase'
) AS funnel_stage
FROM
taco_events
GROUP BY
user_id
) AS funnel_results
GROUP BY
funnel_stage;
Explanation:
- This query first calculates the
funnel_stage
for each user in the subquery. - Then it counts how many users are at each
funnel_stage
.
Query results for users per stage:
| funnel_stage | user_count |
|--------------|-------------|
| 1 | 1 |
| 2 | 3 |
| 3 | 2 |
When to use sequenceMatch
vs windowFunnel
ClickHouse provides two powerful functions for analyzing user behavior in sequences of events: windowFunnel
and sequenceMatch
. While both can be used for similar purposes, each function is better suited for specific types of analysis.
- Use
windowFunnel
when you need to track users through a fixed sequence of events within a specific time window. This function is great for scenarios where you are interested in understanding conversion funnels, such as tracking how many users move through different steps in a purchasing process, as we saw in our taco example.windowFunnel
is optimized for scenarios where each step must happen in a specific order within a limited timeframe. - Use
sequenceMatch
when you need more flexibility in the pattern of events you are tracking. This function allows for more complex conditions and expressions, making it ideal if your analysis needs to account for optional steps, different possible event orders, or more nuanced patterns of user behavior.sequenceMatch
is particularly useful for identifying more complex user journeys, such as detecting various behavioral patterns that may include loops or alternative paths.
For example, if you want to track users who may view the taco menu, optionally add tacos to their cart, and then purchase tacos, sequenceMatch would be a better choice. It allows you to capture multiple potential sequences, such as users who skip the cart step and purchase directly.
The choice between windowFunnel
and sequenceMatch
depends on your specific analytical needs. If your funnel involves a straightforward sequence with a clear time boundary, windowFunnel
is your best option. However, if your analysis needs to capture flexible, non-linear paths or multiple optional steps, sequenceMatch
is more suitable.
Conclusion
Funnel analysis in ClickHouse is a highly effective way to understand user behavior and optimize your processes. By leveraging windowFunnel
, you can easily track user journeys like our taco example, identifying where drop-offs occur and improving the experience to maximize conversions. ClickHouse's efficiency allows you to perform large-scale, real-time funnel analysis, ensuring you always have the insights you need to your user events.
Get started with Propel's Serverless ClickHouse forever-free plan today. Propel is the only Serverless ClickHouse with a true pay-per-query pricing and instant auto-scaling. Contact us to learn more about our volume-based discounts. Visit our pricing page for details.