arrayJoin function
Unpack arrays into separate rows.
The arrayJoin
function is unique in its behavior compared to other SQL functions in ClickHouse.
Purpose and behavior
- Transforms a single row with an array into multiple rows
- Each new row contains one element from the original array
- Other column values are duplicated for each new row
Basic usage
Example 1: Unpack an array into separate rows
Result:
In this example:
-
Array creation:
['salsa', 'guacamole', 'cheese'] AS toppings
creates an array and aliases it astoppings
. -
arrayJoin
function:arrayJoin(toppings) AS topping
takes thetoppings
array and “unpacks” it. This is the key operation that transforms a single row with an array into multiple rows, each containing one element from the array. -
Additional columns:
'Taco'
is a literal string value that will be repeated in each resulting row.toppings
is the original array, which will also be repeated in each row.
This example demonstrates how arrayJoin
can be used to expand array data into individual rows, which is useful for various data analysis tasks, such as:
- Analyzing individual array elements
- Creating more granular data from aggregated array information
- Preparing data for joins or further processing that requires non-array formats
Example 2: Count orders with specific toppings
Result:
This query is counting the number of taco orders that include either ‘salsa’ or ‘guacamole’ as toppings. Here’s how it works:
-
Subquery:
This creates a single row with an array column named
toppings
containing three elements. -
arrayJoin(toppings)
: This function unpacks thetoppings
array into separate rows. So now we have: -
WHERE
clause:This filters the unpacked rows to include only ‘salsa’ and ‘guacamole’, excluding ‘cheese’.
-
SELECT sum(1) AS taco_orders
: This counts the number of rows that pass the WHERE condition.sum(1)
is a common way to count rows in ClickHouse.
The result is 2 because out of the three original array elements, two (‘salsa’ and ‘guacamole’) match the condition in the WHERE clause.
This query demonstrates how arrayJoin
can be used to transform array data into individual rows, which can then be filtered and counted.
Example 3: Combine toppings and sauces
Result:
This query demonstrates how to use multiple arrayJoin
functions together to create combinations of taco toppings and sauces. Here’s what it does:
- It starts with two arrays:
taco_toppings
andsauces
. - The
arrayJoin
function is applied to both arrays, creating a row for each combination of topping and sauce. - The
sum(1)
aggregation counts the occurrences of each combination. - The result is a table showing all possible combinations of toppings and sauces, with each combination appearing once (
taco_orders = 1
).
This approach is useful for generating all possible combinations from multiple arrays.
Using multiple arrayJoin
functions with the same expression may not produce the expected results due to optimizations. In such cases, consider modifying the repeated array expression with extra operations that do not affect the join result, such as arrayJoin(arraySort(arr))
or arrayJoin(arrayConcat(arr, []))
.
Example 4: Create a Cartesian product of ingredients
Result:
This example demonstrates how to create a Cartesian product of taco ingredients using the arrayJoin
function in ClickHouse. Let’s break it down:
-
The query starts with a subquery that creates an array of taco ingredients:
-
The main query then uses
arrayJoin
twice to create all possible pairs of ingredients: -
The first
arrayJoin
creates a row for each ingredient in thefirst_ingredient
column. -
The second
arrayJoin
is slightly modified:This modification is crucial. If we used
arrayJoin(taco_ingredients)
directly, ClickHouse might optimize it away, resulting in an unexpected output. By usingarrayConcat(taco_ingredients, [])
, we force ClickHouse to re-evaluate the array for each row, ensuring we get all combinations. -
The result is a Cartesian product of the ingredients with themselves. Each ingredient is paired with every other ingredient (including itself), resulting in 6 x 6 = 36 rows.
This technique is useful when you need to generate all possible combinations of elements from a single array.
Example 5: Taco orders with fillings and sauces
Result:
Or you can use Tuple:
Example 6: Alternative syntax using arrayZip
Result:
These examples demonstrate how to use ARRAY JOIN
and arrayZip
with arrayJoin
to combine multiple arrays. The query counts taco orders with specific combinations of fillings and sauces.
In Example 5, we use the ARRAY JOIN
syntax to combine the fillings
and sauces
arrays. This creates a row for each corresponding pair of filling and sauce.
In Example 6, we achieve the same result using arrayZip
to combine the arrays into tuples, then use arrayJoin
to expand these tuples into individual rows.
Both methods produce the same result: a count of taco orders for each unique combination of filling and sauce.