arrayJoin
function is unique in its behavior compared to other SQL functions in ClickHouse.
Purpose and behavior
- Transforms a single row with an array into multiple rows
- Each new row contains one element from the original array
- Other column values are duplicated for each new row
Basic usage
Example 1: Unpack an array into separate rows
-
Array creation:
['salsa', 'guacamole', 'cheese'] AS toppings
creates an array and aliases it astoppings
. -
arrayJoin
function:arrayJoin(toppings) AS topping
takes thetoppings
array and “unpacks” it. This is the key operation that transforms a single row with an array into multiple rows, each containing one element from the array. -
Additional columns:
'Taco'
is a literal string value that will be repeated in each resulting row.toppings
is the original array, which will also be repeated in each row.
arrayJoin
can be used to expand array data into individual rows, which is useful for various data analysis tasks, such as:
- Analyzing individual array elements
- Creating more granular data from aggregated array information
- Preparing data for joins or further processing that requires non-array formats
Example 2: Count orders with specific toppings
-
Subquery:
This creates a single row with an array column named
toppings
containing three elements. -
arrayJoin(toppings)
: This function unpacks thetoppings
array into separate rows. So now we have: -
WHERE
clause:This filters the unpacked rows to include only ‘salsa’ and ‘guacamole’, excluding ‘cheese’. -
SELECT sum(1) AS taco_orders
: This counts the number of rows that pass the WHERE condition.sum(1)
is a common way to count rows in ClickHouse.
arrayJoin
can be used to transform array data into individual rows, which can then be filtered and counted.
Example 3: Combine toppings and sauces
arrayJoin
functions together to create combinations of taco toppings and sauces. Here’s what it does:
- It starts with two arrays:
taco_toppings
andsauces
. - The
arrayJoin
function is applied to both arrays, creating a row for each combination of topping and sauce. - The
sum(1)
aggregation counts the occurrences of each combination. - The result is a table showing all possible combinations of toppings and sauces, with each combination appearing once (
taco_orders = 1
).
Using multiple
arrayJoin
functions with the same expression may not produce the expected results due to optimizations. In such cases, consider modifying the repeated array expression with extra operations that do not affect the join result, such as arrayJoin(arraySort(arr))
or arrayJoin(arrayConcat(arr, []))
.Example 4: Create a Cartesian product of ingredients
arrayJoin
function in ClickHouse. Let’s break it down:
-
The query starts with a subquery that creates an array of taco ingredients:
-
The main query then uses
arrayJoin
twice to create all possible pairs of ingredients: -
The first
arrayJoin
creates a row for each ingredient in thefirst_ingredient
column. -
The second
arrayJoin
is slightly modified:This modification is crucial. If we usedarrayJoin(taco_ingredients)
directly, ClickHouse might optimize it away, resulting in an unexpected output. By usingarrayConcat(taco_ingredients, [])
, we force ClickHouse to re-evaluate the array for each row, ensuring we get all combinations. - The result is a Cartesian product of the ingredients with themselves. Each ingredient is paired with every other ingredient (including itself), resulting in 6 x 6 = 36 rows.
Example 5: Taco orders with fillings and sauces
Example 6: Alternative syntax using arrayZip
ARRAY JOIN
and arrayZip
with arrayJoin
to combine multiple arrays. The query counts taco orders with specific combinations of fillings and sauces.
In Example 5, we use the ARRAY JOIN
syntax to combine the fillings
and sauces
arrays. This creates a row for each corresponding pair of filling and sauce.
In Example 6, we achieve the same result using arrayZip
to combine the arrays into tuples, then use arrayJoin
to expand these tuples into individual rows.
Both methods produce the same result: a count of taco orders for each unique combination of filling and sauce.