The arrayJoin function is unique in its behavior compared to other SQL functions in ClickHouse.

Purpose and behavior

  • Transforms a single row with an array into multiple rows
  • Each new row contains one element from the original array
  • Other column values are duplicated for each new row

Basic usage

Example 1: Unpack an array into separate rows

SELECT
    arrayJoin(['salsa', 'guacamole', 'cheese'] AS toppings) AS topping,
    'Taco',
    toppings

Result:

| topping   | Taco | toppings                      |
|-----------|------|-------------------------------|
| salsa     | Taco | ['salsa','guacamole','cheese']|
| guacamole | Taco | ['salsa','guacamole','cheese']|
| cheese    | Taco | ['salsa','guacamole','cheese']|

In this example:

  1. Array creation: ['salsa', 'guacamole', 'cheese'] AS toppings creates an array and aliases it as toppings.

  2. arrayJoin function: arrayJoin(toppings) AS topping takes the toppings array and “unpacks” it. This is the key operation that transforms a single row with an array into multiple rows, each containing one element from the array.

  3. Additional columns:

    • 'Taco' is a literal string value that will be repeated in each resulting row.
    • toppings is the original array, which will also be repeated in each row.

This example demonstrates how arrayJoin can be used to expand array data into individual rows, which is useful for various data analysis tasks, such as:

  • Analyzing individual array elements
  • Creating more granular data from aggregated array information
  • Preparing data for joins or further processing that requires non-array formats

Example 2: Count orders with specific toppings

SELECT sum(1) AS taco_orders
FROM
(
    SELECT ['salsa', 'guacamole', 'cheese'] AS toppings
)
WHERE arrayJoin(toppings) IN ['salsa', 'guacamole'];

Result:

| taco_orders |
|-------------|
| 2           |

This query is counting the number of taco orders that include either ‘salsa’ or ‘guacamole’ as toppings. Here’s how it works:

  1. Subquery:

    SELECT ['salsa', 'guacamole', 'cheese'] AS toppings
    

    This creates a single row with an array column named toppings containing three elements.

  2. arrayJoin(toppings): This function unpacks the toppings array into separate rows. So now we have:

    salsa
    guacamole
    cheese
    
  3. WHERE clause:

    WHERE arrayJoin(toppings) IN ['salsa', 'guacamole']
    

    This filters the unpacked rows to include only ‘salsa’ and ‘guacamole’, excluding ‘cheese’.

  4. SELECT sum(1) AS taco_orders: This counts the number of rows that pass the WHERE condition. sum(1) is a common way to count rows in ClickHouse.

The result is 2 because out of the three original array elements, two (‘salsa’ and ‘guacamole’) match the condition in the WHERE clause.

This query demonstrates how arrayJoin can be used to transform array data into individual rows, which can then be filtered and counted.

Example 3: Combine toppings and sauces

SELECT
    sum(1) AS taco_orders,
    arrayJoin(taco_toppings) AS topping,
    arrayJoin(sauces) AS sauce
FROM
(
    SELECT
        ['salsa', 'guacamole', 'cheese'] AS taco_toppings,
        ['mild', 'medium', 'hot'] AS sauces
)
GROUP BY
    2,
    3

Result:

| taco_orders | topping   | sauce  |
|-------------|-----------|--------|
| 1           | guacamole | medium |
| 1           | guacamole | mild   |
| 1           | cheese    | medium |
| 1           | salsa     | medium |
| 1           | cheese    | mild   |
| 1           | cheese    | hot    |
| 1           | salsa     | hot    |
| 1           | salsa     | mild   |

This query demonstrates how to use multiple arrayJoin functions together to create combinations of taco toppings and sauces. Here’s what it does:

  1. It starts with two arrays: taco_toppings and sauces.
  2. The arrayJoin function is applied to both arrays, creating a row for each combination of topping and sauce.
  3. The sum(1) aggregation counts the occurrences of each combination.
  4. The result is a table showing all possible combinations of toppings and sauces, with each combination appearing once (taco_orders = 1).

This approach is useful for generating all possible combinations from multiple arrays.

Using multiple arrayJoin functions with the same expression may not produce the expected results due to optimizations. In such cases, consider modifying the repeated array expression with extra operations that do not affect the join result, such as arrayJoin(arraySort(arr)) or arrayJoin(arrayConcat(arr, [])).

Example 4: Create a Cartesian product of ingredients

SELECT
    arrayJoin(taco_ingredients) as first_ingredient,
    /* arrayJoin(taco_ingredients) as second_ingredient */ -- is technically correct, but will annihilate result set
    arrayJoin(arrayConcat(taco_ingredients, [])) as second_ingredient -- intentionally changed expression to force re-evaluation
FROM (
    SELECT ['tortilla', 'beef', 'lettuce', 'cheese', 'salsa', 'guacamole'] as taco_ingredients
);

Result:

| first_ingredient | second_ingredient |
|------------------|-------------------|
| tortilla         | tortilla          |
| tortilla         | beef              |
| tortilla         | lettuce           |
| tortilla         | cheese            |
| tortilla         | salsa             |
| tortilla         | guacamole         |
| beef             | tortilla          |
| beef             | beef              |
| beef             | lettuce           |
| beef             | cheese            |
| beef             | salsa             |
| beef             | guacamole         |
| lettuce          | tortilla          |
| lettuce          | beef              |
| lettuce          | lettuce           |
| lettuce          | cheese            |
| lettuce          | salsa             |
| lettuce          | guacamole         |
| cheese           | tortilla          |
| cheese           | beef              |
| cheese           | lettuce           |
| cheese           | cheese            |
| cheese           | salsa             |
| cheese           | guacamole         |
| salsa            | tortilla          |
| salsa            | beef              |
| salsa            | lettuce           |
| salsa            | cheese            |
| salsa            | salsa             |
| salsa            | guacamole         |
| guacamole        | tortilla          |
| guacamole        | beef              |
| guacamole        | lettuce           |
| guacamole        | cheese            |
| guacamole        | salsa             |
| guacamole        | guacamole         |

This example demonstrates how to create a Cartesian product of taco ingredients using the arrayJoin function in ClickHouse. Let’s break it down:

  1. The query starts with a subquery that creates an array of taco ingredients:

    SELECT ['tortilla', 'beef', 'lettuce', 'cheese', 'salsa', 'guacamole'] as taco_ingredients
    
  2. The main query then uses arrayJoin twice to create all possible pairs of ingredients:

    SELECT
        arrayJoin(taco_ingredients) as first_ingredient,
        arrayJoin(arrayConcat(taco_ingredients, [])) as second_ingredient
    
  3. The first arrayJoin creates a row for each ingredient in the first_ingredient column.

  4. The second arrayJoin is slightly modified:

    arrayJoin(arrayConcat(taco_ingredients, []))
    

    This modification is crucial. If we used arrayJoin(taco_ingredients) directly, ClickHouse might optimize it away, resulting in an unexpected output. By using arrayConcat(taco_ingredients, []), we force ClickHouse to re-evaluate the array for each row, ensuring we get all combinations.

  5. The result is a Cartesian product of the ingredients with themselves. Each ingredient is paired with every other ingredient (including itself), resulting in 6 x 6 = 36 rows.

This technique is useful when you need to generate all possible combinations of elements from a single array.

Example 5: Taco orders with fillings and sauces

SELECT
    sum(1) AS taco_orders,
    filling,
    sauce
FROM
(
    SELECT
        ['beef', 'chicken', 'fish'] AS fillings,
        ['salsa', 'guacamole', 'sour cream'] AS sauces
)
ARRAY JOIN
    fillings AS filling,
    sauces AS sauce
GROUP BY
    2,
    3

Result:

| taco_orders | filling | sauce      |
|-------------|---------|------------|
|           1 | beef    | salsa      |
|           1 | chicken | guacamole  |
|           1 | fish    | sour cream |

Or you can use Tuple:

Example 6: Alternative syntax using arrayZip

SELECT
    sum(1) AS taco_orders,
    (arrayJoin(arrayZip(fillings, sauces)) AS t).1 AS filling,
    t.2 AS sauce
FROM
(
    SELECT
        ['beef', 'chicken', 'fish'] AS fillings,
        ['salsa', 'guacamole', 'sour cream'] AS sauces
)
GROUP BY
    2,
    3

Result:

| taco_orders | filling | sauce      |
|-------------|---------|------------|
|           1 | beef    | salsa      |
|           1 | chicken | guacamole  |
|           1 | fish    | sour cream |

These examples demonstrate how to use ARRAY JOIN and arrayZip with arrayJoin to combine multiple arrays. The query counts taco orders with specific combinations of fillings and sauces.

In Example 5, we use the ARRAY JOIN syntax to combine the fillings and sauces arrays. This creates a row for each corresponding pair of filling and sauce.

In Example 6, we achieve the same result using arrayZip to combine the arrays into tuples, then use arrayJoin to expand these tuples into individual rows.

Both methods produce the same result: a count of taco orders for each unique combination of filling and sauce.