ClickHouse WHERE clause with array values

Filtering rows with a value in an array within the WHERE clause

ClickHouse WHERE clause with array values

Propel

Filtering rows efficiently is crucial for any software developer working with databases. In ClickHouse, using array values in the WHERE clause can significantly streamline your queries without the need of further data transformation.

Imagine you have a table of taco orders, and each order has a column called toppings that stores an array of toppings.

CREATE TABLE taco_orders (
    order_id String,
    order_timestamp DateTime,
    taco_type String,
    quantity UInt32,
    price_per_taco Float32,
    customer_id UInt32,
    special_instructions String,
    toppings Array(String)
) ENGINE = MergeTree()
ORDER BY (order_timestamp, order_id);

You want to list all orders that include a specific topping, say "jalapenos". This is a common scenario where filtering with array values becomes essential.

Without proper techniques, you might worry about the complexity of filtering data correctly or having to perform additional data transformations, which means more work. Fortunately, ClickHouse offers straightforward functions to handle this, making your job easier and your queries more efficient.

Filtering with array values in the WHERE clause in ClickHouse

When working with ClickHouse, filtering data using array values can be efficiently achieved with the has function. This function checks if a specific value exists within an array, making it a powerful tool for querying complex datasets.

Basic example using the has function

To filter orders that include "jalapenos," you can use the following query:

SELECT *
FROM taco_orders
WHERE has(toppings, 'jalapenos');

This query returns all rows where the toppings array contains the value 'jalapenos'. It's a straightforward and effective way to filter data based on array values.

Handling arrays stored as strings or JSON

In some cases, your arrays might be stored as strings or JSON columns (in Propel). ClickHouse provides functions to handle these scenarios seamlessly.  Here's how you can filter rows where the array contains a specific value:

SELECT *
FROM taco_orders
WHERE has(JSONExtractArrayRaw(assumeNotNull(toppings)), '"jalapenos"')

In this query:

  • JSONExtractArrayRaw extracts an array of JSON values from the toppings column.
  • assumeNotNull ensures that the toppings column is treated as non-nullable.
  • The has function checks if the array contains the specified value, which in this case is a double-quoted ID.

This approach allows you to filter data, even when dealing with JSON structures.

Equivalent functions in Snowflake and PostgreSQL

While ClickHouse provides the has function to filter arrays in the WHERE clause, other database systems like Snowflake and PostgreSQL offer their own methods to achieve similar functionality. Understanding these equivalents can be useful for developers working across different platforms.

Snowflake: Using ARRAY_CONTAINS

In Snowflake, the ARRAY_CONTAINS function is used to check if an array includes a specific value. This function simplifies querying arrays, making it easier to filter data based on array contents.

Suppose you have a table taco_orders with a column toppings that stores an array of toppings. To find orders that include "jalapenos," you would write:

SELECT *
FROM taco_orders
WHERE ARRAY_CONTAINS('jalapenos', toppings);

This query returns all rows where the toppings array contains the value 'jalapenos'. The ARRAY_CONTAINS function checks each element in the array and ensures efficient filtering.

PostgreSQL: Using the ANY operator

In PostgreSQL, the ANY operator is used to determine if a value is present in an array. This operator is versatile and integrates well with PostgreSQL's array functionalities.

For a table taco_orders with a column toppings as an array, the query to filter orders containing "jalapenos" would be:

SELECT *
FROM taco_orders
WHERE 'jalapenos' = ANY(toppings);

This query returns all rows where the toppings array includes 'jalapenos'. The ANY operator simplifies array filtering, making it a powerful tool for querying complex datasets.

Other approaches in ClickHouse

Beyond the has function, ClickHouse offers additional methods to filter data using array values, providing flexibility and efficiency for various use cases. One such method is the arrayJoin function, which can be particularly useful for more complex queries.

Using arrayJoin for array filtering

The arrayJoin function transforms an array into a set of rows, allowing you to filter data based on array elements more flexibly. This function is ideal when you need to work with individual elements of an array as separate rows.

Let's revisit our taco_orders table with a toppings array column. To find all orders that include "jalapenos," you can use the following query:

SELECT *
FROM taco_orders
ARRAY JOIN toppings AS topping
WHERE topping = 'jalapenos';

In this query:

  • arrayJoin expands the toppings array into individual rows, with each element becoming a separate row.
  • The WHERE clause then filters these rows to find those containing 'jalapenos'.

This method is particularly effective when you need to perform more granular operations on array elements, such as aggregations or additional filtering.

Using arrayJoin offers several advantages:

  • Granular control: By treating array elements as individual rows, you gain more control over your data, enabling complex transformations and analyses.
  • Enhanced flexibility: This approach allows for more sophisticated queries, such as combining array elements with other columns or performing aggregations on array elements.

Developers might worry about the performance impact of expanding arrays into rows. However, ClickHouse is designed to handle such operations efficiently. The arrayJoin function leverages ClickHouse's powerful query engine, ensuring that even large datasets are processed quickly.

Using indexOf

The indexOf function returns the position of a value within an array. If the value is found, it returns a positive integer; otherwise, it returns zero. This can be used in combination with the WHERE clause to filter data.

SELECT *
FROM taco_orders
WHERE indexOf(toppings, 'jalapenos') > 0;

In this query:

  • indexOf(toppings, 'jalapenos') > 0 checks if 'jalapenos' is present in the toppings array.
  • The query returns rows where 'jalapenos' is found in the array.

Using arrayExists

The arrayExists function checks if any element in an array satisfies a given condition. This can be useful for more complex conditions.

SELECT *
FROM taco_orders
WHERE arrayExists(x -> x = 'jalapenos', toppings);

In this query:

  • arrayExists(x -> x = 'jalapenos', toppings) checks if any element in the toppings array equals 'jalapenos'.
  • The query returns rows where the condition is met.

Final thoughts

Filtering rows with array values in the WHERE clause is a valuable technique for developers working with semi structured datasets in ClickHouse. This post has covered various methods to achieve this, including the has function, arrayJoin, indexOf, and arrayExists. Each method offers distinct advantages and can be tailored to fit specific use cases, ensuring your queries are both efficient and accurate.

As you continue to work with ClickHouse, experiment with the techniques discussed. Whether you're building internal tools or customer-facing products, these methods give you a lot of flexibility when filtering.

You can get started with Propel's Serverless ClickHouse today and receive $15 in monthly credits forever. At any point, upgrade to pay-as-you-go, or contact us to learn more about our volume-based discounts. Visit our pricing page for details.

Related posts

What's new in ClickHouse version 24.10

This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Funnel analysis in ClickHouse

This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

What's new in ClickHouse version 24.9

This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Start shipping today

Deliver the analytics your customers have been asking for.