Filtering rows efficiently is crucial for any software developer working with databases. In ClickHouse®, using array values in the WHERE clause can significantly streamline your queries without the need of further data transformation.
Imagine you have a table of taco orders, and each order has a column called toppings
that stores an array of toppings.
CREATE TABLE taco_orders (
order_id String,
order_timestamp DateTime,
taco_type String,
quantity UInt32,
price_per_taco Float32,
customer_id UInt32,
special_instructions String,
toppings Array(String)
) ENGINE = MergeTree()
ORDER BY (order_timestamp, order_id);
You want to list all orders that include a specific topping, say "jalapenos". This is a common scenario where filtering with array values becomes essential.
Without proper techniques, you might worry about the complexity of filtering data correctly or having to perform additional data transformations, which means more work. Fortunately, ClickHouse offers straightforward functions to handle this, making your job easier and your queries more efficient.
Filtering with array values in the WHERE clause in ClickHouse®
When working with ClickHouse, filtering data using array values can be efficiently achieved with the has
function. This function checks if a specific value exists within an array, making it a powerful tool for querying complex datasets.
Basic example using the has
function
To filter orders that include "jalapenos," you can use the following query:
SELECT *
FROM taco_orders
WHERE has(toppings, 'jalapenos');
This query returns all rows where the toppings
array contains the value 'jalapenos'. It's a straightforward and effective way to filter data based on array values.
Handling arrays stored as strings or JSON
In some cases, your arrays might be stored as strings or JSON columns (in Propel). ClickHouse provides functions to handle these scenarios seamlessly. Here's how you can filter rows where the array contains a specific value:
SELECT *
FROM taco_orders
WHERE has(JSONExtractArrayRaw(assumeNotNull(toppings)), '"jalapenos"')
In this query:
JSONExtractArrayRaw
extracts an array of JSON values from thetoppings
column.assumeNotNull
ensures that thetoppings
column is treated as non-nullable.- The
has
function checks if the array contains the specified value, which in this case is a double-quoted ID.
This approach allows you to filter data, even when dealing with JSON structures.
Equivalent functions in Snowflake and PostgreSQL
While ClickHouse provides the has
function to filter arrays in the WHERE clause, other database systems like Snowflake and PostgreSQL offer their own methods to achieve similar functionality. Understanding these equivalents can be useful for developers working across different platforms.
Snowflake: Using ARRAY_CONTAINS
In Snowflake, the ARRAY_CONTAINS
function is used to check if an array includes a specific value. This function simplifies querying arrays, making it easier to filter data based on array contents.
Suppose you have a table taco_orders
with a column toppings
that stores an array of toppings. To find orders that include "jalapenos," you would write:
SELECT *
FROM taco_orders
WHERE ARRAY_CONTAINS('jalapenos', toppings);
This query returns all rows where the toppings
array contains the value 'jalapenos'. The ARRAY_CONTAINS
function checks each element in the array and ensures efficient filtering.
PostgreSQL: Using the ANY
operator
In PostgreSQL, the ANY
operator is used to determine if a value is present in an array. This operator is versatile and integrates well with PostgreSQL's array functionalities.
For a table taco_orders
with a column toppings
as an array, the query to filter orders containing "jalapenos" would be:
SELECT *
FROM taco_orders
WHERE 'jalapenos' = ANY(toppings);
This query returns all rows where the toppings
array includes 'jalapenos'. The ANY
operator simplifies array filtering, making it a powerful tool for querying complex datasets.
Other approaches in ClickHouse
Beyond the has
function, ClickHouse offers additional methods to filter data using array values, providing flexibility and efficiency for various use cases. One such method is the arrayJoin
function, which can be particularly useful for more complex queries.
Using arrayJoin
for array filtering
The arrayJoin
function transforms an array into a set of rows, allowing you to filter data based on array elements more flexibly. This function is ideal when you need to work with individual elements of an array as separate rows.
Let's revisit our taco_orders
table with a toppings
array column. To find all orders that include "jalapenos," you can use the following query:
SELECT *
FROM taco_orders
ARRAY JOIN toppings AS topping
WHERE topping = 'jalapenos';
In this query:
arrayJoin
expands thetoppings
array into individual rows, with each element becoming a separate row.- The
WHERE
clause then filters these rows to find those containing 'jalapenos'.
This method is particularly effective when you need to perform more granular operations on array elements, such as aggregations or additional filtering.
Using arrayJoin
offers several advantages:
- Granular control: By treating array elements as individual rows, you gain more control over your data, enabling complex transformations and analyses.
- Enhanced flexibility: This approach allows for more sophisticated queries, such as combining array elements with other columns or performing aggregations on array elements.
Developers might worry about the performance impact of expanding arrays into rows. However, ClickHouse is designed to handle such operations efficiently. The arrayJoin
function leverages ClickHouse's powerful query engine, ensuring that even large datasets are processed quickly.
Using indexOf
The indexOf
function returns the position of a value within an array. If the value is found, it returns a positive integer; otherwise, it returns zero. This can be used in combination with the WHERE
clause to filter data.
SELECT *
FROM taco_orders
WHERE indexOf(toppings, 'jalapenos') > 0;
In this query:
indexOf(toppings, 'jalapenos') > 0
checks if 'jalapenos' is present in thetoppings
array.- The query returns rows where 'jalapenos' is found in the array.
Using arrayExists
The arrayExists
function checks if any element in an array satisfies a given condition. This can be useful for more complex conditions.
SELECT *
FROM taco_orders
WHERE arrayExists(x -> x = 'jalapenos', toppings);
In this query:
arrayExists(x -> x = 'jalapenos', toppings)
checks if any element in thetoppings
array equals 'jalapenos'.- The query returns rows where the condition is met.
Final thoughts
Filtering rows with array values in the WHERE clause is a valuable technique for developers working with semi structured datasets in ClickHouse. This post has covered various methods to achieve this, including the has
function, arrayJoin
, indexOf
, and arrayExists
. Each method offers distinct advantages and can be tailored to fit specific use cases, ensuring your queries are both efficient and accurate.
As you continue to work with ClickHouse, experiment with the techniques discussed. Whether you're building internal tools or customer-facing products, these methods give you a lot of flexibility when filtering.
You can get started with Propel's Serverless ClickHouse today and receive $15 in monthly credits forever. At any point, upgrade to pay-as-you-go, or contact us to learn more about our volume-based discounts. Visit our pricing page for details.