Essential string functions in ClickHouse®

They key ClickHouse® string functions you need know

Propel

String functions in ClickHouse® are key tools for developers and data engineers. These functions enable efficient data manipulation, cleaning, and querying, which are crucial for building data applications. In this article, we'll explore the essential string functions in ClickHouse®, providing detailed explanations and practical examples you can use to learn and as a reference.

Introduction to ClickHouse® string functions

In this post, you will learn about various string functions available in ClickHouse®. We'll cover basic functions like length and empty, as well as more advanced functions like replaceRegexpAll and concatWithSeparator. Each function will be explained with practical examples, ensuring you can apply them effectively in your projects.

String functions are crucial for data cleaning, transformation, and querying. They allow you to standardize data formats, remove unwanted characters, and extract meaningful information from raw data. By understanding these functions, you can ensure your data is accurate, consistent, and ready for analysis.

A word about the examples

To illustrate various string functions in ClickHouse, we'll use a consistent example throughout this article. The theme is an imaginary SaaS company that tracks orders for taquerias. We'll create a table to store order data and use this schema to demonstrate each function.

Let's start by creating a table called orders to store our data.

CREATE TABLE orders (
    order_id UInt32,
    customer_name String,
    order_details String,
    order_date DateTime
) ENGINE = MergeTree()
ORDER BY order_id;

Next, we'll insert some sample data into the orders table. This data will be used in all the examples that follow.

INSERT INTO orders VALUES
(1, 'John Doe', '3 Tacos, 2 Burritos', '2023-01-15 12:34:56'),
(2, 'Jane Smith', '1 Taco, 1 Quesadilla', '2023-01-16 13:45:07'),
(3, 'Carlos Garcia', '5 Tacos, 1 Nacho', '2023-01-17 14:56:18'),
(4, 'Maria Rodriguez', '2 Burritos, 3 Quesadillas', '2023-01-18 15:07:29');

This table and data will serve as the foundation for demonstrating the various string functions in ClickHouse.

Next, we will explore basic string functions like determining the length of strings and checking for emptiness. These fundamental operations are crucial for validating and processing string data effectively.

Basic string functions: length and emptiness

We'll explore functions to determine string length and check for emptiness, helping you avoid common errors and improve efficiency in your data operations.

`length`

The length function returns the length of a string in bytes. This is useful for validating data and ensuring that strings meet certain length requirements.

Attributes:

s (String): The string whose length you want to determine.

Returns:

An integer representing the length of the string in bytes.

Example:

Let's find the length of the customer_name column in our orders table.

SELECT
    order_id,
    customer_name,
    length(customer_name) AS name_length
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─name_length─┐
│        1 │ John Doe          │           8 │
│        2 │ Jane Smith        │          10 │
│        3 │ Carlos Garcia     │          13 │
│        4 │ Maria Rodriguez   │          15 │
└──────────┴───────────────────┴─────────────┘

The equivalent functions for calculating string length in other platforms are:

Postgres: length(customer_name)
DuckDB: length(customer_name)
Snowflake: length(customer_name)
Databricks: length(customer_name)

`lengthUTF8`

The lengthUTF8 function returns the length of a string in Unicode code points, which is particularly important for strings containing multi-byte characters.

Attributes:

s (String): The string whose length you want to determine.

Returns:

An integer representing the length of the string in Unicode code points.

Example:

Let's find the length of the customer_name column in Unicode code points.

SELECT
    order_id,
    customer_name,
    lengthUTF8(customer_name) AS name_length_utf8
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─name_length_utf8─┐
│        1 │ John Doe          │                8 │
│        2 │ Jane Smith        │               10 │
│        3 │ Carlos Garcia     │               13 │
│        4 │ Maria Rodriguez   │               15 │
└──────────┴───────────────────┴──────────────────┘

The equivalent functions for lengthUTF8 in other platforms are:

Postgres: length(customer_name). Postgres handles UTF-8 natively, so the standard length function works for UTF-8 strings.
DuckDB: length(customer_name). DuckDB also handles UTF-8 natively.
Snowflake: length(customer_name). Snowflake's length function is UTF-8 aware.
Databricks: length(customer_name). Databricks SQL also uses UTF-8 by default.

`empty`

The empty function checks whether a given string is empty. This is useful for data validation, ensuring that required fields are not left blank.

Attributes:

x (String): The string to check for emptiness.

Returns:

A boolean value: 1 if the string is empty, 0 otherwise.

Example:

Check if any customer_name entries are empty.

SELECT
    order_id,
    customer_name,
    empty(customer_name) AS is_empty
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─is_empty─┐
│        1 │ John Doe          │        0 │
│        2 │ Jane Smith        │        0 │
│        3 │ Carlos Garcia     │        0 │
│        4 │ Maria Rodriguez   │        0 │
└──────────┴───────────────────┴──────────┘

The equivalent functions for empty in other platforms are:

Postgres: customer_name = '' or length(customer_name) = 0
DuckDB: customer_name = '' or length(customer_name) = 0
Snowflake: customer_name = '' or length(customer_name) = 0
Databricks: customer_name = '' or length(customer_name) = 0

For these platforms, you would modify the query as follows:

SELECT
    order_id,
    customer_name,
    CASE WHEN customer_name = '' THEN 1 ELSE 0 END AS is_empty
FROM
    orders;

Note that these platforms don't have a direct equivalent to ClickHouse's empty function, so we use a comparison to an empty string or a length check to achieve the same result.

`notEmpty`

The notEmpty function checks whether a given string is not empty. This is the inverse of the empty function and is equally useful for data validation.

Attributes:

x (String): The string to check for non-emptiness.

Returns:

A boolean value: 1 if the string is not empty, 0 otherwise.

Example:

Check if all customer_name entries are not empty.

SELECT
    order_id,
    customer_name,
    notEmpty(customer_name) AS is_not_empty
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─is_not_empty─┐
│        1 │ John Doe          │            1 │
│        2 │ Jane Smith        │            1 │
│        3 │ Carlos Garcia     │            1 │
│        4 │ Maria Rodriguez   │            1 │
└──────────┴───────────────────┴──────────────┘

The equivalent functions for notEmpty in other platforms are:

Postgres: customer_name != '' or length(customer_name) > 0
DuckDB: customer_name != '' or length(customer_name) > 0
Snowflake: customer_name != '' or length(customer_name) > 0
Databricks: customer_name != '' or length(customer_name) > 0

For these platforms, you would modify the query as follows:

SELECT
    order_id,
    customer_name,
    CASE WHEN customer_name != '' THEN 1 ELSE 0 END AS is_not_empty
FROM
    orders;

Note that these platforms don't have a direct equivalent to ClickHouse's notEmpty function, so we use a comparison to an empty string or a length check to achieve the same result.

Common pitfalls

When working with string functions, it's essential to be aware of common pitfalls to avoid errors and inefficiencies:

Byte Length vs. Character Length: The length function returns the length in bytes, which may differ from the character count, especially for multi-byte characters. Use lengthUTF8 for accurate character counts.
Empty Strings: Ensure that fields expected to contain data are not empty. Use the empty and notEmpty functions to validate data before processing.

String replacement functions

ClickHouse provides robust string replacement functions for data cleaning and transformation. This section examines essential replacement functions with real-world examples.

`replaceOne`

The replaceOne function replaces the first occurrence of a specified pattern in a string with a replacement string. This is useful for correcting single instances of incorrect data.

Attributes:

haystack (String): The original string.
pattern (String): The substring to be replaced.
replacement (String): The string to replace the pattern.

Returns:

A new string with the first occurrence of the pattern replaced.

Example:

Replace the first occurrence of "Tacos" with "Taco" in the order_details column.

SELECT
    order_id,
    order_details,
    replaceOne(order_details, 'Tacos', 'Taco') AS corrected_order_details
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─corrected_order_details──────────┐
│        1 │ 3 Tacos, 2 Burritos      │ 3 Taco, 2 Burritos               │
│        2 │ 1 Taco, 1 Quesadilla     │ 1 Taco, 1 Quesadilla             │
│        3 │ 5 Tacos, 1 Nacho         │ 5 Taco, 1 Nacho                  │
│        4 │ 2 Burritos, 3 Quesadillas│ 2 Burritos, 3 Quesadillas        │
└──────────┴──────────────────────────┴──────────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: regexp_replace(order_details, 'Tacos', 'Taco', 1). The fourth parameter '1' limits the replacement to one occurrence.
DuckDB: regexp_replace(order_details, 'Tacos', 'Taco', 1). Similar to Postgres, the fourth parameter '1' limits the replacement to one occurrence.
Snowflake: regexp_replace(order_details, 'Tacos', 'Taco', 1, 1). The fourth parameter '1' specifies to start at the first character, and the fifth parameter '1' limits the replacement to one occurrence.
Databricks: regexp_replace(order_details, 'Tacos', 'Taco', 1). Like Postgres and DuckDB, this limits the replacement to the first occurrence.

`replaceAll`

The replaceAll function replaces all occurrences of a specified pattern in a string with a replacement string. This is ideal for thorough data cleaning.

Attributes:

haystack (String): The original string.
pattern (String): The substring to be replaced.
replacement (String): The string to replace the pattern.

Returns:

A new string with all occurrences of the pattern replaced.

Example:‍

Replace all occurrences of "Tacos" with "Taco" in the order_details column.

SELECT
    order_id,
    order_details,
    replaceAll(order_details, 'Tacos', 'Taco') AS corrected_order_details
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─corrected_order_details──────────┐
│        1 │ 3 Tacos, 2 Burritos      │ 3 Taco, 2 Burritos               │
│        2 │ 1 Taco, 1 Quesadilla     │ 1 Taco, 1 Quesadilla             │
│        3 │ 5 Tacos, 1 Nacho         │ 5 Taco, 1 Nacho                  │
│        4 │ 2 Burritos, 3 Quesadillas│ 2 Burritos, 3 Quesadillas        │
└──────────┴──────────────────────────┴──────────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: regexp_replace(order_details, 'Tacos', 'Taco', 'g'). The 'g' flag ensures all occurrences are replaced.
DuckDB: regexp_replace(order_details, 'Tacos', 'Taco'). By default, this replaces all occurrences.
Snowflake: regexp_replace(order_details, 'Tacos', 'Taco'). This replaces all occurrences by default.
Databricks: regexp_replace(order_details, 'Tacos', 'Taco'). This also replaces all occurrences by default.

Note that while ClickHouse uses a specific replaceAll function, many other platforms use regexp_replace with default behavior or flags to achieve the same result.

`replaceRegexpOne`

The replaceRegexpOne function replaces the first occurrence of a pattern matching a regular expression in a string. This is useful for more complex replacements based on patterns.

Attributes:

haystack (String): The original string.
pattern (String): The regular expression pattern to be replaced.
replacement (String): The string to replace the pattern.

Returns:

A new string with the first occurrence of the pattern replaced.

Example:

Replace the first occurrence of any digit with "X" in the order_details column.

SELECT
    order_id,
    order_details,
    replaceRegexpOne(order_details, '\\d', 'X') AS corrected_order_details
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─corrected_order_details──────────┐
│        1 │ 3 Tacos, 2 Burritos      │ X Tacos, 2 Burritos              │
│        2 │ 1 Taco, 1 Quesadilla     │ X Taco, 1 Quesadilla             │
│        3 │ 5 Tacos, 1 Nacho         │ X Tacos, 1 Nacho                 │
│        4 │ 2 Burritos, 3 Quesadillas│ X Burritos, 3 Quesadillas        │
└──────────┴──────────────────────────┴──────────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: regexp_replace(order_details, '\\d', 'X'). By default, Postgres replaces only the first occurrence, so no 'g' flag is needed.
DuckDB: regexp_replace(order_details, '\\d', 'X'). This function will replace the first occurrence by default.
Snowflake: regexp_replace(order_details, '\\d', 'X', 1). This limits the replacement to the first occurrence.
Databricks: regexp_replace(order_details, '\\d', 'X'). By default, this function replaces the first occurrence.

`replaceRegexpAll`

The replaceRegexpAll function replaces all occurrences of a pattern matching a regular expression in a string. This is useful for comprehensive data cleaning based on complex patterns.

Attributes:

haystack (String): The original string.
pattern (String): The regular expression pattern to be replaced.
replacement (String): The string to replace the pattern.

Returns:

A new string with all occurrences of the pattern replaced.

Example:

Replace all occurrences of any digit with "X" in the order_details column.

SELECT
    order_id,
    order_details,
    replaceRegexpAll(order_details, '\\d', 'X') AS corrected_order_details
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─corrected_order_details──────────┐
│        1 │ 3 Tacos, 2 Burritos      │ X Tacos, X Burritos              │
│        2 │ 1 Taco, 1 Quesadilla     │ X Taco, X Quesadilla             │
│        3 │ 5 Tacos, 1 Nacho         │ X Tacos, X Nacho                 │
│        4 │ 2 Burritos, 3 Quesadillas│ X Burritos, X Quesadillas        │
└──────────┴──────────────────────────┴──────────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: regexp_replace(order_details, '\\d', 'X', 'g'). In Postgres, the 'g' flag in the regexp_replace() function stands for “global”, meaning that it replaces all occurrences of the pattern in the string. Without the 'g' flag, only the first occurrence of the pattern would be replaced.
DuckDB: regexp_replace(order_details, '\\d', 'X').
Snowflake: regexp_replace(order_details, '\\d', 'X').
Databricks: regexp_replace(order_details, '\\d', 'X').

Efficiency tips

To optimize replacement operations and avoid performance hits:

Use specific patterns: When using regular expressions, be as specific as possible to avoid unnecessary replacements and improve performance.
Limit scope: Apply replacement functions only to the necessary columns and rows to minimize the processing load.

Substring functions

Extracting substrings is a common requirement in data manipulation, and precision is crucial to avoid errors. This section covers essential substring functions in ClickHouse, helping you perform these operations accurately and efficiently.

`substring`

The substring function returns a portion of a string starting from a specified offset, with an optional length parameter. This is useful for extracting specific segments of data for further analysis or reporting.

Attributes:

s (String): The original string.
offset (UInt32): The starting position for the substring (1-based index).
length (UInt32, optional): The number of characters to extract. If omitted, the substring extends to the end of the string.

Returns:

A substring of the original string.

Example:

Extract the first three characters from the order_details column.

SELECT
    order_id,
    order_details,
    substring(order_details, 1, 3) AS order_prefix
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─order_prefix─┐
│        1 │ 3 Tacos, 2 Burritos      │ 3 T          │
│        2 │ 1 Taco, 1 Quesadilla     │ 1 T          │
│        3 │ 5 Tacos, 1 Nacho         │ 5 T          │
│        4 │ 2 Burritos, 3 Quesadillas│ 2 B          │
└──────────┴──────────────────────────┴──────────────┘

The equivalent functions in other platforms are:

Postgres: substring(order_details, 1, 3).
DuckDB: substring(order_details, 1, 3).
Snowflake: substring(order_details, 1, 3).
Databricks: substring(order_details, 1, 3).

`substringUTF8`

The substringUTF8 function operates similarly to substring but considers Unicode code points, making it suitable for strings containing multi-byte characters.

Attributes:

s (String): The original string.
offset (UInt32): The starting position for the substring (1-based index).
length (UInt32, optional): The number of characters to extract. If omitted, the substring extends to the end of the string.

Returns:

A substring of the original string, considering Unicode code points.

Example:

Extract the first three characters from the customer_name column, considering Unicode code points.

SELECT
    order_id,
    customer_name,
    substringUTF8(customer_name, 1, 3) AS name_prefix
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─name_prefix─┐
│        1 │ John Doe          │ Joh         │
│        2 │ Jane Smith        │ Jan         │
│        3 │ Carlos Garcia     │ Car         │
│        4 │ Maria Rodriguez   │ Mar         │
└──────────┴───────────────────┴─────────────┘

The equivalent functions in other platforms are:

Postgres: substring(customer_name, 1, 3).
DuckDB: substring(customer_name, 1, 3).
Snowflake: substring(customer_name, 1, 3).
Databricks: substring(customer_name, 1, 3).

These platforms natively handle UTF-8, so no separate substringUTF8() function is needed.

Advanced extraction

In more complex scenarios, you might need to extract substrings based on patterns or dynamic conditions. Here are a few advanced techniques:

Extracting dynamic segments:
- Use functions like position to find the starting point of a substring dynamically.
- Combine substring with other functions to extract variable-length segments.‍

Example:‍
Extract the part of order_details after the first comma.

SELECT
    order_id,
    order_details,
    substring(order_details, position(order_details, ',') + 2) AS details_after_comma
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─details_after_comma─┐
│        1 │ 3 Tacos, 2 Burritos      │ 2 Burritos          │
│        2 │ 1 Taco, 1 Quesadilla     │ 1 Quesadilla        │
│        3 │ 5 Tacos, 1 Nacho         │ 1 Nacho             │
│        4 │ 2 Burritos, 3 Quesadillas│ 3 Quesadillas       │
└──────────┴──────────────────────────┴─────────────────────┘

The equivalent functions in other platforms are:

Postgres: substring(order_details, position(order_details, ',') + 1).
DuckDB: substring(order_details, instr(order_details, ',') + 2).
Snowflake: substring(order_details, position(order_details, ',') + 2).
Databricks: substring(order_details, instr(order_details, ',') + 2).

Using regular expressions:

Combine regexpExtract with substring for more sophisticated extraction based on patterns.

Example:

Extract the first numeric value from order_details.

SELECT
    order_id,
    order_details,
    substring(order_details, position(order_details, regexpExtract(order_details, '\\d+', 0)), length(regexpExtract(order_details, '\\d+', 0))) AS first_number
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─first_number─┐
│        1 │ 3 Tacos, 2 Burritos      │            3 │
│        2 │ 1 Taco, 1 Quesadilla     │            1 │
│        3 │ 5 Tacos, 1 Nacho         │            5 │
│        4 │ 2 Burritos, 3 Quesadillas│            2 │
└──────────┴──────────────────────────┴──────────────┘

The equivalent functions in other platforms are:

Postgres: Use substring() with position() and regexp_matches() for extracting the first number. Example: substring(order_details, position(order_details, (regexp_matches(order_details, '\\d+'))[1])).
DuckDB: Use substring() with instr() and regexp_extract() for extracting the first number.
Snowflake: Use substring() with position() and regexp_substr() for extracting the first number.
Databricks: Use substring() with instr() and regexp_extract() for extracting the first number.

String concatenation

Merging strings is often come in handy in data processing. This section explores key ClickHouse functions for combining strings effectively, while highlighting potential issues to avoid.

`concat`

The concat function combines multiple strings into one without any separator. This is useful for merging fields or constructing new strings from existing data.

Attributes:

s1, s2, ... (String): The strings to be concatenated.

Returns:

A single string resulting from the concatenation of all input strings.

Example:

Concatenate customer_name and order_details into a single string.

SELECT
    order_id,
    customer_name,
    order_details,
    concat(customer_name, ' - ', order_details) AS full_order_description
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─order_details────────────┬─full_order_description────────────────────────────┐
│        1 │ John Doe          │ 3 Tacos, 2 Burritos      │ John Doe - 3 Tacos, 2 Burritos                    │
│        2 │ Jane Smith        │ 1 Taco, 1 Quesadilla     │ Jane Smith - 1 Taco, 1 Quesadilla                 │
│        3 │ Carlos Garcia     │ 5 Tacos, 1 Nacho         │ Carlos Garcia - 5 Tacos, 1 Nacho                  │
│        4 │ Maria Rodriguez   │ 2 Burritos, 3 Quesadillas│ Maria Rodriguez - 2 Burritos, 3 Quesadillas       │
└──────────┴───────────────────┴──────────────────────────┴──────────────────────────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: concat(customer_name, ' - ', order_details).
DuckDB: concat(customer_name, ' - ', order_details).
Snowflake: concat(customer_name, ' - ', order_details).
Databricks: concat(customer_name, ' - ', order_details).

`concatWithSeparator`

The concatWithSeparator function concatenates multiple strings using a specified separator. This is particularly useful for creating delimited strings or formatted outputs.

Attributes:

sep (String): The separator to be used between concatenated strings.
expr1, expr2, expr3... (String): The strings to be concatenated.

Returns:

A single string resulting from the concatenation of all input strings with the specified separator.

Example:

Concatenate customer_name and order_details with a comma separator.

SELECT
    order_id,
    customer_name,
    order_details,
    concatWithSeparator(', ', customer_name, order_details) AS full_order_description
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─order_details────────────┬─full_order_description────────────────────────┐
│        1 │ John Doe          │ 3 Tacos, 2 Burritos      │ John Doe, 3 Tacos, 2 Burritos                 │
│        2 │ Jane Smith        │ 1 Taco, 1 Quesadilla     │ Jane Smith, 1 Taco, 1 Quesadilla              │
│        3 │ Carlos Garcia     │ 5 Tacos, 1 Nacho         │ Carlos Garcia, 5 Tacos, 1 Nacho               │
│        4 │ Maria Rodriguez   │ 2 Burritos, 3 Quesadillas│ Maria Rodriguez, 2 Burritos, 3 Quesadillas    │
└──────────┴───────────────────┴──────────────────────────┴───────────────────────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: concat_ws(', ', customer_name, order_details).
DuckDB: concat_ws(', ', customer_name, order_details).
Snowflake: concat_ws(', ', customer_name, order_details).
Databricks: concat_ws(', ', customer_name, order_details).

Common mistakes

When using string concatenation functions, it's important to be aware of common pitfalls to avoid unexpected results:

Incorrect separator usage: Ensure the separator is correctly specified in concatWithSeparator. An incorrect or missing separator can lead to improperly formatted strings.
Unexpected results: Be mindful of null values in the strings being concatenated. ClickHouse may handle nulls differently, potentially leading to unexpected results in the final concatenated string.

Case conversion functions

Case conversion functions in ClickHouse allow you to change character case for data formatting and standardization. This section explores key functions for accurate and efficient case conversion operations.

`upper`

The upper function converts ASCII Latin symbols in a string to uppercase. This is useful for standardizing data formats, such as converting all names to uppercase for consistency.

Attributes:

input (String): The string to be converted to uppercase.

Returns:

A new string with all ASCII Latin characters converted to uppercase.

Example:

Convert the customer_name column to uppercase.

SELECT
    order_id,
    customer_name,
    upper(customer_name) AS customer_name_upper
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─customer_name_upper─┐
│        1 │ John Doe          │ JOHN DOE            │
│        2 │ Jane Smith        │ JANE SMITH          │
│        3 │ Carlos Garcia     │ CARLOS GARCIA       │
│        4 │ Maria Rodriguez   │ MARIA RODRIGUEZ     │
└──────────┴───────────────────┴─────────────────────┘

The equivalent functions in other platforms are:

Postgres: upper(customer_name).
DuckDB: upper(customer_name).
Snowflake: upper(customer_name).
Databricks: upper(customer_name).

`lower`

The lower function converts ASCII Latin symbols in a string to lowercase. This is useful for ensuring uniformity in data entry, such as converting all email addresses to lowercase.

Attributes:

input (String): The string to be converted to lowercase.

Returns:

A new string with all ASCII Latin characters converted to lowercase.

Example:

Convert the customer_name column to lowercase.

SELECT
    order_id,
    customer_name,
    lower(customer_name) AS customer_name_lower
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─customer_name_lower─┐
│        1 │ John Doe          │ john doe            │
│        2 │ Jane Smith        │ jane smith          │
│        3 │ Carlos Garcia     │ carlos garcia       │
│        4 │ Maria Rodriguez   │ maria rodriguez     │
└──────────┴───────────────────┴─────────────────────┘

The equivalent functions in other platforms are:

Postgres: lower(customer_name).
DuckDB: lower(customer_name).
Snowflake: lower(customer_name).
Databricks: lower(customer_name).

`upperUTF8`

The upperUTF8 function converts characters in a UTF-8 encoded string to uppercase. This is particularly important for strings containing non-ASCII characters.

Attributes:

input (String): The string to be converted to uppercase.

Returns:

A new string with all characters converted to uppercase, considering UTF-8 encoding.

Example:

Convert the customer_name column to uppercase, considering UTF-8 encoding.

SELECT
    order_id,
    customer_name,
    upperUTF8(customer_name) AS customer_name_upper_utf8
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─customer_name_upper_utf8─┐
│        1 │ John Doe          │ JOHN DOE                 │
│        2 │ Jane Smith        │ JANE SMITH               │
│        3 │ Carlos Garcia     │ CARLOS GARCIA            │
│        4 │ Maria Rodriguez   │ MARIA RODRIGUEZ          │
└──────────┴───────────────────┴──────────────────────────┘

The equivalent functions in other platforms are:

Postgres: upper(customer_name).
DuckDB: upper(customer_name).
Snowflake: upper(customer_name).
Databricks: upper(customer_name).

These platforms natively handle UTF-8, so no separate upperUTF8() function is needed.

`lowerUTF8`

The lowerUTF8 function converts characters in a UTF-8 encoded string to lowercase. This ensures that all characters, including non-ASCII ones, are uniformly converted.

Attributes:

input (String): The string to be converted to lowercase.

Returns:

A new string with all characters converted to lowercase, considering UTF-8 encoding.

Example:

Convert the customer_name column to lowercase, considering UTF-8 encoding.

SELECT
    order_id,
    customer_name,
    lowerUTF8(customer_name) AS customer_name_lower_utf8
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─customer_name_lower_utf8─┐
│        1 │ John Doe          │ john doe                 │
│        2 │ Jane Smith        │ jane smith               │
│        3 │ Carlos Garcia     │ carlos garcia            │
│        4 │ Maria Rodriguez   │ maria rodriguez          │
└──────────┴───────────────────┴──────────────────────────┘

The equivalent functions in other platforms are:

Postgres: lower(customer_name).
DuckDB: lower(customer_name).
Snowflake: lower(customer_name).
Databricks: lower(customer_name).

These platforms natively handle UTF-8, so no separate lowerUTF8() function is needed.

Padding functions

Padding helps format and align data, ensuring strings meet specific length requirements and are presented consistently. This section covers essential padding functions in ClickHouse.

`leftPad`

The leftPad function pads a string on the left with a specified character or string until it reaches a desired length. This is useful for aligning data in reports or ensuring string fields meet minimum length requirements.

Attributes:

string (String): The original string to be padded.
length (UInt32): The desired total length of the padded string.
pad_string (String, optional): The string to pad with. If omitted, spaces are used by default.

Returns:

A new string padded on the left to the specified length.

Example:

Pad the customer_name column to a length of 10 characters with spaces.

SELECT
    order_id,
    customer_name,
    leftPad(customer_name, 10) AS padded_customer_name
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─padded_customer_name─┐
│        1 │ John Doe          │    John Doe          │
│        2 │ Jane Smith        │  Jane Smith          │
│        3 │ Carlos Garcia     │ Carlos Gar           │
│        4 │ Maria Rodriguez   │ Maria Rodr           │
└──────────┴───────────────────┴──────────────────────┘

The equivalent functions in other platforms are:

Postgres: lpad(customer_name, 10).
DuckDB: lpad(customer_name, 10).
Snowflake: lpad(customer_name, 10).
Databricks: lpad(customer_name, 10).

`rightPad`

The rightPad function pads a string on the right with a specified character or string until it reaches a desired length. This is useful for formatting data for display or ensuring string fields have a uniform length.

Attributes:

string (String): The original string to be padded.
length (UInt32): The desired total length of the padded string.
pad_string (String, optional): The string to pad with. If omitted, spaces are used by default.

Returns:

A new string padded on the right to the specified length.

Example:

Pad the customer_name column to a length of 0 characters with spaces.

SELECT
    order_id,
    customer_name,
    rightPad(customer_name, 10) AS padded_customer_name
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─padded_customer_name─┐
│        1 │ John Doe          │ John Doe             │
│        2 │ Jane Smith        │ Jane Smith           │
│        3 │ Carlos Garcia     │ Carlos Gar           │
│        4 │ Maria Rodriguez   │ Maria Rodr           │
└──────────┴───────────────────┴──────────────────────┘

The equivalent functions in other platforms are:

Postgres: rpad(customer_name, 10).
DuckDB: rpad(customer_name, 10).
Snowflake: rpad(customer_name, 10).
Databricks: rpad(customer_name, 10).

`leftPadUTF8`

The leftPadUTF8 function pads a string on the left with a specified character or string, considering UTF-8 encoding. This is important for strings containing multi-byte characters.

Attributes:

string (String): The original string to be padded.
length (UInt32): The desired total length of the padded string.
pad_string (String, optional): The string to pad with. If omitted, spaces are used by default.

Returns:

A new string padded on the left to the specified length, considering UTF-8 encoding.

Example:

Pad the customer_name column to a length of 20 characters with spaces, considering UTF-8 encoding.

SELECT
    order_id,
    customer_name,
    leftPadUTF8(customer_name, 20) AS padded_customer_name_utf8
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─padded_customer_name_utf8─┐
│        1 │ John Doe          │             John Doe      │
│        2 │ Jane Smith        │           Jane Smith      │
│        3 │ Carlos Garcia     │        Carlos Garcia      │
│        4 │ Maria Rodriguez   │      Maria Rodriguez      │
└──────────┴───────────────────┴───────────────────────────┘

The equivalent functions in other platforms are:

Postgres: lpad(customer_name, 20).
DuckDB: lpad(customer_name, 20).
Snowflake: lpad(customer_name, 20).
Databricks: lpad(customer_name, 20).

These platforms natively handle UTF-8, so no separate leftPadUTF8() function is needed.

`rightPadUTF8`

The rightPadUTF8 function pads a string on the right with a specified character or string, considering UTF-8 encoding. This ensures that all characters, including multi-byte ones, are uniformly padded.

Attributes:

string (String): The original string to be padded.
length (UInt32): The desired total length of the padded string.
pad_string (String, optional): The string to pad with. If omitted, spaces are used by default.

Returns:

A new string padded on the right to the specified length, considering UTF-8 encoding.

Example:

Pad the customer_name column to a length of 20 characters with spaces, considering UTF-8 encoding.

SELECT
    order_id,
    customer_name,
    rightPadUTF8(customer_name, 20) AS padded_customer_name_utf8
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─padded_customer_name_utf8─┐
│        1 │ John Doe          │ John Doe                  │
│        2 │ Jane Smith        │ Jane Smith                │
│        3 │ Carlos Garcia     │ Carlos Garcia             │
│        4 │ Maria Rodriguez   │ Maria Rodriguez           │
└──────────┴───────────────────┴───────────────────────────┘

The equivalent functions in other platforms are:

Postgres: rpad(customer_name, 20).
DuckDB: rpad(customer_name, 20).
Snowflake: rpad(customer_name, 20).
Databricks: rpad(customer_name, 20).

These platforms natively handle UTF-8, so no separate rightPadUTF8() function is needed.

String reversal and repetition

String reversal and repetition functions in ClickHouse allow for unique string manipulations. These tools ensure data accuracy and meet specific formatting needs. This section explores key reversal and repetition functions, providing practical examples to demonstrate their use.

`reverse`

The reverse function reverses the characters in a string. This can be useful for tasks such as creating palindromes, reversing order information, or preparing data for specific analytical needs.

Attributes:

s (String): The string to be reversed.

Returns:

A new string with the characters in reverse order.

Example:

Reverse the customer_name column.

SELECT
    order_id,
    customer_name,
    reverse(customer_name) AS reversed_customer_name
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─reversed_customer_name_utf8─┐
│        1 │ John Doe          │ eoD nhoJ                    │
│        2 │ Jane Smith        │ htimS enaJ                  │
│        3 │ Carlos Garcia     │ aicraG solraC               │
│        4 │ Maria Rodriguez   │ zeugirdoR airaM             │
└──────────┴───────────────────┴─────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: reverse(customer_name).
DuckDB: reverse(customer_name).
Snowflake: reverse(customer_name).
Databricks: reverse(customer_name).

`reverseUTF8`

The reverseUTF8 function reverses the characters in a UTF-8 encoded string. This is important for strings containing multi-byte characters, ensuring the reversal is accurate and maintains character integrity.

Attributes:

s (String): The UTF-8 encoded string to be reversed.

Returns:

A new string with the characters in reverse order, considering UTF-8 encoding.

Example:

Reverse the customer_name column, considering UTF-8 encoding.

SELECT
    order_id,
    customer_name,
    reverseUTF8(customer_name) AS reversed_customer_name_utf8
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─reversed_customer_name_utf8─┐
│        1 │ John Doe          │ eoD nhoJ                    │
│        2 │ Jane Smith        │ htimS enaJ                  │
│        3 │ Carlos Garcia     │ aicraG solraC               │
│        4 │ Maria Rodriguez   │ zeugirdoR airaM             │
└──────────┴───────────────────┴─────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: reverse(customer_name).
DuckDB: reverse(customer_name).
Snowflake: reverse(customer_name).
Databricks: reverse(customer_name).

These platforms natively handle UTF-8, so no separate reverseUTF8() function is needed.

`repeat`

The repeat function repeats a string a specified number of times. This is useful for generating repeated patterns, creating test data, or formatting strings for display.

Attributes:

s (String): The string to be repeated.
n (UInt32): The number of times to repeat the string.

Returns:

A new string consisting of the original string repeated n times.

Example:

Repeat the customer_name column three times.

SELECT
    order_id,
    customer_name,
    repeat(customer_name, 3) AS repeated_customer_name
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─repeated_customer_name───────────────────────────────┐
│        1 │ John Doe          │ John DoeJohn DoeJohn Doe                             │
│        2 │ Jane Smith        │ Jane SmithJane SmithJane Smith                       │
│        3 │ Carlos Garcia     │ Carlos GarciaCarlos GarciaCarlos Garcia              │
│        4 │ Maria Rodriguez   │ Maria RodriguezMaria RodriguezMaria Rodriguez        │
└──────────┴───────────────────┴──────────────────────────────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: repeat(customer_name, 3).
DuckDB: repeat(customer_name, 3).
Snowflake: repeat(customer_name, 3).
Databricks: repeat(customer_name, 3).

Potential issues

When using string reversal and repetition functions, be aware of common pitfalls to avoid unexpected results:

Incorrect usage: Ensure the correct function is used for the desired operation. For example, use reverseUTF8 for UTF-8 encoded strings to maintain character integrity.
Output length: Be mindful of the output length when using the repeat function, as repeating a string many times can result in very long strings, potentially leading to performance issues or truncation.

String search functions

String search functions are vital tools for developers working with data in ClickHouse. These functions enable precise substring location within larger text fields. This section explores key string search capabilities in ClickHouse, providing you with the means to efficiently locate and analyze specific data patterns.

`position`

The position function returns the position of the first occurrence of a substring within a string. This is useful for locating specific data points within larger strings.

Attributes:

s (String): The original string.
substring (String): The substring to search for.

Returns:

An integer representing the 1-based index of the first occurrence of the substring. If the substring is not found, it returns 0.

Example:

Find the position of "Tacos" in the order_details column.

SELECT
    order_id,
    order_details,
    position(order_details, 'Tacos') AS tacos_position
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─tacos_position─┐
│        1 │ 3 Tacos, 2 Burritos      │              3 │
│        2 │ 1 Taco, 1 Quesadilla     │              0 │
│        3 │ 5 Tacos, 1 Nacho         │              3 │
│        4 │ 2 Burritos, 3 Quesadillas│              0 │
└──────────┴──────────────────────────┴────────────────┘

The equivalent functions in other platforms are:

Postgres: position('Tacos' IN order_details).
DuckDB: instr(order_details, 'Tacos').
Snowflake: position('Tacos' IN order_details).
Databricks: instr(order_details, 'Tacos').

`multiSearchAny`

The multiSearchAny function checks if any of the given substrings are present in the string. This is useful for validating the presence of multiple potential matches.

Attributes:

s (String): The original string.
substrings (Array of Strings): The substrings to search for.

Returns:

A boolean value: 1 if at least one substring matches, otherwise 0.

Example:

Check if any of the substrings "Tacos" or "Burritos" are present in the order_details column.

SELECT
    order_id,
    order_details,
    multiSearchAny(order_details, ['Tacos', 'Burritos']) AS contains_tacos_or_burritos
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─contains_tacos_or_burritos─┐
│        1 │ 3 Tacos, 2 Burritos      │                          1 │
│        2 │ 1 Taco, 1 Quesadilla     │                          0 │
│        3 │ 5 Tacos, 1 Nacho         │                          1 │
│        4 │ 2 Burritos, 3 Quesadillas│                          1 │
└──────────┴──────────────────────────┴────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: No direct equivalent. You would use position() or strpos() with OR, e.g., position(order_details, 'Tacos') > 0 OR position(order_details, 'Burritos') > 0.
DuckDB: No direct equivalent. You would use instr() with OR conditions, e.g., instr(order_details, 'Tacos') > 0 OR instr(order_details, 'Burritos') > 0.
Snowflake: No direct equivalent. You would use position() with OR, e.g., position(order_details, 'Tacos') > 0 OR position(order_details, 'Burritos') > 0.
Databricks: No direct equivalent. You would use instr() with OR, e.g., instr(order_details, 'Tacos') > 0 OR instr(order_details, 'Burritos') > 0.

`multiSearchAllPositions`

The multiSearchAllPositions function returns an array of positions for each substring found within the string. This is useful for detailed analysis of where each substring occurs.

Attributes:

s (String): The original string.
substrings (Array of Strings): The substrings to search for.

Returns:

An array of integers representing the positions of each substring. If a substring is not found, its position is 0.

Example:

Find the positions of "Tacos" and "Burritos" in the order_details column.

SELECT
    order_id,
    order_details,
    multiSearchAllPositions(order_details, ['Tacos', 'Burritos']) AS positions
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─positions───┐
│        1 │ 3 Tacos, 2 Burritos      │ [3, 12]     │
│        2 │ 1 Taco, 1 Quesadilla     │ [0, 0]      │
│        3 │ 5 Tacos, 1 Nacho         │ [3, 0]      │
│        4 │ 2 Burritos, 3 Quesadillas│ [0, 3]      │
└──────────┴──────────────────────────┴─────────────┘

The equivalent functions in other platforms are:

Postgres: No direct equivalent. You would need to use multiple position() or strpos() calls and store the positions in an array manually.
DuckDB: No direct equivalent. You would have to manually capture positions using instr() and create an array of results.
Snowflake: No direct equivalent. Multiple position() calls would need to be combined into an array manually.
Databricks: No direct equivalent. You would use multiple instr() calls and manually build the array of positions.

`multiSearchFirstIndex`

The multiSearchFirstIndex function returns the index of the first found substring in the array. If no substring matches, it returns 0. This is useful for quickly identifying the first match among multiple potential substrings.

Attributes:

s (String): The original string.
substrings (Array of Strings): The substrings to search for.

Returns:

An integer representing the index of the first found substring in the array.

Example:

Find the first occurrence of either "Tacos" or "Burritos" in the order_details column.

SELECT
    order_id,
    order_details,
    multiSearchFirstIndex(order_details, ['Tacos', 'Burritos']) AS first_match_index
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─first_match_index─┐
│        1 │ 3 Tacos, 2 Burritos      │                 1 │
│        2 │ 1 Taco, 1 Quesadilla     │                 0 │
│        3 │ 5 Tacos, 1 Nacho         │                 1 │
│        4 │ 2 Burritos, 3 Quesadillas│                 2 │
└──────────┴──────────────────────────┴───────────────────┘

The equivalent functions in other platforms are:

Postgres: No direct equivalent. You would need to use multiple position() or strpos() calls and compare the results with a CASE or LEAST() statement.
DuckDB: No direct equivalent. You would manually compare positions using instr() for each search term.
Snowflake: No direct equivalent. You would use position() for each term and compare the results.
Databricks: No direct equivalent. Use instr() for each term and manually calculate the first match.

Trimming functions

Trimming functions clean and format data by removing unwanted whitespace or characters from strings. This section covers how to use these functions effectively to improve data quality and consistency.

`trim`

The trim function removes all specified characters from the start, end, or both ends of a string. By default, it removes common whitespace characters.

Attributes:

s (String): The string to be trimmed.
chars (String, optional): The characters to be removed. If omitted, common whitespace characters are removed by default.

Returns:

A new string with the specified characters removed from both ends.

Example:

Trim whitespace from both ends of the customer_name column.

SELECT
    order_id,
    customer_name,
    trim(customer_name) AS trimmed_customer_name
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─trimmed_customer_name─┐
│        1 │ John Doe          │ John Doe              │
│        2 │ Jane Smith        │ Jane Smith            │
│        3 │ Carlos Garcia     │ Carlos Garcia         │
│        4 │ Maria Rodriguez   │ Maria Rodriguez       │
└──────────┴───────────────────┴───────────────────────┘

The equivalent functions in other platforms are:

Postgres: trim(customer_name).
DuckDB: trim(customer_name).
Snowflake: trim(customer_name).
Databricks: trim(customer_name).

`trimLeft`

The trimLeft function removes all specified characters from the beginning of a string. This is useful for cleaning up leading whitespace or other unwanted characters.

Attributes:

s (String): The string to be trimmed.
chars (String, optional): The characters to be removed. If omitted, common whitespace characters are removed by default.

Returns:

A new string with the specified characters removed from the beginning.

Example:

Trim leading whitespace from the customer_name column.

SELECT
    order_id,
    customer_name,
    trimLeft(customer_name) AS left_trimmed_customer_name
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─left_trimmed_customer_name─┐
│        1 │ John Doe          │ John Doe                   │
│        2 │ Jane Smith        │ Jane Smith                 │
│        3 │ Carlos Garcia     │ Carlos Garcia              │
│        4 │ Maria Rodriguez   │ Maria Rodriguez            │
└──────────┴───────────────────┴────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: ltrim(customer_name).
DuckDB: ltrim(customer_name).
Snowflake: ltrim(customer_name).
Databricks: ltrim(customer_name).

`trimRight`

The trimRight function removes all specified characters from the end of a string. This is particularly useful for cleaning up trailing whitespace or other unwanted characters.

Attributes:

s (String): The string to be trimmed.
chars (String, optional): The characters to be removed. If omitted, common whitespace characters are removed by default.

Returns:

A new string with the specified characters removed from the end.

Example:

Trim trailing whitespace from the customer_name column.

SELECT
    order_id,
    customer_name,
    trimRight(customer_name) AS right_trimmed_customer_name
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─right_trimmed_customer_name─┐
│        1 │ John Doe          │ John Doe                    │
│        2 │ Jane Smith        │ Jane Smith                  │
│        3 │ Carlos Garcia     │ Carlos Garcia               │
│        4 │ Maria Rodriguez   │ Maria Rodriguez             │
└──────────┴───────────────────┴─────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: rtrim(customer_name).
DuckDB: rtrim(customer_name).
Snowflake: rtrim(customer_name).
Databricks: rtrim(customer_name).

`trimBoth`

The trimBoth function removes all specified characters from both ends of a string. This function is useful for thorough cleaning of strings, ensuring no unwanted characters remain at either end.

Attributes:

s (String): The string to be trimmed.
chars (String, optional): The characters to be removed. If omitted, common whitespace characters are removed by default.

Returns:

A new string with the specified characters removed from both ends.

Example:

Trim whitespace from both ends of the customer_name column.

SELECT
    order_id,
    customer_name,
    trimBoth(customer_name) AS both_trimmed_customer_name
FROM
    orders;

Output:

┌─order_id─┬─customer_name─────┬─both_trimmed_customer_name─┐
│        1 │ John Doe          │ John Doe                   │
│        2 │ Jane Smith        │ Jane Smith                 │
│        3 │ Carlos Garcia     │ Carlos Garcia              │
│        4 │ Maria Rodriguez   │ Maria Rodriguez            │
└──────────┴───────────────────┴────────────────────────────┘

The equivalent functions in other platforms are:

Postgres: trim(customer_name).
DuckDB: trim(customer_name).
Snowflake: trim(customer_name).
Databricks: trim(customer_name).

Regular expressions: pattern matching

Regular expressions (regex) are powerful pattern-matching tools for string manipulation. They enable complex searches, replacements, and extractions based on specific patterns. This section covers regex capabilities in ClickHouse.

`like`

The like function checks whether a string matches a simple pattern containing % (wildcard for any number of characters) and _ (wildcard for a single character). This function is useful for basic pattern matching without the complexity of full regex.

Attributes:

s (String): The string to be checked.
pattern (String): The pattern to match against.

Returns:

A boolean value: 1 if the string matches the pattern, otherwise 0.

Example:

Check if order_details contains the word "Tacos".

SELECT
    order_id,
    order_details,
    order_details LIKE '%Tacos%' AS contains_tacos
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─contains_tacos─┐
│        1 │ 3 Tacos, 2 Burritos      │              1 │
│        2 │ 1 Taco, 1 Quesadilla     │              0 │
│        3 │ 5 Tacos, 1 Nacho         │              1 │
│        4 │ 2 Burritos, 3 Quesadillas│              0 │
└──────────┴──────────────────────────┴────────────────┘

The equivalent functions in other platforms are:

Postgres: order_details LIKE '%Tacos%'.
DuckDB: order_details LIKE '%Tacos%'.
Snowflake: order_details LIKE '%Tacos%'.
Databricks: order_details LIKE '%Tacos%'.

`regexpExtract`

The regexpExtract function extracts a substring from a string that matches a regular expression pattern. This function is useful for extracting specific parts of a string based on complex patterns.

Attributes:

s (String): The original string.
pattern (String): The regular expression pattern to match.
index (UInt32, optional): The index of the capturing group to return. Defaults to 1.

Returns:

A substring that matches the specified pattern.

Example:

Extract the first numeric value from order_details.

SELECT
    order_id,
    order_details,
    regexpExtract(order_details, '\\d+', 0) AS first_number
FROM
    orders;

Output:

┌─order_id─┬─order_details────────────┬─first_number─┐
│        1 │ 3 Tacos, 2 Burritos      │            3 │
│        2 │ 1 Taco, 1 Quesadilla     │            1 │
│        3 │ 5 Tacos, 1 Nacho         │            5 │
│        4 │ 2 Burritos, 3 Quesadillas│            2 │
└──────────┴──────────────────────────┴──────────────┘

The equivalent functions in other platforms are:

Postgres: regexp_matches() combined with array access.
DuckDB: regexp_extract() with the capture group index.
Snowflake: regexp_substr() for extracting the match.
Databricks: regexp_extract() with the capture group index.

Conclusion

In this post, we've explored essential string functions in ClickHouse, covering a wide range of operations from basic manipulations to complex pattern matching. We've learned about:

Case conversion functions like lower, upper, and capitalize
String manipulation functions such as substring, concat, and replace
Trimming functions including trimLeft, trimRight, and trimBoth
Pattern matching capabilities with like and regexpExtract

We explained each function with its syntax, usage, and practical examples, along with comparisons to equivalent functions in other popular database systems like Postgres, DuckDB, Snowflake, and Databricks.

This post serves as a valuable reference for developers and data analysts working with ClickHouse. By bookmarking this guide, you'll have quick access to these essential string functions, their usage, and how they compare across different platforms.

You can get started with Propel's Serverless ClickHouse today and receive $15 in monthly credits forever. At any point, upgrade to pay-as-you-go, or contact us to learn more about our volume-based discounts. Visit our pricing page for details.

What's new in ClickHouse® version 24.10

This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.