String Search functions

ClickHouse function reference

position

Returns the position (in bytes, starting from 1) of a substring in a string.

Syntax

position(haystack, needle[, start_pos])

Alias:

position(needle IN haystack)

Arguments

haystack (String): String to search in.
needle (String): Substring to search for.
start_pos (UInt, optional): Position (1-based) in haystack to start the search.

Returns

Starting position in bytes (counting from 1) if the substring was found. (UInt64)
0 if the substring was not found. (UInt64)

If needle is empty, these rules apply:

If no start_pos specified: return 1
If start_pos = 0: return 1
If start_pos >= 1 and start_pos <= length(haystack) + 1: return start_pos
Otherwise: return 0

Example

SELECT
	position('Carne asada tacos are delicious!', 'taco'),
	position('Carne asada tacos are delicious!', 'burrito'),
	position('Carne asada tacos are delicious!', 'a', 10);

Result:

| position('Carne asada tacos are delicious!', 'taco') | position('Carne asada tacos are delicious!', 'burrito') | position('Carne asada tacos are delicious!', 'a', 10) |
|------------------------------------------------------|---------------------------------------------------------|-------------------------------------------------------|
| 13                                                   | 0                                                       | 15                                                    |

In this example:

‘taco’ is found at position 13
‘burrito’ is not found, so 0 is returned
Starting from position 10, the next ‘a’ is found at position 15

Example with empty needle:

SELECT
	position('Taco Tuesday', ''),
	position('Taco Tuesday', '', 0),
	position('Taco Tuesday', '', 1),
	position('Taco Tuesday', '', 5),
	position('Taco Tuesday', '', 13),
	position('Taco Tuesday', '', 14);

Result:

| position('Taco Tuesday', '') | position('Taco Tuesday', '', 0) | position('Taco Tuesday', '', 1) | position('Taco Tuesday', '', 5) | position('Taco Tuesday', '', 13) | position('Taco Tuesday', '', 14) |
|------------------------------|---------------------------------|---------------------------------|---------------------------------|----------------------------------|----------------------------------|
| 1                            | 1                               | 1                               | 5                               | 13                               | 0                                |

This example demonstrates the behavior with an empty needle string and various start positions.

locate

Returns the position of a substring within a string.

Syntax:

locate(needle, haystack[, start_pos])

Arguments:

needle (String): Substring to search for.
haystack (String): String to search in.
start_pos (UInt, optional): Position to start the search from (1-based).

Returns:

The position (in bytes) of the first occurrence of needle in haystack, starting from 1.
0 if the substring is not found.

If needle is empty, the function follows these rules:
- If no start_pos is specified: returns 1
- If start_pos = 0: returns 1
- If start_pos >= 1 and start_pos <= length(haystack) + 1: returns start_pos
- Otherwise: returns 0

Example:

SELECT
	locate('salsa', 'spicy salsa verde'),
	locate('guac', 'spicy salsa verde'),
	locate('a', 'spicy salsa verde', 8)

Result:

| locate('salsa', 'spicy salsa verde') | locate('guac', 'spicy salsa verde') | locate('a', 'spicy salsa verde', 8) |
|--------------------------------------|-------------------------------------|-------------------------------------|
| 7                                    | 0                                   | 13                                  |

In this example:

‘salsa’ is found at position 7 in the string.
‘guac’ is not found, so it returns 0.
‘a’ is found at position 13 when starting the search from position 8.

The locate function is case-sensitive. For case-insensitive searches, consider using locateCaseInsensitive or locateCaseInsensitiveUTF8 functions.

positionCaseInsensitive

Returns the position (in bytes, starting from 1) of a substring in a string, ignoring case.

Syntax

positionCaseInsensitive(haystack, needle[, start_pos])

Arguments

haystack (String): String to search in.
needle (String): Substring to search for.
start_pos (UInt, optional): Position to start the search from (1-based).

Returns

The position (in bytes) of the found substring, starting from 1.
0 if the substring was not found.

Example

SELECT
  positionCaseInsensitive('Taco Tuesday', 'taco');

Result:

| positionCaseInsensitive('Taco Tuesday', 'taco') |
|-------------------------------------------------|
| 1                                               |

This function performs a case-insensitive search, so ‘taco’ matches ‘Taco’ at position 1.

Example with start_pos

SELECT
  positionCaseInsensitive('Taco Tuesday Taco Thursday', 'taco', 7);

Result:

| positionCaseInsensitive('Taco Tuesday Taco Thursday', 'taco', 7) |
|------------------------------------------------------------------|
| 14                                                               |

Here, the search starts from position 7, finding the second occurrence of ‘Taco’ at position 14.

The function follows the same rules for empty substrings as the position function. If the needle is an empty string, these rules apply:

If no start_pos was specified: return 1
If start_pos = 0: return 1
If start_pos >= 1 and start_pos <= length(haystack) + 1: return start_pos
Otherwise: return 0

This function is useful when you need to find a substring within a larger string without worrying about the case of the characters, such as searching for ingredient names in a recipe database regardless of capitalization.

positionUTF8

Returns the position (in Unicode code points, starting from 1) of a substring in a string, assuming both the substring and the string are UTF-8 encoded.

Syntax

positionUTF8(haystack, needle[, start_pos])

Arguments

haystack (String): The string to search in.
needle (String): The substring to search for.
start_pos (UInt, optional): The position (1-based) in haystack at which the search starts.

Returns

The starting position in Unicode code points (1-based) if the substring was found.
0 if the substring was not found.

Example

SELECT
  positionUTF8('Jalapeño and Habanero Tacos', 'ñ');

Result:

| positionUTF8('Jalapeño and Habanero Tacos', 'ñ') |
|--------------------------------------------------|
| 6                                                |

In this example, positionUTF8 correctly counts the Unicode character ‘ñ’ as a single code point, returning its position as 6.

If the needle substring is empty, these rules apply:

If no start_pos was specified: return 1
If start_pos = 0: return 1
If start_pos >= 1 and start_pos <= length(haystack) + 1: return start_pos
Otherwise: return 0

positionCaseInsensitiveUTF8

Returns the position (in Unicode code points) of a substring in a string, ignoring case and assuming UTF-8 encoding.

Syntax

positionCaseInsensitiveUTF8(haystack, needle[, start_pos])

Arguments

haystack (String): The string to search in.
needle (String): The substring to search for.
start_pos (UInt, optional): The position to start searching from (1-based).

Returns

The position (in Unicode code points) of the first occurrence of needle in haystack starting from start_pos, if found.
0, if needle is not found.

Type: UInt64

Example

SELECT
  positionCaseInsensitiveUTF8('Jalapeño and Habanero Tacos', 'ñO');

Result:

| positionCaseInsensitiveUTF8('Jalapeño and Habanero Tacos', 'ñO') |
|------------------------------------------------------------------|
| 6                                                                |

This example finds the position of ‘ñO’ (case-insensitive) in the UTF-8 encoded string, returning 6 as the position of ‘ño’ in ‘Jalapeño’.

The function is case-insensitive, so ‘A’ and ‘a’ are considered the same.
The function counts positions in Unicode code points, not bytes.
If needle is empty, the function returns 1.
If start_pos is 0, it’s treated as 1.
If start_pos is greater than the length of haystack, the function returns 0.

This function is useful for searching within text that may contain non-ASCII characters and where case should be ignored, such as when dealing with user-generated content or multilingual data.

multiSearchAllPositions

Returns an array of positions (in bytes, starting from 1) of substrings in a string.

Syntax

multiSearchAllPositions(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): String to search in.
needle (Array(String)): Array of substrings to search for.

Returns:

Array of positions.
For each needle:
- If found, the position (in bytes from 1) of the first occurrence.
- If not found, 0.

Type: Array(UInt64)

Example

Query:

SELECT
	multiSearchAllPositions(
    'Carne asada tacos are delicious!',
    ['taco', 'salsa', 'guacamole']
  ) AS positions

Result:

| positions     |
|---------------|
| [13,0,0]      |

In this example:

‘taco’ is found at position 13
‘salsa’ and ‘guacamole’ are not found, so their positions are 0

The function supports up to 2^8 needles.

multiSearchAllPositionsCaseInsensitive

Returns an array of positions (in bytes, starting from 1) of all occurrences of the needle substrings in the haystack string, ignoring case.

Syntax

multiSearchAllPositionsCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): String in which to search.
needle (Array(String)): Array of substrings to search for.

Returns:

Array of the same size as the needle array, where each element is:
- The position (in bytes) of the corresponding needle in haystack, if found.
- 0 if the substring was not found.

[Array(UInt64)]

Example

SELECT
	multiSearchAllPositionsCaseInsensitive(
    'The quick brown fox jumps over the lazy dog',
    ['THE', 'dog', 'quick', 'TACO']
) AS positions

Result:

| positions     |
|---------------|
| [1, 41, 5, 0] |

In this example:

‘THE’ is found at position 1 (case-insensitive match with ‘The’)
‘dog’ is found at position 41
‘quick’ is found at position 5
‘TACO’ is not found, so its position is 0

This function is useful for simultaneously searching for multiple substrings in a given text, ignoring case differences. It can be particularly handy in text analysis, log parsing, or when working with user-generated content where case sensitivity is not important.

The function supports up to 2^8 (256) needles.

multiSearchAllPositionsUTF8

Searches for multiple substrings in a UTF-8 encoded string and returns an array of positions.

Syntax

multiSearchAllPositionsUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): UTF-8 encoded string in which to search.
needle (Array(String)): Array of UTF-8 encoded substrings to search for.

Returns

An array of the same size as the needle array, where each element is:
- The position (in bytes, starting from 1) of the corresponding needle if found.
- 0 if the substring was not found.

[Array(UInt64)]

Example

SELECT
	multiSearchAllPositionsUTF8(
    'Spicy jalapeño and habanero tacos',
    ['jalapeño', 'habanero', 'salsa']
);

Result:

| multiSearchAllPositionsUTF8('Spicy jalapeño and habanero tacos', ['jalapeño', 'habanero', 'salsa']) |
|-----------------------------------------------------------------------------------------------------|
| [7,21,0]                                                                                            |

In this example:

‘jalapeño’ is found at position 7
‘habanero’ is found at position 21
‘salsa’ is not found, so its position is 0

This function is UTF-8 aware, so it correctly handles multi-byte characters.
The function supports up to 2^8 needles.
If any non-constant haystack or needle arguments are more than 32KB in size, the function will throw an exception.

multiSearchAllPositionsCaseInsensitiveUTF8

Returns an array of positions (in bytes, starting from 1) of all occurrences of multiple needle substrings in a haystack string. The search is case-insensitive and assumes UTF-8 encoding.

Syntax

multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): UTF-8 encoded string in which the search is performed.
needle (Array(String)): Array of UTF-8 encoded substrings to be searched.

Returns

Array of arrays, where each inner array contains:
- The starting positions (in bytes, counting from 1) of the corresponding needle if found.
- 0 if the needle was not found.

Type: Array(Array(UInt64))

Example

SELECT
  multiSearchAllPositionsCaseInsensitiveUTF8(
    'Spicy Taco Tuesday',
    ['taco', 'SPICY', 'wednesday']
  ) AS positions

Result:

| positions     |
|---------------|
| [[7],[1],[0]] |

In this example:

‘taco’ is found at position 7 (case-insensitive)
‘SPICY’ is found at position 1 (case-insensitive)
‘wednesday’ is not found, so 0 is returned

This function supports up to 2^8 needles.
The function is optimized for searching multiple substrings in a single pass through the haystack.
It’s particularly useful for tasks like keyword matching or filtering based on multiple criteria.

multiSearchFirstPosition

Returns the position of the leftmost occurrence of any of the substrings in the haystack string.

Syntax

multiSearchFirstPosition(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): String in which to search.
needle (Array(String)): Array of substrings to search for.

Returns

The position (in bytes) of the leftmost occurrence of any needle substring in the haystack string.
0 if none of the needles are found.

Type: UInt64

Example

SELECT
  multiSearchFirstPosition(
    'I love tacos and burritos',
    ['tacos', 'burritos', 'nachos']
  ) AS position

Result:

| position |
|----------|
| 8        |

In this example, ‘tacos’ is the leftmost occurrence among the search terms, starting at position 8 in the haystack string.

This function is case-sensitive. For case-insensitive searches, use multiSearchFirstPositionCaseInsensitive.

multiSearchFirstPositionCaseInsensitive

Returns the position of the first occurrence of any of the specified substrings in the given string, ignoring case. The search is performed case-insensitively.

Syntax

multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): The string to search in.
needle (Array(String)): Array of substrings to search for.

Returns

The position (in bytes) of the first occurrence of any matched substring (1-based).
0 if none of the substrings are found.

Type: UInt64

Example

SELECT
  multiSearchFirstPositionCaseInsensitive(
    'Crunchy Taco Supreme with Hot Sauce',
    ['TACO', 'sauce', 'CHEESE']
  ) AS position

Result:

| position |
|----------|
| 9        |

In this example, ‘TACO’ is found first at position 9, ignoring case.

This function is useful for efficiently finding the first occurrence of multiple substrings in a given text, without being sensitive to letter case.

multiSearchFirstPositionUTF8

Returns the position of the first occurrence of any of the substrings in the string, assuming UTF-8 encoding.

Syntax

multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): The UTF-8 encoded string to search in.
needle (Array(String)): An array of UTF-8 encoded substrings to search for.

Returns

The position (in bytes) of the leftmost match of any of the needles within the haystack.
0 if none of the needles are found.

Type: UInt64

Example

SELECT
  multiSearchFirstPositionUTF8(
    'Spicy jalapeño and habanero tacos',
    ['jalapeño', 'habanero', 'salsa']
  ) AS position

Result:

| position |
|----------|
| 7        |

In this example, ‘jalapeño’ is found at position 7 (counting bytes, not characters) in the UTF-8 encoded string.

This function is case-sensitive.
If you need case-insensitive search, use multiSearchFirstPositionCaseInsensitiveUTF8 instead.
The function returns the position in bytes, not in characters, which is important for UTF-8 strings where characters can occupy multiple bytes.
The function supports up to 2^8 needles in the array.

multiSearchFirstPositionCaseInsensitiveUTF8

Returns the position of the first occurrence of any of the substrings in the string, ignoring case and assuming UTF-8 encoding.

Syntax

multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): The UTF-8 encoded string to search in.
needle (Array(String)): Array of UTF-8 encoded substrings to search for.

Returns:

The position (in bytes) of the leftmost match of any of the needles.
0 if none of the needles are found.

Type: UInt64

Example

SELECT
  multiSearchFirstPositionCaseInsensitiveUTF8(
    'Carnitas Tacos, Carne Asada Tacos, Pescado Tacos',
    ['TACO', 'BURRITO']
  ) AS position

Result:

| position |
|----------|
| 9        |

This function finds the first occurrence of either ‘TACO’ or ‘BURRITO’ (case-insensitive) in the UTF-8 encoded string of taco types. It returns 9, which is the byte position of the first ‘Tacos’ in ‘Carnitas Tacos’.

The function is case-insensitive, so ‘TACO’ matches ‘Tacos’.
The search is performed on UTF-8 encoded strings, so it correctly handles multi-byte characters.
If none of the needles are found, the function returns 0.
The function supports up to 2^8 needles in the array.

This function is useful for efficiently finding the first occurrence of multiple substrings in a larger string, especially when dealing with case-insensitive, UTF-8 encoded text.

multiSearchFirstIndex

Returns the index of the first matching needle string found in the haystack string.

Syntax

multiSearchFirstIndex(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): String to search in.
needle (Array(String)): Array of strings to search for.

Returns

The index (1-based) of the first matching needle string.
0 if none of the needle strings are found.

Type: UInt8

Example

SELECT
  multiSearchFirstIndex(
    'Carne asada tacos are the best',
    ['tacos', 'burritos', 'quesadillas']
  ) AS position

Result:

| position |
|----------|
| 1        |

In this example, ‘tacos’ is the first needle found in the haystack, so the function returns 1 (the index of ‘tacos’ in the needle array).

The search is case-sensitive.
If you need case-insensitive search, use multiSearchFirstIndexCaseInsensitive instead.
For UTF-8 encoded strings, use multiSearchFirstIndexUTF8.
The function returns as soon as it finds the first match, which can be more efficient than searching for all occurrences.

multiSearchFirstIndexCaseInsensitive

Returns the index of the first matched needle in the haystack string, ignoring case. The index is 1-based, meaning the first position is 1, not 0.

Syntax

multiSearchFirstIndexCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): The string to search in.
needle (Array(String)): Array of substrings to search for.

Returns

The index of the first matched needle (1-based).
0 if none of the needles are found.

Type: UInt8

Example

SELECT
  multiSearchFirstIndexCaseInsensitive(
    'Crunchy Taco Supreme', ['TACO', 'burrito', 'nacho']
  ) AS position

Result:

| position |
|----------|
| 1        |

In this example, ‘TACO’ is the first needle that matches (case-insensitively) in the haystack, and its index is 1.

multiSearchFirstIndexUTF8

Returns the index of the first found needle in the haystack string, assuming both are UTF-8 encoded. The index is 1-based, meaning the first position is 1, not 0.

Syntax

multiSearchFirstIndexUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): UTF-8 encoded string in which to search.
needle (Array(String)): Array of UTF-8 encoded substrings to search for.

Returns

The index (starting from 1) of the leftmost found needle.
0 if none of the needles are found.

Type: UInt8

Example

SELECT
  multiSearchFirstIndexUTF8(
    'Spicy jalapeño and habanero tacos',
    ['jalapeño', 'habanero', 'salsa']
  ) AS position

Result:

| position |
|----------|
| 1        |

In this example, ‘jalapeño’ is the first needle found in the haystack, so the function returns 1.

This function is case-sensitive. For a case-insensitive version, use multiSearchFirstIndexCaseInsensitiveUTF8.

multiSearchFirstIndexCaseInsensitiveUTF8

Returns the index of the first found needle from the array in the haystack string, ignoring case and assuming UTF-8 encoding.

Syntax

multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): UTF-8 encoded string in which to search.
needle (Array(String)): Array of UTF-8 encoded strings to search for.

Returns:

Index (starting from 1) of the leftmost found needle.
0 if none of the needles are found.

Type: UInt8

Example

Query:

SELECT
	multiSearchFirstIndexCaseInsensitiveUTF8(
    'Spicy Taco Tuesday',
    ['TACO', 'burrito', 'NACHO']
  ) AS position

Result:

| position |
|----------|
| 1        |

In this example, ‘TACO’ (index 1) is found first in the haystack, ignoring case.

This function is useful for efficiently finding the first occurrence of multiple strings within a larger text, while ignoring case differences and properly handling UTF-8 encoded strings.

multiSearchAny

Returns 1 if at least one of the strings in the needles array matches the haystack string, and 0 otherwise.

Syntax

multiSearchAny(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): String in which to search.
needles (Array(String)): Array of substrings to search for.

Returns

1 if there was at least one match.
0 if there was no match.

Type: UInt8

Example

SELECT
  multiSearchAny(
    'Carne asada tacos are the best',
    ['burrito', 'taco', 'quesadilla']
  ) AS position

Result:

| position |
|----------|
| 1        |

In this example, the function returns 1 because the substring ‘taco’ is found in the haystack string.

This function is case-sensitive. For case-insensitive searches, use multiSearchAnyCaseInsensitive. For UTF-8 encoded strings, use multiSearchAnyUTF8 or multiSearchAnyCaseInsensitiveUTF8.

multiSearchAnyCaseInsensitive

Returns 1 if at least one of the strings in the needles array matches the haystack string case-insensitively, and 0 otherwise.

Syntax

multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): String in which to search.
needles (Array(String)): Array of strings to search for.

Returns

1 if there was at least one case-insensitive match.
0 if there was no match.

Type: UInt8

Example

Query:

SELECT
  multiSearchAnyCaseInsensitive(
    'Crunchy Taco Supreme',
    ['TACO', 'burrito']
  ) AS position

Result:

| position |
|----------|
| 1        |

In this example, the function returns 1 because ‘TACO’ matches ‘Taco’ in the haystack string, ignoring case.

This function is useful for efficient case-insensitive searches of multiple substrings within a larger string. It’s particularly handy when dealing with user input or when case sensitivity is not important in your search criteria.

multiSearchAnyUTF8

Checks if any of the substrings from the needles array are present in the haystack string. Assumes both haystack and needles are UTF-8 encoded strings.

Syntax

multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): UTF-8 encoded string to search in.
needles (Array(String)): Array of UTF-8 encoded substrings to search for.

Returns

1 if at least one match is found.
0 if no matches are found.

Type: UInt8

Example

Query:

SELECT
  multiSearchAnyUTF8(
    'Carne asada tacos are the best',
    ['burrito', 'taco', 'quesadilla']
  ) AS position

Result:

| position |
|----------|
| 1        |

In this example, the function returns 1 because the substring ‘taco’ is found in the haystack string.

This function is case-sensitive. For a case-insensitive version, use multiSearchAnyCaseInsensitiveUTF8.

multiSearchAnyCaseInsensitiveUTF8

Checks if any of the UTF-8 encoded substrings are present in the given UTF-8 encoded string, ignoring case.

Syntax

multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

haystack (String): UTF-8 encoded string in which to search.
needle (Array(String)): Array of UTF-8 encoded substrings to search for.

Returns

1 if any of the substrings are found (ignoring case).
0 if none of the substrings are found.

Type: UInt8

Example

Query:

SELECT
  multiSearchAnyCaseInsensitiveUTF8(
    'Crunchy Taco Supreme',
    ['SUPREME', 'burrito', 'nacho']
  ) AS position

Result:

| position |
|----------|
| 1        |

In this example, the function returns 1 because ‘SUPREME’ is found in the haystack ‘Crunchy Taco Supreme’, ignoring case.

This function is case-insensitive and works with UTF-8 encoded strings. It’s useful for checking the presence of multiple substrings in a given text, regardless of their case or Unicode representation.

match

Checks whether a string matches a regular expression pattern.

Syntax

match(haystack, pattern)

Alias:

REGEXP

Arguments

haystack (String): The string to search in.
pattern (String): The regular expression pattern with re2 syntax.

Returns

1 if the string matches the pattern.
0 if the string does not match the pattern.

[UInt8]

The function uses UTF-8 encoding. The regular expression must not contain null bytes.
Unlike re2’s default behavior, . matches line breaks. To disable this, prepend the pattern with (?-s).
For simple substring searches, consider using like or position functions for better performance.

Example

SELECT
	match('Crunchy Taco Supreme', '.*Taco.*') AS has_taco,
	match('Burrito Bowl', '.*Taco.*') AS no_taco;

Result:

| has_taco | no_taco |
|----------|---------|
| 1        | 0       |

This example checks if the menu items contain the word “Taco”.

Alias

The REGEXP operator is an alias for match:

haystack REGEXP pattern

multiMatchAllIndices

Returns an array of all indices that match the haystack string using multiple regular expression patterns.

Syntax

multiMatchAllIndices(haystack, [pattern1, pattern2, ..., patternN])

Arguments

haystack (String): String to search in.
pattern1, pattern2, ..., patternN (Array(String)): Regular expressions with re2 syntax.

Returns

Array of indices (1-based) of patterns that match the haystack, in the order they appear in the haystack. (Array(UInt64))

Example

SELECT
	multiMatchAllIndices(
		'The quick brown fox jumps over the lazy dog',
		['quick', 'brown', 'lazy', 'taco']
	) AS matched_indices;

Result:

| matched_indices |
|-----------------|
| [1,2,8]         |

In this example:

Index 1 corresponds to ‘quick’
Index 2 corresponds to ‘brown’
Index 8 corresponds to ‘lazy’
‘taco’ is not found, so it’s not included in the result

extract

Extracts a fragment of a string using a regular expression. If the haystack does not match the pattern, an empty string is returned.

For regex without subpatterns, the function uses the fragment that matches the entire regex. Otherwise, it uses the fragment that matches the first subpattern.

Syntax:

extract(haystack, pattern)

Arguments:

haystack (String): String in which the search is performed.
pattern (String): Regular expression with re2 syntax.

Returns:

The matched fragment, or an empty string if no match is found. [String]

Example:

SELECT
	extract('Spicy beef taco: $5.99', '\\$\\d+\\.\\d+') AS price;

Result:

| price |
|-------|
| $5.99 |

In this example, extract finds the price pattern in the taco description string.

If you only want to search for substrings in a string, consider using functions like position or like instead, as they are generally faster than regular expression-based functions.

extractAll

Extracts all fragments of a string that match a regular expression pattern.

Syntax:

extractAll(haystack, pattern)

Arguments:

haystack (String) - The string to search in.
pattern (String) - The regular expression pattern to match.

Returns:

An array of strings containing all matches of the pattern in the haystack. If no matches are found, an empty array is returned. [Array(String)]

Example:

SELECT
	extractAll('I love tacos, burritos, and quesadillas', '\\w+dillas?') AS mexican_dishes;

Result:

| mexican_dishes  |
|-----------------|
| ['quesadillas'] |

In this example, extractAll finds all words ending in “dillas” or “dilla” in the given string about Mexican food.

The regular expression pattern uses \w+ to match one or more word characters, followed by “dillas” with an optional “s” at the end. The backslashes are escaped in the SQL string.

extractAllGroupsHorizontal

Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.

Syntax:

extractAllGroupsHorizontal(haystack, pattern)

Arguments:

haystack (String) — Input string.
pattern (String) — Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown.

Returns:

Array of arrays of matches. (Array)

If haystack does not match the pattern regex, an array of empty arrays is returned.

Example:

SELECT
	extractAllGroupsHorizontal(
    'Carne asada: $10, Al pastor: $12, Chorizo: $11',
    '(\w+\s?\w+):\s?\$(\d+)'
  ) AS taco_info;

Result:

| taco_info                                                |
|----------------------------------------------------------|
| [['Carne asada','Al pastor','Chorizo'],['10','12','11']] |

In this example:

The first array contains all matched taco names: ['Carne asada','Al pastor','Chorizo']
The second array contains all matched prices: ['10','12','11']

This function is useful for extracting structured data from strings, such as parsing menu items and prices from a text description.

extractAllGroupsVertical

Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the haystack.

Syntax:

extractAllGroupsVertical(haystack, pattern)

Arguments:

haystack (String): Input string.
pattern (String): Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown.

Returns:

Array of arrays of matches. (Array)

If haystack does not match the pattern regex, an empty array is returned.

Example:

SELECT
	extractAllGroupsVertical(
    'Taco Tuesday: $5 tacos, $3 guacamole, $2 salsa',
    '(\$\d+)\s+(\w+)'
  ) AS taco_info;

Result:

| taco_info                                                                                 |
|-------------------------------------------------------------------------------------------|
| [['$5','tacos'],['$3','guacamole'],['$2','salsa']]                                        |

In this example:

The function extracts all price-item pairs from the taco special menu.
Each inner array contains the price (first group) and the item name (second group).
The results are grouped vertically, meaning each inner array represents a complete match of both groups.

This function is particularly useful when you need to extract and group multiple related pieces of information from a string using a regular expression with multiple capture groups.

like

Checks if a string matches a pattern.

Syntax

like(haystack, pattern)

Alias:

haystack LIKE pattern (operator)

Arguments

haystack (String): The string to search in.
pattern (String): The LIKE pattern to match against.

Returns

1 if the string matches the pattern, 0 otherwise. [UInt8]

The LIKE pattern can contain the following special characters:
- % matches any number of characters (including zero characters)
- _ matches any single character
- \ escapes special characters (%, _, and \)
Matching is based on UTF-8 encoding
For patterns of the form %needle%, the function is as fast as the position function
Other patterns are internally converted to regular expressions

Example

SELECT
	'Carne Asada' AS taco_type,
	like(taco_type, '%Asada') AS ends_with_asada,
	like(taco_type, 'Carne%') AS starts_with_carne,
	like(taco_type, '%rne A%') AS contains_rne_a

Result:

| taco_type   | ends_with_asada | starts_with_carne | contains_rne_a |
|-------------|-----------------|-------------------|----------------|
| Carne Asada | 1               | 1                 | 1              |

In this example, we check if the taco type ‘Carne Asada’ matches various LIKE patterns.

To match against literal %, _, and \ characters, prepend them with a backslash: \%, \_, and \\. Note that in SQL strings, backslashes must be escaped as well, so you would actually write \\%, \\_, and \\\\.

notLike

Returns whether a string does not match a pattern. This function is the negation of the like function.

Syntax:

notLike(haystack, pattern)

Alias:

haystack NOT LIKE pattern (operator)

Arguments:

haystack (String): The string to search in.
pattern (String): The pattern to search for, which can contain the following wildcards:
- % — Matches any number of characters (including zero characters)
- _ — Matches any single character

Returns:

1 if the string does not match the pattern.
0 if the string matches the pattern.

[UInt8]

Example:

SELECT
	notLike('Spicy Beef Taco', 'Spicy%') AS not_spicy,
	notLike('Veggie Taco', 'Spicy%') AS not_spicy_veggie;

Result:

| not_spicy | not_spicy_veggie |
|-----------|------------------|
| 0         | 1                |

In this example:

'Spicy Beef Taco' matches the pattern 'Spicy%', so notLike returns 0.
'Veggie Taco' does not match the pattern 'Spicy%', so notLike returns 1.

This function is useful for filtering out rows that do not match a certain pattern.

The notLike function is case-sensitive. For case-insensitive matching, use notILike.

ilike

Performs a case-insensitive pattern matching operation.

Syntax:

ilike(haystack, pattern)

Alias:

haystack ILIKE pattern (operator)

Arguments:

haystack (String): The string to search in.
pattern (String): The LIKE pattern to match against.

Returns:

Returns 1 if the string matches the pattern, 0 otherwise. [UInt8]

Example:

SELECT
	taco_name,
	ilike(taco_name, '%spicy%') AS is_spicy
FROM
	taco_menu;

Result:

| taco_name         | is_spicy |
|-------------------|----------|
| Spicy Beef Taco   | 1        |
| Mild Chicken Taco | 0        |
| SPICY Pork Taco   | 1        |

In this example, ilike checks if each taco_name contains the word “spicy” (case-insensitive). The function returns 1 for names containing “spicy” in any case, and 0 otherwise.

The ilike function is similar to like, but it ignores case when matching. It’s particularly useful when you need to perform case-insensitive string matching operations in your queries.

The ilike function follows the same pattern matching rules as like, where % matches any number of characters and _ matches any single character.

notILike

Returns whether the string haystack does not match the case-insensitive LIKE expression pattern.

Syntax:

notILike(haystack, pattern)

Alias:

haystack NOT ILIKE pattern (operator)

Arguments:

haystack (String): The string to search in.
pattern (String): The LIKE expression pattern to match against.

Returns:

1 if haystack does not match pattern, 0 otherwise. [UInt8]

Example:

SELECT
  notILike('Spicy Taco', '%taco%') AS result;

Result:

| result |
|--------|
| 0      |

In this example, notILike returns 0 because ‘Spicy Taco’ does match the case-insensitive pattern ‘%taco%’.

The ILIKE pattern matching is case-insensitive.
% matches any number of characters (including zero characters).
_ matches any single character.
To escape special characters (%, _, \), precede them with a backslash (\).
This function is the negation of the ilike function.

When working with literal backslashes in SQL strings, you may need to escape them twice: once for the SQL string literal, and once for the LIKE pattern. For example, to match a literal underscore, you might use '\\\_'.

ngramDistance

Calculates the 4-gram distance between two strings. Returns a value between 0 and 1, where a smaller value indicates more similar strings.

Syntax

ngramDistance(haystack, needle)

Arguments

haystack (String) — First string for comparison.
needle (String) — Second string for comparison.

Returns:

A value between 0 and 1 representing the similarity between the two strings. (Float32)

Details

The function counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. A result closer to 0 indicates more similar strings.

This function will throw an exception if constant needle or haystack arguments are more than 32Kb in size. For non-constant arguments larger than 32Kb, the distance is always 1.

Example

SELECT
  ngramDistance('Carne Asada Taco', 'Carnitas Taco');

Result:

| ngramDistance('Carne Asada Taco', 'Carnitas Taco') |
|----------------------------------------------------|
| 0.5555556                                          |

In this example, the 4-gram distance between “Carne Asada Taco” and “Carnitas Taco” is calculated. The result of 0.5555556 indicates a moderate level of similarity between the two taco names.

See Also

ngramDistanceCaseInsensitive
ngramDistanceUTF8
ngramDistanceCaseInsensitiveUTF8

ngramDistanceCaseInsensitive

Calculates the 4-gram distance between two strings, ignoring case. Returns a value between 0 and 1, where 0 indicates identical strings and 1 indicates completely different strings.

Syntax

ngramDistanceCaseInsensitive(haystack, needle)

Arguments

haystack (String): First string for comparison.
needle (String): Second string for comparison.

Returns

The 4-gram distance between the strings, ignoring case. (Float32)

Example

SELECT
  ngramDistanceCaseInsensitive('Spicy Beef Taco', 'spicy chicken taco');

Result:

| ngramDistanceCaseInsensitive('Spicy Beef Taco', 'spicy chicken taco') |
|-----------------------------------------------------------------------|
| 0.23529412                                                            |

This example compares two taco descriptions, ignoring case. The result (0.23529412) indicates that the strings are quite similar, with the main difference being “Beef” vs “chicken”.

The function is case-insensitive, so “Taco” and “taco” are treated as identical.
If constant arguments are more than 32KB in size, an exception is thrown.
For non-constant arguments larger than 32KB, the distance is always 1.

This function is useful for fuzzy string matching and comparing similar strings while ignoring differences in capitalization.

ngramDistanceUTF8

Calculates the 4-gram distance between two UTF-8 encoded strings.

Syntax:

ngramDistanceUTF8(haystack, needle)

Arguments:

haystack (String): First UTF-8 encoded string for comparison.
needle (String): Second UTF-8 encoded string for comparison.

Returns:

A value between 0 and 1 representing the similarity between the two strings. The smaller the result, the more similar the strings are to each other. [Float32]

Example:

SELECT
  ngramDistanceUTF8('Spicy Carnitas Taco', 'Crunchy Fish Taco');

Result:

| ngramDistanceUTF8('Spicy Carnitas Taco', 'Crunchy Fish Taco') |
|---------------------------------------------------------------|
| 0.6666667                                                     |

This example calculates the 4-gram distance between two different taco names. The result of 0.6666667 indicates that the strings are somewhat different, but have some similarity.

Implementation details:

The function uses the 3-gram distance for UTF-8 strings. It’s not a perfectly fair n-gram distance. The function uses 2-byte hashes to hash n-grams and then calculates the symmetric difference between these hash tables – collisions may occur.

If constant haystack or needle arguments are more than 32KB in size, an exception will be thrown. For non-constant arguments exceeding 32KB, the distance is always 1.

This function is useful for fuzzy string matching and comparison, especially when dealing with UTF-8 encoded text such as taco menu items in different languages or with special characters.

ngramDistanceCaseInsensitiveUTF8

Calculates the 4-gram distance between two UTF-8 encoded strings, ignoring case. Returns a value between 0 and 1, where 0 indicates identical strings and 1 indicates completely different strings.

Syntax

ngramDistanceCaseInsensitiveUTF8(haystack, needle)

Arguments

haystack (String): First UTF-8 encoded string for comparison.
needle (String): Second UTF-8 encoded string for comparison.

Returns

A value between 0 and 1 representing the similarity between the two strings. [Float32]

Example

SELECT
  ngramDistanceCaseInsensitiveUTF8('Crunchy Taco', 'crunchy taco');

Result:

| ngramDistanceCaseInsensitiveUTF8('Crunchy Taco', 'crunchy taco') |
|------------------------------------------------------------------|
| 0                                                                |

This example demonstrates that the function is case-insensitive, returning 0 for identical strings that differ only in case.

This function uses the 3-gram distance for UTF-8 strings.
The comparison is not a perfectly fair n-gram distance. It uses 2-byte hashes to hash n-grams and then calculates the symmetric difference between these hash tables, which may result in collisions.
For case-insensitive UTF-8 comparison, it doesn’t use a fair tolower function. Instead, it zeroes the 5th bit (starting from zero) of each codepoint byte and the first bit of the zeroth byte if there are more than one byte. This works for Latin and most Cyrillic letters.

If you need to compare strings without considering UTF-8 encoding or case sensitivity, you can use the ngramDistance function instead.

ngramSearch

Calculates the non-symmetric difference between a needle string and a haystack string. This function is useful for fuzzy string search.

Syntax

ngramSearch(haystack, needle)

Arguments

haystack (String): String in which the search is performed.
needle (String): String to be searched.

Returns

A value between 0 and 1 representing the likelihood of the needle being in the haystack. The bigger the result, the more likely needle is in haystack. (Float32)

Example

SELECT
  ngramSearch('Crunchy Taco Supreme', 'Supreme Taco');

Result:

| ngramSearch('Crunchy Taco Supreme', 'Supreme Taco') |
|-----------------------------------------------------|
| 0.6666667                                           |

In this example, we calculate the likelihood of finding ‘Supreme Taco’ in ‘Crunchy Taco Supreme’. The result of 0.6666667 indicates a high likelihood of a match.

The function calculates the number of n-grams from the needle minus the common number of n-grams normalized by the number of needle n-grams. It uses 4-grams by default.

This function is part of a family of ngram functions. Case-insensitive and UTF-8 variants are available: ngramSearchCaseInsensitive, ngramSearchUTF8, and ngramSearchCaseInsensitiveUTF8.

ngramSearchCaseInsensitive

Calculates the non-symmetric difference between a needle string and a haystack string, ignoring case. Returns a value between 0 and 1, where a higher value indicates a higher likelihood of the needle being in the haystack.

This function is useful for fuzzy string searching.

Syntax

ngramSearchCaseInsensitive(haystack, needle)

Arguments

haystack (String): String in which to search.
needle (String): Substring to search for.

Returns:

A value between 0 and 1. (Float32)
- Closer to 0: Strings are less similar.
- Closer to 1: Strings are more similar or likely to contain the needle.

Example

SELECT
  ngramSearchCaseInsensitive('Crunchy Taco Supreme', 'taco');

Result:

| ngramSearchCaseInsensitive('Crunchy Taco Supreme', 'taco') |
|-----------------------------------------------------------|
| 1                                                         |

This example shows that ‘taco’ is found in ‘Crunchy Taco Supreme’, ignoring the case difference.

The function uses 4-grams for comparison.
For UTF-8 strings, consider using ngramSearchCaseInsensitiveUTF8 instead.
If constant needle or haystack arguments are more than 32KB in size, an exception is thrown.
For non-constant arguments larger than 32KB, the result is always 1.

See also: ngramSearch, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8

ngramSearchUTF8

Calculates the non-symmetric difference between a needle string and a haystack string, assuming both are UTF-8 encoded. This function is useful for fuzzy string searching.

Syntax

ngramSearchUTF8(haystack, needle)

Arguments

haystack (String): UTF-8 encoded string in which to search.
needle (String): UTF-8 encoded string to search for.

Returns

A value between 0 and 1 representing the likelihood of the needle being in the haystack. The larger the result, the more likely the needle is present. (Float32)

Details

This function uses 3-gram distances for UTF-8 strings.
The implementation uses 2-byte hashes to hash n-grams and calculates the non-symmetric difference between these hash tables.
Collisions may occur, which can affect the accuracy of the result.

Example

SELECT
  ngramSearchUTF8('Spicy Jalapeño Taco', 'jalapeño');

Result:

| ngramSearchUTF8('Spicy Jalapeño Taco', 'jalapeño') |
|----------------------------------------------------|
| 0.75                                               |

In this example, the function returns a high value (0.75) indicating a strong likelihood that ‘jalapeño’ is present in the haystack string, despite the difference in case.

This function is more computationally intensive than exact string matching functions. Use it when fuzzy matching is required, such as for user input correction or similarity searches in text data.

ngramSearchCaseInsensitiveUTF8

Performs a case-insensitive n-gram search on UTF-8 encoded strings, calculating the likelihood of a needle being present in a haystack.

Syntax

ngramSearchCaseInsensitiveUTF8(haystack, needle)

Arguments

haystack (String): UTF-8 encoded string to search in.
needle (String): UTF-8 encoded substring to search for.

Returns

A value between 0 and 1 representing the likelihood of the needle being in the haystack. (Float32)

The larger the result, the more likely the needle is present in the haystack.

Example

SELECT
  ngramSearchCaseInsensitiveUTF8('Spicy Carnitas Tacos', 'TACO') AS search_result;

Result:

| search_result |
|---------------|
| 0.625         |

This example searches for ‘TACO’ in ‘Spicy Carnitas Tacos’, ignoring case. The result of 0.625 indicates a high likelihood of the word being present.

This function uses 3-gram distances for UTF-8 strings.
The case-insensitive comparison is not perfectly fair for all Unicode characters.
For non-Latin scripts, the function may not provide accurate results in all cases.

This function is useful for fuzzy string searching in multilingual contexts where case sensitivity is not important.

countSubstrings

Returns the number of occurrences of a substring in a string.

Syntax

countSubstrings(haystack, needle[, start_pos])

Arguments

haystack (String): String in which to search.
needle (String): Substring to search for.
start_pos (UInt, optional): Position (1-based) in haystack at which to start the search.

Returns

The number of occurrences. (UInt64)

Examples

Count occurrences of “taco” in a string:

SELECT
	countSubstrings('I love tacos. Tacos are great!', 'taco');

Result:

| countSubstrings('I love tacos. Tacos are great!', 'taco') |
|-----------------------------------------------------------|
| 2                                                         |

Using the start_pos argument:

SELECT
	countSubstrings('Taco Tuesday: taco, taco, taco!', 'taco', 15);

Result:

| countSubstrings('Taco Tuesday: taco, taco, taco!', 'taco', 15) |
|-----------------------------------------------------------------|
| 3                                                               |

This function is case-sensitive. For case-insensitive counting, use countSubstringsCaseInsensitive.

countSubstringsCaseInsensitive

Returns how often a substring needle occurs in a string haystack, ignoring case.

Syntax

countSubstringsCaseInsensitive(haystack, needle[, start_pos])

Arguments

haystack (String): String in which the search is performed.
needle (String): Substring to be searched.
start_pos (UInt, optional): Position (1-based) in haystack at which the search starts.

Returns

The number of occurrences. (UInt64)

Examples

Query:

SELECT
	countSubstringsCaseInsensitive('Crunchy Taco Supreme', 'taco');

Result:

| countSubstringsCaseInsensitive('Crunchy Taco Supreme', 'taco') |
|----------------------------------------------------------------|
| 1                                                              |

Example with start_pos argument:

Query:

SELECT
	countSubstringsCaseInsensitive('taco_taco_burrito_taco', 'taco', 6);

Result:

| countSubstringsCaseInsensitive('taco_taco_burrito_taco', 'taco', 6)  |
|----------------------------------------------------------------------|
| 2                                                                    |

In this example, the search starts from the 6th character, skipping the first occurrence of ‘taco’.

countSubstringsCaseInsensitiveUTF8

Returns how often a substring needle occurs in a string haystack. Ignores case and assumes that haystack is a UTF-8 string.

Syntax

countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])

Arguments

haystack (String): UTF-8 string in which the search is performed.
needle (String): Substring to be searched.
start_pos (UInt, optional): Position (1-based) in haystack at which the search starts.

Returns

The number of occurrences. (UInt64)

Example

Query:

SELECT
	countSubstringsCaseInsensitiveUTF8('Crunchy Taco, Soft Taco, Super Taco', 'taco');

Result:

| countSubstringsCaseInsensitiveUTF8('Crunchy Taco, Soft Taco, Super Taco', 'taco') |
|-----------------------------------------------------------------------------------|
| 3                                                                                 |

Example with start_pos argument:

Query:

SELECT
	countSubstringsCaseInsensitiveUTF8('Crunchy Taco, Soft Taco, Super Taco', 'taco', 15);

Result:

| countSubstringsCaseInsensitiveUTF8('Crunchy Taco, Soft Taco, Super Taco', 'taco', 15) |
|---------------------------------------------------------------------------------------|
| 2                                                                                     |

In this example, the function starts searching from the 15th character, which is after the first “Taco”, so it only counts the last two occurrences.

countMatches

Returns the number of regular expression matches for a pattern in a string.

Syntax:

countMatches(haystack, pattern)

Arguments:

haystack (String): The string to search in.
pattern (String): The regular expression with re2 syntax.

Returns:

The number of matches. (UInt64)

Example:

SELECT
  countMatches('Taco Tuesday: 2 tacos for $5, 3 tacos for $7', 'taco') AS match_count;

Result:

| match_count |
|-------------|
| 3           |

In this example, countMatches finds three occurrences of the word “taco” (case-sensitive) in the given string about a Taco Tuesday special.

The function is case-sensitive. For case-insensitive matching, use countMatchesCaseInsensitive.

countMatchesCaseInsensitive

Returns the number of case-insensitive regular expression matches for a pattern in a string.

Syntax:

countMatchesCaseInsensitive(haystack, pattern)

Arguments:

haystack (String): The string to search in.
pattern (String): The regular expression with re2 syntax.

Returns:

The number of matches. (UInt64)

Example:

SELECT
  countMatchesCaseInsensitive('Crunchy Taco Supreme', 'taco');

Result:

| countMatchesCaseInsensitive('Crunchy Taco Supreme', 'taco') |
|-------------------------------------------------------------|
| 1                                                           |

This function is similar to countMatches, but it performs a case-insensitive search. It’s useful when you need to count occurrences of a pattern regardless of letter case, such as finding mentions of “taco” in menu item names.

regexpExtract

Extracts the first string in a haystack that matches a regular expression pattern and corresponds to the specified regex group index.

Syntax:

regexpExtract(haystack, pattern[, index])

Alias:

REGEXP_EXTRACT(haystack, pattern[, index])

Arguments:

haystack (String): String in which the regexp pattern will be matched.
pattern (String): Regular expression. Must be a constant string.
index (UInt or Int, optional): An integer greater than or equal to 0, with a default value of 1. Represents which regex group to extract.

Returns:

The extracted substring matching the specified group. (String)

If pattern contains multiple regex groups, index indicates which group to extract.
An index of 0 means matching the entire regular expression.

Example:

SELECT
	regexpExtract('Carne Asada Taco: $3.99, Fish Taco: $4.99', '(\w+)\s+Taco:\s+\$(\d+\.\d+)', 1) AS taco_type,
	regexpExtract('Carne Asada Taco: $3.99, Fish Taco: $4.99', '(\w+)\s+Taco:\s+\$(\d+\.\d+)', 2) AS taco_price,
	regexpExtract('Carne Asada Taco: $3.99, Fish Taco: $4.99', '(\w+)\s+Taco:\s+\$(\d+\.\d+)', 0) AS full_match;

Result:

| taco_type   | taco_price | full_match              |
|-------------|------------|-------------------------|
| Carne Asada | 3.99       | Carne Asada Taco: $3.99 |

In this example:

The first regexpExtract call extracts the taco type (group 1).
The second call extracts the price (group 2).
The third call with index 0 returns the entire matched substring.

hasSubsequence

Returns whether a string contains a given subsequence.

Syntax

hasSubsequence(haystack, needle)

Arguments

haystack (String): String to search in.
needle (String): Subsequence to search for.

Returns

1 if needle is a subsequence of haystack, 0 otherwise. [UInt8]

A subsequence is a sequence that can be derived from the given string by deleting zero or more elements without changing the order of the remaining elements.

Example

SELECT
  hasSubsequence('Taco Tuesday', 'aco');

Result:

| hasSubsequence('Taco Tuesday', 'aco') |
|---------------------------------------|
| 1                                     |

This example checks if ‘aco’ is a subsequence of ‘Taco Tuesday’. The result is 1 (true) because ‘aco’ can be derived from ‘Taco Tuesday’ by deleting some characters while maintaining the order of the remaining ones.

This function is case-sensitive. For case-insensitive matching, use hasSubsequenceCaseInsensitive.

hasSubsequenceCaseInsensitive

Returns whether a string contains a given subsequence, ignoring case.

Syntax

hasSubsequenceCaseInsensitive(haystack, needle)

Arguments

haystack (String): The string to search in.
needle (String): The subsequence to search for.

Returns

1 if needle is a subsequence of haystack (ignoring case), 0 otherwise. [UInt8]

Example

SELECT
  hasSubsequenceCaseInsensitive('Crunchy Taco Supreme', 'taco');

Result:

| hasSubsequenceCaseInsensitive('Crunchy Taco Supreme', 'taco') |
|---------------------------------------------------------------|
| 1                                                             |

This function is useful for case-insensitive substring matching, particularly when you need to check if a certain sequence of characters appears in a string, regardless of their case.

hasSubsequenceUTF8

Returns 1 if needle is a subsequence of haystack, or 0 otherwise. This function assumes haystack and needle are UTF-8 encoded strings. A subsequence is a sequence that can be derived from the given string by deleting zero or more elements without changing the order of the remaining elements.

Syntax

hasSubsequenceUTF8(haystack, needle)

Arguments

haystack (String): String in which the search is performed.
needle (String): Subsequence to be searched.

Returns

1, if needle is a subsequence of haystack
0, otherwise

Type: UInt8

Example

Query:

SELECT
  hasSubsequenceUTF8('Taco Tuesday: Tasty Tacos!', 'Tasty Tacos');

Result:

| hasSubsequenceUTF8('Taco Tuesday: Tasty Tacos!', 'Tasty Tacos')  |
|------------------------------------------------------------------|
| 1                                                                |

This example checks if ‘Tasty Tacos’ is a subsequence of the given string, which it is, so the function returns 1.

hasSubsequenceCaseInsensitiveUTF8

Checks if a given substring is present as a case-insensitive subsequence in a UTF-8 encoded string.

Syntax

hasSubsequenceCaseInsensitiveUTF8(haystack, needle)

Arguments

haystack (String): The UTF-8 encoded string to search in.
needle (String): The UTF-8 encoded subsequence to search for.

Returns

1 if needle is a case-insensitive subsequence of haystack, 0 otherwise. [UInt8]

Example

SELECT
  hasSubsequenceCaseInsensitiveUTF8('Spicy Taco Tuesday', 'taco');

Result:

| hasSubsequenceCaseInsensitiveUTF8('Spicy Taco Tuesday', 'taco')  |
|------------------------------------------------------------------|
| 1                                                                |

This function is useful for case-insensitive substring matching in UTF-8 encoded text, such as finding menu items or ingredients in multilingual taco descriptions.

The function assumes both haystack and needle are valid UTF-8 encoded strings. If they are not, the behavior is undefined.

hasToken

Checks if a given token is present in a string.

Syntax

hasToken(haystack, token)

Arguments

haystack (String): String in which to search.
token (String): Token to search for. Must be a constant string.

Returns

1 if the token is present in the haystack.
0 if the token is not present.

Type: UInt8

Description

A token is defined as a maximal length substring between two non-alphanumeric ASCII characters (or the boundaries of the haystack string).

This function is optimized for performance and is supported by the tokenbf_v1 index specialization.

Example

SELECT
  hasToken('Crunchy Taco Supreme', 'Taco');

Result:

| hasToken('Crunchy Taco Supreme', 'Taco') |
|------------------------------------------|
| 1                                        |

In this example, ‘Taco’ is found as a token within the ‘Crunchy Taco Supreme’ string, so the function returns 1.

The function is case-sensitive. Use hasTokenCaseInsensitive for case-insensitive matching.

hasTokenOrNull

Returns whether a given token is present in a string, with special handling for ill-formed tokens.

Syntax

hasTokenOrNull(haystack, token)

Arguments

haystack (String) — String in which to search.
token (String) — Token to search for. Must be a constant string.

Returns

1 if the token is present in the haystack.
0 if the token is not present in the haystack.
NULL if the token is ill-formed.

[Nullable(UInt8)]

Details

This function is similar to hasToken, but returns NULL instead of throwing an error when the token is ill-formed. A token is considered ill-formed if it contains non-alphanumeric ASCII characters.

The function is case-sensitive. For a case-insensitive version, use hasTokenCaseInsensitiveOrNull.

This function is supported by the tokenbf_v1 index specialization.

Example

SELECT
  hasTokenOrNull('Crunchy Taco Supreme', 'Taco');

Result:

| hasTokenOrNull('Crunchy Taco Supreme', 'Taco') |
|------------------------------------------------|
| 1                                              |

SELECT
  hasTokenOrNull('Spicy Bean Burrito', 'Quesadilla');

Result:

| hasTokenOrNull('Spicy Bean Burrito', 'Quesadilla') |
|----------------------------------------------------|
| 0                                                  |

SELECT
  hasTokenOrNull('Nacho Cheese Doritos Locos Tacos', 'Doritos,Locos');

Result:

| hasTokenOrNull('Nacho Cheese Doritos Locos Tacos', 'Doritos,Locos') |
|---------------------------------------------------------------------|
| NULL                                                                |

In this last example, NULL is returned because ‘Doritos,Locos’ is an ill-formed token due to the comma.

hasTokenCaseInsensitive

Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case.

Syntax:

hasTokenCaseInsensitive(haystack, token)

Arguments:

haystack (String): String in which the search is performed.
token (String): Maximal length substring between two non-alphanumeric ASCII characters (or boundaries of haystack).

Returns:

1 if the token is present in the haystack, 0 otherwise. (UInt8)

Implementation details:

Token must be a constant string. Supported by tokenbf_v1 index specialization.

Example:

SELECT
  hasTokenCaseInsensitive('Crunchy Taco Supreme', 'taco');

Result:

| hasTokenCaseInsensitive('Crunchy Taco Supreme', 'taco')|
|--------------------------------------------------------|
| 1                                                      |

In this example, the function returns 1 because ‘taco’ is present in the haystack ‘Crunchy Taco Supreme’, ignoring the case difference.

hasTokenCaseInsensitiveOrNull

Returns whether a given token is present in a string, ignoring case. Returns NULL if the token is ill-formed.

Syntax

hasTokenCaseInsensitiveOrNull(haystack, token)

Arguments

haystack (String): String in which to search.
token (String): Token to search for. Must be a constant string.

Returns

1 if the token is present in the haystack (case-insensitive).
0 if the token is not present.
NULL if the token is ill-formed.

Type: Nullable(UInt8)

A token is defined as a maximal length substring between two non-alphanumeric ASCII characters (or the boundaries of the haystack).
This function is supported by the tokenbf_v1 index specialization.

Example

SELECT
  hasTokenCaseInsensitiveOrNull('Crunchy Taco Supreme', 'taco');

Result:

| hasTokenCaseInsensitiveOrNull('Crunchy Taco Supreme', 'taco') |
|---------------------------------------------------------------|
| 1                                                             |

SELECT
  hasTokenCaseInsensitiveOrNull('Taco Bell Menu', 'burrito,taco');

Result:

| hasTokenCaseInsensitiveOrNull('Taco Bell Menu', 'burrito,taco')  |
|------------------------------------------------------------------|
| NULL                                                             |

In this example, the second query returns NULL because ‘burrito,taco’ is not a valid token (it contains a comma).

Introduction

Statements

​ClickHouse function reference

​position

​locate

​positionCaseInsensitive

​positionUTF8

​positionCaseInsensitiveUTF8

​multiSearchAllPositions

​multiSearchAllPositionsCaseInsensitive

​multiSearchAllPositionsUTF8

​multiSearchAllPositionsCaseInsensitiveUTF8

​multiSearchFirstPosition

​multiSearchFirstPositionCaseInsensitive

​multiSearchFirstPositionUTF8

​multiSearchFirstPositionCaseInsensitiveUTF8

​multiSearchFirstIndex

​multiSearchFirstIndexCaseInsensitive

​multiSearchFirstIndexUTF8

​multiSearchFirstIndexCaseInsensitiveUTF8

​multiSearchAny

​multiSearchAnyCaseInsensitive

​multiSearchAnyUTF8

​multiSearchAnyCaseInsensitiveUTF8

​match

​multiMatchAllIndices

​extract

​extractAll

​extractAllGroupsHorizontal

​extractAllGroupsVertical

​like

​notLike

​ilike

​notILike

​ngramDistance

​ngramDistanceCaseInsensitive

​ngramDistanceUTF8

​ngramDistanceCaseInsensitiveUTF8

​ngramSearch

​ngramSearchCaseInsensitive

​ngramSearchUTF8

​ngramSearchCaseInsensitiveUTF8

​countSubstrings

​countSubstringsCaseInsensitive

​countSubstringsCaseInsensitiveUTF8

​countMatches

​countMatchesCaseInsensitive

​regexpExtract

​hasSubsequence

​hasSubsequenceCaseInsensitive

​hasSubsequenceUTF8

​hasSubsequenceCaseInsensitiveUTF8

​hasToken

​hasTokenOrNull

​hasTokenCaseInsensitive

​hasTokenCaseInsensitiveOrNull

ClickHouse function reference

position

locate

positionCaseInsensitive

positionUTF8

positionCaseInsensitiveUTF8

multiSearchAllPositions

multiSearchAllPositionsCaseInsensitive

multiSearchAllPositionsUTF8

multiSearchAllPositionsCaseInsensitiveUTF8

multiSearchFirstPosition

multiSearchFirstPositionCaseInsensitive

multiSearchFirstPositionUTF8

multiSearchFirstPositionCaseInsensitiveUTF8

multiSearchFirstIndex

multiSearchFirstIndexCaseInsensitive

multiSearchFirstIndexUTF8

multiSearchFirstIndexCaseInsensitiveUTF8

multiSearchAny

multiSearchAnyCaseInsensitive

multiSearchAnyUTF8

multiSearchAnyCaseInsensitiveUTF8

match

multiMatchAllIndices

extract

extractAll

extractAllGroupsHorizontal

extractAllGroupsVertical

like

notLike

ilike

notILike

ngramDistance

ngramDistanceCaseInsensitive

ngramDistanceUTF8

ngramDistanceCaseInsensitiveUTF8

ngramSearch

ngramSearchCaseInsensitive

ngramSearchUTF8

ngramSearchCaseInsensitiveUTF8

countSubstrings

countSubstringsCaseInsensitive

countSubstringsCaseInsensitiveUTF8

countMatches

countMatchesCaseInsensitive

regexpExtract

hasSubsequence

hasSubsequenceCaseInsensitive

hasSubsequenceUTF8

hasSubsequenceCaseInsensitiveUTF8

hasToken

hasTokenOrNull

hasTokenCaseInsensitive

hasTokenCaseInsensitiveOrNull