URL functions

URL functions provide tools for working with URL strings. These functions are useful for:

Web analytics
Log processing
Any task involving URL data

With URL functions, you can:

Extract key components of a URL:
- Protocol (e.g., http, https)
- Domain
- Path
- Query parameters
Perform common URL operations:
- Encoding
- Decoding

ClickHouse function reference

protocol

Extracts the protocol from a URL.

Syntax:

protocol(url)

Arguments:

url (String): URL to extract protocol from.

Returns:

The protocol, or an empty string if it cannot be determined. [String]

Example:

SELECT
  protocol('https://tacosoft.com/menu/tacos');

Result:

| protocol('https://tacosoft.com/menu/tacos') |
|---------------------------------------------|
| https                                       |

This function extracts the protocol (e.g., http, https, ftp) from the given URL. If the protocol cannot be determined, it returns an empty string.

This function is optimized for performance and may not strictly follow RFC-3986. For RFC-compliant parsing, use protocolRFC instead.

domain

Extracts the hostname from a URL.

Syntax:

domain(url)

Arguments:

url (String): URL.

Returns:

Host name if the input string can be parsed as a URL, otherwise an empty string. (String)

Example:

SELECT
  domain('https://tacosoft.com/menu/burritos');

Result:

| domain('https://tacosoft.com/menu/burritos') |
|----------------------------------------------|
| tacosoft.com                                 |

This function works with URLs specified with or without a protocol. For example:

SELECT
  domain('tacosoft.com/menu/burritos');

Result:

| domain('tacosoft.com/menu/burritos') |
|--------------------------------------|
| tacosoft.com                         |

The function is optimized for performance and may not strictly follow RFC-3986. For RFC-compliant parsing, use domainRFC().

domainRFC

Extracts the hostname from a URL, conforming to RFC 3986.

Syntax:

domainRFC(url)

Arguments:

url (String): URL.

Returns:

Host name if the input string can be parsed as a URL, otherwise an empty string. (String)

Example:

SELECT
	domainRFC('http://user:password@www.tacosoft.com:8080/menu?category=vegetarian#spicy') AS domain;

Result:

| domain               |
|----------------------|
| www.tacosoft.com     |

This function is similar to domain, but it strictly follows RFC 3986 standards. It’s particularly useful when dealing with URLs that contain special characters or complex structures.

The domainRFC function can handle URLs with user information and non-standard ports, which the non-RFC variant might struggle with.

domainWithoutWWW

Returns the domain name without the leading www. subdomain, if present.

Syntax

domainWithoutWWW(url)

Arguments

url (String): URL.

Returns:

The domain name without the leading www. subdomain, if present. (String)

Example

SELECT domainWithoutWWW('https://www.tacosoft.com/menu');

Result:

| domainWithoutWWW('https://www.tacosoft.com/menu')  |
|----------------------------------------------------|
| tacosoft.com                                       |

This function is useful for normalizing domain names in URL analysis, especially when you want to treat ‘www.example.com’ and ‘example.com’ as the same domain.

If the URL doesn’t contain www. at the beginning of the domain, the function returns the domain as is. If the input is not a valid URL or doesn’t contain a domain, an empty string is returned.

domainWithoutWWWRFC

Returns the domain without leading www. if present. This function conforms to RFC 3986.

Syntax

domainWithoutWWWRFC(url)

Arguments

url (String): URL.

Returns:

Domain name without leading www. if present, otherwise an empty string. (String)

Example

Query:

SELECT
	domainWithoutWWW('http://user:password@www.tacosoft.com:8080/path?query=value#fragment'),
	domainWithoutWWWRFC('http://user:password@www.tacosoft.com:8080/path?query=value#fragment');

Result:

| domainWithoutWWW('http://user:password@www.tacosoft.com:8080/path?query=value#fragment') | domainWithoutWWWRFC('http://user:password@www.tacosoft.com:8080/path?query=value#fragment')|
|------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
|                                                                                          | tacosoft.com                                                                               |

In this example, domainWithoutWWWRFC correctly extracts the domain ‘tacosoft.com’ from the URL, removing the www. prefix and conforming to RFC 3986 standards.

This function is particularly useful when you need to extract the domain from URLs that may contain various components like usernames, passwords, ports, and query parameters, while ensuring compliance with RFC 3986.

topLevelDomain

Extracts the top-level domain from a URL.

Syntax:

topLevelDomain(url)

Arguments:

url (String): URL.

Returns:

The top-level domain if the input string can be parsed as a URL. Otherwise, an empty string. (String)

Example:

SELECT topLevelDomain('https://www.tacosoft.com/menu/tacos');

Result:

| topLevelDomain('https://www.tacosoft.com/menu/tacos') |
|-------------------------------------------------------|
| com                                                   |

This function extracts ‘com’ as the top-level domain from the TacoSoft URL.

The URL can be specified with or without a protocol. For example:

SELECT
  topLevelDomain('tacobell.com/menu/tacos');

will also return ‘com’.

If the function cannot parse the input as a URL or if there’s no valid top-level domain, it returns an empty string.

topLevelDomainRFC

Extracts the top-level domain from a URL. This function conforms to RFC 3986.

Syntax

topLevelDomainRFC(url)

Arguments

url (String): URL.

Returns

The top-level domain name if the input string can be parsed as a URL. Otherwise, an empty string. (String)

Example

SELECT topLevelDomainRFC('https://www.tacosoft.com/menu');

Result:

| topLevelDomainRFC('https://www.tacosoft.com/menu') |
|----------------------------------------------------|
| com                                                |

This function is particularly useful when you need to extract the top-level domain from URLs in a way that strictly adheres to RFC 3986 standards. It can handle complex URLs and properly parse various top-level domains, including country code top-level domains (ccTLDs) and new generic top-level domains (gTLDs).

Unlike its non-RFC counterpart, topLevelDomainRFC can correctly handle URLs with special characters in the user info part (following the @ symbol), such as %, ;, =, &, and others, as defined in RFC 3986.

firstSignificantSubdomain

Returns the “first significant subdomain” of a URL.

Syntax

firstSignificantSubdomain(url)

Arguments

url (String): URL.

Returns

The first significant subdomain. (String)

Description

The first significant subdomain is determined as follows:

For second-level domains like .com, .net, .org, or .co, it returns the third-level domain.
For all other cases, it returns the second-level domain.

Examples

SELECT
	firstSignificantSubdomain('https://news.tacosoft.com/') AS subdomain;

Result:

| subdomain  |
|------------|
| tacosoft   |

SELECT
	firstSignificantSubdomain('https://www.tacosoft.com/') AS subdomain;

Result:

| subdomain   |
|-------------|
| tacosoft    |

The list of “insignificant” second-level domains and other implementation details may change in future versions.

This function is optimized for performance and may not strictly follow URL parsing standards. For RFC-compliant parsing, use firstSignificantSubdomainRFC.

firstSignificantSubdomainRFC

Returns the “first significant subdomain” of a URL, conforming to RFC 3986.

Syntax

firstSignificantSubdomainRFC(url)

Arguments

url (String): URL.

Returns

The first significant subdomain. (String)

Description

The first significant subdomain is determined as follows:

For second-level domains like .com, .net, .org, or .co, it returns the third-level domain.
For other domains, it returns the second-level domain.

This function is similar to firstSignificantSubdomain, but strictly adheres to RFC 3986 for URL parsing.

Examples

SELECT
	firstSignificantSubdomainRFC('https://news.tacosoft.com/') AS subdomain;

Result:

| subdomain |
|-----------|
| tacosoft  |

SELECT
	firstSignificantSubdomainRFC('https://www.tacosoft.com/menu') AS subdomain;

Result:

| subdomain    |
|--------------|
| tacosoft     |

The list of “insignificant” second-level domains and other implementation details may change in future versions.

cutToFirstSignificantSubdomain

Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”.

Syntax

cutToFirstSignificantSubdomain(url)

Arguments

url (String): URL.

Returns

Part of the domain that includes top-level subdomains up to the first significant subdomain if possible, otherwise returns an empty string. String.

Example

SELECT
	cutToFirstSignificantSubdomain('https://news.tacosoft.com/') AS subdomain,
	cutToFirstSignificantSubdomain('www.tacosoft.com') AS subdomain2,
	cutToFirstSignificantSubdomain('tacosoft.com') AS subdomain3;

Result:

| subdomain           | subdomain2 | subdomain3 |
|---------------------|------------|------------|
| tacosoft.com        | tacosoft   |            |

In this example:

subdomain returns ‘tacosoft.com’, which is the part of the domain up to the first significant subdomain.
subdomain2 returns ‘tacosoft’, as ‘www’ is not considered significant.
subdomain3 returns an empty string, as ‘tacosoft’ alone is not a valid domain with a significant subdomain.

This function is useful for extracting the main part of a domain while ignoring less significant subdomains like ‘www’ or country-specific extensions.

cutToFirstSignificantSubdomainRFC

Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”. Similar to cutToFirstSignificantSubdomain, but conforms to RFC 3986.

Syntax

cutToFirstSignificantSubdomainRFC(url)

Arguments

url (String): URL.

Returns

Part of the domain that includes top-level subdomains up to the first significant subdomain if possible, otherwise returns an empty string. String.

Example

SELECT
	cutToFirstSignificantSubdomain('http://user:password@www.tacosoft.com:8080/menu') AS non_rfc_result,
	cutToFirstSignificantSubdomainRFC('http://user:password@www.tacosoft.com:8080/menu') AS rfc_result;

Result:

| non_rfc_result | rfc_result        |
|----------------|-------------------|
|                | tacosoft.com      |

In this example, cutToFirstSignificantSubdomainRFC correctly extracts ‘delicious-tacos.com’ from the URL, including the user information and port in the parsing process as per RFC 3986. The non-RFC version fails to parse the URL correctly due to the presence of user information.

This function is particularly useful when dealing with complex URLs that may contain user information, non-standard ports, or other elements that require strict adherence to URL standards.

cutToFirstSignificantSubdomainWithWWW

Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”, without stripping www.

Syntax:

cutToFirstSignificantSubdomainWithWWW(url)

Arguments:

url (String): URL.

Returns:

Part of the domain that includes top-level subdomains up to the first significant subdomain (with www) if possible, otherwise returns an empty string. String.

Example:

SELECT
	cutToFirstSignificantSubdomainWithWWW('https://www.tacosoft.com/menu') AS domain,
	cutToFirstSignificantSubdomainWithWWW('www.tacosoft.co') AS domain2,
	cutToFirstSignificantSubdomainWithWWW('tacosoft.io') AS domain3;

Result:

| domain               | domain2           | domain3     |
|----------------------|-------------------|-------------|
| www.tacosoft.com     | www.tacosoft.co   | tacosoft.io |

In this example:

For domain, the function returns tacosoft.com, preserving the www if it was present.
For domain2, it returns tacosoft.co, keeping the www.
For domain3, it returns tacosoft.io as there’s no subdomain to cut.

This function is useful when you want to extract the main part of a domain while preserving the www subdomain if it exists.

cutToFirstSignificantSubdomainWithWWWRFC

Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”, without stripping www. This function conforms to RFC 3986.

Syntax

cutToFirstSignificantSubdomainWithWWWRFC(url)

Arguments

url (String): URL.

Returns:

Part of the domain that includes top-level subdomains up to the first significant subdomain (with www) if possible, otherwise returns an empty string. String.

Example

SELECT
	cutToFirstSignificantSubdomainWithWWWRFC('http://www.tacosoft.com/menu') AS domain,
	cutToFirstSignificantSubdomainWithWWWRFC('https://order.tacosoft.co.uk/cart') AS uk_domain,
	cutToFirstSignificantSubdomainWithWWWRFC('ftp://files.tacosoft.com.br/recipes') AS br_domain;

Result:

| domain           | uk_domain        | br_domain          |
|------------------|------------------|--------------------|
| www.tacosoft.com | tacosoft.co.uk   | tacosoft.com.br    |

This function is useful for extracting the main part of a domain while preserving the “www” prefix if present. It handles various top-level domains correctly, including country-specific ones.

The function adheres to RFC 3986, ensuring proper handling of special characters and edge cases in URLs.

port

Extracts the port number from a URL, or returns a default port if not specified.

Syntax

port(url [, default_port = 0])

Arguments

url (String): URL to extract the port from.
default_port (UInt16, optional): The default port number to return if no port is specified in the URL.

Returns

The port number from the URL, or the default port if not specified. (UInt16)

Example

SELECT
	port('https://tacosoft.com:8443/docs/') AS port_number,
	port('http://tacosoft.com/menu', 80) AS default_port;

Result:

| port_number | default_port |
|-------------|--------------|
| 8443        | 80           |

In this example:

port_number extracts the explicitly specified port (8443) from the URL.
default_port returns the provided default value (80) since no port is specified in the URL.

If the URL cannot be parsed or doesn’t contain a port, and no default port is provided, the function returns 0.

portRFC

Returns the port number from a URL, or a default port if not specified. This function conforms to RFC 3986.

Syntax

portRFC(url [, default_port = 0])

Arguments

url (String): URL to extract the port from.
default_port (UInt16, optional): The port number to return if no port is specified in the URL. Default: 0.

Returns

The port number from the URL, or the default port if not specified. (UInt16)

Example

SELECT
	portRFC('http://www.tacosoft.com:8080/order') AS port_with_url,
	portRFC('https://www.tacosoft.com/menu', 443) AS port_with_default;

Result:

| port_with_url | port_with_default |
|---------------|-------------------|
| 8080          | 443               |

In this example:

port_with_url returns 8080, which is explicitly specified in the URL.
port_with_default returns 443 (the default HTTPS port) since no port is specified in the URL.

This function is RFC 3986 compliant, which means it correctly handles URLs with special characters or unusual formats. For non-RFC compliant URL parsing, use the port function instead.

path

Extracts the path from a URL without the query string.

Syntax

path(url)

Arguments

url (String): URL.

Returns

The path component of the URL without the query string. (String)

Example

SELECT
	path('https://tacosoft.com/menu/burritos?size=large&extra=guac#nutrition') AS taco_path;

Result:

| taco_path      |
|----------------|
| /menu/burritos |

This function extracts /menu/burritos from the URL, omitting the query string ?size=large&extra=guac and the fragment #nutrition.

If the URL does not contain a path, an empty string is returned.

pathFull

Returns the full path of a URL, including the query string and fragment.

Syntax:

pathFull(url)

Arguments:

url (String): URL.

Returns:

The full path of the URL, including the query string and fragment. (String)

Example:

SELECT
	pathFull('https://tacosoft.com/menu/burritos?size=large#spiciness') AS full_path;

Result:

| full_path                                 |
|-------------------------------------------|
| /menu/burritos?size=large#spiciness       |

In this example, pathFull extracts the complete path from the URL, including the query string ?size=large and the fragment #spiciness.

If the URL doesn’t contain a path, query string, or fragment, an empty string is returned.

protocol

Extracts the protocol from a URL.

Syntax:

protocol(url)

Arguments:

url (String): URL to extract protocol from.

Returns:

The protocol, or an empty string if it cannot be determined. [String]

Example:

SELECT
  protocol('https://tacosoft.com/menu/tacos');

Result:

| protocol('https://tacosoft.com/menu/tacos') |
|---------------------------------------------|
| https                                       |

This function extracts the protocol (e.g., http, https, ftp) from the given URL. If the protocol cannot be determined, it returns an empty string.

This function is optimized for performance and may not strictly follow RFC-3986. For RFC-compliant parsing, use protocolRFC instead.

queryString

Extracts the query string from a URL without the initial question mark, # and everything after #.

Syntax

queryString(url)

Arguments

url (String): URL to extract the query string from.

Returns

The query string without the initial question mark and fragment identifier. (String)

Example

SELECT
  queryString('https://tacosoft.com/order?items=3&sauce=hot#confirm') AS query;

Result:

| query              |
|--------------------|
| items=3&sauce=hot  |

In this example, queryString extracts ‘items=3&sauce=hot’ from the URL, omitting the initial ’?’ and everything after ’#’.

If the URL doesn’t contain a query string, an empty string is returned.

fragment

Extracts the fragment identifier from a URL, without the initial hash symbol.

Syntax

fragment(url)

Arguments

url (String): URL to extract the fragment from.

Returns

The fragment identifier without the initial hash symbol, or an empty string if there is no fragment. (String)

Example

SELECT
	fragment('https://tacosoft.com/menu#spicy-tacos') AS taco_fragment;

Result:

| taco_fragment |
|---------------|
| spicy-tacos   |

In this example, fragment extracts ‘spicy-tacos’ from the URL, which represents the specific section of the taco menu being referenced.

If the URL doesn’t contain a fragment identifier, an empty string is returned:

SELECT
	fragment('https://tacosoft.com/menu') AS no_fragment;

Result:

| no_fragment |
|-------------|
|             |

This function is useful for analyzing user navigation patterns within a webpage or for processing URLs that use fragment identifiers to represent different views or states of a single-page application.

queryStringAndFragment

Returns the query string and fragment identifier from a URL.

Syntax

queryStringAndFragment(url)

Arguments

url (String): URL.

Returns:

The query string and fragment identifier. (String)

Example

SELECT
  queryStringAndFragment('https://tacosoft.com/menu?category=burritos&spice=hot#nutrition') AS query_and_fragment;

Result:

| query_and_fragment                            |
|-----------------------------------------------|
| category=burritos&spice=hot#nutrition         |

This function returns everything after the ’?’ character in the URL, including the ’?’ itself, and the fragment identifier (everything after the ’#’ character, including the ’#’).

If the URL doesn’t contain a query string or fragment identifier, an empty string is returned.

This function does not decode URL-encoded characters.
If only a fragment identifier is present (without a query string), it will still be returned.

extractURLParameter

Extracts the value of a specified parameter from a URL.

Syntax

extractURLParameter(url, name)

Arguments

url (String): The URL to extract the parameter from.
name (String): The name of the parameter to extract.

Returns

The value of the specified parameter if present in the URL, otherwise an empty string. (String)

If there are multiple parameters with the same name, the function returns the value of the first occurrence.
The function assumes that the parameter in the URL is encoded in the same way as in the name argument.

Example

SELECT
	extractURLParameter('https://tacosoft.com/order?size=large&toppings=cheese,salsa&drink=horchata', 'toppings') AS taco_toppings;

Result:

| taco_toppings |
|---------------|
| cheese,salsa  |

In this example, the function extracts the value of the ‘toppings’ parameter from a taco order URL.

extractURLParameters

Extracts all parameters and their values from a URL query string.

Syntax

extractURLParameters(URL)

Arguments

URL (String): The URL to extract parameters from.

Returns

An array of name=value strings corresponding to the URL parameters. (Array(String))

Example

SELECT
	extractURLParameters('https://tacosoft.com/order?items=3&type=carnitas&extra=guacamole') AS params;

Result:

| params                                              |
|----------------------------------------------------|
| ['items=3', 'type=carnitas', 'extra=guacamole']    |

This function extracts all parameters from the URL’s query string, returning them as an array of strings in the format name=value. The values are not decoded.

If the URL doesn’t contain any parameters, an empty array is returned.

extractURLParameterNames

Extracts the names of parameters from a URL.

Syntax

extractURLParameterNames(url)

Arguments

url (String): URL to extract parameter names from.

Returns

An array of strings containing the names of URL parameters. (Array(String))

Example

SELECT
	extractURLParameterNames('https://tacosoft.com/order?filling=beef&salsa=hot&guacamole=true') AS param_names;

Result:

| param_names                     |
|---------------------------------|
| ['filling','salsa','guacamole'] |

This function extracts the names of all parameters from the URL query string. It does not decode the parameter names, so they are returned exactly as they appear in the URL.

If the URL has no parameters, an empty array is returned.
The function does not handle duplicate parameter names in any special way; all occurrences will be included in the result.
The order of parameter names in the resulting array matches their order in the URL.

URLHierarchy

Returns an array containing the URL, truncated at the end by the symbols / and ? in the path and query string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.

Syntax

URLHierarchy(url)

Arguments

url (String): URL.

Returns

An array of strings containing the hierarchical parts of the URL.

Example

SELECT
	URLHierarchy('https://tacosoft.com/menu/burritos?spice=hot#nutrition') AS url_hierarchy;

Result:

| url_hierarchy                                                                 |
|-------------------------------------------------------------------------------|
| ['https://tacosoft.com/', 'https://tacosoft.com/menu/',                       |
|  'https://tacosoft.com/menu/burritos', 'https://tacosoft.com/menu/burritos?'] |

This function is useful for analyzing URL structures and creating hierarchical representations of web pages. It can be particularly helpful in web analytics tasks, site structure analysis, or when working with URL-based data.

The function includes the protocol and domain in the result, unlike URLPathHierarchy which focuses only on the path.

URLPathHierarchy

Returns an array containing the URL path hierarchy, excluding the protocol and host.

Syntax

URLPathHierarchy(url)

Arguments

url (String): URL.

Returns

An array of strings representing the URL path hierarchy. (Array(String))

Example

SELECT
	URLPathHierarchy('https://tacosoft.com/menu/burritos/bean-burrito') AS path_hierarchy;

Result:

| path_hierarchy                                                           |
|--------------------------------------------------------------------------|
| ['/menu/','/menu/burritos/','/menu/burritos/bean-burrito']               |

This function splits the URL path into hierarchical components:

It removes the protocol and domain.
It splits the remaining path at each forward slash (/).
It builds an array where each element is a progressively longer portion of the path.

The function is useful for analyzing URL structures and creating hierarchical reports based on URL paths.

The root path (’/’) is not included in the result array.

encodeURLComponent

Encodes a URL component by replacing certain characters with their percent-encoded equivalents.

Syntax:

encodeURLComponent(url)

Arguments:

url (String): The URL component to encode.

Returns:

The encoded URL component. [String]

Example:

SELECT
	encodeURLComponent('Spicy Jalapeño Taco') AS encoded_taco_name;

Result:

| encoded_taco_name            |
|------------------------------|
| Spicy%20Jalape%C3%B1o%20Taco |

In this example, the function encodes the taco name “Spicy Jalapeño Taco”, replacing spaces with “%20” and the “ñ” character with its UTF-8 percent-encoded equivalent “%C3%B1”.

This function is useful when you need to include special characters or non-ASCII characters in a URL, ensuring that the URL remains valid and properly formatted. It’s particularly helpful when constructing URLs with query parameters that may contain special characters.

decodeURLComponent

Decodes a URL-encoded string.

Syntax:

decodeURLComponent(url)

Arguments:

url (String): The URL-encoded string to decode.

Returns:

The decoded string. [String]

Example:

SELECT
	decodeURLComponent('https%3A%2F%2Ftacosoft.com%2Fmenu%3Fitem%3Dcarnitas%20tacos') AS decoded_url;

Result:

| decoded_url                                       |
|---------------------------------------------------|
| https://tacosoft.com/menu?item=carnitas tacos     |

This function decodes URL-encoded characters in the input string. For example, %20 is decoded to a space, %3A to a colon, and %2F to a forward slash.

The decodeURLComponent function is the inverse of encodeURLComponent. It’s particularly useful when working with URLs or query parameters that may contain special characters or spaces.

encodeURLFormComponent

Encodes a URL component following RFC 1866, where spaces are encoded as plus signs (+).

Syntax:

encodeURLFormComponent(url)

Arguments:

url (String): URL component to encode.

Returns:

The encoded URL component. [String]

Example:

SELECT
	encodeURLFormComponent('Carne Asada Taco with Extra Salsa') AS encoded_taco_order;

Result:

| encoded_taco_order                         |
|--------------------------------------------|
| Carne+Asada+Taco+with+Extra+Salsa          |

This function is particularly useful when constructing URLs for web forms or API requests where spaces and special characters need to be properly encoded. It ensures that the URL remains valid and can be correctly interpreted by web servers.

This function differs from encodeURLComponent in that it specifically encodes spaces as plus signs, which is the expected behavior for URL-encoded form data.

decodeURLFormComponent

Decodes a URL-encoded form component string.

Syntax

decodeURLFormComponent(encoded_string)

Arguments

encoded_string (String): A URL-encoded string.

Returns:

The decoded string. (String)

Description

This function decodes a URL-encoded form component string, following RFC 1866. It specifically handles the following:

Converts + (plus) to a space character.
Decodes percent-encoded sequences (e.g., %20 to space, %2B to +).

Example

SELECT
	decodeURLFormComponent('Spicy+Taco%21') AS decoded_taco_name;

Result:

| decoded_taco_name |
|-------------------|
| Spicy Taco!       |

In this example, decodeURLFormComponent converts the + to a space and decodes %21 to an exclamation mark, resulting in “Spicy Taco!”.

This function is particularly useful when working with form data submitted via HTTP POST requests or when processing URL query parameters.

netloc

Extracts the network locality (username:password@host:port) from a URL.

Syntax:

netloc(url)

Arguments:

url (String): URL.

Returns:

The network locality part of the URL (username:password@host:port). [String]

Example:

SELECT
  netloc('https://juan:secreto@tacosoft.com:8080/menu');

Result:

| netloc('https://juan:secreto@tacosoft.com:8080/menu')  |
|--------------------------------------------------------|
| juan:secreto@tacosoft.com:8080                         |

In this example, netloc extracts the network locality part from a URL for a taco restaurant’s online ordering system, including the username, password, host, and port.

If the URL doesn’t contain any network locality information, an empty string is returned.

cutWWW

Removes the leading www. from a URL’s domain, if present.

Syntax

cutWWW(url)

Arguments

url (String): The URL to process.

Returns

The URL with the leading www. removed from the domain, if present. Otherwise, returns the original URL. [String]

Example

SELECT
  cutWWW('https://www.tacosoft.com/menu') AS url_without_www;

Result:

| url_without_www               |
|-------------------------------|
| https://tacosoft.com/menu     |

In this example, cutWWW removes the www. from the domain of the Taco Bell website URL.

This function only removes the www. prefix if it appears at the beginning of the domain. It does not affect other parts of the URL or remove www. if it appears elsewhere in the URL.

cutQueryString

Removes the query string from a URL, including the question mark.

Syntax:

cutQueryString(url)

Arguments:

url (String): The URL to process.

Returns:

The URL with the query string removed. [String]

Example:

SELECT
  cutQueryString('https://tacosoft.com/menu?category=burritos&spicy=true') AS url_without_query;

Result:

| url_without_query            |
|------------------------------|
| https://tacosoft.com/menu    |

In this example, cutQueryString removes the query string ?category=burritos&spicy=true from the URL, leaving only the base URL.

If the URL does not contain a query string, the function returns the original URL unchanged.

cutFragment

Removes the fragment identifier from a URL, including the hash symbol (#).

Syntax

cutFragment(url)

Arguments

url (String): The URL to process.

Returns

The URL with the fragment identifier removed. [String]

Example

SELECT
  cutFragment('https://tacosoft.com/menu#spicy-tacos') AS url_without_fragment;

Result:

| url_without_fragment           |
|--------------------------------|
| https://tacosoft.com/menu      |

This function is useful when you want to remove the fragment identifier (everything after and including the ’#’ symbol) from a URL. It’s commonly used in web analytics to standardize URLs for better grouping and analysis.

If the URL doesn’t contain a fragment identifier, the function returns the original URL unchanged.

cutQueryStringAndFragment

Removes the query string and fragment identifier from a URL, including the question mark and number sign.

Syntax

cutQueryStringAndFragment(url)

Arguments

url (String): The URL to process.

Returns

The URL with query string and fragment removed. (String)

Example

SELECT
  cutQueryStringAndFragment('https://tacosoft.com/menu?category=burritos#spicy') AS clean_url;

Result:

| clean_url                  |
|----------------------------|
| https://tacosoft.com/menu  |

This function is useful when you need to extract the base URL without any parameters or fragments. It’s particularly handy for cleaning up URLs before analysis or grouping.

If the URL doesn’t contain a query string or fragment, it remains unchanged.

cutURLParameter

Removes a specified parameter from a URL.

Syntax

cutURLParameter(url, name)

Arguments

url (String): The URL to modify.
name (String or Array(String)): The name of the parameter to remove.

Returns

The modified URL with the specified parameter(s) removed. (String)

Example

SELECT
	cutURLParameter('https://tacosoft.com/order?size=large&toppings=cheese&sauce=hot', 'toppings') AS url_without_toppings,
	cutURLParameter('https://tacosoft.com/order?size=large&toppings=cheese&sauce=hot', ['size', 'sauce']) AS url_without_size_and_sauce;

Result:

| url_without_toppings                                | url_without_size_and_sauce                 |
|-----------------------------------------------------|--------------------------------------------|
| https://tacosoft.com/order?size=large&sauce=hot     | https://tacosoft.com/order?toppings=cheese |

In this example:

url_without_toppings removes the ‘toppings’ parameter from the URL.
url_without_size_and_sauce removes both the ‘size’ and ‘sauce’ parameters from the URL.

This function does not encode or decode characters in parameter names. For example, ‘Salsa Type’ and ‘Salsa%20Type’ are treated as different parameter names.

Introduction

Statements

ClickHouse function reference

protocol

domain

domainRFC

domainWithoutWWW

domainWithoutWWWRFC

topLevelDomain

topLevelDomainRFC

firstSignificantSubdomain

firstSignificantSubdomainRFC

cutToFirstSignificantSubdomain

cutToFirstSignificantSubdomainRFC

cutToFirstSignificantSubdomainWithWWW

cutToFirstSignificantSubdomainWithWWWRFC

port

portRFC

path

pathFull

protocol

queryString

fragment

queryStringAndFragment

extractURLParameter

extractURLParameters

extractURLParameterNames

URLHierarchy

URLPathHierarchy

encodeURLComponent

decodeURLComponent

encodeURLFormComponent

decodeURLFormComponent

netloc

cutWWW

cutQueryString

cutFragment

cutQueryStringAndFragment

cutURLParameter

Introduction

Statements

​ClickHouse function reference

​protocol

​domain

​domainRFC

​domainWithoutWWW

​domainWithoutWWWRFC

​topLevelDomain

​topLevelDomainRFC

​firstSignificantSubdomain

​firstSignificantSubdomainRFC

​cutToFirstSignificantSubdomain

​cutToFirstSignificantSubdomainRFC

​cutToFirstSignificantSubdomainWithWWW

​cutToFirstSignificantSubdomainWithWWWRFC

​port

​portRFC

​path

​pathFull

​protocol

​queryString

​fragment

​queryStringAndFragment

​extractURLParameter

​extractURLParameters

​extractURLParameterNames

​URLHierarchy

​URLPathHierarchy

​encodeURLComponent

​decodeURLComponent

​encodeURLFormComponent

​decodeURLFormComponent

​netloc

​cutWWW

​cutQueryString

​cutFragment

​cutQueryStringAndFragment

​cutURLParameter

ClickHouse function reference

protocol

domain

domainRFC

domainWithoutWWW

domainWithoutWWWRFC

topLevelDomain

topLevelDomainRFC

firstSignificantSubdomain

firstSignificantSubdomainRFC

cutToFirstSignificantSubdomain

cutToFirstSignificantSubdomainRFC

cutToFirstSignificantSubdomainWithWWW

cutToFirstSignificantSubdomainWithWWWRFC

port

portRFC

path

pathFull

protocol

queryString

fragment

queryStringAndFragment

extractURLParameter

extractURLParameters

extractURLParameterNames

URLHierarchy

URLPathHierarchy

encodeURLComponent

decodeURLComponent

encodeURLFormComponent

decodeURLFormComponent

netloc

cutWWW

cutQueryString

cutFragment

cutQueryStringAndFragment

cutURLParameter