Encoding functions in help convert data between different character encodings.

These functions enable you to:

  1. Encode and decode strings.
  2. Convert between different character sets.
  3. Transform binary data into readable text.

ClickHouse function reference

char

Returns a string with the length equal to the number of passed arguments, where each byte has the value of the corresponding argument.

Syntax:

char(number_1, [number_2, ..., number_n]);

Arguments:

  • number_1, number_2, …, number_n (numeric): Numerical arguments interpreted as integers. Types: Int, Float.

Returns:

  • A string of given bytes. Type: String.

Example:

SELECT char(104.1, 101, 108.9, 108.9, 111) AS hello_taco;

Result:

| hello_taco |
|------------|
| hello      |

You can construct a string of arbitrary encoding by passing the corresponding bytes. Here’s an example for UTF-8:

SELECT char(0xF0, 0x9F, 0x8C, 0xAE) AS taco_emoji;

Result:

| taco_emoji |
|------------|
| 🌮         |

If the value of an argument is out of the range of the UInt8 data type, it is converted to UInt8 with possible rounding and overflow.

hex

Returns a string containing the argument’s hexadecimal representation.

Syntax

hex(arg)

The function uses uppercase letters A-F and does not include any prefixes (like 0x) or suffixes (like h).

For integer arguments, it prints hex digits from the most significant to least significant (big-endian or “human-readable” order). It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints both digits of every byte even if the leading digit is zero.

Alias:

  • HEX

Arguments

  • arg (String, UInt, Float, Decimal, Date, DateTime, or UUID): A value to convert to hexadecimal.

Returns

  • A string with the hexadecimal representation of the argument. Type: String.

Examples

Converting an integer to hexadecimal:

SELECT hex(42) AS hex_value;

Result:

| hex_value |
|-----------|
| 2A        |

Converting a string to hexadecimal:

SELECT hex('Taco Tuesday') AS hex_string;

Result:

| hex_string                    |
|-------------------------------|
| 5461636F205475657364617920    |

Converting a UUID to hexadecimal:

SELECT hex(toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0')) AS hex_uuid;

Result:

| hex_uuid                                |
|-----------------------------------------|
| 61F0C4045CB311E7907BA6006AD3DBA0        |
  • For Float and Decimal types, the values are encoded as their representation in memory. As ClickHouse supports little-endian architecture, they are encoded in little-endian. Zero leading/trailing bytes are not omitted.
  • For Date and DateTime types, the values are formatted as corresponding integers (the number of days since Epoch for Date and the value of Unix Timestamp for DateTime).
  • For String and FixedString types, all bytes are encoded as two hexadecimal numbers. Zero bytes are not omitted.
  • For UUID type, the values are encoded in big-endian order.

unhex

Performs the opposite operation of hex. It interprets each pair of hexadecimal digits in the argument as a number and converts it to the byte represented by that number. The return value is a binary string (BLOB).

If you want to convert the result to a number, you can use the reverse and reinterpretAs<Type> functions.

Syntax:

unhex(arg)

Alias:

  • UNHEX

Arguments:

  • arg (String, FixedString): A string containing any number of hexadecimal digits.

Supports both uppercase and lowercase letters A-F. The number of hexadecimal digits does not have to be even. If it is odd, the last digit is interpreted as the least significant half of the 00-0F byte. If the argument string contains anything other than hexadecimal digits, some implementation-defined result is returned (an exception isn’t thrown).

Returns:

  • A binary string (BLOB). (String)

Example:

SELECT unhex('303132'), UNHEX('4D7953514C');

Result:

| unhex('303132') | unhex('4D7953514C') |
|-----------------|---------------------|
| 012             | MySQL               |

Converting the result to a number:

SELECT reinterpretAsUInt64(reverse(unhex('FFF'))) AS num;

Result:

| num  |
|------|
| 4095 |

In this example, we use unhex to convert the hexadecimal string ‘FFF’ to its binary representation, reverse the bytes (since ClickHouse uses little-endian), and then reinterpret the result as a UInt64.

For a numeric argument, unhex() does not perform the inverse of hex(N).

bin

Returns a string containing the argument’s binary representation.

Syntax:

bin(arg)
  • For integer arguments, it prints binary digits from the most significant to least significant (big-endian or “human-readable” order).
  • It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints eight digits of every byte if the leading digit is zero.
  • For Date and DateTime, it formats as corresponding integers (the number of days since Epoch for Date and the value of Unix Timestamp for DateTime).
  • For String and FixedString, all bytes are encoded as eight binary numbers. Zero bytes are not omitted.
  • For Float and Decimal types, they are encoded as their representation in memory (little-endian). Zero leading/trailing bytes are not omitted.
  • For UUID type, it is encoded as a big-endian order string.

Alias:

  • BIN

Arguments:

  • arg (String, FixedString, UInt, Float, Decimal, Date, or DateTime): A value to convert to binary.

Returns:

  • A string with the binary representation of the argument. Type: String.

Examples:

In this example, the bin function converts the integer 14 to its binary representation, resulting in the string ‘00001110’.

SELECT bin(14);

Result:

| bin(14)  |
|----------|
| 00001110 |

This second example converts a sequence of floating-point numbers to their binary representations. The third example converts a UUID to its binary representation.

SELECT bin(toFloat32(number)) AS bin_presentation FROM numbers(15, 2);

Result:

| bin_presentation                 |
|----------------------------------|
| 00000000000000000111000001000001 |
| 00000000000000001000000001000001 |

In this example, the bin function converts a UUID to its binary representation. The UUID ‘61f0c404-5cb3-11e7-907b-a6006ad3dba0’ is represented as a 128-bit number, and the bin function outputs its binary form as a string of 128 binary digits. This can be useful for visualizing the binary structure of UUIDs, which are often used as unique identifiers in databases and distributed systems.

SELECT bin(toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0')) as bin_uuid

Result:

| bin_uuid                                                                                                                         |
|----------------------------------------------------------------------------------------------------------------------------------|
| 01100001111100001100010000000100010111001011001100010001111001111001000001111011101001100000000001101010110100111101101110100000 |

unbin

Performs the opposite operation of bin. It interprets each pair of binary digits in the argument as a number and converts it to the byte represented by that number.

Syntax:

unbin(arg)

Alias:

  • UNBIN

Arguments:

  • arg (String): A string containing any number of binary digits.

Returns:

  • A binary string (BLOB). (String)
  • Supports binary digits 0 and 1.
  • The number of binary digits does not have to be a multiple of eight.
  • If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isn’t thrown).
  • For a numeric argument, unbin() does not return the inverse of bin().
  • If unbin is invoked from within the clickhouse-client, binary strings are displayed using UTF-8.

Example:

SELECT UNBIN('001100000011000100110010'), UNBIN('0100110101111001010100110101000101001100');

Result:

| unbin('001100000011000100110010') | unbin('0100110101111001010100110101000101001100')   |
|-----------------------------------|-----------------------------------------------------|
| 012                               | MySQL                                               |

To convert the result to a number, you can use the reverse and reinterpretAs<Type> functions:

SELECT reinterpretAsUInt64(reverse(unbin('1110'))) AS num;

Result:

| num |
|-----|
| 14  |

This function is particularly useful for decoding binary data or working with binary representations of numbers.

bitmaskToList

Converts an integer bitmask to a comma-separated string of powers of two that sum up to the original number.

Syntax

bitmaskToList(num)

Arguments

  • num (Int/UInt): An integer value representing the bitmask.

Returns

  • A string containing comma-separated powers of two in ascending order. (String)

Example

SELECT bitmaskToList(14) AS taco_toppings;

Result:

| taco_toppings |
|---------------|
| 2,4,8         |

In this example, the bitmask 14 (binary 1110) is converted to its constituent powers of two: 2, 4, and 8, which represent different taco toppings when summed.

This function is useful for converting bitmask-encoded data into a human-readable format, such as displaying selected taco toppings from a bitmask representation.

bitmaskToArray

Converts an integer bitmask to an array of powers of two that sum up to the original number.

Syntax

bitmaskToArray(num)

Arguments

  • num (Int/UInt): An integer value representing the bitmask.

Returns

  • An array of UInt64 numbers containing the list of powers of two that sum up to the source number. The numbers in the array are in ascending order. (Array(UInt64))

Example

SELECT bitmaskToArray(14) AS taco_toppings;

Result:

| taco_toppings |
|---------------|
| [2,4,8]       |

In this example, 14 is represented as 2 + 4 + 8, which could correspond to different taco toppings:

  • 2: Cheese
  • 4: Lettuce
  • 8: Salsa

This function can be useful for decoding bitmask-encoded data, such as menu item customizations or feature flags.

If the input is 0, an empty array is returned.

bitPositionsToArray

Converts an integer to an array containing the positions of set bits (bits equal to 1) in ascending order.

Syntax

bitPositionsToArray(arg)

Arguments

  • arg (Int/UInt): Integer value.

Returns:

  • An array containing the positions of set bits in ascending order. Type: Array(UInt64).

Examples

Query:

SELECT bitPositionsToArray(toInt8(1)) AS bit_positions;

Result:

| bit_positions |
|---------------|
| [0]           |

Query:

SELECT bitPositionsToArray(toInt8(-1)) AS bit_positions;

Result:

| bit_positions     |
|-------------------|
| [0,1,2,3,4,5,6,7] |

This function is useful for analyzing binary data or working with bitmasks in a more readable format.

mortonEncode

Calculates the Morton encoding (Z-curve) for a list of unsigned integers.

The function has two modes of operation:

  1. Simple mode
  2. Expanded mode

Simple mode

Syntax:

mortonEncode(arg1, arg2, ..., argN)

Arguments:

  • arg1, arg2, …, argN (UInt8, UInt16, UInt32, UInt64): Up to 8 unsigned integers or columns of unsigned integer types.

Returns:

A UInt64 code.

Example:

SELECT mortonEncode(1, 2, 3);

Result:

| mortonEncode(1, 2, 3) |
|-----------------------|
|                    53 |

Expanded mode

Syntax:

mortonEncode(range_mask, arg1, arg2, ..., argN)

Arguments:

  • range_mask (Tuple): A tuple specifying the amount of range expansion for each argument (1-8x).
  • arg1, arg2, …, argN (UInt8, UInt16, UInt32, UInt64): Up to 8 unsigned integers or columns of unsigned integer types.

Returns:

A UInt64 code.

Example:

SELECT mortonEncode((1,2), 1024, 16);

Result:

| mortonEncode((1, 2), 1024, 16) |
|--------------------------------|
|                        1572864 |

This mode is useful when dealing with arguments of different ranges or cardinality, such as IP addresses (0…FFFFFFFF) and taco shop IDs (0…FF).

  1. Morton encoding for one argument is always the argument itself:
SELECT mortonEncode(1);

Result:

| mortonEncode(1) |
|-----------------|
|               1 |
  1. The function can accept column names as arguments:
CREATE TABLE taco_shops (
    shop_id UInt32,
    location_x UInt32,
    location_y UInt32
) ENGINE = MergeTree()
ORDER BY shop_id;

INSERT INTO taco_shops VALUES (1, 100, 200);

SELECT mortonEncode(location_x, location_y) AS encoded_location
FROM taco_shops;

Result:

| encoded_location |
|------------------|
|           106292 |

Implementation details: The function can fit only so many bits of information into the Morton code as UInt64 has. Two arguments will have a range of maximum 2^32 (64/2) each, three arguments a range of max 2^21 (64/3) each, and so on. All overflow will be clamped to zero.

mortonDecode

Decodes a Morton encoding (ZCurve) into the corresponding unsigned integer tuple.

The function has two modes of operation:

  1. Simple mode
  2. Expanded mode

Simple mode

Syntax:

mortonDecode(tuple_size, code)

Arguments:

  • tuple_size (Integer): Integer value no more than 8.
  • code (UInt64): UInt64 code.

Returns: A tuple of the specified size containing UInt64 values.

Example:

SELECT mortonDecode(2, 31);

Result:

| mortonDecode(2, 31) |
|---------------------|
| ["3", "4"]          |

Expanded mode

Syntax:

mortonDecode(range_mask, code)

Arguments:

  • range_mask (Tuple): A tuple specifying bit shifts for each dimension.
  • code (UInt64): UInt64 code.

Returns: A tuple containing UInt64 values.

Example:

SELECT mortonDecode(tuple(2), 32768);

Result:

| mortonDecode(tuple(2), 32768) |
|--------------------------------|
| ["128"]                        |

This mode is useful when dealing with arguments of different ranges or cardinality, such as IP addresses (0…FFFFFFFF) and country codes (0…FF).

For a single argument without a tuple, the function returns the argument itself as a single-element tuple.

SELECT mortonDecode(1, 1);

Result:

| mortonDecode(1, 1) |
|---------------------|
| ["1"]               |

The function can accept a column of codes as the second argument:

CREATE TABLE morton_numbers (
    n1 UInt32,
    n2 UInt32
) ENGINE = MergeTree()
ORDER BY n1;

INSERT INTO morton_numbers VALUES (1, 2);

SELECT untuple(mortonDecode(2, mortonEncode(n1, n2))) FROM morton_numbers;

Result:

| untuple(mortonDecode(2, mortonEncode(n1, n2))) |
|------------------------------------------------|
| 1 2                                            |

This function is particularly useful for decoding multi-dimensional data that has been encoded into a single dimension using the Morton curve, often used in geospatial applications or for optimizing multi-dimensional database queries.

hilbertEncode

Calculates the Hilbert curve index for a list of unsigned integers.

The function has two modes of operation:

  1. Simple mode
  2. Expanded mode

Simple mode

Syntax:

hilbertEncode(arg1, arg2)

Arguments:

  • arg1, arg2 (UInt): Up to 2 unsigned integers or columns of unsigned integer type.

Returns: A UInt64 code representing the Hilbert curve index.

Example:

SELECT hilbertEncode(3, 4);

Result:

| hilbertEncode(3, 4)  |
|----------------------|
| 31                   |

Expanded mode

Syntax:

hilbertEncode(range_mask, arg1, arg2)

Arguments:

  • range_mask (Tuple): A tuple specifying bit shifts for each dimension.
  • arg1, arg2 (UInt): Up to 2 unsigned integers or columns of unsigned integer type.

Returns: A UInt64 code representing the Hilbert curve index.

Example:

SELECT hilbertEncode((10, 6), 1024, 16);

Result:

| hilbertEncode((10, 6), 1024, 16) |
|------------------------------------|
| 4031541586602                      |

This mode is useful when dealing with arguments of different ranges or cardinality, such as IP addresses (0…FFFFFFFF) and country codes (0…FF).

For a single argument without a tuple, the function returns the argument itself as the Hilbert index:

SELECT hilbertEncode(1);

Result:

| hilbertEncode(1) |
|-------------------|
| 1                 |

The function can accept columns as arguments:

CREATE TABLE hilbert_numbers (
    n1 UInt32,
    n2 UInt32
) ENGINE = MergeTree()
ORDER BY n1;

INSERT INTO hilbert_numbers VALUES (1, 2);

SELECT hilbertEncode(n1, n2) FROM hilbert_numbers;

Result:

| hilbertEncode(n1, n2) |
|------------------------|
| 13                     |

This function is particularly useful for encoding multi-dimensional data into a single dimension while preserving locality, often used in geospatial applications or for optimizing multi-dimensional database queries.

hilbertDecode

Decodes a Hilbert curve index back into a tuple of unsigned integers, representing coordinates in multi-dimensional space.

The function has two modes of operation:

  1. Simple mode
  2. Expanded mode

Simple mode

Syntax:

hilbertDecode(tuple_size, code)

Arguments:

  • tuple_size (Integer): Integer value no more than 2.
  • code (UInt64): UInt64 code.

Returns: A tuple of the specified size containing UInt64 values.

Example:

SELECT hilbertDecode(2, 31);

Result:

| hilbertDecode(2, 31) |
|----------------------|
| ["3", "4"]           |

Expanded mode

Syntax:

hilbertDecode(range_mask, code)

Arguments:

  • range_mask (Tuple): A tuple specifying bit shifts for each dimension.
  • code (UInt64): UInt64 code.

Returns: A tuple containing UInt64 values.

Example:

SELECT hilbertDecode(tuple(2), 32768);

Result:

| hilbertDecode(tuple(2), 32768) |
|--------------------------------|
| ["128"]                        |

This mode is useful when dealing with arguments of different ranges or cardinality, such as IP addresses (0…FFFFFFFF) and country codes (0…FF).

For a single argument without a tuple, the function returns the argument itself as a single-element tuple.

SELECT hilbertDecode(1, 1);

Result:

| hilbertDecode(1, 1) |
|---------------------|
| ["1"]               |

The function can accept a column of codes as the second argument:

CREATE TABLE hilbert_numbers (
    n1 UInt32,
    n2 UInt32
) ENGINE = MergeTree()
ORDER BY n1;

INSERT INTO hilbert_numbers VALUES (1, 2);

SELECT untuple(hilbertDecode(2, hilbertEncode(n1, n2))) FROM hilbert_numbers;

Result:

| untuple(hilbertDecode(2, hilbertEncode(n1, n2))) |
|--------------------------------------------------|
| 1 2                                              |

This function is particularly useful for decoding multi-dimensional data that has been encoded into a single dimension using the Hilbert curve, often used in geospatial applications or for optimizing multi-dimensional database queries.