Encoding functions
Convert between different character encodings.
Encoding functions in help convert data between different character encodings.
These functions enable you to:
- Encode and decode strings.
- Convert between different character sets.
- Transform binary data into readable text.
ClickHouse function reference
char
Returns a string with the length equal to the number of passed arguments, where each byte has the value of the corresponding argument.
Syntax:
Arguments:
number_1
,number_2
, …,number_n
(numeric): Numerical arguments interpreted as integers. Types:Int
,Float
.
Returns:
- A string of given bytes. Type:
String
.
Example:
Result:
You can construct a string of arbitrary encoding by passing the corresponding bytes. Here’s an example for UTF-8:
Result:
If the value of an argument is out of the range of the UInt8
data type, it is converted to UInt8
with possible rounding and overflow.
hex
Returns a string containing the argument’s hexadecimal representation.
Syntax
The function uses uppercase letters A-F and does not include any prefixes (like 0x
) or suffixes (like h
).
For integer arguments, it prints hex digits from the most significant to least significant (big-endian or “human-readable” order). It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints both digits of every byte even if the leading digit is zero.
Alias:
- HEX
Arguments
arg
(String
,UInt
,Float
,Decimal
,Date
,DateTime
, orUUID
): A value to convert to hexadecimal.
Returns
- A string with the hexadecimal representation of the argument. Type:
String
.
Examples
Converting an integer to hexadecimal:
Result:
Converting a string to hexadecimal:
Result:
Converting a UUID to hexadecimal:
Result:
- For
Float
andDecimal
types, the values are encoded as their representation in memory. As ClickHouse supports little-endian architecture, they are encoded in little-endian. Zero leading/trailing bytes are not omitted. - For
Date
andDateTime
types, the values are formatted as corresponding integers (the number of days since Epoch forDate
and the value of Unix Timestamp forDateTime
). - For
String
andFixedString
types, all bytes are encoded as two hexadecimal numbers. Zero bytes are not omitted. - For
UUID
type, the values are encoded in big-endian order.
unhex
Performs the opposite operation of hex
. It interprets each pair of hexadecimal digits in the argument as a number and converts it to the byte represented by that number. The return value is a binary string (BLOB).
If you want to convert the result to a number, you can use the reverse
and reinterpretAs<Type>
functions.
Syntax:
Alias:
- UNHEX
Arguments:
arg
(String
,FixedString
): A string containing any number of hexadecimal digits.
Supports both uppercase and lowercase letters A-F. The number of hexadecimal digits does not have to be even. If it is odd, the last digit is interpreted as the least significant half of the 00-0F byte. If the argument string contains anything other than hexadecimal digits, some implementation-defined result is returned (an exception isn’t thrown).
Returns:
- A binary string (BLOB). (
String
)
Example:
Result:
Converting the result to a number:
Result:
In this example, we use unhex
to convert the hexadecimal string ‘FFF’ to its binary representation, reverse the bytes (since ClickHouse uses little-endian), and then reinterpret the result as a UInt64.
For a numeric argument, unhex()
does not perform the inverse of hex(N)
.
bin
Returns a string containing the argument’s binary representation.
Syntax:
- For integer arguments, it prints binary digits from the most significant to least significant (big-endian or “human-readable” order).
- It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints eight digits of every byte if the leading digit is zero.
- For
Date
andDateTime
, it formats as corresponding integers (the number of days since Epoch for Date and the value of Unix Timestamp for DateTime). - For
String
andFixedString
, all bytes are encoded as eight binary numbers. Zero bytes are not omitted. - For
Float
andDecimal
types, they are encoded as their representation in memory (little-endian). Zero leading/trailing bytes are not omitted. - For
UUID
type, it is encoded as a big-endian order string.
Alias:
- BIN
Arguments:
arg
(String
,FixedString
,UInt
,Float
,Decimal
,Date
, orDateTime
): A value to convert to binary.
Returns:
- A string with the binary representation of the argument. Type:
String
.
Examples:
In this example, the bin
function converts the integer 14 to its binary representation, resulting in the string ‘00001110’.
Result:
This second example converts a sequence of floating-point numbers to their binary representations. The third example converts a UUID to its binary representation.
Result:
In this example, the bin
function converts a UUID to its binary representation. The UUID ‘61f0c404-5cb3-11e7-907b-a6006ad3dba0’ is represented as a 128-bit number, and the bin
function outputs its binary form as a string of 128 binary digits. This can be useful for visualizing the binary structure of UUIDs, which are often used as unique identifiers in databases and distributed systems.
Result:
unbin
Performs the opposite operation of bin
. It interprets each pair of binary digits in the argument as a number and converts it to the byte represented by that number.
Syntax:
Alias:
- UNBIN
Arguments:
arg
(String
): A string containing any number of binary digits.
Returns:
- A binary string (BLOB). (
String
)
- Supports binary digits 0 and 1.
- The number of binary digits does not have to be a multiple of eight.
- If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isn’t thrown).
- For a numeric argument,
unbin()
does not return the inverse ofbin()
. - If
unbin
is invoked from within the clickhouse-client, binary strings are displayed using UTF-8.
Example:
Result:
To convert the result to a number, you can use the reverse
and reinterpretAs<Type>
functions:
Result:
This function is particularly useful for decoding binary data or working with binary representations of numbers.
bitmaskToList
Converts an integer bitmask to a comma-separated string of powers of two that sum up to the original number.
Syntax
Arguments
num
(Int/UInt
): An integer value representing the bitmask.
Returns
- A string containing comma-separated powers of two in ascending order. (
String
)
Example
Result:
In this example, the bitmask 14 (binary 1110) is converted to its constituent powers of two: 2, 4, and 8, which represent different taco toppings when summed.
This function is useful for converting bitmask-encoded data into a human-readable format, such as displaying selected taco toppings from a bitmask representation.
bitmaskToArray
Converts an integer bitmask to an array of powers of two that sum up to the original number.
Syntax
Arguments
num
(Int/UInt
): An integer value representing the bitmask.
Returns
- An array of
UInt64
numbers containing the list of powers of two that sum up to the source number. The numbers in the array are in ascending order. (Array(UInt64)
)
Example
Result:
In this example, 14 is represented as 2 + 4 + 8, which could correspond to different taco toppings:
- 2: Cheese
- 4: Lettuce
- 8: Salsa
This function can be useful for decoding bitmask-encoded data, such as menu item customizations or feature flags.
If the input is 0, an empty array is returned.
bitPositionsToArray
Converts an integer to an array containing the positions of set bits (bits equal to 1) in ascending order.
Syntax
Arguments
arg
(Int/UInt
): Integer value.
Returns:
- An array containing the positions of set bits in ascending order. Type:
Array(UInt64)
.
Examples
Query:
Result:
Query:
Result:
This function is useful for analyzing binary data or working with bitmasks in a more readable format.
mortonEncode
Calculates the Morton encoding (Z-curve) for a list of unsigned integers.
The function has two modes of operation:
- Simple mode
- Expanded mode
Simple mode
Syntax:
Arguments:
arg1
,arg2
, …,argN
(UInt8
,UInt16
,UInt32
,UInt64
): Up to 8 unsigned integers or columns of unsigned integer types.
Returns:
A UInt64
code.
Example:
Result:
Expanded mode
Syntax:
Arguments:
range_mask
(Tuple
): A tuple specifying the amount of range expansion for each argument (1-8x).arg1
,arg2
, …,argN
(UInt8
,UInt16
,UInt32
,UInt64
): Up to 8 unsigned integers or columns of unsigned integer types.
Returns:
A UInt64
code.
Example:
Result:
This mode is useful when dealing with arguments of different ranges or cardinality, such as IP addresses (0…FFFFFFFF) and taco shop IDs (0…FF).
- Morton encoding for one argument is always the argument itself:
Result:
- The function can accept column names as arguments:
Result:
Implementation details: The function can fit only so many bits of information into the Morton code as UInt64 has. Two arguments will have a range of maximum 2^32 (64/2) each, three arguments a range of max 2^21 (64/3) each, and so on. All overflow will be clamped to zero.
mortonDecode
Decodes a Morton encoding (ZCurve) into the corresponding unsigned integer tuple.
The function has two modes of operation:
- Simple mode
- Expanded mode
Simple mode
Syntax:
Arguments:
tuple_size
(Integer
): Integer value no more than 8.code
(UInt64
): UInt64 code.
Returns: A tuple of the specified size containing UInt64
values.
Example:
Result:
Expanded mode
Syntax:
Arguments:
range_mask
(Tuple
): A tuple specifying bit shifts for each dimension.code
(UInt64
): UInt64 code.
Returns: A tuple containing UInt64
values.
Example:
Result:
This mode is useful when dealing with arguments of different ranges or cardinality, such as IP addresses (0…FFFFFFFF) and country codes (0…FF).
For a single argument without a tuple, the function returns the argument itself as a single-element tuple.
Result:
The function can accept a column of codes as the second argument:
Result:
This function is particularly useful for decoding multi-dimensional data that has been encoded into a single dimension using the Morton curve, often used in geospatial applications or for optimizing multi-dimensional database queries.
hilbertEncode
Calculates the Hilbert curve index for a list of unsigned integers.
The function has two modes of operation:
- Simple mode
- Expanded mode
Simple mode
Syntax:
Arguments:
arg1
,arg2
(UInt
): Up to 2 unsigned integers or columns of unsigned integer type.
Returns: A UInt64
code representing the Hilbert curve index.
Example:
Result:
Expanded mode
Syntax:
Arguments:
range_mask
(Tuple
): A tuple specifying bit shifts for each dimension.arg1
,arg2
(UInt
): Up to 2 unsigned integers or columns of unsigned integer type.
Returns: A UInt64
code representing the Hilbert curve index.
Example:
Result:
This mode is useful when dealing with arguments of different ranges or cardinality, such as IP addresses (0…FFFFFFFF) and country codes (0…FF).
For a single argument without a tuple, the function returns the argument itself as the Hilbert index:
Result:
The function can accept columns as arguments:
Result:
This function is particularly useful for encoding multi-dimensional data into a single dimension while preserving locality, often used in geospatial applications or for optimizing multi-dimensional database queries.
hilbertDecode
Decodes a Hilbert curve index back into a tuple of unsigned integers, representing coordinates in multi-dimensional space.
The function has two modes of operation:
- Simple mode
- Expanded mode
Simple mode
Syntax:
Arguments:
tuple_size
(Integer
): Integer value no more than 2.code
(UInt64
): UInt64 code.
Returns: A tuple of the specified size containing UInt64
values.
Example:
Result:
Expanded mode
Syntax:
Arguments:
range_mask
(Tuple
): A tuple specifying bit shifts for each dimension.code
(UInt64
): UInt64 code.
Returns: A tuple containing UInt64
values.
Example:
Result:
This mode is useful when dealing with arguments of different ranges or cardinality, such as IP addresses (0…FFFFFFFF) and country codes (0…FF).
For a single argument without a tuple, the function returns the argument itself as a single-element tuple.
Result:
The function can accept a column of codes as the second argument:
Result:
This function is particularly useful for decoding multi-dimensional data that has been encoded into a single dimension using the Hilbert curve, often used in geospatial applications or for optimizing multi-dimensional database queries.
Was this page helpful?