Array grouping functions
Manipulate and aggregate array data types.
ClickHouse function reference
array_concat_agg
Concatenates arrays from all rows into a single array.
Syntax:
Alias:
- groupArrayArray
Arguments:
array
(Array
): The input array to be concatenated.
Returns:
An array containing all elements from all input arrays.
Example:
Result:
In this example, array_concat_agg
combines all taco_toppings
arrays from different orders into a single array containing all unique toppings.
The function is case-insensitive, so ARRAY_CONCAT_AGG
and array_concat_agg
are equivalent.
This function is useful when you need to collect all elements from multiple arrays into a single array, which can be helpful for analysis or reporting purposes in taco-related data processing.
groupArrayArray
Concatenates arrays from all rows into a single array.
Syntax:
Alias:
- array_concat_agg
Arguments:
arr
(Array
): The input array to be concatenated.
Returns:
An array containing all elements from all input arrays.
Example:
Result:
In this example, groupArrayArray
concatenates all taco_toppings
arrays from different orders into a single array containing all toppings used across all orders.
The order of elements in the resulting array is not guaranteed and may depend on the order of data processing.
groupArray
Creates an array of argument values. Values can be added to the array in any (indeterminate) order.
Syntax:
Alias:
- array_agg
Arguments:
x
(any data type): The values to be aggregated into an array.max_size
(optional): Limits the size of the resulting array tomax_size
elements.
Returns:
An array containing all the values of x
.
- NULL values are automatically removed from the result.
- The order of elements in the resulting array is not guaranteed unless used with ORDER BY in specific cases.
- When
max_size
is specified, only the firstmax_size
elements are included in the result.
Example:
Result:
In this example, groupArray
collects all toppings for each taco type into an array.
Example with max_size:
Result:
Here, groupArray(2)
limits the result to a maximum of 2 toppings per taco type.
This function is useful for collecting multiple values into a single array column, which can be particularly handy for denormalization or preparing data for further array operations.
groupArrayInsertAt
Inserts a value into an array at the specified position.
Syntax:
Arguments:
x
(any data type): The value to be inserted.pos
(UInt32
): The position at which to insert the value. Array indexing starts at 0.default_x
(optional): The default value for empty positions. Must match the data type ofx
.size
(UInt32
, optional): The length of the resulting array. If specified,default_x
must also be provided.
Returns:
An array with the inserted values.
Notes:
- If multiple values are inserted at the same position in a single-threaded query, the first value is used.
- In multi-threaded queries, the result for duplicate positions is non-deterministic.
Example:
Result:
In this example, we’re constructing a taco recipe by inserting ingredients at specific positions. ‘salsa’ is used as the default value for any empty positions.
Example with size parameter:
Result:
This example creates a fixed-size array of 5 elements, filling any unused positions with ‘empty’.
When using the size
parameter, make sure to provide a default_x
value to fill any unspecified positions.
groupArrayIntersect
Returns an intersection of given arrays (all items that are present in all input arrays).
Syntax:
Arguments:
x
(Array
): An array column or expression.
Returns:
An array containing elements that are present in all input arrays. [Array
]
Example:
Result:
In this example, we find the toppings that are common across different taco orders. The result shows that ‘cheese’ and ‘salsa’ are present in all taco orders.
This function is particularly useful when you need to find common elements across multiple arrays, such as identifying ingredients used in all recipes or features present in all product variants.
groupArrayLast
Creates an array of the last max_size
values of the argument.
Syntax:
Arguments:
max_size
(UInt64
, optional): The maximum number of elements in the resulting array. If not specified, there’s no limit on array size.x
: The value to aggregate.
Returns:
An array of the last max_size
values of x
.
Example:
Result:
In this example, groupArrayLast(2)
returns an array of the last two taco IDs from the taco_orders
table.
- The order of elements in the resulting array depends on the order of data processing, which may be non-deterministic.
- For a guaranteed order, use
ORDER BY
in a subquery, but be aware this may impact performance for large datasets. groupArrayLast(1)(x)
is equivalent to[anyLast(x)]
.
Comparison with groupArray:
While groupArray
collects elements from the beginning, groupArrayLast
collects from the end:
Result:
This function is useful when you need to keep track of the most recent values in a dataset, such as the latest taco orders or the most recent customer interactions.
groupArrayMovingAvg
Calculates the moving average of an array.
Syntax:
Arguments:
numbers_for_summing
(numeric): An expression resulting in a numeric data type value.window_size
(UInt64
, optional): Size of the calculation window. If not specified, the function uses a window size equal to the number of rows in the column.
Returns:
An array of the same size and type as the input data.
- The function uses rounding towards zero. It truncates decimal places insignificant for the resulting data type.
Example:
Result:
In this example:
avg_sales
andavg_revenue
calculate the moving average using all previous values.avg_sales_window2
andavg_revenue_window2
calculate the moving average using a window size of 2.
The function is useful for analyzing trends in time series data, such as daily taco sales or revenue figures.
groupArrayMovingSum
Calculates the moving sum of input values.
Syntax:
Arguments:
numbers_for_summing
(numeric): An expression resulting in a numeric data type value.window_size
(numeric, optional): Size of the calculation window. If not specified, the function uses a window size equal to the number of rows in the column.
Returns:
An array of the same size and type as the input data, containing the moving sums.
Example:
Let’s consider a table of taco sales:
Result:
In this example:
total_sales
shows the cumulative sum of taco sales for each day.two_day_sales
shows the sum of taco sales for the current day and the previous day (2-day moving sum).
This function is useful for analyzing trends and patterns in time-series data, such as daily taco sales or customer traffic in a taco restaurant.
groupArraySample
Creates an array of sample argument values. The size of the resulting array is limited to a specified maximum, with values selected randomly.
Syntax:
Arguments:
max_size
(UInt64
): Maximum size of the resulting array.seed
(UInt64
, optional): Seed for the random number generator. Default: 123456.x
: The argument to sample (column name or expression).
Returns:
An array of randomly selected x
arguments. [Array
]
Example:
Result:
In this example, groupArraySample
selects three random taco fillings.
You can also use an expression as an argument:
Result:
This query creates an array of three randomly selected spicy taco fillings.
The function’s randomness can be controlled by specifying a seed value, which is useful for reproducibility in testing scenarios.
groupArraySorted
Returns an array with the first N items in ascending order.
Syntax:
Arguments:
N
(UInt8
): The number of elements to return.column
(Any): The value to be sorted and grouped. Can be of type Integer, String, Float, or other generic types.
Returns:
An array containing the first N items from the column, sorted in ascending order.
Example:
Get the first 5 taco names sorted alphabetically:
Result:
In this example, groupArraySorted
returns an array of the first 5 taco names, sorted in alphabetical order.
If the number of unique values in the column is less than N, the function will return all available values sorted.
groupUniqArray
Creates an array from unique argument values.
Syntax:
or
Arguments:
x
(any data type): The input values.max_size
(UInt64
, optional): The maximum number of elements in the resulting array.
Returns:
An array of unique values.
Example:
Result:
In this example, groupUniqArray
creates an array of unique taco types for each order.
- Memory consumption is similar to the
uniqExact
function. - When
max_size
is specified, the resulting array is limited to that number of elements. For example,groupUniqArray(1)(x)
is equivalent toany(x)
.
This function is useful when you need to collect unique values into an array, such as gathering distinct product types per order or unique tags per user.
groupConcat
Concatenates strings from a group into a single string, with optional delimiter and limit.
Syntax:
Arguments:
expression
(String
): The column or expression outputting strings to concatenate.delimiter
(String
, optional): The string used to separate concatenated values. Defaults to an empty string.limit
(UInt
, optional): The maximum number of elements to concatenate. Excess elements are ignored.
Returns:
A String
containing the concatenated values.
- If
delimiter
is specified withoutlimit
, it must be the first parameter. - If both
delimiter
andlimit
are specified,delimiter
must precedelimit
. - If the group has no elements or only null elements, the result is a nullable string with a null value.
Example:
Result:
In this example:
toppings
shows all toppings for each order, separated by commas.limited_toppings
shows only the first two toppings for each order.
This function is useful for creating comma-separated lists of values from grouped rows, such as listing taco toppings for each order.
largestTriangleThreeBuckets
Applies the Largest-Triangle-Three-Buckets algorithm to downsample time series data for visualization. This function is particularly useful for reducing the number of points in a large dataset while preserving its visual characteristics.
Syntax:
Alias:
- lttb
Arguments:
x
(Integer
,Float
,Decimal
,Date
,Date32
,DateTime
,DateTime64
): The x-coordinate (typically a timestamp or sequential value).y
(Integer
,Float
,Decimal
,Date
,Date32
,DateTime
,DateTime64
): The y-coordinate (typically the measured value).
Parameters:
n
(UInt64
): The number of points to return in the downsampled series.
Returns:
An Array
of Tuple
s, where each tuple contains two elements representing the x and y coordinates of a point in the downsampled series.
- The function sorts the input data by the x-coordinate before applying the algorithm.
- NaN values in the input series are ignored and excluded from the analysis.
Example:
Result:
In this example, we downsample a day’s worth of taco sales data into 3 representative points. This can be useful for creating a quick visual summary of sales trends throughout the day without plotting every single data point.
The algorithm divides the sorted series into buckets (in this case, 3) and finds the most representative point in each bucket, allowing for efficient visualization of large datasets while maintaining the overall shape of the data.
Was this page helpful?