Combinators
Modify aggregate function behavior using suffixes.
Aggregate function combinators are powerful tools that modify the behavior of aggregate functions.
These combinators are applied as suffixes to aggregate function names, allowing you to customize how the aggregation is performed without changing the core functionality of the function.
Aggregate function combinators
-If
The -If
suffix can be appended to the name of any aggregate function. This creates a conditional aggregate function that processes only the rows that meet a specified condition.
Syntax:
Arguments:
column
(any): The column to aggregate.condition
(UInt8
): A boolean expression that determines which rows to include in the aggregation.
Returns:
The result of the aggregate function, calculated only for rows where the condition is true. If the condition is never true, it returns a default value (usually zero or an empty string).
Example:
Result:
In this example:
total_tacos
counts all tacos.spicy_tacos
counts only the tacos whereis_spicy
is true (1).
Conditional aggregate functions allow you to calculate aggregates for multiple conditions simultaneously without using subqueries or JOINs, which can be more efficient.
-Array
The -Array suffix can be appended to any aggregate function. This modifies the function to work with array arguments instead of scalar values.
Syntax:
Arguments:
arr
(Array
): An array of values to aggregate.
Returns:
The result of applying the aggregate function across all elements of the input arrays.
Example:
Result:
In this example:
sumArray
calculates the total number of tacos by summing all elements in the array.uniqArray
counts the number of unique toppings across all arrays.
When using -Array with multiple argument functions, all arguments must be arrays of equal length.
The -Array suffix can be combined with other combinators, but ‘Array’ must come first. For example: uniqArrayIf(arr, cond)
.
This combinator is particularly useful when working with nested data structures or when you need to perform aggregations on array columns without using arrayJoin
.
-Map
Appends the -Map
suffix to an aggregate function, allowing it to operate on Map
type arguments. This combinator aggregates values for each key in the map separately using the specified aggregate function.
Syntax:
Arguments:
map_column
(Map
): A column ofMap
type.
Returns:
A Map
containing the results of applying the aggregate function to each key’s values.
Example:
Result:
In this example:
sumMap(sales)
calculates the total sales for each taco type within each hour.maxMap(sales)
finds the maximum sales for each taco type within each hour.
The -Map
combinator is particularly useful for aggregating data stored in Map
columns without the need to unnest the map structure.
-SimpleState
Converts an aggregate function to return a SimpleAggregateFunction type instead of an AggregateFunction type.
This is useful when working with AggregatingMergeTree tables.
Syntax:
Arguments:
x
: Parameters of the aggregate function.
Returns:
The value of the aggregate function with the SimpleAggregateFunction(…) type.
Example:
Result:
In this example:
sumSimpleState(taco_price)
returns the sum of the taco prices as a SimpleAggregateFunction.- The
toTypeName
function shows the exact type of the returned value.
This combinator is particularly useful when you need to store intermediate aggregation states in AggregatingMergeTree tables while minimizing storage overhead compared to full AggregateFunction states.
-State
Returns an intermediate state of the aggregation rather than the final result. This allows you to compute partial results of an aggregation, which can then be combined or further processed later.
- Useful when working with the
AggregatingMergeTree
table engine and functions likefinalizeAggregation
andrunningAccumulate
. - Can be combined with other combinators, but must come last (e.g.,
uniqIfState
).
Syntax:
Arguments:
x
: Arguments of the aggregate function.
Returns:
The binary representation an intermediate state of aggregation type AggregateFunction(...)
.
Example:
Result:
In this example, uniqState
returns the binary representation of the intermediate state for calculating unique taco types, which can be used for further aggregation or stored in an AggregatingMergeTree
table.
The -State
combinator is particularly useful when you need to perform multi-stage aggregations or store partial aggregation results for later processing.
-Merge
Returns the final aggregated result from an AggregateFunction(...)
type.
Syntax:
Arguments:
state
(AggregateFunction
): The intermediate aggregation state, typically of typeAggregateFunction
.
Returns:
The final aggregated result, with the same type as the original aggregate function.
Example:
Result:
In this example:
- We create a table
taco_order_states
to store intermediate states of taco order sums. - We insert two separate states for the same date.
- Using
sumMerge
, we combine these states to get the total number of tacos ordered on that date.
The -Merge
combinator is particularly useful when working with pre-aggregated data or when performing distributed aggregations across multiple nodes.
The -Merge
combinator can be applied to any aggregate function that supports intermediate states, not just sum
.
-MergeState
Merges the intermediate aggregation states in the same way as the -Merge
combinator. However, it does not return the resulting value, but an intermediate aggregation state, similar to the -State
combinator.
Syntax:
Arguments:
state
(AggregateFunction
): The intermediate state of an aggregate function.
Returns:
A binary representation of an intermediate aggregation state of type AggregateFunction
.
Example:
Result:
In this example:
avgState
is used to create intermediate states of average taco prices for each taco type.avgMergeState
then merges these states, producing a new intermediate state that represents the combined data.
This function is particularly useful when working with pre-aggregated data or when performing multi-level aggregations. It allows for efficient aggregation of partially aggregated results without fully finalizing the aggregation at each step.
The result of MergeState
is not the final aggregated value, but rather an intermediate state that can be further processed or stored. To obtain the final result, you would typically use the -Merge
combinator or the finalizeAggregation
function.
-ForEach
Converts an aggregate function for tables into an aggregate function for arrays that aggregates the corresponding array items and returns an array of results.
Syntax:
Arguments:
array
(Array(T)
): An array of values to aggregate.
Returns:
An array containing the results of applying the aggregate function to each corresponding element across all input arrays.
Example:
Result:
In this example:
- We have three orders with counts for different taco types (beef, chicken, veggie).
sumForEach
adds up the counts for each taco type across all orders.- The result
[10, 9, 5]
represents the total number of beef, chicken, and veggie tacos ordered, respectively.
The -ForEach combinator can be used with various aggregate functions like sum
, avg
, max
, min
, etc., allowing you to perform array-wise aggregations efficiently.
-Distinct
Applies the DISTINCT
modifier to an aggregate function, causing it to only consider unique values in its calculations.
Syntax:
Arguments:
column
(any): The column or expression to aggregate.
Returns:
The result of the aggregate function, considering only distinct values.
Example:
Result:
In this example:
countDistinct
counts the number of unique taco types ordered.avgDistinct
calculates the average price considering only unique prices.
This combinator is useful when you want to perform aggregate calculations on distinct values, eliminating duplicates before aggregation. It can be applied to most aggregate functions, such as sum
, avg
, count
, etc.
Using Distinct
can significantly impact query performance, especially on large datasets, as it requires additional memory and processing to track unique values.
-OrDefault
Changes the behavior of an aggregate function when there are no input values to aggregate.
Syntax:
Arguments:
x
: Parameters of the aggregate function.
Returns:
- If there are values to aggregate: The result of the aggregate function.
- If there are no values to aggregate: The default value for the aggregate function’s return type.
Example:
Result:
In this example, if there are no ‘Ghost Pepper Supreme’ tacos sold:
avg(taco_price)
returnsnan
(Not a Number)avgOrDefault(taco_price)
returns0
(the default value for numeric types)
Combining with other combinators:
Result:
This query demonstrates using -OrDefault
with the -If
combinator. It returns the default value (0.00) because there are no spicy tacos in the subquery result.
-OrNull
Changes the behavior of an aggregate function by converting its result to the Nullable data type. If the aggregate function has no values to calculate, it returns NULL
.
The -OrNull
combinator can be used with other combinators.
Syntax:
Arguments:
x
: Aggregate function parameters.
Returns:
- The result of the aggregate function, converted to the Nullable data type.
NULL
, if there is nothing to aggregate.
Return type: Nullable(aggregate function return type)
Example:
Result:
In this example, if there are no ‘Ghost Pepper Supreme’ tacos sold, sumOrNull
returns NULL
instead of throwing an error or returning 0.
The -OrNull
combinator can also be used with other combinators:
Result:
This query returns NULL
because the condition is_spicy
is not met for any rows.
The -OrNull
combinator is particularly useful when you want to distinguish between “no data” (NULL
) and “zero result” (e.g., 0 for sum
, empty string for groupArray
) in your aggregate calculations.
-Resample
Resamples data into intervals defined by a start, end, and step, and then applies the aggregate function to each interval.
Syntax:
Arguments:
start
(numeric): The starting value of the interval for the resampling key values. It marks the beginning of the range over which the data will be aggregatedend
(numeric): The ending value of the interval. It sets the limit up to which the data will be considered for aggregationstep
(numeric): The size of each subinterval within the specified range. The aggregate function is executed independently over each of these subintervals.resampling_key
(Column
): This is the column whose values are used to separate the data into intervals. It acts as the basis for dividing the data into the specified groupsaggFunction_params
(any):aggFunction
parameters.
Returns:
An array of aggFunction
results for each subinterval.
Example:
Result:
In this example:
taco_groups
shows the tacos grouped by price range.sales_by_price_range
shows the total sales for each price range.
The -Resample
combinator allows for flexible data analysis by price intervals without the need for complex subqueries or joins.
-ArgMin
Process only the rows that have the minimum value of an additional specified expression.
Syntax:
Arguments:
arg
(any type): Argument to return.val
(any type): Value to compare.
Returns:
The value of arg
corresponding to the minimum value of val
.
Example:
When combined with the sum function, it becomes sumArgMin which sums up the values in a column for the rows that have the minimum value in another specified column.
Result:
-ArgMax
Process only the rows that have the maximum value of an additional specified expression.
Syntax:
Arguments:
arg
(any type): Argument to return.val
(any type): Value to compare.
Returns:
The value of arg
corresponding to the maximum value of val
.
Example:
When combined with the sum function, it becomes sumArgMax which sums up the values in a column for the rows that have the maximum value in another specified column.
Result:
Was this page helpful?