Aggregate function combinators are powerful tools that modify the behavior of aggregate functions.

These combinators are applied as suffixes to aggregate function names, allowing you to customize how the aggregation is performed without changing the core functionality of the function.

Aggregate function combinators

-If

The -If suffix can be appended to the name of any aggregate function. This creates a conditional aggregate function that processes only the rows that meet a specified condition.

Syntax:

<aggregate_function>If(column, condition)

Arguments:

  • column (any): The column to aggregate.
  • condition (UInt8): A boolean expression that determines which rows to include in the aggregation.

Returns:

The result of the aggregate function, calculated only for rows where the condition is true. If the condition is never true, it returns a default value (usually zero or an empty string).

Example:

SELECT
	COUNT(taco_id) AS total_tacos,
	COUNTIf(taco_id, is_spicy = 1) AS spicy_tacos
FROM
	(
		SELECT 1 AS taco_id, 1 AS is_spicy
		UNION ALL
		SELECT 2 AS taco_id, 0 AS is_spicy
		UNION ALL
		SELECT 3 AS taco_id, 1 AS is_spicy
		UNION ALL
		SELECT 4 AS taco_id, 0 AS is_spicy
		UNION ALL
		SELECT 5 AS taco_id, 1 AS is_spicy
	) AS taco_orders;

Result:

| total_tacos | spicy_tacos |
|-------------|-------------|
| 5           | 3           |

In this example:

  • total_tacos counts all tacos.
  • spicy_tacos counts only the tacos where is_spicy is true (1).

Conditional aggregate functions allow you to calculate aggregates for multiple conditions simultaneously without using subqueries or JOINs, which can be more efficient.

-Array

The -Array suffix can be appended to any aggregate function. This modifies the function to work with array arguments instead of scalar values.

Syntax:

<aggregate_function>Array(arr)

Arguments:

  • arr (Array): An array of values to aggregate.

Returns:

The result of applying the aggregate function across all elements of the input arrays.

Example:

SELECT
    sumArray([1, 2, 3]) AS total_tacos,
    uniqArray(['salsa', 'guacamole', 'salsa', 'cheese']) AS unique_toppings;

Result:

| total_tacos | unique_toppings |
|-------------|-----------------|
| 6           | 3               |

In this example:

  • sumArray calculates the total number of tacos by summing all elements in the array.
  • uniqArray counts the number of unique toppings across all arrays.

When using -Array with multiple argument functions, all arguments must be arrays of equal length.

The -Array suffix can be combined with other combinators, but ‘Array’ must come first. For example: uniqArrayIf(arr, cond).

This combinator is particularly useful when working with nested data structures or when you need to perform aggregations on array columns without using arrayJoin.

-Map

Appends the -Map suffix to an aggregate function, allowing it to operate on Map type arguments. This combinator aggregates values for each key in the map separately using the specified aggregate function.

Syntax:

<aggFunction>Map(map_column)

Arguments:

  • map_column (Map): A column of Map type.

Returns:

A Map containing the results of applying the aggregate function to each key’s values.

Example:

SELECT
  hour,
  sumMap(types, sales) AS total_sales,
  maxMap(types, sales) AS max_sales
FROM
(
	SELECT '2023-05-01 12:00:00' AS hour, ['beef', 'chicken', 'veggie'] AS types, [5, 3, 2] AS sales
	UNION ALL
	SELECT '2023-05-01 13:00:00' AS hour, ['beef', 'chicken', 'fish'] AS types, [4, 2, 1] AS sales
	UNION ALL
	SELECT '2023-05-01 14:00:00' AS hour, ['chicken', 'veggie', 'fish'] AS types, [3, 2, 2] AS sales
) GROUP BY hour;

Result:

| hour                | total_sales                                | max_sales                                  |
|---------------------|--------------------------------------------|--------------------------------------------|
| 2023-05-01 14:00:00 | (['chicken','fish','veggie'],[3,2,2])      | (['chicken','fish','veggie'],[3,2,2])      |
| 2023-05-01 13:00:00 | (['beef','chicken','fish'],[4,2,1])        | (['beef','chicken','fish'],[4,2,1])        |
| 2023-05-01 12:00:00 | (['beef','chicken','veggie'],[5,3,2])      | (['beef','chicken','veggie'],[5,3,2])      |

In this example:

  • sumMap(sales) calculates the total sales for each taco type within each hour.
  • maxMap(sales) finds the maximum sales for each taco type within each hour.

The -Map combinator is particularly useful for aggregating data stored in Map columns without the need to unnest the map structure.

-SimpleState

Converts an aggregate function to return a SimpleAggregateFunction type instead of an AggregateFunction type.

This is useful when working with AggregatingMergeTree tables.

Syntax:

<aggFunction>SimpleState(x)

Arguments:

  • x: Parameters of the aggregate function.

Returns:

The value of the aggregate function with the SimpleAggregateFunction(…) type.

Example:

SELECT
	toTypeName(sumSimpleState(taco_price)) AS sum_price_type,
	sumSimpleState(taco_price) AS sum_price
FROM
	(
		SELECT 7.50 AS taco_price
		UNION ALL
		SELECT 8.00 AS taco_price
		UNION ALL
		SELECT 6.25 AS taco_price
	) AS taco_sales;

Result:

| sum_price_type                           | sum_price |
|------------------------------------------|-----------|
| SimpleAggregateFunction(sum, Float64)    | 21.75     |

In this example:

  • sumSimpleState(taco_price) returns the sum of the taco prices as a SimpleAggregateFunction.
  • The toTypeName function shows the exact type of the returned value.

This combinator is particularly useful when you need to store intermediate aggregation states in AggregatingMergeTree tables while minimizing storage overhead compared to full AggregateFunction states.

-State

Returns an intermediate state of the aggregation rather than the final result. This allows you to compute partial results of an aggregation, which can then be combined or further processed later.

  • Useful when working with the AggregatingMergeTree table engine and functions like finalizeAggregation and runningAccumulate.
  • Can be combined with other combinators, but must come last (e.g., uniqIfState).

Syntax:

<aggFunction>State(x)

Arguments:

  • x: Arguments of the aggregate function.

Returns:

The binary representation an intermediate state of aggregation type AggregateFunction(...).

Example:

SELECT
	uniqState(taco_type) AS unique_tacos_state
FROM
	(
		SELECT 'Carnitas' AS taco_type
		UNION ALL
		SELECT 'Al Pastor' AS taco_type
		UNION ALL
		SELECT 'Barbacoa' AS taco_type
		UNION ALL
		SELECT 'Carnitas' AS taco_type
		UNION ALL
		SELECT 'Al Pastor' AS taco_type
	) AS taco_orders;

Result:

| unique_tacos_state                |
|-----------------------------------|
| \0� �=������                    |

In this example, uniqState returns the binary representation of the intermediate state for calculating unique taco types, which can be used for further aggregation or stored in an AggregatingMergeTree table.

The -State combinator is particularly useful when you need to perform multi-stage aggregations or store partial aggregation results for later processing.

-Merge

Returns the final aggregated result from an AggregateFunction(...) type.

Syntax:

<aggFunction>Merge(state)

Arguments:

  • state (AggregateFunction): The intermediate aggregation state, typically of type AggregateFunction.

Returns:

The final aggregated result, with the same type as the original aggregate function.

Example:

SELECT
  order_date,
	sumMerge(orders) AS total_tacos
FROM (
	SELECT
		toDate('2023-05-01') AS order_date,
		sumState(toUInt32(10)) AS orders
	UNION ALL
	SELECT
		toDate('2023-05-01') AS order_date,
		sumState(toUInt32(15)) AS orders
) AS taco_order_states
GROUP BY order_date;

Result:

| order_date | total_tacos |
|------------|-------------|
| 2023-05-01 | 25          |

In this example:

  • We create a table taco_order_states to store intermediate states of taco order sums.
  • We insert two separate states for the same date.
  • Using sumMerge, we combine these states to get the total number of tacos ordered on that date.

The -Merge combinator is particularly useful when working with pre-aggregated data or when performing distributed aggregations across multiple nodes.

The -Merge combinator can be applied to any aggregate function that supports intermediate states, not just sum.

-MergeState

Merges the intermediate aggregation states in the same way as the -Merge combinator. However, it does not return the resulting value, but an intermediate aggregation state, similar to the -State combinator.

Syntax:

<aggFunction>MergeState(state)

Arguments:

  • state (AggregateFunction): The intermediate state of an aggregate function.

Returns:

A binary representation of an intermediate aggregation state of type AggregateFunction.

Example:

SELECT
	avgMergeState(state)
FROM (
	SELECT
		avgState(taco_price) AS state
	FROM (
		SELECT 'Carnitas' AS taco_type, 8.5 AS taco_price
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 9.0 AS taco_price
		UNION ALL
		SELECT 'Pescado' AS taco_type, 10.5 AS taco_price
		UNION ALL
		SELECT 'Barbacoa' AS taco_type, 10.0 AS taco_price
	) AS taco_sales
	GROUP BY
		taco_type
);

Result:

| avg_tacos_state                |
|--------------------------------|
| \0\0\0\0\0\0C@                   |

In this example:

  • avgState is used to create intermediate states of average taco prices for each taco type.
  • avgMergeState then merges these states, producing a new intermediate state that represents the combined data.

This function is particularly useful when working with pre-aggregated data or when performing multi-level aggregations. It allows for efficient aggregation of partially aggregated results without fully finalizing the aggregation at each step.

The result of MergeState is not the final aggregated value, but rather an intermediate state that can be further processed or stored. To obtain the final result, you would typically use the -Merge combinator or the finalizeAggregation function.

-ForEach

Converts an aggregate function for tables into an aggregate function for arrays that aggregates the corresponding array items and returns an array of results.

Syntax:

<aggFunction>ForEach(array)

Arguments:

  • array (Array(T)): An array of values to aggregate.

Returns:

An array containing the results of applying the aggregate function to each corresponding element across all input arrays.

Example:

SELECT
	sumForEach(taco_counts) AS total_tacos_per_type
FROM
(
	SELECT
		[5, 3, 2] AS taco_counts -- [beef, chicken, veggie]
	UNION ALL
	SELECT
		[2, 4, 1] AS taco_counts
	UNION ALL
	SELECT
		[3, 2, 2] AS taco_counts
);

Result:

| total_tacos_per_type |
|----------------------|
| [10, 9, 5]           |

In this example:

  • We have three orders with counts for different taco types (beef, chicken, veggie).
  • sumForEach adds up the counts for each taco type across all orders.
  • The result [10, 9, 5] represents the total number of beef, chicken, and veggie tacos ordered, respectively.

The -ForEach combinator can be used with various aggregate functions like sum, avg, max, min, etc., allowing you to perform array-wise aggregations efficiently.

-Distinct

Applies the DISTINCT modifier to an aggregate function, causing it to only consider unique values in its calculations.

Syntax:

<aggregate_function>Distinct(column)

Arguments:

  • column (any): The column or expression to aggregate.

Returns:

The result of the aggregate function, considering only distinct values.

Example:

SELECT
	countDistinct(taco_type) AS unique_tacos,
	avgDistinct(price) AS avg_unique_price
FROM
	(
		SELECT 'Carnitas' AS taco_type, 8.5 AS price
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 9.0 AS price
		UNION ALL
		SELECT 'Pescado' AS taco_type, 10.5 AS price
		UNION ALL
		SELECT 'Carnitas' AS taco_type, 8.5 AS price
		UNION ALL
		SELECT 'Barbacoa' AS taco_type, 9.5 AS price
	) AS taco_orders;

Result:

| unique_tacos | avg_unique_price |
|--------------|------------------|
| 4            | 9.375            |

In this example:

  • countDistinct counts the number of unique taco types ordered.
  • avgDistinct calculates the average price considering only unique prices.

This combinator is useful when you want to perform aggregate calculations on distinct values, eliminating duplicates before aggregation. It can be applied to most aggregate functions, such as sum, avg, count, etc.

Using Distinct can significantly impact query performance, especially on large datasets, as it requires additional memory and processing to track unique values.

-OrDefault

Changes the behavior of an aggregate function when there are no input values to aggregate.

Syntax:

<aggFunction>OrDefault(x)

Arguments:

  • x: Parameters of the aggregate function.

Returns:

  • If there are values to aggregate: The result of the aggregate function.
  • If there are no values to aggregate: The default value for the aggregate function’s return type.

Example:

SELECT
	avg(taco_price) AS avg_price,
	avgOrDefault(taco_price) AS avg_price_default
FROM
	(
		SELECT 'Carnitas' AS taco_type, 8.5 AS taco_price
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 9.0 AS taco_price
		UNION ALL
		SELECT 'Pescado' AS taco_type, 10.5 AS taco_price
		UNION ALL
		SELECT 'Barbacoa' AS taco_type, 9.5 AS taco_price
	) AS taco_sales
WHERE
	taco_type = 'Ghost Pepper Supreme';

Result:

| avg_price | avg_price_default |
|-----------|-------------------|
| nan       | 0                 |

In this example, if there are no ‘Ghost Pepper Supreme’ tacos sold:

  • avg(taco_price) returns nan (Not a Number)
  • avgOrDefault(taco_price) returns 0 (the default value for numeric types)

Combining with other combinators:

SELECT
	avgOrDefaultIf(price, is_spicy)
FROM
	(SELECT toDecimal32(4.99, 2) AS price, false AS is_spicy);

Result:

| avgOrDefaultIf(price, is_spicy) |
|---------------------------------|
| 0.00                            |

This query demonstrates using -OrDefault with the -If combinator. It returns the default value (0.00) because there are no spicy tacos in the subquery result.

-OrNull

Changes the behavior of an aggregate function by converting its result to the Nullable data type. If the aggregate function has no values to calculate, it returns NULL.

The -OrNull combinator can be used with other combinators.

Syntax:

<aggFunction>OrNull(x)

Arguments:

  • x: Aggregate function parameters.

Returns:

  • The result of the aggregate function, converted to the Nullable data type.
  • NULL, if there is nothing to aggregate.

Return type: Nullable(aggregate function return type)

Example:

SELECT
	sumOrNull(taco_price) AS total_price,
	toTypeName(sumOrNull(taco_price)) AS type
FROM
	(
		SELECT
			'Carnitas' AS taco_type,
			8.5 AS taco_price
		UNION ALL
		SELECT
			'Al Pastor' AS taco_type,
			9.0 AS taco_price
		UNION ALL
		SELECT
			'Pescado' AS taco_type,
			10.5 AS taco_price
	) AS taco_sales
WHERE
	taco_type = 'Ghost Pepper Supreme';

Result:

| total_price | type              |
|-------------|-------------------|
| NULL        | Nullable(Float64) |

In this example, if there are no ‘Ghost Pepper Supreme’ tacos sold, sumOrNull returns NULL instead of throwing an error or returning 0.

The -OrNull combinator can also be used with other combinators:

SELECT
	avgOrNullIf(price, is_spicy) AS avg_spicy_price
FROM
	(
		SELECT
			toDecimal32(4.99, 2) AS price,
			toUInt8(0) AS is_spicy
	)

Result:

| avg_spicy_price |
|-----------------|
| NULL            |

This query returns NULL because the condition is_spicy is not met for any rows.

The -OrNull combinator is particularly useful when you want to distinguish between “no data” (NULL) and “zero result” (e.g., 0 for sum, empty string for groupArray) in your aggregate calculations.

-Resample

Resamples data into intervals defined by a start, end, and step, and then applies the aggregate function to each interval.

Syntax:

<aggFunction>Resample(start, end, step, resampling_key, aggFunction_params)

Arguments:

  • start (numeric): The starting value of the interval for the resampling key values. It marks the beginning of the range over which the data will be aggregated
  • end (numeric): The ending value of the interval. It sets the limit up to which the data will be considered for aggregation
  • step (numeric): The size of each subinterval within the specified range. The aggregate function is executed independently over each of these subintervals.
  • resampling_key (Column): This is the column whose values are used to separate the data into intervals. It acts as the basis for dividing the data into the specified groups
  • aggFunction_params (any): aggFunction parameters.

Returns:

An array of aggFunction results for each subinterval.

Example:

SELECT
    groupArrayResample(85, 125, 20)(name, toUInt32(price * 10)) AS taco_groups,
    sumResample(85, 125, 20)(sales, toUInt32(price * 10)) AS sales_by_price_range
FROM (
    SELECT 'Carnitas' AS name, 8.5 AS price, 10 AS sales
    UNION ALL
    SELECT 'Al Pastor' AS name, 9.0 AS price, 15 AS sales
    UNION ALL
    SELECT 'Barbacoa' AS name, 10.5 AS price, 8 AS sales
    UNION ALL
    SELECT 'Pescado' AS name, 11.5 AS price, 11 AS sales
    UNION ALL
    SELECT 'Camarones' AS name, 12.0 AS price, 9 AS sales
    UNION ALL
    SELECT 'Lengua' AS name, 11.0 AS price, 16 AS sales
) AS taco_sales;

Result:

| taco_groups                                                            | sales_by_price_range |
|------------------------------------------------------------------------|----------------------|
| [['Carnitas','Al Pastor'],['Barbacoa','Pescado','Camarones','Lengua']] | [25,44]              |

In this example:

  • taco_groups shows the tacos grouped by price range.
  • sales_by_price_range shows the total sales for each price range.

The -Resample combinator allows for flexible data analysis by price intervals without the need for complex subqueries or joins.

-ArgMin

Process only the rows that have the minimum value of an additional specified expression.

Syntax:

<aggFunction>ArgMin(arg, val)

Arguments:

  • arg (any type): Argument to return.
  • val (any type): Value to compare.

Returns:

The value of arg corresponding to the minimum value of val.

Example:

When combined with the sum function, it becomes sumArgMin which sums up the values in a column for the rows that have the minimum value in another specified column.

SELECT
	argMin(taco_name, price) AS cheapest_taco,
	sumArgMin(sales, price) AS total_sales_of_cheapest_taco
FROM
	(
		SELECT
			'Carnitas' AS taco_name,
			8.5 AS price,
			15 AS sales
		UNION ALL
		SELECT
			'Al Pastor' AS taco_name,
			9.0 AS price,
			20 AS sales
		UNION ALL
		SELECT
			'Pescado' AS taco_name,
			10.5 AS price,
			10 AS sales
    UNION ALL
		SELECT
			'Carnitas' AS taco_name,
			8.5 AS price,
			10 AS sales
    UNION ALL
		SELECT
			'Pescado' AS taco_name,
			10.5 AS price,
			9 AS sales
	) AS taco_sales;

Result:

| cheapest_taco | total_sales_of_cheapest_taco |
|---------------|------------------------------|
| Carnitas      | 25                           |

-ArgMax

Process only the rows that have the maximum value of an additional specified expression.

Syntax:

<aggFunction>ArgMax(arg, val)

Arguments:

  • arg (any type): Argument to return.
  • val (any type): Value to compare.

Returns:

The value of arg corresponding to the maximum value of val.

Example:

When combined with the sum function, it becomes sumArgMax which sums up the values in a column for the rows that have the maximum value in another specified column.

SELECT
	argMax(taco_name, price) AS most_expensive_taco,
	sumArgMax(sales, price) AS total_sales_of_most_expensive_taco
FROM
	(
		SELECT
			'Carnitas' AS taco_name,
			8.5 AS price,
			15 AS sales
		UNION ALL
		SELECT
			'Al Pastor' AS taco_name,
			9.0 AS price,
			20 AS sales
		UNION ALL
		SELECT
			'Pescado' AS taco_name,
			10.5 AS price,
			10 AS sales
    UNION ALL
		SELECT
			'Carnitas' AS taco_name,
			8.5 AS price,
			10 AS sales
    UNION ALL
		SELECT
			'Pescado' AS taco_name,
			10.5 AS price,
			9 AS sales
	) AS taco_sales;

Result:

| most_expensive_taco | total_sales_of_most_expensive_taco |
|---------------------|------------------------------------|
| Pescado             | 19                                 |