Advanced summations & averages

ClickHouse function reference

deltaSum

Calculates the sum of positive differences between consecutive rows in a sorted dataset.

Syntax:

deltaSum(value)

Arguments:

value (numeric): The input values to calculate the delta sum.

Returns:

The sum of positive differences between consecutive rows, with the same type as the input.

The input data must be sorted for this function to work correctly.
Negative differences are ignored in the calculation.
For use in materialized views, consider using deltaSumTimestamp instead.

Example:

SELECT
	deltaSum(taco_price) AS price_increase
FROM
	(
		SELECT arrayJoin([2.50, 3.00, 2.75, 3.25, 3.50]) AS taco_price
	)
ORDER BY
	taco_price;

Result:

| price_increase |
|----------------|
| 1.25           |

In this example:

We create a sorted list of taco prices.
deltaSum calculates the sum of positive price differences:
- (3.00 - 2.50) + (3.25 - 3.00) + (3.50 - 3.25) = 0.50 + 0.25 + 0.25 = 1.25
The negative difference (2.75 - 3.00) is ignored.

This function is useful for analyzing cumulative increases in sorted data, such as price trends or incremental changes in time series data.

deltaSumTimestamp

Calculates the sum of differences between consecutive values, ordered by a timestamp. It ignores negative differences.

Syntax:

deltaSumTimestamp(value, timestamp)

Arguments:

value (numeric, Date, or DateTime): The values to sum.
timestamp (numeric, Date, or DateTime): The parameter for ordering values.

Returns:

The accumulated sum of positive differences between consecutive values, ordered by the timestamp parameter. The return type matches the input type of value.

Example:

SELECT
	deltaSumTimestamp(taco_count, order_time) AS total_taco_increase
FROM
	taco_orders
WHERE
	date = '2023-05-05'
GROUP BY
	toStartOfHour(order_time);

Result:

| total_taco_increase |
|---------------------|
| 42                  |

This example calculates the total increase in taco orders for each hour on May 5, 2023. It sums up only the positive differences between consecutive taco counts, ordered by the order time.

This function is particularly useful for materialized views that store data ordered by time-bucketed timestamps (e.g., rounded to the start of a minute). It maintains the original timestamp order during merges, ensuring correct calculations even when rows have identical bucketed timestamps.

For simple delta sums across an ordered collection without timestamp considerations, use the deltaSum function instead.

exponentialMovingAverage

Calculates the exponential moving average of values over a specified time period.

This function calculates a weighted average where older time points have less weight in the final result. It’s particularly useful for time series data where recent values are more significant than older ones.

Syntax:

exponentialMovingAverage(x)(value, timeunit)

Arguments:

value (numeric): The value to be averaged.
timeunit (numeric): An index representing the time interval, not a timestamp. Can be calculated using intDiv or toRelativeHourNum.

Parameters:

x (numeric): The half-life period. This is the time lag at which the exponential weights decay by one-half.

Returns:

An exponentially smoothed moving average of the values for the past x time at the latest point in time. [Float64]

Example:

SELECT
	sale_time,
	sales,
	exponentialMovingAverage(2)(sales, toRelativeHourNum(sale_time)) OVER (ORDER BY sale_time) AS exp_moving_avg
FROM taco_sales
ORDER BY sale_time;

Result:

| sale_time           | sales | exp_moving_avg      |
|---------------------|-------|---------------------|
| 2023-05-01 12:00:00 | 100   | 50                  |
| 2023-05-01 13:00:00 | 120   | 85.35533905932738   |
| 2023-05-01 14:00:00 | 90    | 87.67766952966369   |
| 2023-05-01 15:00:00 | 110   | 98.83883476483184   |
| 2023-05-01 16:00:00 | 130   | 114.4194173824159   |

In this example, we calculate the exponential moving average of taco sales with a 2-hour half-life. The result shows how the average smoothly adapts to changes in sales, giving more weight to recent data.

The timeunit argument should be a monotonically increasing value. Using toRelativeHourNum() or intDiv(toUInt32(sale_time), 3600) ensures this for hourly data.

exponentialTimeDecayedAvg

Calculates the exponentially smoothed weighted moving average of values in a time series.

Syntax:

exponentialTimeDecayedAvg(x)(v, t)

Arguments:

v (numeric): Value. Integer, Float, or Decimal.
t (numeric or DateTime): Time. Integer, Float, Decimal, DateTime, or DateTime64.

Parameters:

x (numeric): Half-life period. Integer, Float, or Decimal.

Returns:

An exponentially smoothed weighted moving average at the given point in time. [Float64]

Example:

SELECT
	taco_sales AS value,
	sale_time AS time,
	round(exp_smooth, 3) AS smoothed_sales,
	bar(exp_smooth, 0, 50, 20) AS sales_trend
FROM (
	SELECT
		taco_sales,
		toHour(sale_time) AS sale_time,
		exponentialTimeDecayedAvg(6)(taco_sales, toHour(sale_time))
		OVER (ORDER BY toHour(sale_time)) AS exp_smooth
	FROM taco_sales_hourly
)
LIMIT 10;

Result:

| value | time | smoothed_sales | sales_trend       |
|-------|------|----------------|-------------------|
| 10    | 0    | 10.000         | ████
| 15    | 1    | 12.500         | █████
| 8     | 2    | 10.250         | ████
| 12    | 3    | 11.125         | ████▍
| 20    | 4    | 15.563         | ██████▏
| 18    | 5    | 16.781         | ██████▋
| 25    | 6    | 20.891         | ████████▎
| 22    | 7    | 21.445         | ████████▌
| 30    | 8    | 25.723         | ██████████▎
| 28    | 9    | 26.861         | ██████████▋

This example calculates the exponentially smoothed average of hourly taco sales. The half-life period is set to 6 hours, meaning that sales from 6 hours ago have half the weight of current sales. The sales_trend column provides a visual representation of the smoothed sales trend.

The function is particularly useful for analyzing time series data where recent values should have more influence on the average than older values.

exponentialTimeDecayedCount

Calculates the cumulative exponential decay over a time series at a given point in time.

Syntax:

exponentialTimeDecayedCount(x)(t)

Arguments:

x (numeric): The half-life period.
t (numeric or DateTime): The time value.

Returns:

The cumulative exponential decay at the given point in time. [Float64]

Example:

SELECT
	order_time,
	order_count,
	round(exponentialTimeDecayedCount(10)(order_time) OVER (
		ORDER BY order_time
		ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
	), 3) AS decayed_count,
	bar(decayed_count, 0, 10, 20) AS visual_decay
FROM
	taco_orders
ORDER BY order_time;

Result:

| order_time          | order_count | decayed_count | visual_decay          |
|---------------------|-------------|---------------|-----------------------|
| 2023-05-01 12:00:00 | 1           | 1.000         | ██
| 2023-05-01 13:00:00 | 0           | 1.933         | ███▊
| 2023-05-01 14:00:00 | 0           | 2.777         | █████▌
| 2023-05-01 15:00:00 | 0           | 3.540         | ███████
| 2023-05-01 16:00:00 | 0           | 4.230         | ████████▍
| 2023-05-01 17:00:00 | 1           | 4.853         | █████████▋
| 2023-05-01 18:00:00 | 0           | 5.416         | ██████████▊
| 2023-05-01 19:00:00 | 0           | 5.924         | ███████████▊
| 2023-05-01 20:00:00 | 0           | 6.382         | ████████████▋
| 2023-05-01 21:00:00 | 0           | 6.796         | █████████████▌
| 2023-05-01 22:00:00 | 1           | 7.169         | ██████████████▎
| 2023-05-01 23:00:00 | 0           | 7.506         | ███████████████
| 2023-05-02 00:00:00 | 0           | 7.811         | ███████████████▌
| 2023-05-02 01:00:00 | 0           | 8.087         | ████████████████▏
| 2023-05-02 02:00:00 | 0           | 8.337         | ████████████████▋
| 2023-05-02 03:00:00 | 1           | 8.563         | █████████████████▏
| 2023-05-02 04:00:00 | 0           | 8.768         | █████████████████▌
| 2023-05-02 05:00:00 | 0           | 8.953         | █████████████████▉
| 2023-05-02 06:00:00 | 0           | 9.121         | ██████████████████▏
| 2023-05-02 07:00:00 | 0           | 9.273         | ██████████████████▌
| 2023-05-02 08:00:00 | 1           | 9.411         | ██████████████████▊
| 2023-05-02 09:00:00 | 0           | 9.535         | ███████████████████
| 2023-05-02 10:00:00 | 0           | 9.648         | ███████████████████▎
| 2023-05-02 11:00:00 | 0           | 9.750         | ███████████████████▌

In this example:

Taco orders table has 24 hours of data with an order (represented by 1) every 5 hours.
The exponentialTimeDecayedCount function is applied with a half-life of 10 hours.
The decayed_count shows how the cumulative value decays over time, with new orders increasing the count.
The visual_decay column provides a bar chart representation of the decayed count.

This function is useful for analyzing time-series data where recent events should have more weight than older ones, such as in trend analysis or anomaly detection in order patterns.

exponentialTimeDecayedMax

Calculates the maximum of the exponentially smoothed moving average at the current time index compared to the previous one.

Syntax:

exponentialTimeDecayedMax(x)(value, timeunit)

Arguments:

value (numeric): The value to be smoothed.
timeunit (numeric or DateTime or DateTime64): The time unit associated with each value.

Parameters:

x (numeric): The half-life period for the exponential decay.

Returns:

The maximum of the exponentially smoothed weighted moving average at the current time index and the previous one. [Float64]

Example:

SELECT
	order_time,
	taco_price,
	round(exp_smooth, 3) AS smoothed_price,
	bar(exp_smooth, 0, 10, 50) AS price_trend
FROM (
	SELECT
		order_time,
		taco_price,
		exponentialTimeDecayedMax(5)(taco_price, toUnixTimestamp(order_time))
		OVER (ORDER BY order_time) AS exp_smooth
	FROM taco_orders
)
ORDER BY order_time
LIMIT 10;

Result:

| order_time          | taco_price | smoothed_price | price_trend                     |
|---------------------|------------|----------------|---------------------------------|
| 2023-05-01 12:00:00 | 5.99       | 5.990          | ███████████████████████████
| 2023-05-01 12:15:00 | 6.49       | 6.490          | ████████████████████████████
| 2023-05-01 12:30:00 | 5.99       | 6.490          | ████████████████████████████
| 2023-05-01 12:45:00 | 6.99       | 6.990          | ██████████████████████████████
| 2023-05-01 13:00:00 | 5.49       | 6.990          | ██████████████████████████████
| 2023-05-01 13:15:00 | 6.99       | 6.990          | ██████████████████████████████
| 2023-05-01 13:30:00 | 7.49       | 7.490          | ███████████████████████████████
| 2023-05-01 13:45:00 | 6.99       | 7.490          | ███████████████████████████████
| 2023-05-01 14:00:00 | 7.99       | 7.990          | ████████████████████████████████
| 2023-05-01 14:15:00 | 7.49       | 7.990          | ████████████████████████████████

In this example:

We calculate the exponentially smoothed maximum price of tacos over time.
The smoothed_price column shows the result of exponentialTimeDecayedMax with a half-life of 5 time units.
The price_trend column visualizes the smoothed price using a bar chart.

This function is useful for tracking the maximum value of a metric that may fluctuate over time, giving more weight to recent observations while still considering historical data.

exponentialTimeDecayedSum

Calculates the sum of exponentially smoothed moving average values of a time series.

Syntax:

exponentialTimeDecayedSum(x)(v, t)

Arguments:

v (numeric): The value in the time series.
t (numeric or DateTime or DateTime64): The timestamp of the value.

Parameters:

x (numeric): The half-life period for the exponential decay.

Returns:

The sum of exponentially smoothed moving average values at the given point in time. [Float64]

Example:

SELECT
	taco_sales AS value,
	sale_time AS time,
	round(exp_smooth, 3) AS smoothed_sales,
	bar(exp_smooth, 0, 10, 50) AS sales_trend
FROM (
	SELECT
		taco_sales,
		toUnixTimestamp(sale_time) AS sale_time,
		exponentialTimeDecayedSum(3600)(taco_sales, toUnixTimestamp(sale_time))
		OVER (ORDER BY sale_time) AS exp_smooth
	FROM taco_sales_hourly
	LIMIT 10
)

Result:

| value | time       | smoothed_sales | sales_trend                         |
|-------|------------|----------------|-------------------------------------|
| 5     | 1625097600 | 5.000          | █████
| 7     | 1625101200 | 8.432          | ████████▍
| 3     | 1625104800 | 8.697          | ████████▋
| 8     | 1625108400 | 11.485         | ███████████▍
| 6     | 1625112000 | 12.792         | ████████████▊
| 4     | 1625115600 | 12.903         | ████████████▉
| 9     | 1625119200 | 15.679         | ███████████████▋
| 2     | 1625122800 | 14.603         | ██████████████▌
| 6     | 1625126400 | 15.477         | ███████████████▍
| 7     | 1625130000 | 16.724         | ████████████████▋

In this example:

We calculate the exponentially smoothed sum of taco sales over time.
The half-life period is set to 3600 seconds (1 hour).
The bar function visualizes the trend of smoothed sales.

This function is useful for analyzing time series data with more weight given to recent values, such as recent sales trends or user activity patterns.

sumCount

Calculates the sum of numbers and counts the number of rows simultaneously. This function is primarily used by the ClickHouse query optimizer to optimize queries with multiple sum, count, or avg functions.

Syntax:

sumCount(x)

Arguments:

x (Integer, Float, or Decimal): The numeric value to be summed.

Returns:

A tuple containing two elements:

The sum of the numbers (sum)
The count of rows with non-NULL values (count)

Return type: Tuple(sum, count)

Example:

SELECT
    sumCount(price) AS taco_sales_summary
FROM
    (
        SELECT 3.99 AS price
        UNION ALL
        SELECT 4.50 AS price
        UNION ALL
        SELECT 5.25 AS price
        UNION ALL
        SELECT NULL AS price
        UNION ALL
        SELECT 3.75 AS price
    ) AS taco_sales;

Result:

| taco_sales_summary |
|--------------------|
| (17.49,4)          |

In this example:

The sum of all taco prices is 17.49
There are 4 rows with non-NULL prices

sumKahan

Calculates the sum of numbers using the Kahan compensated summation algorithm, which provides higher accuracy for floating-point arithmetic compared to standard summation.

Syntax:

sumKahan(x)

Arguments:

x (Integer, Float, or Decimal): The values to sum.

Returns:

The sum of the input values. The return type matches the input type.

Details:

This function is slower than the standard sum function but offers improved accuracy for floating-point calculations.
The compensation algorithm is only effective for Float types.

Example:

SELECT
	sum(taco_price) AS regular_sum,
	sumKahan(taco_price) AS kahan_sum
FROM
	taco_orders
WHERE
	order_date = '2023-05-05';

Result:

| regular_sum         | kahan_sum |
|---------------------|-----------|
| 49.9999999999999999 | 50        |

In this example, we compare the regular sum and sumKahan for taco prices ordered on a specific date. The sumKahan function provides a more accurate result (50) compared to the regular sum function, which shows a small floating-point error.

Use sumKahan when dealing with floating-point numbers and high precision is required, especially when summing a large number of small values.

sumMap

Calculates the sum of array elements for each unique key in a key-value pair of arrays.

Syntax:

sumMap(keys, values)
sumMap(tuple(keys, values))

Alias:

sumMappedArrays

Arguments:

keys (Array): An array of keys.
values (Array): An array of values corresponding to the keys.

Alternatively, you can pass a single tuple containing two arrays: one for keys and one for values.

Returns:

A tuple containing two arrays:

Sorted unique keys
Sum of values for each corresponding key

Both returned arrays are of the same type as the input arrays.

The number of elements in keys and values must be the same for each row.
If a key is encountered multiple times, the values for that key are summed.

Example:

Let’s calculate the total number of tacos sold by type:

SELECT
	sumMap(
		['beef', 'chicken', 'fish', 'beef', 'fish'],
		[10, 20, 15, 5, 5]
	) AS taco_sales

Result:

| taco_sales                                |
|-------------------------------------------|
| (['beef','chicken','fish'],[15,20,20])    |

In this example:

‘beef’ tacos: 10 + 5 = 15
‘chicken’ tacos: 20
‘fish’ tacos: 15 + 5 = 20

The function returns a tuple with sorted unique keys and their corresponding summed values.

sumMapWithOverflow

Calculates the sum of array elements for each key in the map, handling overflow by maintaining the original data type.

Syntax:

sumMapWithOverflow(keys, values)
sumMapWithOverflow(tuple(keys, values))

Arguments:

keys (Array): Array of keys.
values (Array): Array of values.

Alternatively, you can pass a single tuple containing both the keys and values arrays.

Returns:

A tuple containing two arrays:

Sorted unique keys
Sum of values for each corresponding key

The data type of the summed values remains the same as the input, allowing for overflow.

Example:

SELECT
    time,
    sumMapWithOverflow(taco_type, quantity) AS taco_totals
FROM (
  SELECT '2023-05-01 12:00:00' AS time, ['Carnitas', 'Al Pastor', 'Pescado'] AS taco_type, [10, 15, 5] AS quantity UNION ALL
  SELECT '2023-05-01 13:00:00' AS time, ['Al Pastor', 'Pescado', 'Pollo'] AS taco_type, [20, 10, 15] AS quantity UNION ALL
  SELECT '2023-05-01 14:00:00' AS time, ['Carnitas', 'Pollo', 'Pescado'] AS taco_type, [5, 25, 10] AS quantity
)
GROUP BY time;

Result:

| time                | taco_totals                                       |
|---------------------|---------------------------------------------------|
| 2023-05-01 12:00:00 | (['Al Pastor','Carnitas','Pescado'],[15,10,5])    |
| 2023-05-01 13:00:00 | (['Al Pastor','Pescado','Pollo'],[20,10,15])      |
| 2023-05-01 14:00:00 | (['Carnitas','Pescado','Pollo'],[5,10,25])        |

In this example, sumMapWithOverflow calculates the total quantity for each taco type, maintaining the original UInt8 data type for quantities. This allows for potential overflow if the sum exceeds 255.

The function differs from sumMap in that it performs summation with overflow, preserving the original data type of the values. Use this function when you want to maintain the input data type and allow for overflow in calculations.

sumWithOverflow

Computes the sum of numbers using the same data type for the result as for the input parameters. If the sum exceeds the maximum value for this data type, it is calculated with overflow.

Syntax:

sumWithOverflow(num)

Arguments:

num (UInt*, Int*, Float*, or Decimal*): A column of numeric values.

Returns:

The sum of the values, with the same data type as the input.

Example:

SELECT
    sum(daily_sales) AS regular_sum,
    sumWithOverflow(daily_sales) AS overflow_sum,
    toTypeName(sum(daily_sales)) AS regular_sum_type,
    toTypeName(sumWithOverflow(daily_sales)) AS overflow_sum_type
FROM (
    SELECT 30000 AS daily_sales UNION ALL
    SELECT 25000 AS daily_sales UNION ALL
    SELECT 20000 AS daily_sales UNION ALL
    SELECT 15000 AS daily_sales
);

Result:

| regular_sum | overflow_sum | regular_sum_type | overflow_sum_type  |
|-------------|--------------|-------------------|-------------------|
| 90000       | 24464        | UInt64            | UInt16            |

In this example, we sum the daily taco sales. The sum function automatically widens the result type to UInt64 to accommodate the larger sum. However, sumWithOverflow keeps the original UInt16 type, resulting in an overflow when the sum exceeds 65,535 (the maximum value for UInt16).

This function is useful when you need to maintain the original data type in calculations, even at the risk of overflow. Use with caution and only when overflow behavior is acceptable for your use case.

Introduction

Statements

Advanced summations & averages

ClickHouse function reference

deltaSum

deltaSumTimestamp

exponentialMovingAverage

exponentialTimeDecayedAvg

exponentialTimeDecayedCount

exponentialTimeDecayedMax

exponentialTimeDecayedSum

sumCount

sumKahan

sumMap

sumMapWithOverflow

sumWithOverflow

Introduction

Statements

​ClickHouse function reference

​deltaSum

​deltaSumTimestamp

​exponentialMovingAverage

​exponentialTimeDecayedAvg

​exponentialTimeDecayedCount

​exponentialTimeDecayedMax

​exponentialTimeDecayedSum

​sumCount

​sumKahan

​sumMap

​sumMapWithOverflow

​sumWithOverflow

ClickHouse function reference

deltaSum

deltaSumTimestamp

exponentialMovingAverage

exponentialTimeDecayedAvg

exponentialTimeDecayedCount

exponentialTimeDecayedMax

exponentialTimeDecayedSum

sumCount

sumKahan

sumMap

sumMapWithOverflow

sumWithOverflow