Advanced summations & averages
Perform complex summations and weighted averages.
ClickHouse function reference
deltaSum
Calculates the sum of positive differences between consecutive rows in a sorted dataset.
Syntax:
Arguments:
value
(numeric): The input values to calculate the delta sum.
Returns:
The sum of positive differences between consecutive rows, with the same type as the input.
- The input data must be sorted for this function to work correctly.
- Negative differences are ignored in the calculation.
- For use in materialized views, consider using
deltaSumTimestamp
instead.
Example:
Result:
In this example:
- We create a sorted list of taco prices.
deltaSum
calculates the sum of positive price differences:- (3.00 - 2.50) + (3.25 - 3.00) + (3.50 - 3.25) = 0.50 + 0.25 + 0.25 = 1.25
- The negative difference (2.75 - 3.00) is ignored.
This function is useful for analyzing cumulative increases in sorted data, such as price trends or incremental changes in time series data.
deltaSumTimestamp
Calculates the sum of differences between consecutive values, ordered by a timestamp. It ignores negative differences.
Syntax:
Arguments:
value
(numeric,Date
, orDateTime
): The values to sum.timestamp
(numeric,Date
, orDateTime
): The parameter for ordering values.
Returns:
The accumulated sum of positive differences between consecutive values, ordered by the timestamp parameter. The return type matches the input type of value
.
Example:
Result:
This example calculates the total increase in taco orders for each hour on May 5, 2023. It sums up only the positive differences between consecutive taco counts, ordered by the order time.
This function is particularly useful for materialized views that store data ordered by time-bucketed timestamps (e.g., rounded to the start of a minute). It maintains the original timestamp order during merges, ensuring correct calculations even when rows have identical bucketed timestamps.
For simple delta sums across an ordered collection without timestamp considerations, use the deltaSum
function instead.
exponentialMovingAverage
Calculates the exponential moving average of values over a specified time period.
This function calculates a weighted average where older time points have less weight in the final result. It’s particularly useful for time series data where recent values are more significant than older ones.
Syntax:
Arguments:
value
(numeric): The value to be averaged.timeunit
(numeric): An index representing the time interval, not a timestamp. Can be calculated usingintDiv
ortoRelativeHourNum
.
Parameters:
x
(numeric): The half-life period. This is the time lag at which the exponential weights decay by one-half.
Returns:
An exponentially smoothed moving average of the values for the past x
time at the latest point in time. [Float64
]
Example:
Result:
In this example, we calculate the exponential moving average of taco sales with a 2-hour half-life. The result shows how the average smoothly adapts to changes in sales, giving more weight to recent data.
The timeunit
argument should be a monotonically increasing value. Using toRelativeHourNum()
or intDiv(toUInt32(sale_time), 3600)
ensures this for hourly data.
exponentialTimeDecayedAvg
Calculates the exponentially smoothed weighted moving average of values in a time series.
Syntax:
Arguments:
v
(numeric): Value.Integer
,Float
, orDecimal
.t
(numeric orDateTime
): Time.Integer
,Float
,Decimal
,DateTime
, orDateTime64
.
Parameters:
x
(numeric): Half-life period.Integer
,Float
, orDecimal
.
Returns:
An exponentially smoothed weighted moving average at the given point in time. [Float64
]
Example:
Result:
This example calculates the exponentially smoothed average of hourly taco sales. The half-life period is set to 6 hours, meaning that sales from 6 hours ago have half the weight of current sales. The sales_trend
column provides a visual representation of the smoothed sales trend.
The function is particularly useful for analyzing time series data where recent values should have more influence on the average than older values.
exponentialTimeDecayedCount
Calculates the cumulative exponential decay over a time series at a given point in time.
Syntax:
Arguments:
x
(numeric): The half-life period.t
(numeric orDateTime
): The time value.
Returns:
The cumulative exponential decay at the given point in time. [Float64
]
Example:
Result:
In this example:
- Taco orders table has 24 hours of data with an order (represented by 1) every 5 hours.
- The
exponentialTimeDecayedCount
function is applied with a half-life of 10 hours. - The
decayed_count
shows how the cumulative value decays over time, with new orders increasing the count. - The
visual_decay
column provides a bar chart representation of the decayed count.
This function is useful for analyzing time-series data where recent events should have more weight than older ones, such as in trend analysis or anomaly detection in order patterns.
exponentialTimeDecayedMax
Calculates the maximum of the exponentially smoothed moving average at the current time index compared to the previous one.
Syntax:
Arguments:
value
(numeric): The value to be smoothed.timeunit
(numeric orDateTime
orDateTime64
): The time unit associated with each value.
Parameters:
x
(numeric): The half-life period for the exponential decay.
Returns:
The maximum of the exponentially smoothed weighted moving average at the current time index and the previous one. [Float64
]
Example:
Result:
In this example:
- We calculate the exponentially smoothed maximum price of tacos over time.
- The
smoothed_price
column shows the result ofexponentialTimeDecayedMax
with a half-life of 5 time units. - The
price_trend
column visualizes the smoothed price using a bar chart.
This function is useful for tracking the maximum value of a metric that may fluctuate over time, giving more weight to recent observations while still considering historical data.
exponentialTimeDecayedSum
Calculates the sum of exponentially smoothed moving average values of a time series.
Syntax:
Arguments:
v
(numeric): The value in the time series.t
(numeric orDateTime
orDateTime64
): The timestamp of the value.
Parameters:
x
(numeric): The half-life period for the exponential decay.
Returns:
The sum of exponentially smoothed moving average values at the given point in time. [Float64
]
Example:
Result:
In this example:
- We calculate the exponentially smoothed sum of taco sales over time.
- The half-life period is set to 3600 seconds (1 hour).
- The
bar
function visualizes the trend of smoothed sales.
This function is useful for analyzing time series data with more weight given to recent values, such as recent sales trends or user activity patterns.
sumCount
Calculates the sum of numbers and counts the number of rows simultaneously. This function is primarily used by the ClickHouse query optimizer to optimize queries with multiple sum
, count
, or avg
functions.
Syntax:
Arguments:
x
(Integer
,Float
, orDecimal
): The numeric value to be summed.
Returns:
A tuple containing two elements:
- The sum of the numbers (
sum
) - The count of rows with non-NULL values (
count
)
Return type: Tuple(sum, count)
Example:
Result:
In this example:
- The sum of all taco prices is 17.49
- There are 4 rows with non-NULL prices
sumKahan
Calculates the sum of numbers using the Kahan compensated summation algorithm, which provides higher accuracy for floating-point arithmetic compared to standard summation.
Syntax:
Arguments:
x
(Integer
,Float
, orDecimal
): The values to sum.
Returns:
The sum of the input values. The return type matches the input type.
Details:
- This function is slower than the standard
sum
function but offers improved accuracy for floating-point calculations. - The compensation algorithm is only effective for
Float
types.
Example:
Result:
In this example, we compare the regular sum
and sumKahan
for taco prices ordered on a specific date. The sumKahan
function provides a more accurate result (50) compared to the regular sum
function, which shows a small floating-point error.
Use sumKahan
when dealing with floating-point numbers and high precision is required, especially when summing a large number of small values.
sumMap
Calculates the sum of array elements for each unique key in a key-value pair of arrays.
Syntax:
Alias:
- sumMappedArrays
Arguments:
keys
(Array
): An array of keys.values
(Array
): An array of values corresponding to the keys.
Alternatively, you can pass a single tuple containing two arrays: one for keys and one for values.
Returns:
A tuple containing two arrays:
- Sorted unique keys
- Sum of values for each corresponding key
Both returned arrays are of the same type as the input arrays.
- The number of elements in
keys
andvalues
must be the same for each row. - If a key is encountered multiple times, the values for that key are summed.
Example:
Let’s calculate the total number of tacos sold by type:
Result:
In this example:
- ‘beef’ tacos: 10 + 5 = 15
- ’chicken’ tacos: 20
- ’fish’ tacos: 15 + 5 = 20
The function returns a tuple with sorted unique keys and their corresponding summed values.
sumMapWithOverflow
Calculates the sum of array elements for each key in the map, handling overflow by maintaining the original data type.
Syntax:
Arguments:
keys
(Array
): Array of keys.values
(Array
): Array of values.
Alternatively, you can pass a single tuple
containing both the keys
and values
arrays.
Returns:
A tuple
containing two arrays:
- Sorted unique keys
- Sum of values for each corresponding key
The data type of the summed values remains the same as the input, allowing for overflow.
Example:
Result:
In this example, sumMapWithOverflow
calculates the total quantity for each taco type, maintaining the original UInt8
data type for quantities. This allows for potential overflow if the sum exceeds 255.
The function differs from sumMap
in that it performs summation with overflow, preserving the original data type of the values. Use this function when you want to maintain the input data type and allow for overflow in calculations.
sumWithOverflow
Computes the sum of numbers using the same data type for the result as for the input parameters. If the sum exceeds the maximum value for this data type, it is calculated with overflow.
Syntax:
Arguments:
num
(UInt*
,Int*
,Float*
, orDecimal*
): A column of numeric values.
Returns:
The sum of the values, with the same data type as the input.
Example:
Result:
In this example, we sum the daily taco sales. The sum
function automatically widens the result type to UInt64
to accommodate the larger sum. However, sumWithOverflow
keeps the original UInt16
type, resulting in an overflow when the sum exceeds 65,535 (the maximum value for UInt16
).
This function is useful when you need to maintain the original data type in calculations, even at the risk of overflow. Use with caution and only when overflow behavior is acceptable for your use case.
Was this page helpful?