Basic aggregations
Perform common aggregations like count, min, max, sum, and average.
ClickHouse function reference
any
Selects the first encountered non-NULL value from a column.
Syntax:
Aliases:
any_value
first_value
Arguments:
column
: The column name.
Modifiers:
RESPECT NULLS
: Includes NULL values in the selection.
Returns:
The first non-NULL value encountered in the column. The return type is the same as the input type, except for LowCardinality
which is discarded.
- The
RESPECT NULLS
modifier can be used after the function name to include NULL values in the selection. - If no rows are input, it returns the default value for the column type (0 for integers, NULL for Nullable columns).
- The order of execution is not guaranteed, so results may be indeterminate.
Example:
Result:
In this example, any(taco_type)
returns ‘Carnitas’, the first non-NULL value encountered in the taco_type
column.
- For deterministic results, consider using
min
ormax
functions instead ofany
. - In some cases, you can rely on the order of execution, such as when
SELECT
comes from a subquery that usesORDER BY
. - When a
SELECT
query has aGROUP BY
clause or at least one aggregate function, all expressions inSELECT
,HAVING
, andORDER BY
clauses must be calculated from keys or aggregate functions.
anyHeavy
Selects a frequently occurring value using the heavy hitters algorithm. If there is a value that occurs more than in half the cases in each of the query’s execution threads, this value is returned. The result is typically nondeterministic.
Syntax:
Arguments:
column
(any): The column name.
Returns:
A value from the specified column that occurs frequently.
Example:
Result:
In this example, anyHeavy
selects a frequently occurring taco type from the taco_orders
table. The result shows that “Carnitas” is a popular taco choice, appearing in more than half of the orders in at least one execution thread.
The function is useful for identifying dominant values in a dataset, but it doesn’t guarantee returning the most frequent value overall. It’s particularly efficient for datasets with a clear majority value.
anyLast
Selects the last non-NULL value encountered in a column. If all values are NULL, it returns NULL. The result is indeterminate and may vary between query executions.
Syntax:
Arguments:
column
(any data type): The column to analyze.
Returns:
The last non-NULL value in the column, or NULL if all values are NULL.
Modifier:
RESPECT NULLS
: When specified, the function will return the last value, even if it’s NULL.
Example:
Result:
In this example, anyLast
returns ‘Carnitas’ as the last non-NULL taco filling from the orders.
The function’s behavior is non-deterministic. If you need consistent results, consider using ORDER BY
and LIMIT 1
instead.
argMax
Calculates the arg
value corresponding to the maximum val
value. If multiple rows have the same maximum val
, the returned arg
is not deterministic.
Syntax:
Arguments:
arg
(any type): The argument to return.val
(any type): The value to compare for finding the maximum.
Returns:
The arg
value corresponding to the maximum val
. The return type matches the type of arg
.
- Both
arg
andval
behave as aggregate functions. - NULL values are skipped during processing.
- If all values are NULL, the function returns NULL.
Example:
Result:
In this example, argMax
returns the taco_type
corresponding to the highest price
in the taco_menu
.
argMin
Calculates the arg
value corresponding to the minimum val
value. If multiple rows have the same minimum val
, the returned arg
is not deterministic.
Syntax:
Arguments:
arg
(any type): The argument to return.val
(any type): The value to compare for finding the minimum.
Returns:
The arg
value that corresponds to the minimum val
. The return type matches the type of arg
.
- Both
arg
andval
behave as aggregate functions. - NULL values are skipped during processing.
- If all values are NULL, the function returns NULL.
Example:
Result:
In this example, argMin
returns the taco_type
corresponding to the lowest price
in the taco_menu
.
avg
Calculates the arithmetic mean of a set of values.
Syntax:
Arguments:
x
(numeric): The values to calculate the average of.
Returns:
The arithmetic mean as a Float64
.
- Returns
NaN
if the input is empty.
Example:
Result:
In this example, we calculate the average price of tacos from the taco_menu
table.
Example with empty input:
Result:
This example demonstrates that avg()
returns NaN
when given an empty input.
median
median
— Alias forquantile
.medianDeterministic
— Alias forquantileDeterministic
.medianExact
— Alias forquantileExact
.medianExactWeighted
— Alias forquantileExactWeighted
.medianTiming
— Alias forquantileTiming
.medianTimingWeighted
— Alias forquantileTimingWeighted
.medianTDigest
— Alias forquantileTDigest
.medianTDigestWeighted
— Alias forquantileTDigestWeighted
.medianBFloat16
— Alias forquantileBFloat16
.medianDD
— Alias forquantileDD
.
count
Counts the number of rows or non-NULL values.
Syntax:
Arguments:
expr
(optional): An expression of any type.
Returns:
The number of rows or non-NULL values. (UInt64
)
Details:
count()
orcount(*)
: Counts the total number of rows.count(expr)
: Counts the number of rows whereexpr
is not NULL.count(DISTINCT expr)
: Counts the number of distinct non-NULL values ofexpr
.
The implementation of COUNT(DISTINCT ...)
uses uniqExact
.
SELECT count() FROM table
is optimized using MergeTree metadata by default.
Examples:
Count total rows:
Result:
Count distinct taco types:
Result:
For improved performance on COUNT(DISTINCT expr)
queries, consider adding a GROUP BY
clause or using a projection to create an index on the target column.
first_value
Returns the first value in a group. It is an alias for any
but was introduced for compatibility with window functions.
Syntax:
Arguments:
expr
: The expression to evaluate.
Modifiers:
RESPECT NULLS
: Includes NULL values in the selection.IGNORE NULLS
: Excludes NULL values from the selection.
Returns:
The first value in the group, matching the type of expr
.
- By default,
first_value
ignores NULL values. - Without window functions, the result may be random if the source stream is not ordered.
Example:
Result:
In this example:
first_taco
andfirst_taco_ignore_null
both return ‘Carnitas’, ignoring NULL values.first_taco_with_null
returns NULL, as it’s the first value when considering NULL values.
last_value
Selects the last encountered value in a set of values. It can handle NULL values and is particularly useful with Window Functions.
Syntax:
Arguments:
expr
: The expression to evaluate.
Modifiers:
RESPECT NULLS
: Includes NULL values in the selection.IGNORE NULLS
: Excludes NULL values from the selection.
Returns:
The last value of the expression, based on the order of rows processed.
- Without Window Functions, the result may be unpredictable if the source stream is not ordered.
- By default, NULL values are ignored.
Example:
Result:
In this example:
last_taco_default
andlast_taco_ignore_null
both return ‘Al Pastor’, ignoring NULL values.last_taco_respect_null
returns NULL, as it’s the last value when considering NULL values.
When used without Window Functions, it’s important to use an ordered subquery to ensure consistent results.
max
Calculates the maximum value across a group of values.
Syntax:
Arguments:
expr
: The expression to calculate the maximum value from. Can be of any data type that supports comparison operations.
Returns:
The maximum value of the same type as the input expression.
Example:
Result:
In this example, max(price)
calculates the highest price for each type of taco on the menu.
If you need a non-aggregate function to choose the maximum of two values, use the greatest
function instead:
min
Calculates the minimum value across a group of values.
Syntax:
Arguments:
expr
: The expression to calculate the minimum for. Can be of any data type that supports comparison operations.
Returns:
The minimum value of the same type as the input expression.
Example:
Result:
In this example, min
calculates the lowest price for each type of taco on the menu.
If you need a non-aggregate function to choose the minimum of two values, use the least
function instead:
singleValueOrNull
Returns the single unique non-NULL value in a column, or NULL if there are zero or multiple distinct values.
Syntax:
Arguments:
x
(Any
, exceptMap
,Array
, orTuple
): A column of any data type, exceptMap
,Array
, orTuple
. The column can beNullable
.
Returns:
- The unique non-NULL value if there’s only one in the column.
NULL
if there are zero or multiple distinct values.
This function is often used to implement subquery operators, such as x = ALL (SELECT ...)
.
Example:
Result:
In this example, if all non-NULL taco_filling
values are “Carnitas”, the function returns “Carnitas”. If there are multiple different fillings or no fillings at all, it returns NULL.
This function ignores NULL values in the input column. Only non-NULL values are considered when determining uniqueness.
sum
Calculates the sum of numeric values.
Syntax:
Arguments:
x
(numeric): Column with numeric data type values. Supported data types:Int
,UInt
,Float
,Decimal
.
Returns:
The sum of the values.
- Type: Matches the input data type.
Example:
Result:
This example calculates the total revenue from taco sales by summing up all the taco_price
values.
The sum
function only works with numeric types. For other data types, consider using appropriate type conversions or alternative aggregation functions.
Was this page helpful?