Window functions
Perform calculations over sets of rows related to the current row.
Window functions allow you to perform calculations across sets of rows that are related to the current row, without collapsing them into a single output.
Window functions are useful for a variety of analytical tasks, such as:
- Computing a running totals
- Calculating a moving averages
- Ranking customers by their sales volume
- Computing a cumulative distribution function for a set of values
General syntax
Arguments:
aggregate_function(column_name)
: The aggregate function to apply.PARTITION BY grouping_column
: Divides the result set into partitions to which the window function is applied independently.ORDER BY sorting_column
: Defines the logical order of rows within each partition.ROWS or RANGE
: Specifies the frame over which the function operates relative to the current row.WINDOW clause
: Allows you to name and reuse window definitions.
Visualizing window frames
Frame bounds
Frame bounds define the start and end of the window frame for each row. Here are common examples:
Frame bound | Explanation | Example |
---|---|---|
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | Includes all rows in the partition | SUM(value) OVER (PARTITION BY group) |
No bound expression, no ORDER BY | Same as above, includes all rows in the partition | AVG(price) OVER (PARTITION BY category) |
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | From start of partition to current row | SUM(sales) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) |
No bound expression, with ORDER BY | Same as above, from start of partition to current row | SUM(sales) OVER (ORDER BY date) |
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING | From current row to end of partition | MAX(price) OVER (ORDER BY date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) |
BETWEEN 1 PRECEDING AND CURRENT ROW | Sliding frame of current and previous row | AVG(value) OVER (ORDER BY time ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) |
BETWEEN 1 PRECEDING AND 1 FOLLOWING | Sliding frame of 3 rows | MEDIAN(score) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) |
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING | From current row to end of partition | LAST_VALUE(price) OVER (ORDER BY date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) |
Some functions like ROW_NUMBER()
do not respect the frame and always operate on the entire partition.
Examples
Compare to category average
Compare each taco’s price to the average for its category.
Returns:
Compare to category max
Compare each taco’s price to the maximum for its category.
Returns:
Partitioning by column
Returns:
Cumulative sum
Returns:
Moving average over 5 rows
Returns:
Moving average over 10 seconds
Returns:
Moving average over 10 days
Returns:
ClickHouse function reference
row_number
Returns a unique number for each row within its partition, starting at 1 for the first row.
Syntax:
Arguments:
None other than the OVER
clause.
Returns:
A unique integer for each row within its partition. (UInt64
)
Example:
Result:
In this example, row_number()
assigns a unique rank to each taco based on its price in descending order. Note that even though “Classic Beef” and “Chicken Fiesta” have the same price, they receive different rank numbers.
Example with PARTITION BY:
Result:
In this example, row_number()
restarts the numbering for each taco category, ranking tacos by price within their respective categories.
Unlike rank()
and dense_rank()
, row_number()
always assigns unique, sequential numbers, even for rows with identical values in the ORDER BY
column.
first_value
Returns the first value evaluated within its ordered frame. By default, NULL arguments are skipped, but this behavior can be modified.
Syntax:
Alias:
- any
Arguments:
column_name
(Any): The column from which to select the first value.
Optional Modifiers:
RESPECT NULLS
: Includes NULL values in the evaluation.IGNORE NULLS
: Skips NULL values (default behavior).
Returns:
The first value in the ordered set of rows.
Example:
Result:
In this example, first_value
identifies the most expensive taco (Supreme Taco) based on the descending order of prices.
first_value
is an alias for theany
function in ClickHouse.- When using
RESPECT NULLS
, NULL values are considered in determining the first value.
This function is particularly useful for finding the top or bottom value within a group or partition in ordered data sets.
last_value
Returns the last value evaluated within its ordered frame. By default, NULL arguments are skipped, but this behavior can be modified using the RESPECT NULLS
clause.
Syntax:
Arguments:
column_name
: The column from which to retrieve the last value.
Optional Modifiers:
RESPECT NULLS
: Includes NULL values in the evaluation.IGNORE NULLS
: Skips NULL values (default behavior).
Returns:
The last value evaluated within the specified window frame.
Example:
Result:
In this example, last_value
is used to find the cheapest taco from our taco menu. The window frame covers the entire dataset, so the last value (when ordered by price descending) is always the cheapest taco.
nth_value
Returns the nth value evaluated within its ordered frame.
Syntax:
Arguments:
x
(Any data type): The column or expression to evaluate.offset
(UInt64
): The nth row to evaluate the current row against (1-based).
Returns:
The first non-NULL value from the nth row in the ordered frame.
Example:
Result:
In this example, nth_value
is used to find the third most expensive taco from a taco menu. The function returns NULL for the first two rows because there isn’t a third row yet in the ordered frame. From the third row onward, it returns ‘Gourmet Taco’, which is the taco name in the third row when ordered by price descending.
The nth_value
function is particularly useful when you need to compare each row with a specific nth value within a sorted set, such as finding benchmark or threshold values in a dataset.
rank
Ranks rows within a partition, allowing for gaps in the ranking sequence.
It assigns a unique rank to each distinct row within a partition based on the specified ordering. If multiple rows have the same values in the ORDER BY
columns, they receive the same rank, and the next rank is skipped. This results in gaps in the ranking sequence.
For ranking without gaps, use the dense_rank()
function instead.
Syntax:
Arguments:
None other than the OVER
clause.
Returns:
A rank value for each row. [UInt64
]
Example:
Result:
In this example:
- Tacos are ranked by price in descending order.
- The “Supreme Taco” and “Deluxe Taco” have the same price, so they both receive rank 1.
- The next rank is 3, skipping 2, demonstrating the gap in ranking.
The rank()
function is often used for competitive rankings or when you need to identify distinct positions while preserving information about ties.
dense_rank
Ranks rows within a partition without gaps in the ranking sequence.
It assigns a unique rank to each distinct row within a partition based on the ORDER BY
clause. If multiple rows have the same values, they receive the same rank, and the next rank is assigned to the next distinct value without gaps.
This function differs from rank()
in that it doesn’t leave gaps in the ranking sequence when there are ties.
Syntax:
Alias:
- denseRank (case-sensitive)
Arguments:
None other than the OVER
clause.
Returns:
A ranking number for each row within its partition, without gaps. [UInt64
]
Example:
Result:
In this example:
- The most expensive tacos (Supreme and Deluxe) both receive rank 1.
- The next distinct price (Veggie Fiesta) receives rank 2, without gaps.
- Classic Beef and Chicken tacos share rank 3.
- The least expensive taco (Basic) receives rank 4.
dense_rank()
is often used in scenarios where you want to assign rankings without gaps, such as in competition results or when creating tier systems based on certain criteria.
percent_rank
Returns the relative rank (i.e., percentile) of rows within a window partition.
Syntax:
Alias:
- percentRank
Arguments:
column_name
(String
): The column to calculate the percent rank for.
Returns:
A float value between 0 and 1, representing the relative rank of each row.
- The default and required window frame definition is
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
. - The result is calculated as
(rank - 1) / (total_rows - 1)
, whererank
is the row’s rank within the partition andtotal_rows
is the number of rows in the partition.
Example:
Result:
In this example, percent_rank()
calculates the price percentile for each taco. The most expensive taco (Supreme Taco) has a percentile of 0, while the least expensive (Value Taco) has a percentile of 1. The other tacos are distributed between these extremes based on their prices.
lagInFrame
Returns a value evaluated at the row that is at a specified physical offset row before the current row within the ordered frame.
The lagInFrame
function is similar to the common lag
function found in databases like PostgreSQL or Snowflake, but it respects the window frame. This means it only considers rows within the defined frame, not the entire partition. To achieve equivalent behavior to the traditional lag
function use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
in the OVER
clause.
Syntax:
Arguments:
x
: Column or expression to evaluate.offset
((U)Int*
, optional): Number of rows to look back. Default is 1.default
(optional): Value to return if the offset exceeds the window frame boundaries. Defaults to the column’s default value.
Returns:
The value of x
from the row offset
rows before the current row within the ordered frame.
Example:
This example calculates the day-to-day change in closing price for a taco restaurant’s stock:
Result:
In this example:
close
is the closing price for each day.previous_day_close
useslagInFrame
to get the previous day’s closing price.price_change
calculates the absolute change in price.percent_change
calculates the percentage change in price.
leadInFrame
Returns a value evaluated at the row that is offset rows after the current row within the ordered frame.
The leadInFrame
function is similar to the common lead
function found in databases like PostgreSQL or Snowflake, but it respects the window frame. This means it only considers rows within the defined frame, not the entire partition. To achieve equivalent behavior to the traditional lead
function use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
in the OVER
clause.
Syntax:
Arguments:
x
: Column or expression to evaluate.offset
(UInt*
, optional): Number of rows to look ahead. Default is 1.default
(optional): Value to return if the offset exceeds the window frame. Defaults to the column’s default value.
Returns:
The value of x
evaluated at the row that is offset
rows after the current row within the ordered frame.
Example:
This example uses leadInFrame
to show the next year’s taco sales for each taco type:
Result:
In this example:
next_year_sales
shows the sales for the following year for each taco type.NULL
appears when there’s no next year in the data for that taco type.
Was this page helpful?