ClickHouse function reference

quantile

Computes an approximate quantile of a numeric data sequence.

Syntax:

quantile(level)(expr)

Alias:

  • median

Arguments:

  • level (numeric, optional): Level of quantile. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates the median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date, or DateTime.

Returns:

Approximate quantile of the specified level.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example:

SELECT
	quantile(0.75)(taco_price) AS expensive_taco_price
FROM
(
    SELECT 1 AS taco_id, 2.99 AS taco_price UNION ALL
    SELECT 2 AS taco_id, 3.49 AS taco_price UNION ALL
    SELECT 3 AS taco_id, 4.99 AS taco_price UNION ALL
    SELECT 4 AS taco_id, 5.49 AS taco_price UNION ALL
    SELECT 5 AS taco_id, 6.99 AS taco_price UNION ALL
    SELECT 6 AS taco_id, 7.49 AS taco_price UNION ALL
    SELECT 7 AS taco_id, 8.99 AS taco_price UNION ALL
    SELECT 8 AS taco_id, 9.49 AS taco_price UNION ALL
    SELECT 9 AS taco_id, 10.99 AS taco_price UNION ALL
    SELECT 10 AS taco_id, 11.49 AS taco_price
) AS taco_menu;

Result:

| expensive_taco_price |
|----------------------|
| 9.365                |

In this example, we calculate the 75th percentile of taco prices, which represents the price point below which 75% of the tacos on the menu are priced.

<Note>
- This function applies reservoir sampling with a reservoir size up to 8192 and a random number generator for sampling. The result is non-deterministic.
- For an exact quantile, use the `quantileExact` function.
- When using multiple `quantile*` functions with different levels in a query, the internal states are not combined, potentially reducing query efficiency. In such cases, consider using the `quantiles` function.
- For an empty numeric sequence, `quantile` will return `NaN`.
</Note>


### quantileBFloat16

Computes an approximate quantile of a sample consisting of numeric values using the bfloat16 algorithm.

This function uses the bfloat16 (Brain Floating Point) format, which is a 16-bit floating-point format with 1 sign bit, 8 exponent bits, and 7 fraction bits. The function converts input values to 32-bit floats and takes the most significant 16 bits. It then calculates the bfloat16 quantile value and converts the result to a 64-bit float by appending zero bits.

The function is a fast quantile estimator with a relative error no more than 0.390625%.

**Syntax:**

```sql
quantileBFloat16[(level)](expr)

Alias:

  • medianBFloat16

Arguments:

  • expr (numeric): Column with numeric data. Supported types: Integer, Float.

Parameters:

  • level (Float, optional): Level of quantile. Possible values are in the range [0, 1]. Default value: 0.5.

Returns:

Approximate quantile of the specified level. Type: Float64.

Example:

SELECT
    quantileBFloat16(0.5)(price) AS median_price,
    quantileBFloat16(0.75)(price) AS q3_price,
    quantileBFloat16(0.5)(quantity) AS median_quantity
FROM (
    SELECT 2.50 AS price, 100 AS quantity UNION ALL
    SELECT 3.00 AS price, 150 AS quantity UNION ALL
    SELECT 3.50 AS price, 120 AS quantity UNION ALL
    SELECT 4.00 AS price, 80 AS quantity UNION ALL
    SELECT 4.50 AS price, 50 AS quantity UNION ALL
    SELECT 5.00 AS price, 200 AS quantity UNION ALL
    SELECT 5.50 AS price, 180 AS quantity UNION ALL
    SELECT 6.00 AS price, 160 AS quantity UNION ALL
    SELECT 6.50 AS price, 140 AS quantity UNION ALL
    SELECT 7.00 AS price, 130 AS quantity
) AS data;

Result:

| median_price | q3_price | median_quantity |
|--------------|----------|-----------------|
| 4.5          | 6        | 130             |

In this example:

  • median_price calculates the median (0.5 quantile) of taco prices.
  • q3_price calculates the third quartile (0.75 quantile) of taco prices.
  • median_quantity calculates the median quantity of tacos sold.

Note that due to the bfloat16 approximation, results may slightly differ from exact quantile calculations.

quantileDD

Computes an approximate quantile of a sample with relative-error guarantees using a Distributed Deterministic (DD) sketch.

Syntax:

quantileDD(relative_accuracy, [level])(expr)

Arguments:

  • expr (numeric): The column containing numeric data to compute the quantile from.

Parameters:

  • relative_accuracy (Float64): The relative accuracy of the quantile, between 0 and 1. Lower values increase accuracy but require more memory. Recommended value is 0.001 or higher.
  • level (Float64, optional): The quantile level to compute, between 0 and 1. Default: 0.5 (median).

Returns:

Approximate quantile of the specified level. [Float64]

Example:

SELECT
  quantileDD(0.01, 0.75)(taco_price) AS taco_price_75th_percentile,
  quantileDD(0.01, 0.5)(num_tacos_ordered) AS median_tacos_ordered
FROM
(
  SELECT 2.99 AS taco_price, 4 AS num_tacos_ordered UNION ALL
  SELECT 3.49 AS taco_price, 5 AS num_tacos_ordered UNION ALL
  SELECT 4.99 AS taco_price, 3 AS num_tacos_ordered UNION ALL
  SELECT 5.49 AS taco_price, 6 AS num_tacos_ordered UNION ALL
  SELECT 6.99 AS taco_price, 2 AS num_tacos_ordered UNION ALL
  SELECT 7.49 AS taco_price, 7 AS num_tacos_ordered UNION ALL
  SELECT 8.99 AS taco_price, 1 AS num_tacos_ordered UNION ALL
  SELECT 9.49 AS taco_price, 8 AS num_tacos_ordered UNION ALL
  SELECT 10.99 AS taco_price, 9 AS num_tacos_ordered UNION ALL
  SELECT 11.49 AS taco_price, 10 AS num_tacos_ordered
) AS taco_orders;

Result:

| taco_price_75th_percentile | median_tacos_ordered |
|----------------------------|----------------------|
| 8.935418643763574          | 5.002829575110705    |

In this example:

  • taco_price_75th_percentile calculates the 75th percentile of taco prices with 1% relative accuracy.
  • median_tacos_ordered calculates the median number of tacos ordered with 1% relative accuracy.

The memory usage of the sketch depends on the data range and relative accuracy. It’s approximately log(max_value/min_value)/relative_accuracy. Lower relative_accuracy values or wider data ranges increase memory usage.

This function is useful for efficiently calculating approximate quantiles on large datasets where exact calculations would be too resource-intensive.

quantileDeterministic

Computes an approximate quantile of a numeric data sequence using a deterministic algorithm.

This function uses reservoir sampling with a reservoir size up to 8192 and a deterministic sampling algorithm. The result is deterministic for the same inputs and determinator.

Syntax:

quantileDeterministic(level)(expr, determinator)

Alias:

  • medianDeterministic

Arguments:

  • level (numeric, optional): Level of quantile. Constant float from 0 to 1. Default: 0.5 (median).
  • expr (numeric, Date, or DateTime): Expression over column values resulting in numeric, Date, or DateTime types.
  • determinator (numeric): Number whose hash is used instead of a random number generator to make the sampling deterministic.

Returns:

Approximate quantile of the specified level.

  • Float64 for numeric input
  • Date for Date input
  • DateTime for DateTime input

Example:

SELECT
	quantileDeterministic(0.75)(taco_price, customer_id) AS pricey_taco
FROM
(
    SELECT 2.99 AS taco_price, 1 AS customer_id UNION ALL
    SELECT 3.49 AS taco_price, 2 AS customer_id UNION ALL
    SELECT 4.99 AS taco_price, 3 AS customer_id UNION ALL
    SELECT 5.49 AS taco_price, 4 AS customer_id UNION ALL
    SELECT 6.99 AS taco_price, 5 AS customer_id UNION ALL
    SELECT 7.49 AS taco_price, 6 AS customer_id UNION ALL
    SELECT 8.99 AS taco_price, 7 AS customer_id UNION ALL
    SELECT 9.49 AS taco_price, 8 AS customer_id UNION ALL
    SELECT 10.99 AS taco_price, 9 AS customer_id UNION ALL
    SELECT 11.49 AS taco_price, 10 AS customer_id
) AS taco_sales;

Result:

| pricey_taco |
|-------------|
| 9.365       |

This example calculates the 75th percentile of taco prices, using the customer_id as a determinator to ensure consistent results.

When using multiple quantile* functions with different levels in a query, internal states are not combined, potentially reducing query efficiency. In such cases, consider using the quantiles function instead.

quantileExact

Calculates the exact quantile of a numeric data sequence.

Syntax

quantileExact(level)(expr)

Alias:

  • medianExact

Arguments

  • level (numeric): Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date or DateTime.

Returns:

Quantile of the specified level.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Performance notes:

The function:

  • Calculates the exact quantile by fully sorting the data.
  • Consumes O(n) memory, where n is the number of values passed.
  • Is very effective for a small number of values.

When using multiple quantile* functions with different levels in a query, the internal states are not combined (less efficient). In this case, use the quantiles function instead.

Example

SELECT
	quantileExact(number)
FROM
	numbers(10)

Result:

| quantileExact(number) |
|-----------------------|
|                     5 |

In this example, we calculate the median (50th percentile) of the numbers from 0 to 9.

quantileExactLow

Computes the exact quantile of a numeric data sequence, similar to quantileExact, but with a different method for handling the median and other quantile levels.

Syntax

quantileExactLow(level)(expr)

Alias:

  • medianExactLow

Arguments

  • level (numeric, optional): Level of quantile. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates the median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date, or DateTime.

Returns:

Quantile of the specified level.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Performance notes:

  • All passed values are combined into an array, which is then fully sorted. The sorting algorithm’s complexity is O(N·log(N)), where N is the number of elements.
  • For the median (level = 0.5):
    • With an even number of elements, returns the lower of the two middle values.
    • With an odd number of elements, returns the middle value.
  • For other levels, returns the element at the index corresponding to level * size_of_array.
  • This function is similar to Python’s median_low implementation.

Example

SELECT quantileExactLow(0.1)(number) FROM numbers(10)

Result:

| quantileExactLow(0.1)(number) |
|--------------------------------|
| 1                              |
SELECT quantileExactLow(number) FROM numbers(10)

Result:

| quantileExactLow(number) |
|--------------------------|
| 4                        |

When using multiple quantile* functions with different levels in a query, the internal states are not combined, potentially reducing query efficiency. In such cases, consider using the quantiles function instead.

quantileExactHigh

Calculates the exact high quantile of a numeric data sequence.

Syntax:

quantileExactHigh(level)(expr)

Alias:

  • medianExactHigh

Arguments:

  • level (numeric, optional): Level of quantile. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates the median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date, or DateTime.

Returns:

Quantile of the specified level.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Performance notes:

  • All passed values are combined into an array, which is then fully sorted. The sorting algorithm’s complexity is O(N·log(N)), where N is the number of elements.
  • For the median (level = 0.5):
    • With an even number of elements, returns the higher of the two middle values.
    • With an odd number of elements, returns the middle value.
  • For other levels, returns the element at the index corresponding to level * size_of_array.
  • This function is similar to Python’s median_high implementation.

When using multiple quantile* functions with different levels in a query, the internal states are not combined, potentially reducing query efficiency. In such cases, consider using the quantiles function instead.

Example:

SELECT
	quantileExactHigh(0.75)(taco_price) AS expensive_taco_price
FROM
(
    SELECT number AS taco_price
    FROM numbers(10)
    ARRAY JOIN [5.99, 7.99, 8.99, 9.99, 10.99, 11.99, 12.99, 13.99, 14.99, 15.99] AS taco_prices
);

Result:

| expensive_taco_price |
|----------------------|
| 7                   |

In this example, quantileExactHigh calculates the 75th percentile of taco prices, representing the price threshold for more expensive tacos in the dataset.

quantileExactExclusive

Exactly computes the quantile of a numeric data sequence, using an exclusive method.

This function is equivalent to the PERCENTILE.EXC Excel function.

Syntax:

quantileExactExclusive(level)(expr)

Arguments:

  • level (numeric, optional): Level of quantile. Constant floating-point number from (0, 1) - bounds not included. Default value: 0.5. At level=0.5 the function calculates the median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date, or DateTime.

Returns:

Quantile of the specified level.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

To get the exact value, all passed values are combined into an array, which is then partially sorted. Therefore, the function consumes O(n) memory, where n is the number of values passed. However, for a small number of values, the function is very effective.

When using multiple quantileExactExclusive functions with different levels in a query, the internal states are not combined (i.e., the query works less efficiently than it could). In this case, use the quantilesExactExclusive function.

Example:

SELECT
    quantileExactExclusive(0.6)(price) AS taco_price_60th_percentile
FROM (
    SELECT 5.99 AS price UNION ALL
    SELECT 4.99 UNION ALL
    SELECT 6.99 UNION ALL
    SELECT 7.99 UNION ALL
    SELECT 3.99 UNION ALL
    SELECT 8.99 UNION ALL
    SELECT 9.99 UNION ALL
    SELECT 10.99 UNION ALL
    SELECT 11.99 UNION ALL
    SELECT 12.99
);

Result:

| taco_price_60th_percentile |
|----------------------------|
| 9.59                       |

In this example, we calculate the 60th percentile of taco prices using the exclusive method. The result shows that 60% of tacos are priced below $9.59.

quantileExactInclusive

Exactly computes the quantile of a numeric data sequence.

This function is equivalent to the PERCENTILE.INC Excel function.

Syntax

quantileExactInclusive(level)(expr)

Arguments

  • level (Float64, optional): Level of quantile. Possible values: [0, 1] — bounds included. Default value: 0.5. At level=0.5 the function calculates median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date, or DateTime.

Returns

Quantile of the specified level.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example

SELECT
    quantileExactInclusive(0.6)(taco_price)
FROM
(
    SELECT 3.99 AS taco_price UNION ALL
    SELECT 4.99 AS taco_price UNION ALL
    SELECT 5.99 AS taco_price UNION ALL
    SELECT 6.99 AS taco_price UNION ALL
    SELECT 7.99 AS taco_price UNION ALL
    SELECT 8.99 AS taco_price UNION ALL
    SELECT 9.99 AS taco_price UNION ALL
    SELECT 10.99 AS taco_price UNION ALL
    SELECT 11.99 AS taco_price UNION ALL
    SELECT 12.99 AS taco_price
) AS taco_sales;

Result:

| quantileExactInclusive(0.6)(taco_price)  |
|------------------------------------------|
| 9.39                                     |

In this example, we calculate the 60th percentile of taco prices from our sales data.

  • To get the exact value, all passed values are combined into an array, which is then partially sorted. This consumes O(n) memory, where n is the number of values passed.
  • When using multiple quantileExactInclusive functions with different levels in a query, the internal states are not combined, potentially reducing query efficiency. In such cases, consider using the quantilesExactInclusive function instead.

quantileExactWeighted

Exactly computes the quantile of a numeric data sequence, taking into account the weight of each element.

Syntax:

quantileExactWeighted(level)(expr, weight)

Alias:

  • medianExactWeighted

Arguments:

  • level (numeric, optional): Level of quantile. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date or DateTime.
  • weight (numeric): Column with weights of sequence members. Weight is a number of value occurrences.

Returns:

Quantile of the specified level.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example:

SELECT
    quantileExactWeighted(0.75)(taco_price, order_quantity) AS expensive_taco
FROM
(
    SELECT 3.99 AS taco_price, 1 AS order_quantity UNION ALL
    SELECT 4.99 AS taco_price, 2 AS order_quantity UNION ALL
    SELECT 5.99 AS taco_price, 3 AS order_quantity UNION ALL
    SELECT 6.99 AS taco_price, 4 AS order_quantity UNION ALL
    SELECT 7.99 AS taco_price, 5 AS order_quantity UNION ALL
    SELECT 8.99 AS taco_price, 6 AS order_quantity UNION ALL
    SELECT 9.99 AS taco_price, 7 AS order_quantity UNION ALL
    SELECT 10.99 AS taco_price, 8 AS order_quantity UNION ALL
    SELECT 11.99 AS taco_price, 9 AS order_quantity UNION ALL
    SELECT 12.99 AS taco_price, 10 AS order_quantity
) AS taco_sales;

Result:

| expensive_taco |
|----------------|
| 11.99          |

In this example, we calculate the 75th percentile of taco prices, weighted by the order quantity. This gives us an idea of what price point represents an “expensive” taco in our sales data.

  • This function uses less memory than quantileExact when passed values are frequently repeated, as it uses a hash table in its algorithm.
  • When using multiple quantile* functions with different levels in a query, the internal states are not combined, potentially reducing query efficiency. In such cases, consider using the quantiles function instead.

quantileGK

Computes the quantile of a numeric data sequence using the Greenwald-Khanna algorithm. This function allows you to control the accuracy of the approximate quantile result.

Syntax:

quantileGK(accuracy, level)(expr)

Alias:

  • medianGK

Arguments:

  • accuracy (UInt8): Accuracy of quantile. Constant positive integer. Larger values mean less error. For example, if set to 100, the computed quantile will have an error no greater than 1% with high probability.
  • level (numeric, optional): Level of quantile. Constant floating-point number from 0 to 1. Default value: 0.5. At level=0.5 the function calculates the median.
  • expr (numeric, Date, or DateTime): Expression over column values resulting in numeric data types, Date, or DateTime.

Returns:

Quantile of the specified level and accuracy.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example:

SELECT
	quantileGK(100, 0.25)(taco_price) AS quarter_price
FROM
(
    SELECT 1 AS taco_id, 2.99 AS taco_price UNION ALL
    SELECT 2 AS taco_id, 3.49 AS taco_price UNION ALL
    SELECT 3 AS taco_id, 4.99 AS taco_price UNION ALL
    SELECT 4 AS taco_id, 5.49 AS taco_price UNION ALL
    SELECT 5 AS taco_id, 6.99 AS taco_price UNION ALL
    SELECT 6 AS taco_id, 7.49 AS taco_price UNION ALL
    SELECT 7 AS taco_id, 8.99 AS taco_price UNION ALL
    SELECT 8 AS taco_id, 9.49 AS taco_price UNION ALL
    SELECT 9 AS taco_id, 10.99 AS taco_price UNION ALL
    SELECT 10 AS taco_id, 11.49 AS taco_price
) AS taco_sales;

Result:

| quarter_price |
|---------------|
| 4.99          |

In this example, quantileGK calculates the 25th percentile (first quartile) of taco prices with an accuracy setting of 100.

There’s a trade-off between accuracy and computational complexity. Higher accuracy requires more memory and computational resources but provides more precise results. Lower accuracy allows for faster and more memory-efficient computation but with slightly lower precision.

quantileInterpolatedWeighted

Computes the quantile of a numeric data sequence using linear interpolation, taking into account the weight of each element.

Syntax:

quantileInterpolatedWeighted(level)(expr, weight)

Alias:

  • medianInterpolatedWeighted

Arguments:

  • level (numeric, optional): Level of quantile. Constant floating-point number from 0 to 1. Default value: 0.5. At level=0.5 the function calculates the median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date, or DateTime.
  • weight (numeric): Column with weights of sequence members. Weight is a number of value occurrences.

Returns:

The quantile of the specified level.

  • For numeric data type input: Float64
  • For Date input: Date
  • For DateTime input: DateTime

Example:

Let’s say we have a table of taco orders with their prices and quantities:

SELECT
	quantileInterpolatedWeighted(0.75)(price, quantity) AS weighted_75th_percentile
FROM
(
    SELECT 2.50 AS price, 100 AS quantity UNION ALL
    SELECT 3.00 AS price, 50 AS quantity UNION ALL
    SELECT 3.50 AS price, 75 AS quantity UNION ALL
    SELECT 4.00 AS price, 25 AS quantity UNION ALL
    SELECT 4.50 AS price, 60 AS quantity UNION ALL
    SELECT 5.00 AS price, 80 AS quantity UNION ALL
    SELECT 5.50 AS price, 90 AS quantity UNION ALL
    SELECT 6.00 AS price, 40 AS quantity UNION ALL
    SELECT 6.50 AS price, 30 AS quantity UNION ALL
    SELECT 7.00 AS price, 20 AS quantity
) AS taco_orders;

Result:

| weighted_75th_percentile |
|--------------------------|
| 5.455882352941177        |

In this example, we calculate the 75th percentile of taco prices, weighted by the quantity ordered. This gives us an idea of the price point at which 75% of the tacos (by quantity) are sold at or below.

When using multiple quantile* functions with different levels in a query, the internal states are not combined, which may impact query efficiency. In such cases, consider using the quantiles function instead.

quantiles

Calculates multiple quantiles of a numeric data sequence in one pass.

Syntax:

quantiles(level1, level2, ...)(expr)

Arguments:

  • level1, level2, ... (numeric): Levels of quantiles to compute. Each level should be a float between 0 and 1.
  • expr (numeric): Expression over the column values. Typically a column name.

Returns:

An array of computed quantiles corresponding to the specified levels.

Example:

SELECT
	quantiles(0.25, 0.5, 0.75)(taco_price) AS taco_price_quantiles
FROM
(
    SELECT 1 AS taco_id, 2.99 AS taco_price UNION ALL
    SELECT 2 AS taco_id, 3.49 AS taco_price UNION ALL
    SELECT 3 AS taco_id, 4.99 AS taco_price UNION ALL
    SELECT 4 AS taco_id, 5.49 AS taco_price UNION ALL
    SELECT 5 AS taco_id, 6.99 AS taco_price UNION ALL
    SELECT 6 AS taco_id, 7.49 AS taco_price UNION ALL
    SELECT 7 AS taco_id, 8.99 AS taco_price UNION ALL
    SELECT 8 AS taco_id, 9.49 AS taco_price UNION ALL
    SELECT 9 AS taco_id, 10.99 AS taco_price UNION ALL
    SELECT 10 AS taco_id, 11.49 AS taco_price
) AS taco_sales;

Result:

| taco_price_quantiles |
|----------------------|
| [5.115,7.24,9.365]   |

In this example, we calculate the 25th, 50th (median), and 75th percentiles of taco prices from the taco_sales table.

This function is more efficient than calling quantile multiple times for different levels, as it computes all quantiles in a single pass over the data.

quantileTDigest

Computes an approximate quantile of a numeric data sequence using the t-digest algorithm.

Syntax:

quantileTDigest(level)(expr)

Alias:

  • medianTDigest

Arguments:

  • level (numeric, optional): Level of quantile. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date, or DateTime.

Returns:

Approximate quantile of the specified level.

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example:

SELECT
	quantileTDigest(0.75)(taco_price) AS expensive_taco_price
FROM
(
    SELECT 1 AS taco_id, 2.99 AS taco_price UNION ALL
    SELECT 2 AS taco_id, 3.49 AS taco_price UNION ALL
    SELECT 3 AS taco_id, 4.99 AS taco_price UNION ALL
    SELECT 4 AS taco_id, 5.49 AS taco_price UNION ALL
    SELECT 5 AS taco_id, 6.99 AS taco_price UNION ALL
    SELECT 6 AS taco_id, 7.49 AS taco_price UNION ALL
    SELECT 7 AS taco_id, 8.99 AS taco_price UNION ALL
    SELECT 8 AS taco_id, 9.49 AS taco_price UNION ALL
    SELECT 9 AS taco_id, 10.99 AS taco_price UNION ALL
    SELECT 10 AS taco_id, 11.49 AS taco_price
) AS taco_sales;

Result:

| expensive_taco_price |
|----------------------|
| 9.49                 |

This example calculates the 75th percentile of taco prices, which could be considered the price point for more expensive tacos.

  • Memory consumption is log(n), where n is the number of values.
  • The result depends on the order of running the query and is nondeterministic.
  • Performance is lower than quantile or quantileTiming, but it offers a better ratio of state size to precision.
  • When using multiple quantile* functions with different levels in a query, the internal states are not combined. In this case, use the quantiles function for better efficiency.

quantileTDigestWeighted

Computes an approximate quantile of a numeric data sequence using the t-digest algorithm, taking into account the weight of each sequence member. The maximum error is 1%. Memory consumption is log(n), where n is the number of values.

Syntax:

quantileTDigestWeighted(level)(expr, weight)

Alias:

  • medianTDigestWeighted

Arguments:

  • level (numeric, optional): Level of quantile. Constant floating-point number from 0 to 1. Default value: 0.5. At level=0.5 the function calculates the median.
  • expr (numeric, Date, or DateTime): Expression over the column values resulting in numeric data types, Date, or DateTime.
  • weight (numeric): Column with weights of sequence elements. Weight is the number of value occurrences.

Returns:

  • Approximate quantile of the specified level.

Return Type:

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example:

SELECT
	quantileTDigestWeighted(0.75)(taco_price, order_quantity)
FROM
	taco_sales;

This query calculates the 75th percentile of taco prices, weighted by the order quantity.

  1. The performance is lower than quantile or quantileTiming.
  2. The result depends on the order of running the query and is nondeterministic.
  3. When using multiple quantile* functions with different levels in a query, the internal states are not combined.
  4. Not recommended for tiny data sets as it can lead to significant error. Consider using quantileTDigest for small datasets.

quantileTiming

Calculates the quantile of a numeric data sequence with determined precision. This function is optimized for distributions like web page loading times or backend response times.

Syntax:

quantileTiming(level)(expr)

Alias:

  • medianTiming

Arguments:

  • level (numeric, optional): Level of quantile. Constant float from 0 to 1. Default: 0.5 (median). Recommended range: [0.01, 0.99].
  • expr (numeric): Expression over column values returning a numeric type.

Returns:

Quantile of the specified level. (Float32)

  • Negative values lead to undefined behavior.
  • Values greater than 30,000 are capped at 30,000.
  • Result is deterministic (not affected by query processing order).
  • For multiple quantile* functions with different levels, use quantiles for better efficiency.
  • Precise if total values ≤ 5670.
  • Precise if total values > 5670 and value < 1024ms.
  • Otherwise, rounded to nearest multiple of 16ms.

Example:

SELECT
    quantileTiming(0.75)(response_time) AS p95_response_time
FROM
(
    SELECT 100 AS response_time UNION ALL
    SELECT 200 AS response_time UNION ALL
    SELECT 300 AS response_time UNION ALL
    SELECT 400 AS response_time UNION ALL
    SELECT 500 AS response_time UNION ALL
    SELECT 600 AS response_time UNION ALL
    SELECT 700 AS response_time UNION ALL
    SELECT 800 AS response_time UNION ALL
    SELECT 900 AS response_time UNION ALL
    SELECT 1000 AS response_time
) AS taco_order_metrics;

Result:

| p95_response_time |
|-------------------|
| 800               |

This example calculates the 75th percentile of taco order response times.

If no values are passed (e.g., when using quantileTimingIf), the function returns NaN.

quantileTimingWeighted

Calculates the weighted quantile of a numeric data sequence with a specified level of precision. This function is optimized for distributions like web page loading times or backend response times.

Syntax:

quantileTimingWeighted(level)(expr, weight)

Alias:

  • medianTimingWeighted

Arguments:

  • level (numeric, optional): Level of quantile. Constant float from 0 to 1. Default: 0.5 (median). Recommended range: [0.01, 0.99].
  • expr (numeric): Expression over column values returning a Float* type number.
    • Negative values produce undefined behavior.
    • Values greater than 30,000 (30 seconds) are capped at 30,000.
  • weight (numeric): Column with weights of sequence elements (number of value occurrences).

Returns:

Quantile of the specified level. (Float32)

  • Results are deterministic (independent of query processing order).
  • For page loading time quantiles, this function is more effective and accurate than quantile.
  • If no values are passed (when using quantileTimingIf), it returns NaN.

Calculation is accurate if:

  • Total number of values ≤ 5670.
  • Total number of values > 5670, but page loading time < 1024ms.

Otherwise, the result is rounded to the nearest multiple of 16ms.

Example:

Calculate the weighted median response time for taco orders:

SELECT
	quantileTimingWeighted(0.5)(response_time, order_count) AS median_response_time
FROM
(
    SELECT 100 AS response_time, 5 AS order_count UNION ALL
    SELECT 200 AS response_time, 3 AS order_count UNION ALL
    SELECT 300 AS response_time, 7 AS order_count UNION ALL
    SELECT 400 AS response_time, 2 AS order_count UNION ALL
    SELECT 500 AS response_time, 6 AS order_count UNION ALL
    SELECT 600 AS response_time, 4 AS order_count UNION ALL
    SELECT 700 AS response_time, 8 AS order_count UNION ALL
    SELECT 800 AS response_time, 1 AS order_count UNION ALL
    SELECT 900 AS response_time, 9 AS order_count UNION ALL
    SELECT 1000 AS response_time, 10 AS order_count
) AS taco_order_stats;

Result:

| median_response_time |
|----------------------|
| 700                  |

This example calculates the weighted median response time for taco orders, where response_time is the time taken to prepare an order, and order_count is the number of times that particular response time occurred.

When using multiple quantile* functions with different levels in a query, internal states are not combined, reducing efficiency. In such cases, consider using the quantiles function instead.

quantilesTimingWeighted

Calculates the weighted quantiles of a numeric data sequence with a specified level of precision. This function is optimized for distributions like web page loading times or backend response times.

Syntax:

quantilesTimingWeighted(level1, level2, ...)(expr, weight)

Arguments:

  • level1, level2, ... (numeric): Levels of quantiles to compute. Optional. Constant floating-point numbers from 0 to 1. Recommended range: [0.01, 0.99].
  • expr (numeric): Expression over column values returning a numeric data type.
  • weight (numeric): Column with weights of sequence elements. Weight represents the number of value occurrences.

Returns:

An array of quantiles corresponding to the specified levels.

  • For values greater than 30,000 (representing times > 30 seconds), the function assumes 30,000.
  • Negative input values lead to undefined behavior.
  • The calculation is accurate for up to 5,670 total values or for times less than 1,024ms when exceeding 5,670 values.
  • For larger datasets, results are rounded to the nearest multiple of 16ms.

Example:

SELECT
	quantilesTimingWeighted(0.5, 0.95, 0.99)
		(response_time, order_count) AS taco_response_quantiles
FROM
(
    SELECT 100 AS response_time, 5 AS order_count UNION ALL
    SELECT 200 AS response_time, 3 AS order_count UNION ALL
    SELECT 300 AS response_time, 7 AS order_count UNION ALL
    SELECT 400 AS response_time, 2 AS order_count UNION ALL
    SELECT 500 AS response_time, 6 AS order_count UNION ALL
    SELECT 600 AS response_time, 4 AS order_count UNION ALL
    SELECT 700 AS response_time, 8 AS order_count UNION ALL
    SELECT 800 AS response_time, 1 AS order_count UNION ALL
    SELECT 900 AS response_time, 9 AS order_count UNION ALL
    SELECT 1000 AS response_time, 10 AS order_count
) AS taco_delivery_stats;

Result:

taco_response_quantiles
[700,1000,1000]

In this example:
- `response_time` represents the delivery time for taco orders in milliseconds.
- `order_count` is the weight, indicating how many orders had that response time.
- The result shows the median (50th percentile), 95th percentile, and 99th percentile of taco delivery response times.

This function is particularly useful for analyzing performance metrics, helping identify typical, above-average, and extreme delivery times while accounting for the frequency of each time value.