Skip to main content

When and how to choose a partition key

When and How to Choose a Partition Key in ClickHouse

In this guide, you’ll learn how to select the optimal partition key for your table in Propel’s Serverless ClickHouse.

💡TL/DR:
If your table is smaller than 10 GB, you don’t need to customize the partition key and can skip this guide. 🥳.

Understanding the partition key

ClickHouse can easily handle tables with hundreds of millions of rows; however, for tables that exceed 10 GB in storage, it’s recommended to define an optional partition key that will organize the table’s rows into a few logical groups, or partitions. ClickHouse can then operate on the smaller partitions individually, instead of the entire table, speeding up data access and manipulation.

The number of partitions should not be too many, since accessing too many partitions in a single query can hurt performance or fail the query entirely. Ideally, INSERT and SELECT queries should access one or a few partitions at a time. For this reason, partition keys should be low cardinality, resulting in dozens up to hundreds of partitions, not thousands. For example, typical partition keys will partition by month, by date, or event type.

graph TD T["Table"] --> PN1["Partition 1"] T --> PN2["Partition 2"] T -.-> PNe["…"] T --> PNn["Partition N"] PN2 --> PT1["Part 1"] PN2 --> PT2["Part 2"] PN2 -.-> PTe["…"] PN2 --> PTn["Part N"] style T fill:#E1D9FF style PN1 fill:#DADCFF style PN2 fill:#DADCFF style PNe fill:#DADCFF style PNn fill:#DADCFF style PT1 fill:#D2DEFF style PT2 fill:#D2DEFF style PTe fill:#D2DEFF style PTn fill:#D2DEFF

Diagram of a ClickHouse table. The table contains partitions, and each partition contains parts.

How to choose the right partition key for your table

In the following section, we will provide you with guidelines and considerations to keep in mind when selecting the most suitable partition key for your table.

First, let's begin with some general guidelines that apply regardless of the data size and table engine:

  • A SELECT query should not touch more than a few dozen partitions.
  • The number of partitions in a table should be in the dozens or hundreds, not thousands.
  • A single INSERT should deliver data to one or a few partitions.

Small tables (smaller than 10 GB)

Partitioning is usually unnecessary for tables smaller than 10 gigabytes. Having no partition is better than having the wrong one.

So, if your table has 5 GB of data, for example, you can leave it empty and stop reading 🥳.

Append-only data in MergeTree tables

For larger MergeTree tables (greater than 10 GB), it's crucial to select an appropriate partition key. As a general guideline, a single partition for MergeTree tables should be no more than 300 GB.

The following recommendations vary depending on the type of data: time series or incremental data.

Time series data

For time series data, you should choose a partition key based on your primary timestamp and data volume. Most of the time monthly partitioning is sufficient, and so Propel recommends this as a default, but if monthly data would exceed 300 GB, you should use weekly partitioning; etc.

Given a timestamp column, you can configure these partitioning schemes as follows:

  • Monthly: toYYYYMM(timestamp)

    Monthly partition key in ClickHouse


  • Weekly: toStartOfWeek(timestamp)

    Weekly partition key in ClickHouse


  • Daily: toDate(timestamp)

    Daily partition key in ClickHouse


  • Hourly: toStartOfHour(timestamp)

    Hourly partition key in ClickHouse


Incremental data

For sequential records with an increasing numeric column, you can divide the numeric column by a large number to assign it a partition. The size of the divisor should depend on how quickly the numeric column increases.

Given an increasing numeric column order_id, setting the partition key to intDiv(order_id, 1000000) will generate the following partitions:

  • Partition 1: order_id values 0–1,000,000
  • Partition 2: order_id values 1,000,000–2,000,000
  • etc.

Incremental data partition key in ClickHouse


Mutable data in ReplacingMergeTree tables

For larger ReplacingMergeTree tables, we recommend partitions to be a bit smaller, up to 50 GB.

For instance, if you have a large "users" dimension table (greater than 10 GB) and queries always filter on a user_id UUID column, the recommended partition key would be xxHash64(user_id) % 10. This will generate up to 10 partitions, with user_ids consistently distributed throughout.

Large scale mutable data partition key in ClickHouse


Aggregations in Summing- or Aggregating-MergeTree tables

For SummingMergeTree and AggregatingMergeTree tables, we recommend partitions to be up to 50 GB; however, because these tables are often small, due to aggregation, you rarely need to partition.

If your table aggregates by time, follow the instructions for time series data above. If your table aggregates by, for example, user_id, follow the instructions for mutable data above.

Frequently asked questions

Can you change the partition key of a table?

No, once a table is created in ClickHouse, the partition key cannot be changed directly. However, if you need to change the partition key, you have a couple of options:

  1. You can create a new table with the desired partition key and move the data from the old table to the new one using a copy job.
  2. Alternatively, you can create a materialized view with a destination Data Pool with a different partition key. This allows you to have a constantly updated new table that mirrors the original data with a different partitioning scheme.

What is a part in ClickHouse?

In ClickHouse, a part is a piece of a table that stores rows. Each part is represented as a single folder with columns. Parts are not the same as partitions; they are physical entities, while partitions are virtual.

Are parts the same as partitions?

No, parts and partitions in ClickHouse are not the same. Parts are physical entities that store rows of a table, with each part represented as a single folder with columns. On the other hand, partitions are virtual entities that don't have a physical representation. However, you can associate certain parts with the same partition based on the partition key.


Further reading