Example section for showcasing API endpoints
In this guide, you’ll learn how to select the optimal partition key for your table in Propel’s Serverless ClickHouse.
If your table is smaller than 10 GB, you don’t need to customize the partition key and can skip this guide. 🥳
ClickHouse can easily handle tables with hundreds of millions of rows; however, for tables that exceed 10 GB in storage, it’s recommended to define an optional partition key that will organize the table’s rows into a few logical groups, or partitions. ClickHouse can then operate on the smaller partitions individually, instead of the entire table, speeding up data access and manipulation.
The number of partitions should not be too many, since accessing too many partitions in a single query can hurt performance or fail the query entirely. Ideally, INSERT and SELECT queries should access one or a few partitions at a time. For this reason, partition keys should be low cardinality, resulting in dozens up to hundreds of partitions, not thousands. For example, typical partition keys will partition by month, by date, or event type.
Diagram of a ClickHouse table. The table contains partitions, and each partition contains parts.
In the following section, we will provide you with guidelines and considerations to keep in mind when selecting the most suitable partition key for your table.
First, let’s begin with some general guidelines that apply regardless of the data size and table engine:
Partitioning is usually unnecessary for tables smaller than 10 gigabytes. Having no partition is better than having the wrong one.
So, if your table has 5 GB of data, for example, you can leave it empty and stop reading 🥳.
For larger MergeTree tables (greater than 10 GB), it’s crucial to select an appropriate partition key. As a general guideline, a single partition for MergeTree tables should be no more than 300 GB.
The following recommendations vary depending on the type of data: time series or incremental data.
For time series data, you should choose a partition key based on your primary timestamp and data volume. Most of the time monthly partitioning is sufficient, and so Propel recommends this as a default, but if monthly data would exceed 300 GB, you should use weekly partitioning; etc.
Given a timestamp
column, you can configure these partitioning schemes as follows:
Monthly: toYYYYMM(timestamp)
Weekly: toStartOfWeek(timestamp)
Daily: toDate(timestamp)
Hourly: toStartOfHour(timestamp)
For sequential records with an increasing numeric column, you can divide the numeric column by a large number to assign it a partition. The size of the divisor should depend on how quickly the numeric column increases.
Given an increasing numeric column order_id
, setting the partition key to intDiv(order_id, 1000000)
will generate the following partitions:
order_id
values 0–1,000,000order_id
values 1,000,000–2,000,000For larger ReplacingMergeTree tables, we recommend partitions to be a bit smaller, up to 50 GB.
For instance, if you have a large “users” dimension table (greater than 10 GB) and queries always filter on a user_id
UUID column, the recommended partition key would be xxHash64(user_id) % 10
. This will generate up to 10 partitions, with user_ids
consistently distributed throughout.
For SummingMergeTree and AggregatingMergeTree tables, we recommend partitions to be up to 50 GB; however, because these tables are often small, due to aggregation, you rarely need to partition.
If your table aggregates by time, follow the instructions for time series data above. If your table aggregates by, for example, user_id
, follow the instructions for mutable data above.
Can you change the partition key of a table?
No, once a table is created in ClickHouse, the partition key cannot be changed directly. However, if you need to change the partition key, you have a couple of options:
What is a part in ClickHouse?
In ClickHouse, a part is a piece of a table that stores rows. Each part is represented as a single folder with columns. Parts are not the same as partitions; they are physical entities, while partitions are virtual.
Are parts the same as partitions?
No, parts and partitions in ClickHouse are not the same. Parts are physical entities that store rows of a table, with each part represented as a single folder with columns. On the other hand, partitions are virtual entities that don’t have a physical representation. However, you can associate certain parts with the same partition based on the partition key.
Example section for showcasing API endpoints
In this guide, you’ll learn how to select the optimal partition key for your table in Propel’s Serverless ClickHouse.
If your table is smaller than 10 GB, you don’t need to customize the partition key and can skip this guide. 🥳
ClickHouse can easily handle tables with hundreds of millions of rows; however, for tables that exceed 10 GB in storage, it’s recommended to define an optional partition key that will organize the table’s rows into a few logical groups, or partitions. ClickHouse can then operate on the smaller partitions individually, instead of the entire table, speeding up data access and manipulation.
The number of partitions should not be too many, since accessing too many partitions in a single query can hurt performance or fail the query entirely. Ideally, INSERT and SELECT queries should access one or a few partitions at a time. For this reason, partition keys should be low cardinality, resulting in dozens up to hundreds of partitions, not thousands. For example, typical partition keys will partition by month, by date, or event type.
Diagram of a ClickHouse table. The table contains partitions, and each partition contains parts.
In the following section, we will provide you with guidelines and considerations to keep in mind when selecting the most suitable partition key for your table.
First, let’s begin with some general guidelines that apply regardless of the data size and table engine:
Partitioning is usually unnecessary for tables smaller than 10 gigabytes. Having no partition is better than having the wrong one.
So, if your table has 5 GB of data, for example, you can leave it empty and stop reading 🥳.
For larger MergeTree tables (greater than 10 GB), it’s crucial to select an appropriate partition key. As a general guideline, a single partition for MergeTree tables should be no more than 300 GB.
The following recommendations vary depending on the type of data: time series or incremental data.
For time series data, you should choose a partition key based on your primary timestamp and data volume. Most of the time monthly partitioning is sufficient, and so Propel recommends this as a default, but if monthly data would exceed 300 GB, you should use weekly partitioning; etc.
Given a timestamp
column, you can configure these partitioning schemes as follows:
Monthly: toYYYYMM(timestamp)
Weekly: toStartOfWeek(timestamp)
Daily: toDate(timestamp)
Hourly: toStartOfHour(timestamp)
For sequential records with an increasing numeric column, you can divide the numeric column by a large number to assign it a partition. The size of the divisor should depend on how quickly the numeric column increases.
Given an increasing numeric column order_id
, setting the partition key to intDiv(order_id, 1000000)
will generate the following partitions:
order_id
values 0–1,000,000order_id
values 1,000,000–2,000,000For larger ReplacingMergeTree tables, we recommend partitions to be a bit smaller, up to 50 GB.
For instance, if you have a large “users” dimension table (greater than 10 GB) and queries always filter on a user_id
UUID column, the recommended partition key would be xxHash64(user_id) % 10
. This will generate up to 10 partitions, with user_ids
consistently distributed throughout.
For SummingMergeTree and AggregatingMergeTree tables, we recommend partitions to be up to 50 GB; however, because these tables are often small, due to aggregation, you rarely need to partition.
If your table aggregates by time, follow the instructions for time series data above. If your table aggregates by, for example, user_id
, follow the instructions for mutable data above.
Can you change the partition key of a table?
No, once a table is created in ClickHouse, the partition key cannot be changed directly. However, if you need to change the partition key, you have a couple of options:
What is a part in ClickHouse?
In ClickHouse, a part is a piece of a table that stores rows. Each part is represented as a single folder with columns. Parts are not the same as partitions; they are physical entities, while partitions are virtual.
Are parts the same as partitions?
No, parts and partitions in ClickHouse are not the same. Parts are physical entities that store rows of a table, with each part represented as a single folder with columns. On the other hand, partitions are virtual entities that don’t have a physical representation. However, you can associate certain parts with the same partition based on the partition key.