Background
This section provides some background on ClickHouse table engines and the role the sorting key plays while organizing data in tables.👉 How to choose the right table engine and sorting key .
Understanding table engines
Table engines determine how ClickHouse tables store, process, read, and update their data. Table engines are organized in families. The main family (and the one that Propel supports) is the MergeTree family. Other, less common table engines are beyond the scope of this guide.Overview of the MergeTree table engines
The MergeTree is ClickHouse’s primary table engine family. MergeTree tables optimize for fast inserts and reads. They achieve this by storing the inserted data in parts and merging those parts in the background. The various MergeTree tables differ in their merging approach, with each having unique merging rules. The table types include:- MergeTree: The most common engine in the MergeTree family. It is designed for append-only, immutable data, as merging consolidates parts but never deletes records.
- ReplacingMergeTree: A variation of the MergeTree engine. When merging data parts, it replaces old rows with new ones that have the same sorting key.
- SummingMergeTree: This engine is designed for use cases where you need to calculate the total for numeric columns. When merging data parts, it sums up columns for rows with the same sorting key.
- AggregatingMergeTree: This engine is ideal for incremental data aggregation. When merging data parts, it aggregates columns for rows with the same sorting key.
- CollapsingMergeTree: This engine is designed for scenarios where you need to account for a record’s state changes. When merging data parts, it deletes (”collapses”) pairs of records with the same sorting key and opposite sign column.
- VersionedCollapsingMergeTree: This engine is an extension of the CollapsingMergeTree. It uses a different collapsing algorithm that allows inserting the data in any order.
Understanding the sorting key
The sorting key in ClickHouse is a set of one or more columns that ClickHouse uses to organize the rows within a table. The sorting key determines the order of the rows in the table, influences how parts are merged, and significantly impacts the query performance. If the rows are sorted well, ClickHouse can efficiently skip over unneeded rows and thus optimize query performance.How to choose the right table engine and sorting key
In this section, we’ll guide you through how to choose the right table engine and sorting key for your table.What type of data do you have?
The first step is to figure out what type of data you have.
- Append-only data: This type of data refers to data that is only added to and never modified or deleted. Once a record is inserted, it remains unchanged. Examples include:
- Events
- Log data
- Transaction records
- Time-series data
- Any immutable data
- Mutable records: These are records that can be updated or changed after they’ve been inserted. Examples include:
- Dimension data
- Entity data like customers, employees, users, products, etc.
- Pre-aggregated data where rows are periodically updated
- Incremental aggregations: This type of data refers to aggregations that are calculated incrementally as data is written. Examples include:
- Real-time game leaderboard
- Real-time e-commerce sales KPI
Append-only data
The recommended table engine for append-only data is the MergeTree**,** which is the most common choice in ClickHouse. It provides fast inserts and great query performance.
Identify primary timestamp
created_at
date. If yes, include it in the sorting key.Identify frequently filtered columns
customer_id
or workspace_id
columns for multi-tenant apps. You don’t need to include all columns you are going to filter on, only the filters that you are always (or most often) going to use. If you have such columns, place them to the left of the timestamp in the sorting key.Order columns by cardinality
Consider hierarchical relationships
continent
, country
, then city name
.Example: Sorting key for an events table in a multi-tenant app
The sorting key for an events table will depend on the specific schema and query access patterns. For example, imagine our application supports multiple customers. Each customer can have multiple workspaces, with events belonging to individual workspaces. Then, the schema for our events table could look like this:created_at
(timestamp)customer_id
(string)workspace_id
(string)event_type
(string)payload
(JSON)
customer_id
, workspace_id
, then created_at
.

The
customer_id
is the lowest granularity so it should go first. The workspace_id
belongs to a customer, so they have a tree-like relationship, so it should go next. Once the data is reduced to a given workspace_id
of a customer_id
, we’d want it ordered by time, adding the created_at
last.
This way, your data will be sorted like the table below:

- You can provide just the
customer_id
or both thecustomer_id
andworkspace_id
in the filters or GROUP BYs and still benefit from the sorting key as it reduces rows read. - If you just provide the
created_at
, you will NOT take advantage of the sorting key, as Propel will have to scan the full table. - With Materialized Views, you can create different versions of the same table with different sorting keys to support different query patterns.
Mutable records
The recommended table engine for mutable records is the ReplacingMergeTree. It provides fast inserts, handles updates, and provides great query performance for medium-sized tables (up to 100 million records). For tables over 100 million records, see the “Handling large updating tables” section below.
Determine uniqueness column
- If it is something like
user_id
, include it in the sorting key. - If it is a combined uniqueness key, like an
order_id
and aproduct_id
, include them all in the sorting key.

Check for version or updated_at column
version
or updated_at
column that increases with each update to the record?If yes, include it in the ReplacingMergeTree version column.ver
, is an optional column that allows you to control which
version of a row Propel considers ‘newest’. If unspecified, Propel considers
the most recently inserted row newest.Propel automatically filters out older versions of rows when querying and periodically
deletes them in the background during merges.
Consider table size
Example: Sorting key for a “Users” entity table
The sorting key for an entity table will depend on how uniqueness is defined. Consider the schema below:user_id
(string)created_at
(string)first_name
(string)last_name
(string)email
(string)
user_id
determines uniqueness. Therefore, your sorting key should be just user_id
.

Putting the
user_id
in the sorting key will enforce uniqueness during merges and allow for efficient user lookups by ID.
This way, your data will be sorted like the table below:

Handling large updating tables
If you have updatable data that is over 100 million records with mutable data that needs to be queried fast, the following questions will help you choose a sorting key and understand the tradeoffs.-
Do you need to filter or aggregate your data by time? Like, “Give me all the users created last month”? Does your data have a
created_at
timestamp column?-
If so, you can add the
created_at
to the sorting key to the left of theuser_id
from the example above. Take into account that:- The
created_at
column cannot be mutable. - The
created_at
column will now be part of the uniqueness definition.
created_at
,user_id
.
- The
-
If so, you can add the
-
Do you need to filter or aggregate your data by any other dimension? For example, “Give me all the users that signed up on mobile with an iOS device,” determined by a
signup_channel
andsignup_os
columns.-
If so, you can add the
signup_channel
andsignup_os
columns to the sorting key to the left of thecreated_at
and/oruser_id
from the examples above. Again, take into account that:- The
signup_channel
andsignup_os
columns cannot be mutable. - The
signup_channel
andsignup_os
columns will now be part of the uniqueness definition.
signup_channel
,signup_os
,created_at
,user_id
. - The
-
If so, you can add the
Incremental aggregations
The recommended table engine for incremental aggregations are the SummingMergeTree and AggregatingMergeTree, depending on the type of aggregations you need to do. For simpleSUM
and COUNT
aggregations, the SummingMergeTree will do. For other types of aggregations, like AVG
, MAX
, MIN
, PERCENTILES
, UNIQUES
, and others, we recommend you use the AggregatingMergeTree.

Identify grouping columns
Choose aggregation type
SUM
or COUNT
aggregations, which would indicate using the SummingMergeTree.Configure summarization (for SummingMergeTree)
Example: Sorting key for a “Visits” aggregated table
The sorting key for an incremental aggregation table will include the dimensions you need to group by. Consider the schema below:domain
(string)path
(string)visits
(Int)
domain
and path
.
Putting the domain
and path
in the sorting key will “roll up” the visits by two columns.

This way, your data will be sorted like the table below:

Frequently asked questions
What is the difference between the sorting key and the primary key?
What is the difference between the sorting key and the primary key?
- The sorting key is used to sort the rows stored within each table part. The sorting key also influences the merge behavior of the MergeTree family of table engines, as described above.
- The primary key is used for indexing. ClickHouse builds a sparse primary index that allows it to quickly skip over irrelevant rows during a query, based on the primary key.
What happens if I don't define a sorting key for my table?
What happens if I don't define a sorting key for my table?
How can I change the sorting key of an existing table?
How can I change the sorting key of an existing table?
Do merges happen on a given schedule? Is it configurable?
Do merges happen on a given schedule? Is it configurable?
Can a table have multiple sorting keys to support different access patterns?
Can a table have multiple sorting keys to support different access patterns?
What happens if I use mutable columns in my sorting key?
What happens if I use mutable columns in my sorting key?