Skip to main content

Deleting data

deleting Data in ClickHouse

Propel’s Serverless ClickHouse supports real-time deletes and batch deletes for deleting data. Read on to learn when and how to use each method.

Real-time deletes

Real-time deletes allow you to delete existing rows in a table, which are immediately excluded when querying.

Use cases

  • Delete rows in a table and immediately exclude them from queries
  • Delete entities (users, accounts, etc.) or dimensions

How it works

Real-time deletes in Propel are supported by ClickHouse’s ReplacingMergeTree table engine. Deletes do not happen transactionally. Instead, they happen asynchronously, where Propel uses a combination of filter-on-read and background jobs to perform the deletes.

When a row is deleted, a new record is written with an internal _propel_is_deleted column set to true. Propel deletes these records in the background, asynchronously. To ensure that deleted data is never returned in queries, Propel automatically performs a filter-on-read when querying ReplacingMergeTree tables. This ensures that no deleted record is returned, even if the background delete operation has not yet occurred.

However, it's important to note that there are two implications:

  1. The row count in ReplacingMergeTree tables may be larger than the logical number of rows. This is because there may be duplicate rows in the table that haven’t been deleted yet. These rows will be de-duplicated at query time.
  2. Filtering deleted records at query time has performance implications on read operations. This can be compensated using a larger Propeller.

Create a Data Pool to handle real-time deletes

In this section, we will guide you through the process of creating a Data Pool that can handle real-time deletes. We will provide step-by-step instructions on how to create it via the Console, API, and Terraform.


To create a Data Pool that can handle real-time deletes in Console, first go to the “Data Pools” section, then click ”Create Data Pool”.

Propel's Data Pools Console page

Select the Data Pool type you need to create.

Create new Data Pool

In the “Table Settings” step, select “Mutable Data”.

ClickHouse table settings configuration

Answer the questions on how to uniquely identify records and your query patterns. For more information, read the “Mutable records” section of the How to select a table engine and sorting key guide.

ClickHouse table settings configuration questions

Next, you will get the suggested table settings. Note that the table engine is ReplacingMergeTree, and the sorting key determines the uniqueness of the row.

Suggested ClickHouse table settings

Click “Next” to finish setting up the Data Pool. Once it is created, you can go to the “Details” tab and verify the settings.

Data Pool details


Batch deletes

Batch deletes are useful for one-off or automated jobs where you need to delete many rows in a table.

Use cases

  • Deletes for GDPR compliance
  • One-off deletes of incorrect data

How it works

Batch deletes in Propel are supported by table mutations in ClickHouse. When you perform a batch delete, Propel issues an ALTER TABLE … DELETE statement to ClickHouse and monitors its progress. The mutation proceeds part-by-part, partition-by-partition, deleting rows in the table.

Propel provides a simple way to data data asynchronously using the Console or the createDeletionJob API.

Create a batch delete job

In this section, we guide you through the process of creating a batch delete job via the Console and API.

You can initiate a delete job in the Console by navigating to the Data Pool from which you need to delete data, clicking on the "Operations" tab, and then clicking “Delete data.

Deleting data

Here, you can specify the filters of the data to delete.

Deleting data in Console

Lastly, you can see the progress of the delete job and when it is completed.

Monitoring a delete job in Console


Remember, deleting data is permanent and cannot be undone, so use this feature cautiously.

Conclusion

Managing deletes in Propel's Serverless ClickHouse requires an understanding of the unique way it handles deletions. Real-time deletes are achieved through the use of the ReplacingMergeTree table engine, but there are implications to consider such as larger record counts and performance impacts on read operations on large tables.

Batch deletes, on the other hand, are more suited for large-scale or automated deletion tasks and can be accomplished via the Console or the createDeletionJob API.

Further reading