Today, we’re delighted to announce a preview of updates and deletes support for customers powering their customer-facing analytics use cases on Propel with Snowflake data.
Why is it so hard to handle updates and deletes in analytical data stores?
Mutable data is generally unwelcome in analytical data stores for good reasons. Analytical data stores are optimized for fast aggregate queries and writes, making updates and deletes inefficient. Even mature analytical databases do not support performant updates out-of-the-box without exotic table types and upsert logic.
At Propel we wanted to support updates and deletes that just work, fulfilling our promise of providing a true mirror of your data in Snowflake.
Append only vs. updates and deletes
Up until this announcement, Propel supported incremental append when synchronizing data from a given Snowflake table to a Propel data pool**.** This mode syncs new records only from the Snowflake table and appends them inside a Propel data pool. This is ideal for immutable event data.
However, many use cases have updating, mutable data where existing data changes over time. Take for example a set of message records for a communications system where each message has a status. A message may move from <span class="code-exp">draft</span> to <span class="code-exp">sent</span> to <span class="code-exp">received</span>. Instead of appending a new row representing every status change event, an update is issued within Snowflake, updating the <span class="code-exp">status</span> column of a given message row. With the announcement of Updates and Deletes, Propel can now mirror this pattern for both updates (changes in column values) and deletes (removal of rows.) Propel uses a Snowflake stream, a form of change data capture, to synchronize inserted, updated, and deleted records from Snowflake to your Data Pool. It uses each record's primary key, consisting of the primary timestamp and a unique ID, to determine whether it should be inserted, deleted, or updated within the Data Pool.
Requirements for Updates and Deletes
Supporting updates and deletes means requiring an additional parameter when setting up a Propel Data Pool: a unique identifier. This field is the key by which you identify each record separately for enabling updates. For example, going back to our messages example, we can specify the primary key; <span class="code-exp">message_id</span> as our unique identifier. Please note that the unique identifier is often your primary key, but does not have to be. Under the hood, Propel uses both this unique ID and the timestamp column to form the primary key used to order data and enable updates and deletes in a performant way.
How to get started
For existing customers, please contact us to get updates support activated on your account. If you don’t have a Propel account yet, click here to get started. Also, check out our Data Chaos podcast, where we deep dive into the entropy that exists in the world of data.