Skip to main content

Connect Your Data

In this section, you will find all the information you need to connect your data to Propel's Serverless Clickhouse. This overview guide covers the different types of data sources supported, how the data is synced from your data source to Propel, and the key concept you need to familiarize yourself with.

High-level overviewโ€‹

The key concept to understand how data flows is Data Pools. These are ClickHouse tables with an ingestion pipeline from a source and queryable from the API.

Propel supports various types of data sources, including data warehouses and lakes, event-based data sources, and databases. The diagram below shows an example of how data is collected into Propel Data Pools and then served via the API to your apps.

A high level overview of how data is connected to Propel.

Supported data sourcesโ€‹

Propel supports integration with various types of data: event-based data, data warehouses and data lakes, and databases. These can be integrated either via a native connection or through Amazon S3. We offer step-by-step guides for each supported data integration type, whether native or via Amazon S3 Parquet.

Event and streaming sourcesโ€‹

Data sourceIntegration
WebhooksNative
KafkaNative
AWS KinesisVia ELT/ETL Platforms

Data warehouses and data lakesโ€‹

Data sourceIntegration
SnowflakeNative
Amazon S3 ParquetNative

Databasesโ€‹

Data sourceIntegration
ClickHouseNative

ELT / ETL Platformsโ€‹

Data sourceIntegration
FivetranNative
AirbyteNative

Don't see a data source you need or want access to any preview? Let us know.

Understanding Data Poolsโ€‹

Data Pools are ClickHouse tables with an ingestion pipeline from a data source.

A screenshot of a Data Pool in the Propel Console.

Understanding event-based Data Poolsโ€‹

Event-based data sources like the Webhook Data Pool collect and write events into Data Pools. These Data Pools have a very simple schema:

ColumnTypeDescription
_propel_received_atTIMESTAMPThe timestamp when the event was collected in UTC.
_propel_payloadJSONThe JSON Payload of the event

During the setup of a Webhook Data Pool, you can optionally unpack top-level or nested keys from the incoming JSON event into specific columns. See the Webhook Data Pool for more details.

Understanding data warehouse and data lake-based Data Poolsโ€‹

Data warehouses and data lake-based Data Pools, such as Snowflake or Amazon S3 Parquet, synchronize records at a given interval from the source table and write them into Data Pools. You can create multiple Data Pools, one for each table.

Data warehouses and data lake-based Data Pools also offer additional properties that enable you to control their synchronization behavior. These include:

  • Scheduled Syncs: A Data Pool's sync interval determines how often Propel checks for new data to synchronize. For near real-time applications, the interval can be as short as 1 minute, while for applications with more relaxed data freshness requirements, it can be set to once a day or anything in between.
  • Manually triggered Syncs: Syncs can be triggered on-demand when a Data Pool's underlying data source has changed, or in order to re-sync the Data Pool from scratch.
  • Pausing and resuming syncing: Controls whether a Data Pool syncs data or not. When paused, Propel stops synchronizing records to your Data Pool. When resumed, it will start syncing on the configured interval.

Key guidesโ€‹

Here are some key guides to help you as you onboard your data to Propel:

Frequently asked questionsโ€‹

How long does it take for my data to be synced into Propel? Is Propel real-time?

Once data gets to Propel via syncs or events, it is available via the API within a couple of seconds.

In what region is the data stored?

The data is stored in the AWS US East 2 region. We are working on expanding our region coverage. If you are interested in using Propel in a different region, pleaseย contact us.

How much data can I bring into Propel?

As much as you need. Propel does not have any limits on how much data you bring. You should think of the data in Propel as the data you need to serve to your applications.

How long does Propel keep the data?

You can keep data in Propel for as long as you need. For instance, if your application requires data for only 90 days, you can use the Delete API to remove data after 90 days.

Can you sync only certain columns from a data warehouse into a Data Pool?

Yes. When you create the Data Pool, you can select which columns from the underlying table you want to sync. This is useful if there is PII or any other data that you donโ€™t need in Propel.

What happens if the underlying data source is not available? For example, what happens if Snowflake is down?

Even if the underlying data source is down, Propel will continue to serve data via the API. New data will not sync until the data store comes back online.

When does the Data Pool syncing interval start?

The syncing interval starts when your Data Pool goes LIVE or when syncing is resumed.

Management APIโ€‹

Everything that you can do in the Propel Console, you can also achieve via the Management API. This enables you to create and manage Data Pools programmatically.

Data Pool APIsโ€‹

Queriesโ€‹

Mutationsโ€‹

Note on Data Source APIsโ€‹

The following Data Source APIs are deprecated and are listed here for reference.

Data Source APIs (Deprecated)โ€‹

Queriesโ€‹

Mutationsโ€‹