Connect Your Data
In this section, you will find all the information you need to connect your data to Propel's Serverless Clickhouse. This overview guide covers the different types of data sources supported, how the data is synced from your data source to Propel, and the key concept you need to familiarize yourself with.
High-level overviewโ
The key concept to understand how data flows is Data Pools. These are ClickHouse tables with an ingestion pipeline from a source and queryable from the API.
Propel supports various types of data sources, including data warehouses and lakes, event-based data sources, and databases. The diagram below shows an example of how data is collected into Propel Data Pools and then served via the API to your apps.
Supported data sourcesโ
Propel supports integration with various types of data: event-based data, data warehouses and data lakes, and databases. These can be integrated either via a native connection or through Amazon S3. We offer step-by-step guides for each supported data integration type, whether native or via Amazon S3 Parquet.
Event and streaming sourcesโ
Data source | Integration |
---|---|
Webhooks | Native |
Kafka | Native |
AWS Kinesis | Via ELT/ETL Platforms |
Data warehouses and data lakesโ
Data source | Integration |
---|---|
Snowflake | Native |
Amazon S3 Parquet | Native |
Databasesโ
Data source | Integration |
---|---|
ClickHouse | Native |
ELT / ETL Platformsโ
Data source | Integration |
---|---|
Fivetran | Native |
Airbyte | Native |
Don't see a data source you need or want access to any preview? Let us know.
Understanding Data Poolsโ
Data Pools are ClickHouse tables with an ingestion pipeline from a data source.
Understanding event-based Data Poolsโ
Event-based data sources like the Webhook Data Pool collect and write events into Data Pools. These Data Pools have a very simple schema:
Column | Type | Description |
---|---|---|
_propel_received_at | TIMESTAMP | The timestamp when the event was collected in UTC. |
_propel_payload | JSON | The JSON Payload of the event |
During the setup of a Webhook Data Pool, you can optionally unpack top-level or nested keys from the incoming JSON event into specific columns. See the Webhook Data Pool for more details.
Understanding data warehouse and data lake-based Data Poolsโ
Data warehouses and data lake-based Data Pools, such as Snowflake or Amazon S3 Parquet, synchronize records at a given interval from the source table and write them into Data Pools. You can create multiple Data Pools, one for each table.
Data warehouses and data lake-based Data Pools also offer additional properties that enable you to control their synchronization behavior. These include:
- Scheduled Syncs: A Data Pool's sync interval determines how often Propel checks for new data to synchronize. For near real-time applications, the interval can be as short as 1 minute, while for applications with more relaxed data freshness requirements, it can be set to once a day or anything in between.
- Manually triggered Syncs: Syncs can be triggered on-demand when a Data Pool's underlying data source has changed, or in order to re-sync the Data Pool from scratch.
- Pausing and resuming syncing: Controls whether a Data Pool syncs data or not. When paused, Propel stops synchronizing records to your Data Pool. When resumed, it will start syncing on the configured interval.
Key guidesโ
Here are some key guides to help you as you onboard your data to Propel:
- Creating a Data Pool
- How to select a table engine and sorting key
- Working with Propel and dbt
- Building multi-tenant applications
Frequently asked questionsโ
How long does it take for my data to be synced into Propel? Is Propel real-time?
Once data gets to Propel via syncs or events, it is available via the API within a couple of seconds.
In what region is the data stored?
The data is stored in the AWS US East 2 region. We are working on expanding our region coverage. If you are interested in using Propel in a different region, pleaseย contact us.
How much data can I bring into Propel?
As much as you need. Propel does not have any limits on how much data you bring. You should think of the data in Propel as the data you need to serve to your applications.
How long does Propel keep the data?
You can keep data in Propel for as long as you need. For instance, if your application requires data for only 90 days, you can use the Delete API to remove data after 90 days.
Can you sync only certain columns from a data warehouse into a Data Pool?
Yes. When you create the Data Pool, you can select which columns from the underlying table you want to sync. This is useful if there is PII or any other data that you donโt need in Propel.
What happens if the underlying data source is not available? For example, what happens if Snowflake is down?
Even if the underlying data source is down, Propel will continue to serve data via the API. New data will not sync until the data store comes back online.
When does the Data Pool syncing interval start?
The syncing interval starts when your Data Pool goes LIVE
or when syncing is resumed.
Management APIโ
Everything that you can do in the Propel Console, you can also achieve via the Management API. This enables you to create and manage Data Pools programmatically.
Data Pool APIsโ
Queriesโ
Mutationsโ
- Create a Data Pool
- Delete a Data Pool by ID
- Delete a Data Pool by name
- Modify Data Pool
- Disable Data Pool syncing
- Enable Data Pool syncing
- Inspect Data Pool schema
- Reconnect Data Pool
- Retry Data Pool set up by ID
- Retry Data Pool set up by name
- Test Data Pool
- Request delete
Note on Data Source APIsโ
The following Data Source APIs are deprecated and are listed here for reference.