Airbyte
The Airbyte destination lets you synchronize data from over 350+ sources to Propel's Serverless ClickHouse infrastructure. It provides an easy way to power your customer-facing analytics and data applications with data from any SaaS application, database, or platform supported by Airbyte.
Consider using Airbyte to sync data to Propel when:
- You need to sync data from a SaaS application or platform like Hubspot, Stripe or Shopify.
- You need to sync data from a database that Propel does not support natively like Oracle, MSSQL, MySQL, or PostgreSQL.
- You need to sync data from streaming sources that Propel does not support natively like AWS Kinesis or Google Pub/Sub.
Get started​
Follow our step-by-step Airbyte setup guide to connect any Airbye source to Propel.
Features​
Data Pools created with the Airbyte destination are Webhook Data Pools that support the following features:
Feature name | Supported | Notes |
---|---|---|
Syncs inserts, updates, and deletes | ✅ | Propel collects inserts, updates, and deletes sent from Airbyte. |
Re-sync | ✅ | You can trigger a resync in Airbyte. |
Configurable sync interval | ✅ | You can configure sync interval in Airbyte. |
Sync Pausing / Resuming | ✅ | You can pause / resume syncs in Airbyte. |
Sync mode: Full refresh - Overwrite | ✅ | Supports the Full refresh - Overwrite Airbyte sync mode. |
Sync mode: Full refresh - Append | ✅ | Supports the Full refresh - Append Airbyte sync mode. |
Sync mode: Incremental - Append | ✅ | Supports the Incremental - Append Airbyte sync mode. |
Sync mode: Incremental - Append + Deduped | ✅ | Supports the Incremental - Append + Deduped Airbyte sync mode. |
Schema changes | ✅ | When Airbyte detects a column was added in your source table, Propel will automatically add it to the payload in the _propel_payload column. It will not add it as a column in your destination Data Pool. |
Delete Job API | ✅ | See Delete Job API. |
API configurable | ✅ | See API reference docs. |
Terraform configurable | ✅ | See Propel Terraform docs. |
How does the Airbyte destination work?​
When syncing a Airbyte source, Propel creates a Webhook Data Pool for every source table. Airbyte treats each source table as a stream and sends to Propel every the insert, update, and delete as individual records.
The Data Pools will have the same columns as the source table plus the following columns added by Airbyte and Propel.
Column | Type | Description |
---|---|---|
_propel_received_at | TIMESTAMP | The timestamp when the record was received by Propel in UTC. |
_propel_payload | JSON | The JSON Payload of the record. |
_airbyte_raw_id | STRING | The unique ID of the record generated by Airbyte. |
_airbyte_extracted_at | TIMESTAMP | The timestamp when the data was extracted from the source by Airbyte in UTC. |
Once connected, Airbyte will perform an initial sync and subsequent syncs based on the sync frequency you have configured. Airbyte will send the data to Propel via the Webhook Data Pool API. Propel will process the data and make it available for querying via the SQL and API.
Default timestamp column​
Propel will use the _airbyte_extracted_at
as the deafult timestamp.
Unique ID column​
Propel will use the _airbyte_raw_id
as the unique ID.
Sort Key​
Propel use the unique ID and timestamp as the Data Pool's sort key. You can change the sort key by creating a Materialized View that orders the data by a different field.
Updates and deletes​
Inserts, updates, and deletes are captured by the Airbyte stream and ingested into the Data Pool. You can handle updates and deletes by creating a Materialized View that processes inserts, updates, and deletes and inserts them into a new Data Pool.
Schema changes​
Any schema changes in the source will be automatically reflected in the _propel_payload
data in the Data Pool. Schema changes that happen after the connection is set up will NOT be reflected in the Data Pool schema.
Data Types​
The following table illustrates how we transform Airbyte data types into Propel-supported types:
Airbyte Type | Airbyte Format | Propel Type |
---|---|---|
(Not specified or Null) | - | String |
String | - | String |
String | Date | Date |
String | DateTime (TimestampWOTZ) | String |
String | DateTime | Timestamp |
String | Time | String |
Boolean | - | Boolean |
Number | - | Double |
Integer | - | Int64 |
Object | - | JSON |
Array | - | JSON |
Transforming data​
Once the data has been ingested into the Webhook Data Pool, you can create Materialized Views to transform the data. This includes transformations such as ordering, filtering, aggregation, and joining with other data. These transformations are defined using SQL and can be updated in real time as new data arrives.
Materialized Views can be used to:
- Separate the messages from a specific topic into their own tables, each with its own schema.
- Handle real-time updates and deletes for mutable data.
- Transform data in real time.
- Enrich data joining with other Data Pools.
Learn more about Transforming your data with Materialized Views.
API reference documentation​
Below is the relevant API documentation for the Webhook Data Pool.