Airbyte to ClickHouse
Ingest data using Airbyte.
Synchronize data from over 350+ sources to Propel’s Serverless ClickHouse.
Get started with Airbyte
Step-by-step instructions to connect any Airbyte source to Propel.
Features
Data Pools created with the Airbyte destination are Webhook Data Pools that support the following features:
Feature name | Supported | Notes |
---|---|---|
Syncs inserts, updates, and deletes | ✅ | Propel collects inserts, updates, and deletes sent from Airbyte. |
Re-sync | ✅ | You can trigger a resync in Airbyte. |
Configurable sync interval | ✅ | You can configure sync interval in Airbyte. |
Sync Pausing / Resuming | ✅ | You can pause / resume syncs in Airbyte. |
Sync mode: Full refresh - Overwrite | ✅ | Supports the Full refresh - Overwrite Airbyte sync mode. |
Sync mode: Full refresh - Append | ✅ | Supports the Full refresh - Append Airbyte sync mode. |
Sync mode: Incremental - Append | ✅ | Supports the Incremental - Append Airbyte sync mode. |
Sync mode: Incremental - Append + Deduped | ✅ | Supports the Incremental - Append + Deduped Airbyte sync mode. |
Schema changes | ✅ | When Airbyte detects a column was added in your source table, Propel will automatically add it to the payload in the _propel_payload column. It will not add it as a column in your destination Data Pool. |
Delete Job API | ✅ | See Delete Job API. |
API configurable | ✅ | See API docs. |
Terraform configurable | ✅ | See Terraform docs. |
How does the Airbyte destination work?
Propel creates a Webhook Data Pool for each Airbyte source table. Airbyte streams inserts, updates, and deletes as individual records to Propel.
Data Pools mirror the source table’s structure, with additional columns from Airbyte and Propel:
Column | Type | Description |
---|---|---|
_propel_received_at | TIMESTAMP | The timestamp when the record was received by Propel in UTC. |
_propel_payload | JSON | The JSON Payload of the record. |
_airbyte_raw_id | STRING | The unique ID of the record generated by Airbyte. |
_airbyte_extracted_at | TIMESTAMP | The timestamp when the data was extracted from the source by Airbyte in UTC. |
After connecting, Airbyte performs initial and scheduled syncs, sending data to Propel’s Webhook Data Pool API. Propel then processes this data, making it available for SQL and the Query APIs.
Key configurations
- Default Timestamp:
_airbyte_extracted_at
- Primary Key:
_airbyte_raw_id
- Sort Key: Combination of primary key and default timestamp
You can modify the sort key using a Materialized View.
Data handling
Airbyte streams capture inserts, updates, and deletes. Process these changes using a Materialized View to create a new, updated Data Pool.
Schema changes
Source schema changes are reflected in the _propel_payload
JSON, but not in the Data Pool schema after initial setup.
Data types
The following table illustrates how we transform Airbyte data types into Propel-supported types:
Airbyte Type | Airbyte Format | Propel Type |
---|---|---|
(Not specified or Null) | - | String |
String | - | String |
String | Date | Date |
String | DateTime (TimestampWOTZ) | String |
String | DateTime | Timestamp |
String | Time | String |
Boolean | - | Boolean |
Number | - | Double |
Integer | - | Int64 |
Object | - | JSON |
Array | - | JSON |
Transforming data
Once your data is in a Webhook Data Pool, you can use Materialized Views to:
- Flatten nested JSON into tabular form
- Flatten JSON array into individual rows
- Combine data from multiple source Data Pools through JOINs
- Calculate new derived columns from existing data
- Perform incremental aggregations
- Sort rows with a different sorting key
- Filter out unnecessary data based on conditions
- De-duplicate rows
Was this page helpful?