This guide covers how to:

  1. Create a Amazon Data Firehose Data Pool
  2. Create the Data Firehose stream
  3. Send test data
  4. Preview data in Propel

Requirements


Step 1: Create a Amazon Data Firehose Data Pool

First, create a Amazon Data Firehose Data Pool in Propel.

This will give you a dedicated HTTP endpoint for Amazon Data Firehose to send data to.

1

Navigate to Data Pools

In the Console, click on “Data Pools” in the left-hand menu. Click on “Create Data Pool” and select “Amazon Data Firehose”.

2

Define the schema

Propel will automatically unpack the Amazon Data Firehose events into individual rows.

This means that there is NO need to unpack the standard requestId, timestamp, and records fields.

The default schema contains two columns:

ColumnTypeDescription
_propel_received_atTIMESTAMPThe timestamp when the event was collected in UTC.
_propel_payloadJSONThe value in the records[i].data of the event.

By providing a sample event, you can add columns to the Data Pool to match the structure of your data.

For this guide, we’ll use the TacoSoft sample event:

{
  "TICKER_SYMBOL": "QXZ",
  "SECTOR": "HEALTHCARE",
  "CHANGE": -0.05,
  "PRICE": 84.51
}

After adding the sample JSON, click on:

  • Extract top-level properties to create columns representing the top-level keys.
  • Extract nested properties to create columns representing the nested JSON keys.

If a required field is missing from the sample event, Propel will reject the event with an HTTP 400 Bad Request error.

3

Configure Authentication

You’ll need to enable basic authentication with a user and a secure password.

You will need the user and password to configure the Amazon Data Firehose.

After configuring, click “Next” to proceed.

4

Configure the table settings

Select whether your data is “Append-only” or “Mutable data”.

To learn more, read out guide on Selecting table engine and sorting key.

Answer the questions in the wizard to complete the setup.

Confirm your table settings and click “Continue”.

5

Set a name and description

Enter a name and description for your new Data Pool and click “Next”.

After creating the Data Pool, you’ll be provided with a unique HTTP endpoint and a X-Amz-Firehose-Access-Key key.

You’ll need this endpoint and key to configure your Amazon Data Firehose HTTP endpoint destination.


Step 2: Create the Data Firehose stream

Once you have the Data Pool created, we’ll need to create an Amazon Data Firehose stream with an HTTP endpoint destination that sends events to Propel.

1

Go to the 'Amazon Data Firehose' section of the AWS console

Once you are in the “Amazon Data Firehose” section of the AWS console, click on “Create Firehose stream”.

2

Select a source and destination

Select your source and “HTTP Endpoint” as the destination.

3

Give your stream a name

Give your stream a name for future reference.

4

Configure the destination

Next, we are going to configure the destination. This is where we’ll need the information about the Amazon Data Firehose Data Pool we created in the step above.

  1. Get the Propel Amazon Data Firehose Data Pool HTTP endpoint and X-Amz-Firehose-Access-Key key.
  1. Enter the destination details
  1. “HTTP endpoint name” - Give the endpoint a name. For example: “firehose-to-propel”

  2. “HTTP endpoint URL” - Enter the Amazon Data Firehose Data Pool HTTP endpoint. It should look like: https://webhooks.us-east-2.propeldata.com/v1/WHKXXXXXX/kinesis.

  3. “Access key” - Copy the access key you copied above. For example, Basic Zml6ejpidXp6.

  4. “Retry duration” - Set as the maximum seconds: “7200”.

  5. “Content encoding” - Set as “Not enabled”

  6. “Buffer hints” - Set Buffer size to 1MiB.

Requests above 1MiB will get a 413 Request Entity Too Large error.

  1. “Buffer hints” - Set Buffer interval to desired value. Recommended to set to 60 seconds.

  2. Create an S3 bucket for failed deliveries

    If deliveries fail, you don’t want to lose events. You must capture failed events in an S3 bucket (required to create the stream).

    To configure it, go to “Backup settings” and configure a bucket.

5

Create it!

Click “Create Firehose stream” and you are done!

6

Monitor for delivery failures

If everything is set up correctly, you should not see any deliveries to the logs.


For further reference, see the Understand HTTP endpoint delivery request and response specifications documentation.

Step 3: Send test data

Once you have the Data Firehose stream created, you can send test data to it.

Step 4: Preview data in Propel

Once events are in your Amazon Data Firehose stream, they are ingested directly to Propel.

By going to your Data Pool and clicking “Preview Data”, you’ll be able to see the records as they land.