Product

Introducing the AWS S3 Data Source: Power customer-facing analytics from Parquet files in your S3 bucket

You can now power customer-facing analytics use cases such as insights dashboards, product usage reporting, or analytics APIs with Parquet files in your S3 bucket and Propel.

Photo: Propel Data

abstract S3 logo Pattern

Today we are thrilled to announce Propel's AWS S3 Data Source. The AWS S3 Data Source enables you to power your customer-facing analytics from Parquet files in your S3 bucket. Whether you have a Data Lake in S3, are landing Parquet files in S3 as part of your data pipeline or event-driven architecture, or are extracting data using services like Airbyte of Fivetran, you can now define Metrics and query them blazingly fast via Propel's GraphQL API.

What are the supported file formats?

The AWS S3 Data Source supports the Parquet file format. Parquet is a columnar file format that is supported by many data processing frameworks, including Spark, Fink, and Impala, among others. It provides efficient storage and encoding of data, as well as optimized query performance.

When to use the AWS S3 Data Source?

The Propel AWS S3 Data Source is an extremely flexible way to integrate a wide range of data architectures to Propel. Consider using it in the following scenarios:

Event-driven analytics

When you have an event-driven architecture and need to power customer-facing analytics from those events. These events typically go through an event bus like AWS Event Bridge then a consumer like Kinesis Firehose will pick them up, perform the necessary transformations and land them in S3.

Data lake or lakehouse analytics

When you have your data in a Parquet-based data lake or lakehouse, and you need to expose your customers via your web or mobile app. These could be homegrown data lakes or data lakehouses like Dremio or AWS Lake formation.

Streaming analytics

When you have a streaming infrastructure based on Kafka or AWS Kinesis. A consumer of these streams can easily put the data in Parquet format in an S3 bucket to power your user-facing analytics.

Airbyte or Fivetran integrations

When you're pulling data from a database or SaaS app using Fivetran or Airbyte. The source would be the database or SaaS app and the destination would be Parquet files in an S3 bucket.

How does the S3 Data Source work?

The Propel S3 Data Source integrates with an S3 bucket in your AWS account, so there is no need to move data around. You will have to provide an AWS credential and specify the path of Parquet files inside a given bucket. Propel will use these credentials to access the Parquet files, cache the data, and make them available for querying via Propel’s GraphQL API.

When new Parquet files land in the S3 bucket, Propel automatically detects them and caches the data making it available via the API within a couple of minutes.

Conclusion

The Propel AWS S3 Data Source can simplify your data architecture by leveraging the data you already have. With Propel, you don't need to worry about caching or aggregating the data for your different use cases.

You can get started with Propel's AWS S3 Data Source by reading the documentation and creating your first S3 Data Source in the Propel Console.

If you don't have your Propel Account yet Join our waitlist!

We are onboarding users first as fast as we can. We can’t wait to see what you can build with Propel!

Follow us on Twitter @propeldatacloud or subscribe to our email newsletter.

Related Content

Graffiti reading “TRUST YOUR STRUGGLE” used to illustrate that product and data teams struggle to work together when building data apps and analytics products.

Product

Why do product and data teams struggle to work together?

Product and data teams struggle to work together because there's a tradeoff in data between flexibility, performance and cost-effectiveness.