Segment provides an excellent way to collect data through a single schema and API. That's why it's also often used as middleware to track customer behavior data for product analytics and analytics use cases in marketing. It also offers a free tier.
In this article, you'll learn how to set up Segment for your web application and store the captured data in Snowflake, a popular data warehouse solution.
Many organizations are eager to collect customer behavior data. Although many free tools (like Google Analytics) offer this capability in one way or another, they often lock in their users by making it hard or impossible to get the data out. This restriction seriously limits the use cases for this data.
When it comes to data collection, Segment was one of the first tools to offer a unified tracking API via Connections, allowing users to send and store data in their tool of choice. It's effortless to set up and very robust, and it delivers data in real time.
On top of that, Segment offers two more capabilities that put it in the CDP software category. It offers a data catalog and quality checks in a feature called Protocols. Segment also allows you to create customer segments and expose them to an organization's marketing stack, enabling content personalization on websites, emails, and other customer touchpoints.
Just like Segment has banked on the growing need to adequately collect customer data, Snowflake has made it convenient to store and process large amounts of data for analytical purposes. It competes with Google BigQuery, Databricks, and Firebolt to offer the fastest data warehouse solution. Snowflake is unique in that it can be deployed on any major cloud vendor. You can choose between Google Cloud Platform, Amazon Web Services, or Microsoft Azure.
Data warehouses store data in a columnar format, and both storage and computing resources can be scaled out more or less automatically. Tracking behavior on a popular digital product often results in millions of events logged per day. Nothing matches the performance of data warehouses like Snowflake for processing OLAP workloads.
Furthermore, while data warehouses used to be notoriously hard to load data into, modern data warehouses like Snowflake offer various options for loading data and registering it in a schema, both in batch and stream.
In this section, you'll build a real-time pipeline in which Segment collects data from a web application and stores it in a Snowflake data warehouse.
There are a couple of prerequisites to set up product analytics tracking with Segment and Snowflake:
Once you've fulfilled the above prerequisites, you'll need to configure the source and destination of the data you'd like to analyze.
Once you've done that, you'll get a very compact piece of code. Once you install this code on your website, it will load the Segment API, which contains a variety of methods to track events such as page views and clicks. The code also contains your unique tracking ID in two places in order to identify your data source. To view an example Segment snippet, check out this fiddle.
When you implement client-side tracking, there are three main ways to implement the tracking script. You can:
In the example screenshot below, the Head & Footer Code WordPress plugin was used:
To ingest data into Snowflake with Segment, you'll need the following:
While all these objects can be created via Snowflake's GUI, you can simply run the following queries in a Snowflake worksheet (don't forget to set a safe password):
Next, set up a destination via the Segment GUI by going to Connections and clicking Add Destination. After selecting Snowflake, you'll be asked to provide some credentials:
Take note that you do not use the account ID that you find in the admin settings of your Snowflake organization. Instead, for the Account field in the credentials section, you're looking for the first part of the hostname of your account if you open Snowflake's classic console:
Now that you've configured everything, it's time to test that it's all working correctly.
First, you should test your pipeline. Go to the Debugger section of the source you've just created, and generate a few page hits on the website where you've implemented the tracking snippet. If everything goes well, the hits will be registered in real time:
Once the functioning of your pipeline is confirmed, you can query the destination table(s) in Snowflake. If you're doing this with a user other than the one you authenticated Segment with, you need to grant it the same USAGE privileges on the database and data warehouse.
The query below prints the page views you triggered to test your data pipeline:
SELECT * FROM <YOURWEBSITE_COM>.PAGES
Depending on the kinds of events you log, Segment will create new tables in the schema, which can be queried in an analogous manner.
Tracking your users' behavior via Segment and Snowflake is easy to set up, but has some caveats.
Segment doesn't integrate with your existing data sources. It's not a data integration tool (like Fivetran) that lifts data from an existing data storage or software-as-a-service (SaaS) tool and moves it to Snowflake. On the contrary, Segment is plugged directly into your digital product, collecting real-time data that gets synced to your Snowflake data warehouse.
Although Segment collects real-time data from your digital product, it doesn't stream it to Snowflake. The data gets stored in Segment and is frequently synced to all configured destinations. When you have a Business plan with Segment, you can configure a selection of data sources and the synchronization time. Nevertheless, the data will only be fresh at the start of a new interval.
Terraform and CDK, the pros and cons of each, and how you can use infrastructure as code with Propel.