Snowflake’s warehouses (or “virtual warehouses”) are required to be running in order to be able to manipulate and query any data that you want to perform analytics on. Snowflake separates compute from storage, and Snowflake’s virtual warehouses are how you handle the “compute” – the computationally-intensive tasks of querying, loading, or manipulating data.
Warehouses come in a variety of sizes, from extra small (XS) to extremely large (up to 6XL), and larger warehouses cost more money than small ones. In Snowflake lingo, a warehouse’s size determines the processing power (or “compute resources”) available to perform the data operations you request via Snowflake’s structured query language (SQL) query engine.
“Resizing a warehouse to a larger size is useful when the operations being performed by the warehouse will benefit from more compute resources, including:
◦ Improving the performance of large, complex queries against large data sets.
◦ Improving performance while loading and unloading significant amounts of data.” –Snowflake Docs
Typically, if your Snowflake queries are taking too long, you can solve the issue by making your Snowflake warehouse larger. You can also submit multiple requests to the same data warehouse in Snowflake at the same time, in which case Snowflake will try to process your requests simultaneously, meaning the requests run in parallel (concurrently).
However, a warehouse that has already been tasked with a complex query may not have enough compute resources available to process the next query simultaneously. In that case, the second query will be queued until the warehouse’s processing power becomes available.
To handle concurrency in Snowflake, you have two choices: make your warehouse larger, or set up multiple warehouses running in parallel. How many warehouses your Snowflake account needs to hold will depend on your team’s needs for query processing and concurrency.
There seems to be no limit, other than your finances, to how many virtual warehouses can be created in a single Snowflake account. When you start creating multiple warehouses in Snowflake, it’s important to keep track of how many you’ve created at which sizes, because you pay whenever any warehouse is running. Managing multiple warehouses can become expensive quickly, so maybe it is no surprise that “the average Snowflake customer pays Snowflake $165k a year,” according to Michael Malis of FreshPaint.
Running multiple warehouses has a few advantages over a single warehouse, though there’s a trade-off between concurrency and processing speed. A single warehouse is going to prioritize speed and running queries sequentially, while multiple warehouses are better for concurrency.
“With isolated workloads, data scientists may run monster jobs concurrently, [...] in parallel with business analysts or executive dashboards, [...] with no queuing or workload bumping.” –Michael Nixon on Snowflake’s company blog
You’ll pay the same number of Snowflake credits per hour whether you’re running two Small warehouses or one Medium warehouse. In both cases, you’ll need to pay 4 credits per hour, with the advantage of two warehouses being that your queries won’t interfere with each other.
While a Medium warehouse can theoretically handle some query concurrency, like any other Snowflake warehouse, your second query may end up queued until the first query finishes. To make sure the queries don’t interfere with each other, you’ll need to set up multiple warehouses.
With Snowflake data analytics platform, virtual warehouses come in many sizes. The downside is that you have to size each warehouse manually, including resizing the warehouse based on your individual requirements for processing speed and concurrency. Your account can hold as many warehouses as you would like, but the warehouses have to be running to do anything.
If you need multiple users and queries running simultaneously, you’ll want to set up multiple warehouses manually, giving each one a size appropriate to its need for compute resources. Or, you can use multi-cluster warehouses in order to auto-scale your warehouses up and down as needed, assuming you have the Enterprise edition of Snowflake available to your organization.
For something like in-product analytics, multi-cluster warehouses are a perfect feature. They allow you to scale resources to improve the concurrent performance for multiple users and queries. On the other hand, they’re not as good at improving the performance of slow-running queries or data loading when compared to just resizing a single warehouse to be larger.
Another drawback of multi-cluster warehouses is that you don’t have direct access to tweak warehouse size while in the efficient “auto-scale” mode, and the “maximized” mode could be wasteful if you only use multiple clusters at some times. They also can become expensive, particularly if you try to configure multiple multi-cluster warehouses in the same account.
Once you’re starting to hit more than the maximum number of concurrent compute clusters (10) in a multi-cluster warehouse, your analytics set-up is going to start getting awfully complicated. If that’s the case, and you’re exploring your options to set up many different multi-cluster warehouses in Snowflake to build in-product analytics, you should check out Propel Data.
At Propel, our analytics backend system gives you easy-to-use GraphQL APIs in a performant, cost-effective manner, without needing to manually set up multi-cluster Snowflake warehouses. For building in-product analytics when you’re using Snowflake as a data warehouse, Propel is the best choice, because we help you connect your data app to Snowflake via GraphQL API.
Snowflake data platform is referred to as a data warehouse or data lake because it separates storage (data) from compute (processing power).
Databases and schemas ("namespaces") are used to organize data in Snowflake storage, which uses a columnar format internally for analytics.