Snowflake Concepts

What Is the Difference Between a Database and a Warehouse in Snowflake?

In Snowflake data platform, databases are virtual hard drives where you store data, while warehouses are the virtual compute resources you use to run analytical queries. Here’s what to know.

Cover photo by Aaron Burden on Unsplash

Databases are how you pay for storage while warehouses are how you pay for compute in Snowflake data platform, illustrated by a snow globe filled with realistic Snowflakes.

Snowflake is a data analytics platform that offers advantages over traditional databases for a variety of interesting data engineering and analytics applications.

Snowflake uses databases to hold large amounts of data (“storage”) separate from the “virtual warehouses” (or just “warehouses”), which process and manipulate that data (“compute”).

While data warehouses are central repositories of data used for reporting and data analysis, Snowflake uses the term “warehouse” specifically to mean a virtual computational cluster that allows you to manipulate and process data for analytical queries.

“A virtual warehouse is a cluster of compute resources. A warehouse is needed to execute certain types of SQL statements because it provides resources such as CPU, memory, and local storage.” –Snowflake Docs

Meanwhile, databases in Snowflake have a more traditional definition. Snowflake’s relational databases are built using SQL (Structured Query Language), and they’re where you store the data you’ll be using in the platform.

“All data in Snowflake is maintained in databases. Each database consists of one or more schemas, which are logical groupings of database objects, such as tables and views.” –Snowflake Docs

Each schema belongs to a single Snowflake database, and each database belongs to only one Snowflake account. Together, a database and schema are called a “namespace” in Snowflake. Snowflake does not limit the number of databases, the number of schemas (within a database), or the number of objects (within a schema) that you are able to create in a single account.

What’s the Difference Between Warehouses and Databases in Snowflake?

Snowflake’s warehouses are the compute engines that allow you to easily scale up and down the size of the computing cluster running your analytics queries. The warehouse is what runs the analytics in Snowflake, while the database itself is just a static repository holding the data.

Snowflake has a unique approach to databases compared to other data warehousing solutions. Snowflake’s advantage comes from the use of micro-partitioning, which are small partitions of 50 to 500MB that are created automatically and enable faster queries than static partitions.

The automatic creation of clusters in Snowflake is called automatic clustering, a process that consumes Snowflake credits and thus costs money. However, automatic clustering does not require you to set up a virtual warehouse manually; it happens automatically behind the scenes. Nevertheless, you’ll have to pay for those warehouse resources in addition to the storage costs.

Snowflake’s storage databases include Continuous Data Protection (CDP) features known as Time Travel and Fail-safe. Snowflake Time Travel allows querying, cloning, and restoring historical data from Snowflake tables, schemas, and databases for 1 day (Snowflake Standard Edition) or for up to 90 days (Snowflake Enterprise Edition). Snowflake Fail-safe is a disaster recovery system for historical data that allows recovery for 7 days after Time-Travel expires. Neither of these CDP features requires a warehouse to be running, but both incur storage costs.

In conclusion, Snowflake is a data warehouse that separates storage from compute resources. The data held at Snowflake is held inside namespaces, which are composed of databases and schemas, while virtual warehouses handle the analytical queries. You use Snowflake credits to pay for both storage in databases as well as processing time (when warehouses are running).

If you enjoyed this article on Snowflake, please sign up for Propel’s email newsletter, follow us on our Twitter account @propeldatacloud, or join the customer waiting list to stay in touch. We’re onboarding Snowflake customers as fast as we can!

Related Content

Snowflake is considered a data warehouse because it’s cloud-based platform is central repository of data that separates storage of the data from the compute resources needed to process that data for analytical queries, as illustrated by this image of a single snowflake on lint-covered fabric.

Snowflake Concepts

Is Snowflake a Data Warehouse for Analytics and Insights?

Snowflake data platform is referred to as a data warehouse or data lake because it separates storage (data) from compute (processing power).

Multi-cluster virtual warehouses in Snowflake are analogous to the server farm pictured, since they allocate additional compute resources compared to a single warehouse or virtual machine.

Snowflake Concepts

What Is a Multi-Cluster Virtual Warehouse in Snowflake Data Platform?

Multi-cluster virtual warehouses auto-scale compute resources based on the demands on the data warehouse. Here’s how they work in Snowflake.