Graphic by Propel
In software development, engineers typically build applications in a separate system from the one where the end product will be running. Such a system can be a simple stand-alone machine or a complex multi-cloud setup with hundreds of nodes, load balancers, API Gateways, etc. However, the idea remains the same: you develop and test your application in an environment different from the one your end users will use. So you have one environment for development, one for testing, one for production, and so on. Each environment needs to have sufficient resources to run the application, including databases, network bandwidth, storage capacity, CPU power, and the like.
The production environment is where the latest stable version of the application runs. Software engineers primarily interact with the development environment to develop new features or patches and validate any new code before releasing it for testing. Therefore, the software version in development will be more recent than the production version. The development environment also won't have the same traffic as the production. Both environments could be running in physically separate networks or at least using different resources within the same network.
Keeping this distinction between development and production environments during the software development lifecycle is vital for a few reasons:
First, it helps in preventing downtime. A production environment runs a stable, tested version of your software. Directly making changes to this version is a surefire way to break something and cause an outage. A development environment reduces this risk and validates modifications or enhancements.
Second, a development environment can help catch security vulnerabilities and bugs in the code before deploying it in production. Similarly, developers and QA engineers won't have full access to the production environment, ensuring best security practices.
A development environment also allows developers to identify and test new dependencies for their code. Such dependencies can be infrastructure or software related. Developers can test the dependencies before implementing them in production.
Modern software development teams typically follow the agile methodology, where the team makes incremental enhancements or changes to the code, tests them, and ships them to production. Agile development also sees stakeholders or customers taking part in initial testing so developers can take on any feedback in the early stages. A development environment allows you to showcase any changes to such external parties.
Now that you know why you need different environments for software development, the following sections explain the essential criteria for ensuring this separation and how you can implement such separation using the Snowflake data warehouse.
You must fulfill a few requirements to separate your application environments.
With isolation, you run production and nonproduction workloads separately, so one environment doesn't affect the other.
Using separate networks is one way you can implement isolation. For example, you can use different Virtual Private Clouds (VPCs) within the same cloud account or different accounts from the same cloud provider, preventing network traffic in each environment from reaching the other.
Another method is using separate fleets of servers, load balancers, API Gateways, web application firewalls (WAFs), or databases for separate environments. These resources can reside within the same network or use the same managed resources with separate tenancies.
Businesses may sometimes host development and production environments on the same resource. For example, a physical server could run a set of virtual machines for both development and production purposes. Similarly, a web server can have different VirtualHosts to serve development and production traffic. Although having both development and production environments running on the same server or application is not ideal, companies often use such separation for noncritical workloads to save costs.
Environment separation may also require a separate authentication and authorization layer, where each environment will have its user accounts, groups, roles, and permissions.
Security is the primary reason for separate access control. Typically, development environments are less secure than production environments as developers and testers have elevated server, database, and file-share privileges and sometimes know service account passwords. The firewalls in the development environment may also allow a wide IP range. However, even if a cyberattack breaches the development environment's security, the attackers won't be able to compromise the production data because developers usually do not have wide-ranging access to production servers. Separate service accounts could be running applications in the production environment, and firewall rules may be far stricter.
A development environment must host a different data set from its production equivalent.
Sometimes, production data could be sensitive. Examples include personally identifiable information (PII), medical data, financial information, or user credentials. Using the same data in a development environment unnecessarily exposes it to exploitation and misuse. When using a copy of production data, all sensitive information must be redacted, encrypted, scrambled, or replaced with dummy versions.
Using the latest version of production data also requires backing it up and restoring it to development on a regular basis. Although you can automate this process, it may be unnecessary because functional testing doesn't always need the latest data. Replicating data from production to development may also overwrite any data changes that were made during testing.
Another problem that could occur when duplicating data from development to production is that program code that depends on auto-incrementing fields may fail when restoring a new data set from production. A blanket restore could overwrite configuration parameters and replace them with production values. Such configuration updates might have serious consequences if parameters are changed from their original values. For example, developers and testers may inadvertently update or delete data without knowing that it's impacting the production system.
Development teams often use scripts to generate dummy data or load from sample data files to implement data separation. As an alternative to replicating the latest data from production to development, they could keep an older copy of production data and restore it when necessary.
If you need to use the latest version of production data in development, consider restoring a subset of it wherever possible. A partial restore will ensure you have a small dataset to roll back if something goes wrong.
You need to be aware of two things whenever you are restoring a copy of production into development. First, you must ensure the development environment uses the original configuration values. Such configurations can include ODBC strings, IP addresses, service accounts, usernames, or databases. Second, you must check that no sensitive data is present in the restored copy, such as information related to medical history or data related to credit card details or account transactions, which only a certified DBA should have access to in case of a data defect.
Snowflake is an SQL-based cloud data platform that is best known as a feature-rich data warehouse. Organizations are increasingly deploying their analytics workloads on Snowflake due to its unlimited scaling capabilities, flexible pricing, support for major cloud providers, and almost zero administration overhead.
The next sections outline two methods you can use to separate development and production environments in Snowflake; specifically, you can achieve this using either of the following:
- One Database or Schema per Environment or
- One Snowflake Account per Environment
With this approach, you will have a single Snowflake account and create one database for each environment, as shown below:
You can also have a single database and create a separate schema for each environment:
However, creating a separate schema for each environment is not recommended for a couple of reasons. Firstly, code that is meant to run on a development schema may sometimes accidentally be run on the production schema, causing serious data issues. Secondly, using the same database for both production and development purposes will mean fine-tuning security for each schema, which can be a tedious and error-prone process. Lastly, you might want to leverage the database and schema constructs in both development and production environments, using a database or a schema as an environment would prevent you from using them to separate marketing and finance data for example.
Using separate databases (or schemas) for each environment has some advantages:
The point in time is useful when you want to recover data from a saved backup to your staging/ or production database from a specific point in time. This is particularly helpful if a developer makes a mistake, for example overriding the staging database with production copy while losing the last configuration updates that were meant to roll out on the next release.
Using separate databases (or schemas) for each environment also has some disadvantages:
You can also implement separate development and production environments in Snowflake by creating a separate account for each environment. You use the Snowflake Organization feature to set up the accounts under the same organization.
There are many advantages to having one Snowflake account per environment, such as:
There are some disadvantages to having one Snowflake account per environment, but these mainly affect the operation and the financial side and not the quality or security of your development cycle. These disadvantages include:
When choosing how to separate your development and production environments in Snowflake, it's important to consider the organization administrator role (ORGADMIN). In Snowflake, an organization is the object that links the accounts owned by your business entity, and the ORGADMIN is responsible for managing all operations on the organization level.
The ORGADMIN user will have the following benefits:
The ORGADMIN role is usually enabled for a single account in an organization if a user creates an account using the self-service portal. Users with an ACCOUNTADMIN role can assign the ORGADMIN role to a user.
Example of Assigning ORGADMIN role:
![Assigning ORGADMIN Role](https://i.imgur.com/Lllysdw.png)
The ORGADMIN role can only be enabled for one account; if you want to enable it for multiple accounts, you will need to contact Snowflake support. However, this practice is not recommended by Snowflake.
Now you know why you need separate application environments and some of the criteria to ensure such separation. You also learned how Snowflake, a robust cloud data warehouse, can implement environment separation for analytics workloads using either one database (or schema) or one Snowflake account per environment.
There are some advantages and disadvantages whether you choose to have separate Snowflake accounts per environment or one account for all environments. The best option for you depends on the use case and the budget you are working with. If you have a limited budget and often require repetitive cloning operations from different environments, then using the same Snowflake account for all your environments is the best fit. If you don't have a budget constraint and you want to limit production access, for example, due to data compliance and policies, then having separate Snowflake accounts is more suitable.
Propel Data is a serverless analytics API platform to power customer-product analytics for web and mobile applications. Propel's Analytics API platform integrates seamlessly with Snowflake so you can use the data you already have in your customer-facing applications.
To learn more, join Propel Data's waitlist.