In this post, we’ll do a deep dive into ClickHouse Operator, what it does, the ideal scenarios for its use, and illustrate its operation with detailed examples.
What is the ClickHouse® Operator?
An operator in the Kubernetes ecosystem refers to a software extension incorporating domain-specific knowledge into the Kubernetes API. This facilitates the automation of complex tasks, enabling Kubernetes to handle them natively.
The ClickHouse Operator, in particular, is a Kubernetes operator specifically designed to manage ClickHouse, an open-source column-oriented database management system. ClickHouse stands out for its ability to generate real-time analytical data reports. Running ClickHouse on Kubernetes using the ClickHouse Operator brings enhanced scalability, resilience, and orchestration capabilities, making it a preferred choice for managing large-scale analytical workloads.
Why run ClickHouse® with Kubernetes?
Combining the power of ClickHouse with the robustness of Kubernetes yields several significant advantages.
- Scalability: Kubernetes' inherent ability to scale resources as per workload needs aligns perfectly with ClickHouse's horizontal scalability. This combination ensures that your ClickHouse deployment is always equipped to handle varying data loads.
- High Availability: Kubernetes ensures high availability of applications running on it. This means your ClickHouse databases are always up and running, minimizing downtime.
- Efficient Resource Management: Kubernetes allows efficient use of hardware resources, ensuring that your ClickHouse deployment utilizes the available resources optimally.
- Orchestration: Kubernetes takes care of the orchestration of your ClickHouse deployment, automating tasks such as rollouts, rollbacks, and service discovery.
Features of the ClickHouse® Operator
The ClickHouse Operator is equipped with features to simplify the management of ClickHouse in a Kubernetes environment. These include:
- Automated cluster provisioning: The ClickHouse Operator can automatically provision new ClickHouse clusters with a single command. This means you can set up complex distributed databases with minimal manual intervention.
- Scaling: With the ClickHouse Operator, you can easily scale your ClickHouse clusters up or down. This is crucial for maintaining performance during peak load times and conserving resources during low-traffic periods.
- Monitoring: The ClickHouse Operator integrates seamlessly with monitoring tools, providing real-time insights into your ClickHouse clusters' performance. This allows for proactive problem detection and resolution.
- Updates and Upgrades: The ClickHouse Operator handles updates and upgrades to your ClickHouse clusters, ensuring they're always running the latest and most secure version of ClickHouse.
When to use the ClickHouse® Operator?
The ClickHouse Operator is useful for running ClickHouse in a Kubernetes environment. It is particularly beneficial when managing multiple ClickHouse clusters or scaling your clusters regularly based on traffic patterns. Moreover, if you require a high level of automation for cluster management tasks such as updates and monitoring, the ClickHouse Operator is a perfect choice.
If you choose not to use the ClickHouse Operator, managing ClickHouse on Kubernetes will require manual intervention. You must manually create and manage the Kubernetes resources, such as StatefulSets, Services, and PersistentVolumeClaims. Scaling, monitoring, and upgrading your ClickHouse clusters must also be handled manually or with custom scripts. Without the operator, you can also not provision new ClickHouse clusters automatically. Therefore, while running ClickHouse on Kubernetes without the operator is possible, it will require significantly more effort and expertise in Kubernetes administration.
How to install the ClickHouse® Operator
Before you can utilize the ClickHouse Operator, you need to install it in your Kubernetes environment. Here is a step-by-step guide on how to achieve this using two different approaches.
Prerequisites
Before you begin, ensure that you have the following:
- A Kubernetes cluster compatible with the ClickHouse Operator version you intend to install. Versions before 0.16.0 are compatible with Kubernetes 1.16 and prior to 1.22. Versions 0.16.0 and after are compatible with Kubernetes 1.16 and later.
- Properly configured kubectl.
- Curl.
Installation via kubectl
The operator installation process is straightforward and involves deploying the ClickHouse Operator using its manifest directly from the GitHub repo:
You should get the following results:
Verify the operator is running and is deployed in the <span class="code-exp">kube-system</span> namespace:
You should see the ClickHouse Operator running:
Installation via Helm
Starting with version 0.20.1, an official ClickHouse® Operator Helm chart is also available.
For installation:
For upgrade:
For more details, see the official Helm chart for ClickHouse Operator.
Resources description
As part of the installation, several resources are created, including:
- Custom Resource Definition (CRD): This extends the Kubernetes API with a new kind, the ClickHouseInstallation. It allows you to manage Kubernetes resources of this kind.
- Service Account: This provides an identity for the ClickHouse Operator to interact with the Kubernetes API. It's authenticated as the <span class="code-exp">clickhouse-operator</span> service account.
- Cluster Role Binding: This grants permissions defined in a role to a set of users. In this case, the <span class="code-exp">cluster-admin</span> role is granted to the <span class="code-exp">clickhouse-operator</span> service account, thereby giving it permissions across the cluster.
- Deployment: This deploys the ClickHouse Operator itself.
Now that you've installed the ClickHouse Operator, you can begin to use it to manage your ClickHouse instances in Kubernetes.
How does the ClickHouse® Operator work? – Deploy a cluster
Once the operator is installed, you can create a new ClickHouse® cluster by applying a YAML file describing your cluster's desired state.
Below is an example of a simple ClickHouse® cluster definition:
In this example, we're creating a new ClickHouse cluster named, <span class="code-exp">my-cluster</span> with a single shard. The operator will create the necessary resources, such as pods and services, to bring up the ClickHouse cluster. To create the cluster, we need to apply the manifest using <span class="code-exp">kubectl</span>. Copy the cluster definition above and save it locally to <span class="code-exp">my-custer.yaml</span>. It’s common practice to run your components in a dedicated namespace. For this example, we’ll create a namespace called <span class="code-exp">clickhouse-test</span>. To create the namespace, run the following command.
You should see the following:
Now, let’s deploy our cluster using the yaml file we created.
Expected output:
Check that the cluster has been created and is running
You should see that there is a pod in the “Running” state
Check the services created by the operator:
We should see the following services up and running:
To interact with the cluster internally, you can execute the <span class="code-exp">clickhouse-client</span> command on the pod:
This command opens a ClickHouse client session connected to the ClickHouse server running in the <span class="code-exp">chi-my-cluster-my-cluster-0-0-0</span> pod.
You can also access the cluster via the <span class="code-exp">EXTERNAL-IP</span> reported above (<redacted>.us-east-2.elb.amazonaws.com)
You can run SQL queries directly from this prompt. For example, to get the version of ClickHouse you are running, you could use:
Advanced ClickHouse Operator examples
The ClickHouse Operator allows for a high degree of customization and more complex configurations. Here's an example of a ClickHouse cluster with an encrypted, resizable AWS GP3 volume, one shard, and three replicas:
In this example, we're creating a new ClickHouse cluster named <span class="code-exp">pv-resize-enc</span> with an encrypted AWS GP3 EBS volume, one shard, and three replicas. This type of configuration is beneficial for larger databases, where data is distributed across multiple shards for improved query performance, and replicas are used for redundancy and failover.
You can interact with the cluster in the same way as before, using the <span class="code-exp">clickhouse-client</span> command. Remember that you need to specify the correct pod name, which will depend on the number of shards and replicas in your cluster.
This advanced usage of the ClickHouse Operator showcases its flexibility in managing complex ClickHouse configurations, making it an indispensable tool for any data professional working with ClickHouse on Kubernetes. See the ClickHouse Operator Github repo for examples of additional configurations.
ClickHouse® Operator configuration
The ClickHouse Operator has several settings and parameters that control its behavior and the configuration of ClickHouse clusters:
- Operator Settings: These settings dictate the behavior of the operator itself. They are initialized from three sources (in order): <span class="code-exp">
/etc/clickhouse-operator/config.yaml</span>
, the <span class="code-exp">etc-clickhouse-operator-files configmap</span>
, and the <span class="code-exp">ClickHouseOperatorConfiguration</span>
resource. Changes to these settings are monitored and applied immediately. - ClickHouse Common Configuration Files: These are ready-to-use XML files with sections of ClickHouse configuration. They typically contain general ClickHouse configuration sections, such as network listen endpoints, logger options, etc. They are exposed via config maps.
- ClickHouse User Configuration Files: These are ready-to-use XML files with sections of ClickHouse configuration. They typically contain ClickHouse configuration sections with user account specifications. They are exposed via config maps as well.
- ClickHouseOperatorConfiguration Resource: This is a Kubernetes custom resource that provides the ClickHouse Operator with its configuration.
- ClickHouseInstallationTemplates: The operator provides functionality to specify parts of the ClickHouseInstallation manifest as a set of templates, which are used in all ClickHouseInstallations.
Some specific settings include:
- Watch Namespaces: This setting allows you to specify the namespaces where the ClickHouse Operator watches for events. Multiple operators running concurrently should watch different namespaces.
- Additional Configuration Files: These settings allow you to specify the paths to folders containing various configuration files for ClickHouse.
- Cluster Create/Update/Delete Objects: These settings control the operator's behavior when creating, updating, or deleting Kubernetes objects, such as StatefulSets for ClickHouse clusters.
- ClickHouse Settings: These settings allow you to specify default values for ClickHouse user configurations, such as user profile, quota, network IP, and password.
- Operator's access to ClickHouse instances: These settings allow you to specify the ClickHouse credentials (username, password, and port) to be used by the operator to connect to ClickHouse instances for metrics requests, schema maintenance, and DROP DNS CACHE.
All these settings and parameters make the ClickHouse Operator a highly configurable tool for managing ClickHouse in a Kubernetes environment.
1. Operator Settings: For example, you might have the following settings in your <span class="code-exp">/etc/clickhouse-operator/config.yaml</span>
:
2. ClickHouse Common Configuration Files: An example configuration file might look like this:
3. ClickHouse User Configuration Files: An example configuration file might look like this:
4. ClickHouseOperatorConfiguration Resource: An example resource might look like this:
5. ClickHouseInstallationTemplates: An example template might look like this:
Specific settings:
- Watch Namespaces: For example, you might specify the following namespaces:
- Additional Configuration Files: For example, you might specify the following paths:
- Cluster Create/Update/Delete Objects: For example, you might specify the following behavior:
- ClickHouse Settings: For example, you might specify the following default values:
- Operator's access to ClickHouse instances: For example, you might specify the following credentials:
Security hardening for the ClickHouse® Operator
The ClickHouse Operator's security model provides a solid foundation for maintaining a secure environment for your ClickHouse instances on Kubernetes. However, it's essential to understand the model and apply the necessary hardening measures to ensure maximum protection against potential threats.
The default ClickHouse Operator deployment comes with two users, <span class="code-exp">default</span> and <span class="code-exp">clickhouse_operator</span>, both of which are shielded by network restriction rules barring unauthorized access.
Securing the 'default' user
The <span class="code-exp">default</span> user connects to the ClickHouse instance from the pod where it's running and is used for distributed queries. By default, this user has no password. To secure the <span class="code-exp">default</span> user, the operator applies network security rules that restrict connections to the pods running the ClickHouse cluster.
Securing the 'clickhouse_operator' user
The <span class="code-exp">clickhouse_operator</span> user is used by the operator itself to perform DML operations when adding or removing ClickHouse replicas and shards and to collect monitoring data. The user and password values are stored in a secret. It is recommended not to include these credentials directly in the operator configuration without a secret.
To change the <span class="code-exp">clickhouse_operator</span> user password, you can modify the <span class="code-exp">etc-clickhouse-operator-files</span> config map or create a ClickHouseOperatorConfiguration object. The operator restricts access to this user using an IP mask.
Securing additional ClickHouse users
For additional ClickHouse users created using <span class="code-exp">SQL CREATE USER</span> statement or in a dedicated section of <span class="code-exp">ClickHouseInstallation</span>, you should ensure that passwords are not exposed. User passwords can be specified in plaintext or in SHA256 (hex format).
However, specifying passwords in plain text is not recommended, even though the operator hashes it when deploying to ClickHouse. It is advisable to provide hashes explicitly as follows:
For enhanced security, the operator supports reading passwords and password hashes from a secret as follows:
Adhering to these guidelines ensures that your ClickHouse Operator and the ClickHouse instances it manages are secure and protected against unauthorized access.
Please refer to the official ClickHouse operator hardening guide for detailed instructions and examples.
Build faster with Propel: A Serverless Clickhouse® for developers
At Propel, we offer a fully managed ClickHouse® service that allows you to focus more on drawing insights from your data and less on infrastructure management. Propel provides data-serving APIs, React components, and built-in multi-tenant access controls, making it easier and faster for you to build data-intensive applications.
You can connect your own ClickHouse® with Propel, whether it's self-hosted or on the ClickHouse Cloud, or take advantage of our fully managed serverless cloud.
Conclusion
The ClickHouse® Operator is a powerful tool for managing ClickHouse® in a Kubernetes environment. It offers a suite of features like automated cluster provisioning, scaling, monitoring, and automatic backups. Whether you're running a single ClickHouse instance or managing a fleet of ClickHouse clusters, the ClickHouse Operator can simplify your operations and make your life easier.
For more information on how to use the ClickHouse Operator, be sure to check out the official ClickHouse Operator documentation.
Further reading
For more insights on how to use ClickHouse for your data operations, check out our other posts. We cover a wide range of topics, from advanced querying techniques to performance tuning for large datasets. Whether you're a beginner or an experienced data professional, there's always something new to learn!