# What is Claudie (https://docs.claudie.io/latest/\#what-is-claudie) Claudie is a platform for managing multi-cloud and hybrid-cloud Kubernetes clusters. These Kubernetes clusters can mix and match nodepools from various cloud providers, e.g. a single cluster can have a nodepool in AWS, another in GCP and another one on-premises. This is our opinionated way to build multi-cloud and hybrid-cloud Kubernetes infrastructure. On top of that Claudie supports Cluster Autoscaler on the managed clusters. ## Vision (https://docs.claudie.io/latest/\#vision) The purpose of Claudie is to become the final Kubernetes engine you'll ever need. It aims to build clusters that leverage features and costs across multiple cloud vendors and on-prem datacenters. A Kubernetes that you won't ever need to migrate away from. ## Use cases (https://docs.claudie.io/latest/\#use-cases) Claudie has been built as an answer to the following Kubernetes challenges: - Cost savings - Data locality & compliance (e.g. GDPR) - Managed Kubernetes for providers that do not offer it - Cloud bursting - Service interconnet You can read more [here](https://docs.claudie.io/latest/use-cases/use-cases/). ## Features (https://docs.claudie.io/latest/\#features) Claudie covers you with the following features functionalities: - Manage multi-cloud and hybrid-cloud Kubernetes clusters - Management via IaC - Fast scale-up/scale-down of your infrastructure - Loadbalancing - Persistent storage volumes See more in How Claudie works sections. ## What to do next (https://docs.claudie.io/latest/\#what-to-do-next) In case you are not sure where to go next, you can just simply start with our [Getting Started Guide](https://docs.claudie.io/latest/getting-started/get-started-using-claudie/) or read our documentation [sitemap](https://docs.claudie.io/latest/sitemap/sitemap/). If you need help or want to chat with us, feel free to join our slack channel [![Alt text](https://docs.claudie.io/latest/slack_logo.png)](https://join.slack.com/t/claudieio-workspace/shared_invite/zt-365u5692o-hwb2IEwPHDe6U~bpXkIAdg) [Skip to content](https://docs.claudie.io/latest/sitemap/sitemap/#sitemap) # Sitemap (https://docs.claudie.io/latest/sitemap/sitemap/\#sitemap) This section contains a brief descriptions about main parts of the Claudie's documentation. ## Getting Started (https://docs.claudie.io/latest/sitemap/sitemap/\#getting-started) The "Getting Started" section is where you'll learn how to begin using Claudie. We'll guide you through the initial steps and show you how to set things up, so you can start using the software right away. You'll also find helpful information on how to customize Claudie to suit your needs, including specifications for the settings you can adjust, and examples of how to use configuration files to get started. By following the steps in this section, you'll have everything you need to start using Claudie with confidence! ## Input manifest (https://docs.claudie.io/latest/sitemap/sitemap/\#input-manifest) This section contains examples of YAML files of the InputManifest CRD that tell Claudie what should an infrastructure look like. Besides these files, you can also find an API reference for the InputManifest CRD there. ## How Claudie works (https://docs.claudie.io/latest/sitemap/sitemap/\#how-claudie-works) In this section, we'll show you how Claudie works and guide you through our workflow. We'll explain how we store and manage data, balance the workload across different parts of the system, and automatically adjust resources to handle changes in demand. By following our explanations, you'll gain a better understanding of how Claudie operates and be better equipped to use it effectively. ## Claudie Use Cases (https://docs.claudie.io/latest/sitemap/sitemap/\#claudie-use-cases) The "Claudie Use Cases" section includes examples of different ways you can use Claudie to solve various problems. We've included these examples to help you understand the full range of capabilities Claudie offers and to show you how it can be applied in different scenarios. By exploring these use cases, you'll get a better sense of how Claudie can be a valuable tool for your work. ## FAQ (https://docs.claudie.io/latest/sitemap/sitemap/\#faq) You may find helpful answers in our FAQ section. ## Roadmap for Claudie (https://docs.claudie.io/latest/sitemap/sitemap/\#roadmap-for-claudie) In this section, you'll find a roadmap for Claudie that outlines the features we've already added and those we plan to add in the future. By checking out the roadmap, you'll be able to stay informed about the latest updates and see how Claudie is evolving to meet the needs of its users. ## Contributing (https://docs.claudie.io/latest/sitemap/sitemap/\#contributing) In this section, we've gathered all the information you'll need if you want to help contribute to the Claudie project or release a new version of the software. By checking out this section, you'll get a better sense of what's involved in contributing and how you can be part of making Claudie even better. ## Changelog (https://docs.claudie.io/latest/sitemap/sitemap/\#changelog) The "changelog" section is where you can find information about all the changes, updates, and issues related to each version of Claudie. ## Latency limitations (https://docs.claudie.io/latest/sitemap/sitemap/\#latency-limitations) In this section, we describe a latency limitations, which you should take into an account, when desiging your infrastructure. ## Troubleshooting (https://docs.claudie.io/latest/sitemap/sitemap/\#troubleshooting) In case you run into issues, we recommend following some of the trobleshooting guides in this section. ## Creating Claudie Backup (https://docs.claudie.io/latest/sitemap/sitemap/\#creating-claudie-backup) This section describes steps to back up claudie and its dependencies. ## Claudie Hardening (https://docs.claudie.io/latest/sitemap/sitemap/\#claudie-hardening) This section describes how to further configure the default claudie deployment. It is highly recommended that you read this section. ## Prometheus Monitoring (https://docs.claudie.io/latest/sitemap/sitemap/\#prometheus-monitoring) In this section we walk you through the setup of Claudie's Prometheus metrics to gain visibility into various metrics that Claudie exposes. ## Updating Claudie (https://docs.claudie.io/latest/sitemap/sitemap/\#updating-claudie) This section describes how to execute updates, such as OS or kubernetes version, in Claudie. ## Deploying Node-Local-DNS (https://docs.claudie.io/latest/sitemap/sitemap/\#deploying-node-local-dns) Claudie doesn't deploy Node-Local-DNS in the default mode, thus you have to install it independently. This section provides a step-by-step guide on how to do it. ## Command Cheat Sheet (https://docs.claudie.io/latest/sitemap/sitemap/\#command-cheat-sheet) The "Command Cheat Sheet" section contains a useful `kubectl` commands to interact with Claudie. ## Version matrix (https://docs.claudie.io/latest/sitemap/sitemap/\#version-matrix) In this section, you can find supported Kubernetes and OS versions for the latest Claudie versions. [Skip to content](https://docs.claudie.io/latest/getting-started/get-started-using-claudie/#get-started-using-claudie) # Getting started ## Get started using Claudie(https://docs.claudie.io/latest/getting-started/get-started-using-claudie/\#get-started-using-claudie) ### Prerequisites(https://docs.claudie.io/latest/getting-started/get-started-using-claudie/\#prerequisites) Before you begin, please make sure you have the following prerequisites installed and set up: 1. Claudie needs to be installed on an existing Kubernetes cluster, referred to as the _Management Cluster_, which it uses to manage the clusters it provisions. For testing, you can use ephemeral clusters like Minikube or Kind. However, for production environments, we recommend using a more resilient solution since Claudie maintains the state of the infrastructure it creates. 2. Claudie requires the installation of cert-manager in your Management Cluster. To install cert-manager, use the following command: ``` kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml ``` ### Supported providers(https://docs.claudie.io/latest/getting-started/get-started-using-claudie/\#supported-providers) | Supported Provider | Node Pools | DNS | DNS healthchecks | | --- | --- | --- | --- | | [AWS](https://docs.claudie.io/latest/input-manifest/providers/aws/) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | | [Azure](https://docs.claudie.io/latest/input-manifest/providers/azure/) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | | [GCP](https://docs.claudie.io/latest/input-manifest/providers/gcp/) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | | [OCI](https://docs.claudie.io/latest/input-manifest/providers/oci/) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | | [Hetzner](https://docs.claudie.io/latest/input-manifest/providers/hetzner/) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | N/A | | [Cloudflare](https://docs.claudie.io/latest/input-manifest/providers/cloudflare/) | N/A | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | ![✔](https://cdn.jsdelivr.net/gh/jdecked/twemoji@15.1.0/assets/svg/2714.svg) | For adding support for other cloud providers, open an issue or propose a PR. ### Install Claudie(https://docs.claudie.io/latest/getting-started/get-started-using-claudie/\#install-claudie) 1. Deploy Claudie to the Management Cluster: ``` kubectl apply -f https://github.com/berops/claudie/releases/latest/download/claudie.yaml ``` To further harden claudie, you may want to deploy our pre-defined network policies: ``` # for clusters using cilium as their CNI kubectl apply -f https://github.com/berops/claudie/releases/latest/download/network-policy-cilium.yaml ``` ``` # other kubectl apply -f https://github.com/berops/claudie/releases/latest/download/network-policy.yaml ``` ### Deploy your cluster(https://docs.claudie.io/latest/getting-started/get-started-using-claudie/\#deploy-your-cluster) 1. Create Kubernetes Secret resource for your provider configuration. ``` kubectl create secret generic example-aws-secret-1 \ --namespace=mynamespace \ --from-literal=accesskey='myAwsAccessKey' \ --from-literal=secretkey='myAwsSecretKey' ``` Check the [supported providers](https://docs.claudie.io/latest/getting-started/get-started-using-claudie/#supported-providers) for input manifest examples. For an input manifest spanning all supported hyperscalers checkout out [this example](https://docs.claudie.io/latest/input-manifest/example/). 2. Deploy InputManifest resource which Claudie uses to create infrastructure, include the created secret in `.spec.providers` as follows: ``` kubectl apply -f - <--kubeconfig` in the namespace where it is deployed: 1. Recover kubeconfig of your cluster by running: ``` kubectl get secrets -n claudie -l claudie.io/output=kubeconfig -o jsonpath='{.items[0].data.kubeconfig}' | base64 -d > your_kubeconfig.yaml ``` 2. Use your new kubeconfig: ``` kubectl get pods -A --kubeconfig=your_kubeconfig.yaml ``` ### Cleanup(https://docs.claudie.io/latest/getting-started/get-started-using-claudie/\#cleanup) 1. To remove your cluster and its associated infrastructure, delete the cluster definition block from the InputManifest: ``` kubectl apply -f - < my-super-cluster-kubeconfig.yaml ``` If you want to connect to your dynamic k8s nodes via SSH, you can recover private SSH key: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq '.dynamic_nodepools | map_values(.nodepool_private_key)' ``` To recover public IP of your dynamic k8s nodes to connect to via SSH: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq -r .dynamic_nodepools.node_ips ``` In case you want to connect to your dynamic load balancer nodes via SSH, you can recover private SSH key: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq '.dynamic_load_balancer_nodepools | .[]' ``` To recover public IP addresses of your dynamic load balancer nodes to connect to via SSH: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq -r '.dynamic_load_balancer_nodepools[] | .node_ips' ``` Each secret created by Claudie has following labels: | Key | Value | | --- | --- | | `claudie.io/project` | Name of the project. | | `claudie.io/cluster` | Name of the cluster. | | `claudie.io/cluster-id` | ID of the cluster. | | `claudie.io/output` | Output type, either `kubeconfig` or `metadata`. | 11. Use your new kubeconfig to see what’s in your new cluster ``` kubectl get pods -A --kubeconfig=my-super-cluster-kubeconfig.yaml ``` 12. Let's add a bursting autoscaling node pool in Hetzner cloud. In order to use other hyperscalers, we'll need to add a new provider with appropriate credentials. First we will create a provider secret for Hetzner Cloud, then we open `inputmanifest-bursting.yaml` input manifest again and append the new Hetzner node pool configuration. ``` # Hetzner provider requires the secrets to have field: credentials kubectl create secret generic hetzner-secret-1 --namespace=mynamespace --from-literal=credentials='kslISA878a6etYAfXYcg5iYyrFGNlCxcICo060HVEygjFs21nske76ksjKko21lp' ``` Claudie autoscaling Autoscaler in Claudie is deployed in Claudie management cluster and provisions additional resources remotely at the time of need. For more information check out how [Claudie autoscaling](https://docs.claudie.io/latest/autoscaling/autoscaling.md) works. ``` # inputmanifest-bursting.yaml apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: cloud-bursting labels: app.kubernetes.io/part-of: claudie spec: providers: - name: hetzner-1 # add under nodePools.dynamic section providerType: hetzner secretRef: name: hetzner-secret-1 namespace: mynamespace nodePools: dynamic: ... - name: hetzner-worker # add under nodePools.dynamic section providerSpec: name: hetzner-1 # use your new hetzner provider hetzner-1 to create these nodes region: hel1 zone: hel1-dc2 serverType: cpx52 image: ubuntu-22.04 autoscaler: # this node pool uses a claudie autoscaler instead of static count of nodes min: 1 max: 10 kubernetes: clusters: - name: my-super-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - aws-control compute: - aws-worker - hetzner-worker # add it to the compute list here ... ``` 13. Update the crd with the new InputManifest to incorporate the desired changes. Deleting existing secrets! **Deleting or replacing existing input manifest secrets triggers cluster deletion!** To add new components to your existing clusters, generate a new secret value and apply it using the following command. ``` kubectl apply -f ./inputmanifest-bursting.yaml ``` 14. You can also passthrough additional ports from load balancers to control plane and or worker node pools by adding additional roles under `roles`. ``` # inputmanifest-bursting.yaml apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: cloud-bursting labels: app.kubernetes.io/part-of: claudie spec: ... loadBalancers: roles: - name: apiserver protocol: tcp port: 6443 targetPort: 6443 targetPools: # only loadbalances for port 6443 for the aws-control nodepool - aws-control - name: https protocol: tcp port: 443 targetPort: 443 targetPools: # only loadbalances for port 443 for the aws-worker nodepool - aws-worker # possible to add other nodepools, hetzner-worker, for example clusters: - name: loadbalance-me roles: - apiserver - https # define it here dns: dnsZone: domain.com provider: aws-dns hostname: supercluster targetedK8s: my-super-cluster pools: - aws-lb ``` Load balancing Please refer how our load balancing works by reading our [documentation](https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/). 15. Update the InputManifest again with the new configuration. ``` kubectl apply -f ./inputmanifest-bursting.yaml ``` 16. To delete the cluster just simply delete the secret and wait for Claudie to destroy it. ``` kubectl delete -f ./inputmanifest-bursting.yaml ``` Removing clusters Deleting Claudie or the management cluster does not remove the Claudie managed clusters. Delete the secret first to initiate Claudie's deletion process. 17. After Claudie-operator finished deletion workflow delete minikube cluster ``` kind delete cluster ``` ## General tips(https://docs.claudie.io/latest/getting-started/detailed-guide/\#general-tips) ### Control plane considerations(https://docs.claudie.io/latest/getting-started/detailed-guide/\#control-plane-considerations) - **Single Control Plane Node:** Node pool with one machine manages your cluster. - **Multiple Control Plane Nodes:** Control plane node pool that has more than one node. - **Load Balancer Requirement:** A load balancer is optional for high availability setup, however we recommend it. Include an additional node pool for load balancers. - **DNS Requirement:** If you want to use load balancing, you will need a registered domain name, and a hosted zone. Claudie creates a failover DNS record for the load balancer machines. - **Supported DNS providers:** If your DNS provider is not supported, delegate a subdomain to a supported DNS provider, refer to supported DNS providers. - **Egress Traffic**: Hyperscalers charge for outbound data and multi-region infrastructure. To avoid egress traffic deploy control plane node pools in the same region to one hypoerscaler. If availability is more important than egress traffic costs, you can have multiple control plane node pools spanning across different hyperscalers. ### Egress traffic(https://docs.claudie.io/latest/getting-started/detailed-guide/\#egress-traffic) Hyperscalers charge for outbound data and multi-region infrastructure. - **Control plane:** To avoid egress traffic deploy control plane node pools in the same region to one hyperscaler. If availability is more important than egress traffic costs, you can have multiple control plane node pools spanning across different hyperscalers. - **Workloads:** Egress costs associated with workloads are more complicated as they depend on each use case. What we recommend it to try and use localised workloads where possible. Example Consider a scenario where you have a workload that involves processing extensive datasets from GCP storage using Claudie managed AWS GPU instances. To minimize egress network traffic costs, it is recommended to host the datasets in an S3 bucket and limit egress traffic from GCP and keep the workload localised. ### On your own path(https://docs.claudie.io/latest/getting-started/detailed-guide/\#on-your-own-path) Once you've gained a comprehensive understanding of how Claudie operates through this guide, you can deploy it to a reliable management cluster, this could be a cluster that you already have. Tailor your input manifest file to suit your specific requirements and explore a detailed example showcasing providers, load balancing, and DNS records across various hyperscalers by visiting this [comprehensive example](https://docs.claudie.io/latest/input-manifest/example/). ## Claudie customization(https://docs.claudie.io/latest/getting-started/detailed-guide/\#claudie-customization) All of the customisable settings can be found in `claudie/.env` file. | Variable | Default | Type | Description | | --- | --- | --- | --- | | `GOLANG_LOG` | `info` | string | Log level for all services. Can be either `info` or `debug`. | | `HTTP_PROXY_MODE` | `default` | string | `default`, `on` or `off`. `default` utilizes HTTP proxy only when there's at least one node in the K8s cluster from the Hetzner cloud provider. `on` uses HTTP proxy even when the K8s cluster doesn't have any nodes from the Hetzner. `off` turns off the usage of HTTP proxy. If the value isn't set or differs from `on` or `off` it always works with the `default`. | | `HTTP_PROXY_URL` | `http://proxy.claudie.io:8880` | string | HTTP proxy URL used in kubeone [proxy configuration](https://docs.kubermatic.com/kubeone/latest/guides/proxy/) to build the K8s cluster. | | `DATABASE_HOSTNAME` | `mongodb` | string | Database hostname used for Claudie configs. | | `MANAGER_HOSTNAME` | `manager` | string | Manager service hostname. | | `TERRAFORMER_HOSTNAME` | `terraformer` | string | Terraformer service hostname. | | `ANSIBLER_HOSTNAME` | `ansibler` | string | Ansibler service hostname. | | `KUBE_ELEVEN_HOSTNAME` | `kube-eleven` | string | Kube-eleven service hostname. | | `KUBER_HOSTNAME` | `kuber` | string | Kuber service hostname. | | `MINIO_HOSTNAME` | `minio` | string | MinIO hostname used for state files. | | `AWS_REGION` | `local` | string | Region for MinIO. | | `DATABASE_PORT` | 27017 | int | Port of the database service. | | `TERRAFORMER_PORT` | 50052 | int | Port of the Terraformer service. | | `ANSIBLER_PORT` | 50053 | int | Port of the Ansibler service. | | `KUBE_ELEVEN_PORT` | 50054 | int | Port of the Kube-eleven service. | | `MANAGER_PORT` | 50055 | int | Port of the MANAGER service. | | `KUBER_PORT` | 50057 | int | Port of the Kuber service. | | `MINIO_PORT` | 9000 | int | Port of the MinIO service. | [Skip to content](https://docs.claudie.io/latest/input-manifest/providers/aws/#aws) # AWS(https://docs.claudie.io/latest/input-manifest/providers/aws/\#aws) AWS cloud provider requires you to input the credentials as an `accesskey` and a `secretkey`. ## Compute and DNS example(https://docs.claudie.io/latest/input-manifest/providers/aws/\#compute-and-dns-example) ``` apiVersion: v1 kind: Secret metadata: name: aws-secret data: accesskey: U0xEVVRLU0hGRE1TSktESUFMQVNTRA== secretkey: aXVoYk9JSk4rb2luL29saWtEU2Fkc25vaVNWU0RzYWNvaW5PVVNIRA== type: Opaque ``` ## Create AWS credentials(https://docs.claudie.io/latest/input-manifest/providers/aws/\#create-aws-credentials) ### Prerequisites(https://docs.claudie.io/latest/input-manifest/providers/aws/\#prerequisites) 1. Install AWS CLI tools by following [this guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html). 2. Setup AWS CLI on your machine by following [this guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html). 3. Ensure that the regions you're planning to use are enabled in your AWS account. You can check the available regions using [this guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions), and you can enable them using [this guide](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-regions.html). Otherwise, you may encounter a misleading error suggesting your STS token is invalid. ### Creating AWS credentials for Claudie(https://docs.claudie.io/latest/input-manifest/providers/aws/\#creating-aws-credentials-for-claudie) 1. Create a user using AWS CLI: ``` aws iam create-user --user-name claudie ``` 2. Create a policy document with compute and DNS permissions required by Claudie: ``` cat > policy.json < policy.json < az role assignment create --assignee claudie-sp --role "Network Contributor" --scope --scope /subscriptions/ az role assignment create --assignee claudie-sp --role "Resource Group Management" --scope --scope /subscriptions/ } ``` Use built-in role as alternative to custom role If you're not using the custom **Resource Group Management** role, assign the built-in role **Kubernetes Agent Subscription Level Operator**. ## DNS requirements(https://docs.claudie.io/latest/input-manifest/providers/azure/\#dns-requirements) If you wish to use Azure as your DNS provider where Claudie creates DNS records pointing to Claudie managed clusters, you will need to create a **public DNS zone** by following [this guide](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal#prerequisites). Azure is not my domain registrar If you haven't acquired a domain via Azure and wish to utilize Azure for hosting your zone, you can refer to [this guide](https://learn.microsoft.com/en-us/azure/dns/dns-delegate-domain-azure-dns#retrieve-name-servers) on Azure nameservers. However, if you prefer not to use the entire domain, an alternative option is to delegate a subdomain to Azure. ## Input manifest examples(https://docs.claudie.io/latest/input-manifest/providers/azure/\#input-manifest-examples) ### Single provider, multi region cluster example(https://docs.claudie.io/latest/input-manifest/providers/azure/\#single-provider-multi-region-cluster-example) #### Create a secret for Azure provider(https://docs.claudie.io/latest/input-manifest/providers/azure/\#create-a-secret-for-azure-provider) The secret for an Azure provider must include the following mandatory fields: `clientsecret`, `subscriptionid`, `tenantid`, and `clientid`. ``` kubectl create secret generic azure-secret-1 --namespace=mynamespace --from-literal=clientsecret='Abcd~EFg~H6Ijkls~ABC15sEFGK54s78X~Olk9' --from-literal=subscriptionid='6a4dfsg7-sd4v-f4ad-dsva-ad4v616fd512' --from-literal=tenantid='54cdafa5-sdvs-45ds-546s-df651sfdt614' --from-literal=clientid='0255sc23-76we-87g6-964f-abc1def2gh3l' ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: azure-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: azure-1 providerType: azure secretRef: name: azure-secret-1 namespace: mynamespace nodePools: dynamic: - name: control-az providerSpec: # Name of the provider instance. name: azure-1 # Location of the nodepool. region: North Europe # Zone of the nodepool. zone: "1" count: 2 # VM size name. serverType: Standard_B2s # URN of the image. image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 - name: compute-1-az providerSpec: # Name of the provider instance. name: azure-1 # Location of the nodepool. region: Germany West Central # Zone of the nodepool. zone: "1" count: 2 # VM size name. serverType: Standard_B2s # URN of the image. image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 storageDiskSize: 50 - name: compute-2-az providerSpec: # Name of the provider instance. name: azure-1 # Location of the nodepool. region: North Europe # Zone of the nodepool. zone: "1" count: 2 # VM size name. serverType: Standard_B2s # URN of the image. image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 storageDiskSize: 50 kubernetes: clusters: - name: azure-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-az compute: - compute-2-az - compute-1-az ``` ### Multi provider, multi region clusters example(https://docs.claudie.io/latest/input-manifest/providers/azure/\#multi-provider-multi-region-clusters-example) ``` kubectl create secret generic azure-secret-1 --namespace=mynamespace --from-literal=clientsecret='Abcd~EFg~H6Ijkls~ABC15sEFGK54s78X~Olk9' --from-literal=subscriptionid='6a4dfsg7-sd4v-f4ad-dsva-ad4v616fd512' --from-literal=tenantid='54cdafa5-sdvs-45ds-546s-df651sfdt614' --from-literal=clientid='0255sc23-76we-87g6-964f-abc1def2gh3l' kubectl create secret generic azure-secret-2 --namespace=mynamespace --from-literal=clientsecret='Efgh~ijkL~on43noi~NiuscviBUIds78X~UkL7' --from-literal=subscriptionid='0965bd5b-usa3-as3c-ads1-csdaba6fd512' --from-literal=tenantid='55safa5d-dsfg-546s-45ds-d51251sfdaba' --from-literal=clientid='076wsc23-sdv2-09cA-8sd9-oigv23npn1p2' ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: azure-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: azure-1 providerType: azure secretRef: name: azure-secret-1 namespace: mynamespace - name: azure-2 providerType: azure secretRef: name: azure-secret-2 namespace: mynamespace nodePools: dynamic: - name: control-az-1 providerSpec: # Name of the provider instance. name: azure-1 # Location of the nodepool. region: North Europe # Zone of the nodepool. zone: "1" count: 1 # VM size name. serverType: Standard_B2s # URN of the image. image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 - name: control-az-2 providerSpec: # Name of the provider instance. name: azure-2 # Location of the nodepool. region: Germany West Central # Zone of the nodepool. zone: "2" count: 2 # VM size name. serverType: Standard_B2s # URN of the image. image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 - name: compute-az-1 providerSpec: # Name of the provider instance. name: azure-1 # Location of the nodepool. region: Germany West Central # Zone of the nodepool. zone: "2" count: 2 # VM size name. serverType: Standard_B2s # URN of the image. image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 storageDiskSize: 50 - name: compute-az-2 providerSpec: # Name of the provider instance. name: azure-2 # Location of the nodepool. region: North Europe # Zone of the nodepool. zone: "1" count: 2 # VM size name. serverType: Standard_B2s # URN of the image. image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 storageDiskSize: 50 kubernetes: clusters: - name: azure-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-az-1 - control-az-2 compute: - compute-az-1 - compute-az-2 ``` [Skip to content](https://docs.claudie.io/latest/input-manifest/providers/cloudflare/#cloudflare) # Cloudflare(https://docs.claudie.io/latest/input-manifest/providers/cloudflare/\#cloudflare) Cloudflare provider requires `apitoken` token and `accountid` id field in string format. Cloudflare DNS Load Balancing Claudie creates A DNS records with loadbalancing and healtcheck functionality. To enable this feature, you must have the **Load Balancing** add-on enabled in your [Cloudflare plan](https://www.cloudflare.com/plans/). Without this add-on, Claudie will still create the DNS A records, but they won't be monitored for availability. ## DNS example(https://docs.claudie.io/latest/input-manifest/providers/cloudflare/\#dns-example) ``` apiVersion: v1 kind: Secret metadata: name: cloudflare-secret data: apitoken: a3NsSVNBODc4YTZldFlBZlhZY2c1aVl5ckZHTmxDeGM= accountid: ODU1NGEyM3J0NnU4NmRjNGFzZDE1ODc2NHcyNGIyNTQK type: Opaque ``` ## Create Cloudflare credentials(https://docs.claudie.io/latest/input-manifest/providers/cloudflare/\#create-cloudflare-credentials) You can create Cloudflare API token by following [this guide](https://developers.cloudflare.com/fundamentals/api/get-started/create-token/). The required permissions for the zone you want to use are: ``` Zone:Read DNS:Read DNS:Edit ``` If Claudie will be creating load-balanced DNS records, the following additional permissions are required: ``` Load Balancing:Monitors And Pools:Edit Billing:Read ``` The Billing: Read permission is necessary to verify that the Load Balancing feature is enabled and active in your Cloudflare account. ## DNS setup(https://docs.claudie.io/latest/input-manifest/providers/cloudflare/\#dns-setup) If you wish to use Cloudflare as your DNS provider where Claudie creates DNS records pointing to Claudie managed clusters, you will need to create a **public DNS zone** by following [this guide](https://developers.cloudflare.com/dns/zone-setups/). Cloudflare is not my domain registrar If you haven't acquired a domain via Cloudflare and wish to utilize Cloudflare for hosting your zone, you can refer to [this guide](https://developers.cloudflare.com/dns/zone-setups/full-setup/setup/#update-your-nameservers) on Cloudflare nameservers. However, if you prefer not to use the entire domain, an alternative option is to delegate a subdomain to Cloudflare. ## Input manifest examples(https://docs.claudie.io/latest/input-manifest/providers/cloudflare/\#input-manifest-examples) ### Load balancing example(https://docs.claudie.io/latest/input-manifest/providers/cloudflare/\#load-balancing-example) Showcase example To make this example functional, you need to specify control plane and node pools. This current showcase will produce an error if used as is. ### Create a secret for Cloudflare and AWS providers(https://docs.claudie.io/latest/input-manifest/providers/cloudflare/\#create-a-secret-for-cloudflare-and-aws-providers) The secret for an Cloudflare provider must include the following mandatory fields: `apitoken` and `accountid` ``` kubectl create secret generic cloudflare-secret-1 --namespace=mynamespace --from-literal=apitoken='kslISA878a6etYAfXYcg5iYyrFGNlCxc' --from-literal=accountid='8554a23rt6u86dc4asd158764w24b254' ``` The secret for an AWS provider must include the following mandatory fields: `accesskey` and `secretkey`. ``` kubectl create secret generic aws-secret-1 --namespace=mynamespace --from-literal=accesskey='SLDUTKSHFDMSJKDIALASSD' --from-literal=secretkey='iuhbOIJN+oin/olikDSadsnoiSVSDsacoinOUSHD' ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: cloudflare-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: cloudflare-1 providerType: cloudflare secretRef: name: cloudflare-secret-1 namespace: mynamespace - name: aws-1 providerType: aws secretRef: name: aws-secret-1 namespace: mynamespace nodePools: dynamic: - name: loadbalancer providerSpec: name: aws-1 region: eu-central-1 zone: eu-central-1c count: 2 serverType: t3.medium image: ami-0965bd5ba4d59211c kubernetes: clusters: - name: cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: [] compute: [] loadBalancers: roles: - name: apiserver protocol: tcp port: 6443 targetPort: 6443 targetPools: [] clusters: - name: apiserver-lb-prod roles: - apiserver dns: dnsZone: dns-zone provider: cloudflare-1 hostname: my.fancy.url targetedK8s: prod-cluster pools: - loadbalancer ``` [Skip to content](https://docs.claudie.io/latest/input-manifest/providers/gcp/#gcp) # GCP(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#gcp) GCP provider requires you to input multiline `credentials` as well as specific GCP project ID `gcpproject` where to provision resources. ## Compute and DNS example(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#compute-and-dns-example) ``` apiVersion: v1 kind: Secret metadata: name: gcp-secret data: credentials: >- ewogICAgICAgICAidHlwZSI6InNlcnZpY2VfYWNjb3VudCIsCiAgICAgICAgICJwcm9qZWN0X2lkIjoicHJvamVjdC1jbGF1ZGllIiwKICAgICAgICAgInByaXZhdGVfa2V5X2lkIjoiYnNrZGxvODc1czkwODczOTQ3NjNlYjg0ZTQwNzkwM2xza2RpbXA0MzkiLAogICAgICAgICAicHJpdmF0ZV9rZXkiOiItLS0tLUJFR0lOIFBSSVZBVEUgS0VZLS0tLS1cblNLTE9vc0tKVVNEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkXG5NSUlFdlFJQkFEQU5CZ2txaGtpXG4tLS0tLUVORCBQUklWQVRFIEtFWS0tLS0tXG4iLAogICAgICAgICAiY2xpZW50X2VtYWlsIjoiY2xhdWRpZUBwcm9qZWN0LWNsYXVkaWUtMTIzNDU2LmlhbS5nc2VydmljZWFjY291bnQuY29tIiwKICAgICAgICAgImNsaWVudF9pZCI6IjEwOTg3NjU0MzIxMTIzNDU2Nzg5MCIsCiAgICAgICAgICJhdXRoX3VyaSI6Imh0dHBzOi8vYWNjb3VudHMuZ29vZ2xlLmNvbS9vL29hdXRoMi9hdXRoIiwKICAgICAgICAgInRva2VuX3VyaSI6Imh0dHBzOi8vb2F1dGgyLmdvb2dsZWFwaXMuY29tL3Rva2VuIiwKICAgICAgICAgImF1dGhfcHJvdmlkZXJfeDUwOV9jZXJ0X3VybCI6Imh0dHBzOi8vd3d3Lmdvb2dsZWFwaXMuY29tL29hdXRoMi92MS9jZXJ0cyIsCiAgICAgICAgICJjbGllbnRfeDUwOV9jZXJ0X3VybCI6Imh0dHBzOi8vd3d3Lmdvb2dsZWFwaXMuY29tL3JvYm90L3YxL21ldGFkYXRhL3g1MDkvY2xhdWRpZSU0MGNsYXVkaWUtcHJvamVjdC0xMjM0NTYuaWFtLmdzZXJ2aWNlYWNjb3VudC5jb20iCiAgICAgIH0= gcpproject: cHJvamVjdC1jbGF1ZGll # base64 created from GCP project ID type: Opaque ``` ## Create GCP credentials(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#create-gcp-credentials) ### Prerequisites(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#prerequisites) 1. Install gcoud CLI on your machine by following [this guide](https://cloud.google.com/sdk/docs/install). 2. Initialize gcloud CLI by following [this guide](https://cloud.google.com/sdk/docs/initializing). 3. Authorize cloud CLI by following [this guide](https://cloud.google.com/sdk/docs/authorizing) ### Creating GCP credentials for Claudie(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#creating-gcp-credentials-for-claudie) 1. Create a GCP project: ``` gcloud projects create claudie-project ``` 2. Set the current project to claudie-project: ``` gcloud config set project claudie-project ``` 3. Attach billing account to your project: ``` gcloud alpha billing accounts projects link claudie-project (--account-id=ACCOUNT_ID | --billing-account=ACCOUNT_ID) ``` 4. Enable Compute Engine API and Cloud DNS API: ``` { gcloud services enable compute.googleapis.com gcloud services enable dns.googleapis.com } ``` 5. Create a service account: ``` gcloud iam service-accounts create claudie-sa ``` 6. Attach roles to the servcie account: ``` { gcloud projects add-iam-policy-binding claudie-project --member=serviceAccount:claudie-sa@claudie-project.iam.gserviceaccount.com --role=roles/compute.admin gcloud projects add-iam-policy-binding claudie-project --member=serviceAccount:claudie-sa@claudie-project.iam.gserviceaccount.com --role=roles/dns.admin } ``` 7. Recover service account keys for claudie-sa: ``` gcloud iam service-accounts keys create claudie-credentials.json --iam-account=claudie-sa@claudie-project.iam.gserviceaccount.com ``` ## DNS setup(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#dns-setup) If you wish to use GCP as your DNS provider where Claudie creates DNS records pointing to Claudie managed clusters, you will need to create a **public DNS zone** by following [this guide](https://cloud.google.com/dns/docs/zones). GCP is not my domain registrar If you haven't acquired a domain via GCP and wish to utilize GCP for hosting your zone, you can refer to [this guide](https://cloud.google.com/dns/docs/update-name-servers) on GCP nameservers. However, if you prefer not to use the entire domain, an alternative option is to delegate a subdomain to GCP. ## Input manifest examples(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#input-manifest-examples) ### Single provider, multi region cluster example(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#single-provider-multi-region-cluster-example) ### Create a secret for Cloudflare and GCP providers(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#create-a-secret-for-cloudflare-and-gcp-providers) The secret for an GCP provider must include the following mandatory fields: `gcpproject` and `credentials`. ``` # The ./claudie-credentials.json file is the file created in #Creating GCP credentials for Claudie step 7. kubectl create secret generic gcp-secret-1 --namespace=mynamespace --from-literal=gcpproject='project-claudie' --from-file=credentials=./claudie-credentials.json ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: gcp-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: gcp-1 providerType: gcp secretRef: name: gcp-secret-1 namespace: mynamespace nodePools: dynamic: - name: control-gcp providerSpec: # Name of the provider instance. name: gcp-1 # Region of the nodepool. region: europe-west1 # Zone of the nodepool. zone: europe-west1-c count: 1 # Machine type name. serverType: e2-medium # OS image name. image: ubuntu-2404-noble-amd64-v20250313 - name: compute-1-gcp providerSpec: # Name of the provider instance. name: gcp-1 # Region of the nodepool. region: europe-west3 # Zone of the nodepool. zone: europe-west3-a count: 2 # Machine type name. serverType: e2-medium # OS image name. image: ubuntu-2404-noble-amd64-v20250313 storageDiskSize: 50 - name: compute-2-gcp providerSpec: # Name of the provider instance. name: gcp-1 # Region of the nodepool. region: europe-west2 # Zone of the nodepool. zone: europe-west2-a count: 2 # Machine type name. serverType: e2-medium # OS image name. image: ubuntu-2404-noble-amd64-v20250313 storageDiskSize: 50 kubernetes: clusters: - name: gcp-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-gcp compute: - compute-1-gcp - compute-2-gcp ``` ### Multi provider, multi region clusters example(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#multi-provider-multi-region-clusters-example) ### Create a secret for Cloudflare and GCP providers(https://docs.claudie.io/latest/input-manifest/providers/gcp/\#create-a-secret-for-cloudflare-and-gcp-providers_1) The secret for an GCP provider must include the following mandatory fields: `gcpproject` and `credentials`. ``` # The ./claudie-credentials.json file is the file created in #Creating GCP credentials for Claudie step 7. kubectl create secret generic gcp-secret-1 --namespace=mynamespace --from-literal=gcpproject='project-claudie' --from-file=credentials=./claudie-credentials.json kubectl create secret generic gcp-secret-2 --namespace=mynamespace --from-literal=gcpproject='project-claudie' --from-file=credentials=./claudie-credentials-2.json ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: gcp-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: gcp-1 providerType: gcp secretRef: name: gcp-secret-1 namespace: mynamespace - name: gcp-2 providerType: gcp secretRef: name: gcp-secret-2 namespace: mynamespace nodePools: dynamic: - name: control-gcp-1 providerSpec: # Name of the provider instance. name: gcp-1 # Region of the nodepool. region: europe-west1 # Zone of the nodepool. zone: europe-west1-c count: 1 # Machine type name. serverType: e2-medium # OS image name. image: ubuntu-2404-noble-amd64-v20250313 - name: control-gcp-2 providerSpec: # Name of the provider instance. name: gcp-2 # Region of the nodepool. region: europe-west1 # Zone of the nodepool. zone: europe-west1-a count: 2 # Machine type name. serverType: e2-medium # OS image name. image: ubuntu-2404-noble-amd64-v20250313 - name: compute-gcp-1 providerSpec: # Name of the provider instance. name: gcp-1 # Region of the nodepool. region: europe-west3 # Zone of the nodepool. zone: europe-west3-a count: 2 # Machine type name. serverType: e2-medium # OS image name. image: ubuntu-2404-noble-amd64-v20250313 storageDiskSize: 50 - name: compute-gcp-2 providerSpec: # Name of the provider instance. name: gcp-2 # Region of the nodepool. region: europe-west1 # Zone of the nodepool. zone: europe-west1-c count: 2 # Machine type name. serverType: e2-medium # OS image name. image: ubuntu-2404-noble-amd64-v20250313 storageDiskSize: 50 kubernetes: clusters: - name: gcp-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-gcp-1 - control-gcp-2 compute: - compute-gcp-1 - compute-gcp-2 ``` [Skip to content](https://docs.claudie.io/latest/input-manifest/providers/hetzner/#hetzner) # Hetzner(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#hetzner) Hetzner provider requires `credentials` token field in string format, and Hetzner DNS provider requires `apitoken` field in string format. ## Compute example(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#compute-example) ``` apiVersion: v1 kind: Secret metadata: name: hetzner-secret data: credentials: a3NsSVNBODc4YTZldFlBZlhZY2c1aVl5ckZHTmxDeGNJQ28wNjBIVkV5Z2pGczIxbnNrZTc2a3NqS2tvMjFscA== type: Opaque ``` ## DNS example(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#dns-example) ``` apiVersion: v1 kind: Secret metadata: name: hetznerdns-secret data: apitoken: a1V0UmcxcGdqQ1JhYXBQbWQ3cEFJalZnaHVyWG8xY24= type: Opaque ``` No Load-Balanced DNS Support on Hetzner Hetzner does not support load-balanced DNS records with health checks. In the event of a virtual machine failure, the corresponding DNS A record will remain active and will not be automatically removed from the DNS database. ## Create Hetzner API credentials(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#create-hetzner-api-credentials) You can create Hetzner API credentials by following [this guide](https://docs.hetzner.com/cloud/api/getting-started/generating-api-token/). The required permissions for the zone you want to use are: ``` Read & Write ``` ## Create Hetzner DNS credentials(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#create-hetzner-dns-credentials) You can create Hetzner DNS credentials by following [this guide](https://docs.hetzner.com/dns-console/dns/general/api-access-token/). DNS provider specification The provider for DNS is different from the one for the Cloud. ## DNS setup(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#dns-setup) If you wish to use Hetzner as your DNS provider where Claudie creates DNS records pointing to Claudie managed clusters, you will need to create a **public DNS zone** by following [this guide](https://docs.hetzner.com/dns-console/dns/general/getting-started-dns/). Hetzner is not my domain registrar If you haven't acquired a domain via Hetzner and wish to utilize Hetzner for hosting your zone, you can refer to [this guide](https://docs.hetzner.com/dns-console/dns/general/dns-overview#the-hetzner-online-name-servers-are) on Hetzner nameservers. However, if you prefer not to use the entire domain, an alternative option is to delegate a subdomain to Hetzner. ## Input manifest examples(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#input-manifest-examples) ### Single provider, multi region cluster example(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#single-provider-multi-region-cluster-example) #### Create a secret for Hetzner provider(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#create-a-secret-for-hetzner-provider) The secret for an Hetzner provider must include the following mandatory fields: `credentials`. ``` kubectl create secret generic hetzner-secret-1 --namespace=mynamespace --from-literal=credentials='kslISA878a6etYAfXYcg5iYyrFGNlCxcICo060HVEygjFs21nske76ksjKko21lp' ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: hetzner-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: hetzner-1 providerType: hetzner secretRef: name: hetzner-secret-1 namespace: mynamespace nodePools: dynamic: - name: control-htz providerSpec: # Name of the provider instance. name: hetzner-1 # Region of the nodepool. region: hel1 # Datacenter of the nodepool. zone: hel1-dc2 count: 1 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-24.04 - name: compute-1-htz providerSpec: # Name of the provider instance. name: hetzner-1 # Region of the nodepool. region: fsn1 # Datacenter of the nodepool. zone: fsn1-dc14 count: 2 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-24.04 storageDiskSize: 50 - name: compute-2-htz providerSpec: # Name of the provider instance. name: hetzner-1 # Region of the nodepool. region: nbg1 # Datacenter of the nodepool. zone: nbg1-dc3 count: 2 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-24.04 storageDiskSize: 50 kubernetes: clusters: - name: hetzner-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-htz compute: - compute-1-htz - compute-2-htz ``` ### Multi provider, multi region clusters example(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#multi-provider-multi-region-clusters-example) #### Create a secret for Hetzner provider(https://docs.claudie.io/latest/input-manifest/providers/hetzner/\#create-a-secret-for-hetzner-provider_1) The secret for an Hetzner provider must include the following mandatory fields: `credentials`. ``` kubectl create secret generic hetzner-secret-1 --namespace=mynamespace --from-literal=credentials='kslISA878a6etYAfXYcg5iYyrFGNlCxcICo060HVEygjFs21nske76ksjKko21lp' kubectl create secret generic hetzner-secret-2 --namespace=mynamespace --from-literal=credentials='kslIIOUYBiuui7iGBYIUiuybpiUB87bgPyuCo060HVEygjFs21nske76ksjKko21l' ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: hetzner-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: hetzner-1 providerType: hetzner secretRef: name: hetzner-secret-1 namespace: mynamespace - name: hetzner-2 providerType: hetzner secretRef: name: hetzner-secret-2 namespace: mynamespace nodePools: dynamic: - name: control-htz-1 providerSpec: # Name of the provider instance. name: hetzner-1 # Region of the nodepool. region: hel1 # Datacenter of the nodepool. zone: hel1-dc2 count: 1 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-24.04 - name: control-htz-2 providerSpec: # Name of the provider instance. name: hetzner-2 # Region of the nodepool. region: fsn1 # Datacenter of the nodepool. zone: fsn1-dc14 count: 2 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-24.04 - name: compute-htz-1 providerSpec: # Name of the provider instance. name: hetzner-1 # Region of the nodepool. region: fsn1 # Datacenter of the nodepool. zone: fsn1-dc14 count: 2 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-24.04 storageDiskSize: 50 - name: compute-htz-2 providerSpec: # Name of the provider instance. name: hetzner-2 # Region of the nodepool. region: nbg1 # Datacenter of the nodepool. zone: nbg1-dc3 count: 2 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-24.04 storageDiskSize: 50 kubernetes: clusters: - name: hetzner-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-htz-1 - control-htz-2 compute: - compute-htz-1 - compute-htz-2 ``` [Skip to content](https://docs.claudie.io/latest/input-manifest/providers/oci/#oci) # OCI(https://docs.claudie.io/latest/input-manifest/providers/oci/\#oci) OCI provider requires you to input `privatekey`, `keyfingerprint`, `tenancyocid`, `userocid`, and `compartmentocid`. ## Compute and DNS example(https://docs.claudie.io/latest/input-manifest/providers/oci/\#compute-and-dns-example) ``` apiVersion: v1 kind: Secret metadata: name: oci-secret data: compartmentocid: b2NpZDIuY29tcGFydG1lbnQub2MyLi5hYWFhYWFhYWEycnNmdmx2eGMzNG8wNjBrZmR5Z3NkczIxbnNrZTc2a3Nqa2tvMjFscHNkZnNm keyfingerprint: YWI6Y2Q6M2Y6MzQ6MzM6MjI6MzI6MzQ6NTQ6NTQ6NDU6NzY6NzY6Nzg6OTg6YWE= privatekey: >- LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyL2Fza0pTTG9zYWQKICAgICAgICBNSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkCiAgICAgICAgTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDajIvYXNrSlNMb3NhZAogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyL2Fza0pTTG9zYWQKICAgICAgICBNSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkCiAgICAgICAgTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDajIvYXNrSlNMb3NhZAogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyL2Fza0pTTG9zYWQKICAgICAgICBNSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkCiAgICAgICAgTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDajIvYXNrSlNMb3NhZAogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyL2Fza0pTTG9zYWQKICAgICAgICBNSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkCiAgICAgICAgTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDajIvYXNrSlNMb3NhZAogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyL2Fza0pTTG9zYWQKICAgICAgICBNSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkCiAgICAgICAgTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDajIvYXNrSlNMb3NhZAogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyL2Fza0pTTG9zYWQKICAgICAgICBNSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkCiAgICAgICAgTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDajIvYXNrSlNMb3NhZAogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyL2Fza0pTTG9zYWQKICAgICAgICBNSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkCiAgICAgICAgTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDajIvYXNrSlNMb3NhZAogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyL2Fza0pTTG9zYWQKICAgICAgICBNSUlFdlFJQkFEQU5CZ2txaGtpRzl3MEJBUUVGQUFTQ0JLY3dnZ1NqQWdFQUFvSUJBUUNqMi9hc2tKU0xvc2FkCiAgICAgICAgTUlJRXZRSUJBREFOQmdrcWhraUc5dzBCQVFFRkFBU0NCS2N3Z2dTakFnRUFBb0lCQVFDajIvYXNrSlNMb3NhZAogICAgICAgIE1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2oyLz09CiAgICAgICAgLS0tLS1FTkQgUlNBIFBSSVZBVEUgS0VZLS0tLS0= tenancyocid: b2NpZDIudGVuYW5jeS5vYzIuLmFhYWFhYWFheXJzZnZsdnhjMzRvMDYwa2ZkeWdzZHMyMW5za2U3NmtzamtrbzIxbHBzZGZzZnNnYnJ0Z2hz userocid: b2NpZDIudXNlci5vYzIuLmFhYWFhYWFhYWFueXJzZnZsdnhjMzRvMDYwa2ZkeWdzZHMyMW5za2U3NmtzamtrbzIxbHBzZGZzZg== type: Opaque ``` ## Create OCI credentials(https://docs.claudie.io/latest/input-manifest/providers/oci/\#create-oci-credentials) ### Prerequisites(https://docs.claudie.io/latest/input-manifest/providers/oci/\#prerequisites) 1. Install OCI CLI by following [this guide](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm). 2. Configure OCI CLI by following [this guide](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm). ### Creating OCI credentials for Claudie(https://docs.claudie.io/latest/input-manifest/providers/oci/\#creating-oci-credentials-for-claudie) 01. Export your tenant id: ``` export tenancy_ocid="ocid" ``` Find your tenant id You can find it under `Identity & Security` tab and `Compartments` option. 02. Create OCI compartment where Claudie deploys its resources: ``` { oci iam compartment create --name claudie-compartment --description claudie-compartment --compartment-id $tenancy_ocid } ``` 03. Create the claudie user: ``` oci iam user create --name claudie-user --compartment-id $tenancy_ocid --description claudie-user --email ``` 04. Create a group that will hold permissions for the user: ``` oci iam group create --name claudie-group --compartment-id $tenancy_ocid --description claudie-group ``` 05. Generate policy file with necessary permissions: ``` { cat > policy.txt < to manage instance-family in compartment " "Allow group to manage volume-family in compartment " "Allow group to manage virtual-network-family in tenancy" "Allow group to manage dns-zones in compartment ", "Allow group to manage dns-records in compartment ", "Allow group to manage dns-steering-policies in compartment ", "Allow group to manage dns-steering-policy-attachments in compartment ", "Allow group to manage health-check-monitor in compartment " ``` ## Input manifest examples(https://docs.claudie.io/latest/input-manifest/providers/oci/\#input-manifest-examples) ### Single provider, multi region cluster example(https://docs.claudie.io/latest/input-manifest/providers/oci/\#single-provider-multi-region-cluster-example) #### Create a secret for OCI provider(https://docs.claudie.io/latest/input-manifest/providers/oci/\#create-a-secret-for-oci-provider) The secret for an OCI provider must include the following mandatory fields: `compartmentocid`, `userocid`, `tenancyocid`, `keyfingerprint` and `privatekey`. ``` # Refer to values exported in "Creating OCI credentials for Claudie" section kubectl create secret generic oci-secret-1 --namespace=mynamespace --from-literal=compartmentocid=$compartment_ocid --from-literal=userocid=$user_ocid --from-literal=tenancyocid=$tenancy_ocid --from-literal=keyfingerprint=$fingerprint --from-file=privatekey=./claudie-user_public.pem ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: oci-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: oci-1 providerType: oci secretRef: name: oci-secret-1 namespace: mynamespace nodePools: dynamic: - name: control-oci providerSpec: # Name of the provider instance. name: oci-1 # Region of the nodepool. region: eu-milan-1 # Availability domain of the nodepool. zone: hsVQ:EU-MILAN-1-AD-1 count: 1 # VM shape name. serverType: VM.Standard2.2 # OCID of the image ubuntu 24.04. # Make sure to update it according to the region. # https://docs.oracle.com/en-us/iaas/images/ubuntu-2404/canonical-ubuntu-24-04-2024-08-28-0.htm image: ocid1.image.oc1.eu-milan-1.aaaaaaaa2ixn6kthb7vn6mom6bv7fts4omou5sowilrqfub2e7ouweiirkbq - name: compute-1-oci providerSpec: # Name of the provider instance. name: oci-1 # Region of the nodepool. region: eu-frankfurt-1 # Availability domain of the nodepool. zone: hsVQ:EU-FRANKFURT-1-AD-1 count: 2 # VM shape name. serverType: VM.Standard2.1 # OCID of the image ubuntu 24.04. # Make sure to update it according to the region. # https://docs.oracle.com/en-us/iaas/images/ubuntu-2404/canonical-ubuntu-24-04-2024-08-28-0.htm image: ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa7hxwyz4qiasffo7n7s4ep5lywpzwgkc2am65frqrqinoyitmxxla storageDiskSize: 50 - name: compute-2-oci providerSpec: # Name of the provider instance. name: oci-1 # Region of the nodepool. region: eu-frankfurt-1 # Availability domain of the nodepool. zone: hsVQ:EU-FRANKFURT-1-AD-2 count: 2 # VM shape name. serverType: VM.Standard2.1 # OCID of the image ubuntu 24.04. # Make sure to update it according to the region. # https://docs.oracle.com/en-us/iaas/images/ubuntu-2404/canonical-ubuntu-24-04-2024-08-28-0.htm image: ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa7hxwyz4qiasffo7n7s4ep5lywpzwgkc2am65frqrqinoyitmxxla storageDiskSize: 50 kubernetes: clusters: - name: oci-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-oci compute: - compute-1-oci - compute-2-oci ``` ### Multi provider, multi region clusters example(https://docs.claudie.io/latest/input-manifest/providers/oci/\#multi-provider-multi-region-clusters-example) #### Create a secret for OCI provider(https://docs.claudie.io/latest/input-manifest/providers/oci/\#create-a-secret-for-oci-provider_1) The secret for an OCI provider must include the following mandatory fields: `compartmentocid`, `userocid`, `tenancyocid`, `keyfingerprint` and `privatekey`. ``` # Refer to values exported in "Creating OCI credentials for Claudie" section kubectl create secret generic oci-secret-1 --namespace=mynamespace --from-literal=compartmentocid=$compartment_ocid --from-literal=userocid=$user_ocid --from-literal=tenancyocid=$tenancy_ocid --from-literal=keyfingerprint=$fingerprint --from-file=privatekey=./claudie-user_public.pem kubectl create secret generic oci-secret-2 --namespace=mynamespace --from-literal=compartmentocid=$compartment_ocid2 --from-literal=userocid=$user_ocid2 --from-literal=tenancyocid=$tenancy_ocid2 --from-literal=keyfingerprint=$fingerprint2 --from-file=privatekey=./claudie-user_public2.pem ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: oci-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: oci-1 providerType: oci secretRef: name: oci-secret-1 namespace: mynamespace - name: oci-2 providerType: oci secretRef: name: oci-secret-2 namespace: mynamespace nodePools: dynamic: - name: control-oci-1 providerSpec: # Name of the provider instance. name: oci-1 # Region of the nodepool. region: eu-milan-1 # Availability domain of the nodepool. zone: hsVQ:EU-MILAN-1-AD-1 count: 1 # VM shape name. serverType: VM.Standard2.2 # OCID of the image ubuntu 24.04. # Make sure to update it according to the region. # https://docs.oracle.com/en-us/iaas/images/ubuntu-2404/canonical-ubuntu-24-04-2024-08-28-0.htm image: ocid1.image.oc1.eu-milan-1.aaaaaaaa2ixn6kthb7vn6mom6bv7fts4omou5sowilrqfub2e7ouweiirkbq - name: control-oci-2 providerSpec: # Name of the provider instance. name: oci-2 # Region of the nodepool. region: eu-frankfurt-1 # Availability domain of the nodepool. zone: hsVQ:EU-FRANKFURT-1-AD-3 count: 2 # VM shape name. serverType: VM.Standard2.1 # OCID of the image ubuntu 24.04. # Make sure to update it according to the region. # https://docs.oracle.com/en-us/iaas/images/ubuntu-2404/canonical-ubuntu-24-04-2024-08-28-0.htm image: ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa7hxwyz4qiasffo7n7s4ep5lywpzwgkc2am65frqrqinoyitmxxla - name: compute-oci-1 providerSpec: # Name of the provider instance. name: oci-1 # Region of the nodepool. region: eu-frankfurt-1 # Availability domain of the nodepool. zone: hsVQ:EU-FRANKFURT-1-AD-1 count: 2 # VM shape name. serverType: VM.Standard2.1 # OCID of the image ubuntu 24.04. # Make sure to update it according to the region. # https://docs.oracle.com/en-us/iaas/images/ubuntu-2404/canonical-ubuntu-24-04-2024-08-28-0.htm image: ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa7hxwyz4qiasffo7n7s4ep5lywpzwgkc2am65frqrqinoyitmxxla storageDiskSize: 50 - name: compute-oci-2 providerSpec: # Name of the provider instance. name: oci-2 # Region of the nodepool. region: eu-milan-1 # Availability domain of the nodepool. zone: hsVQ:EU-MILAN-1-AD-1 count: 2 # VM shape name. serverType: VM.Standard2.1 # OCID of the image ubuntu 24.04. # Make sure to update it according to the region. # https://docs.oracle.com/en-us/iaas/images/ubuntu-2404/canonical-ubuntu-24-04-2024-08-28-0.htm image: ocid1.image.oc1.eu-milan-1.aaaaaaaa2ixn6kthb7vn6mom6bv7fts4omou5sowilrqfub2e7ouweiirkbq storageDiskSize: 50 kubernetes: clusters: - name: oci-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-oci-1 - control-oci-2 compute: - compute-oci-1 - compute-oci-2 ``` ### Flex instances example(https://docs.claudie.io/latest/input-manifest/providers/oci/\#flex-instances-example) #### Create a secret for OCI provider(https://docs.claudie.io/latest/input-manifest/providers/oci/\#create-a-secret-for-oci-provider_2) The secret for an OCI provider must include the following mandatory fields: `compartmentocid`, `userocid`, `tenancyocid`, `keyfingerprint` and `privatekey`. ``` # Refer to values exported in "Creating OCI credentials for Claudie" section kubectl create secret generic oci-secret-1 --namespace=mynamespace --from-literal=compartmentocid=$compartment_ocid --from-literal=userocid=$user_ocid --from-literal=tenancyocid=$tenancy_ocid --from-literal=keyfingerprint=$fingerprint --from-file=privatekey=./claudie-user_public.pem ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: oci-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: oci-1 providerType: oci secretRef: name: oci-secret-1 namespace: mynamespace nodePools: dynamic: - name: oci providerSpec: # Name of the provider instance. name: oci-1 # Region of the nodepool. region: eu-frankfurt-1 # Availability domain of the nodepool. zone: hsVQ:EU-FRANKFURT-1-AD-1 count: 2 # VM shape name. serverType: VM.Standard.E4.Flex # further describes the selected server type. machineSpec: # use 2 ocpus. cpuCount: 2 # use 8 gb of memory. memory: 8 # OCID of the image ubuntu 24.04. # Make sure to update it according to the region. # https://docs.oracle.com/en-us/iaas/images/ubuntu-2404/canonical-ubuntu-24-04-2024-08-28-0.htm image: ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa7hxwyz4qiasffo7n7s4ep5lywpzwgkc2am65frqrqinoyitmxxla storageDiskSize: 50 kubernetes: clusters: - name: oci-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - oci compute: - oci ``` [Skip to content](https://docs.claudie.io/latest/input-manifest/providers/on-prem/#on-premise-nodes) # On premise nodes(https://docs.claudie.io/latest/input-manifest/providers/on-prem/\#on-premise-nodes) Claudie is designed to leverage your existing infrastructure and utilise it for building Kubernetes clusters together with supported cloud providers. However, Claudie operates under a few assumptions: 1. Accessibility of Machines: Claudie requires access to the machines specified by the provided endpoint. It needs the ability to connect to these machines in order to perform necessary operations. 2. Connectivity between Static Nodes: Static nodes within the infrastructure should be able to communicate with each other using the specified endpoints. This connectivity is important for proper functioning of the Kubernetes cluster. 3. SSH Access with Root Privileges: Claudie relies on SSH access to the nodes using the SSH key provided in the input manifest. The SSH key should grant root privileges to enable Claudie to perform required operations on the nodes. 4. Meeting the Kubernetes nodes requirements: Learn [more](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#before-you-begin). By ensuring that these assumptions are met, Claudie can effectively utilise your infrastructure and build Kubernetes clusters while collaborating with the supported cloud providers. ## Private key example secret(https://docs.claudie.io/latest/input-manifest/providers/on-prem/\#private-key-example-secret) ``` apiVersion: v1 kind: Secret metadata: name: static-node-key data: privatekey: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcFFJQkFBS0NBUUVBbzJEOGNYb0Uxb3VDblBYcXFpVW5qbHh0c1A4YXlKQW4zeFhYdmxLOTMwcDZBUzZMCncvVW03THFnbUhpOW9GL3pWVnB0TDhZNmE2NWUvWjk0dE9SQ0lHY0VJendpQXF3M3M4NGVNcnoyQXlrSWhsWE0KVEpSS3J3SHJrbDRtVlBvdE9paDFtZkVTenFMZ25TMWdmQWZxSUVNVFdOZlRkQmhtUXpBNVJFT2NpQ1Q1dFRnMApraDI1SmVHeU9qR3pzaFhkKzdaVi9PUXVQUk5Mb2lrQzFDVFdtM0FSVFFDeUpZaXR5bURVeEgwa09wa2VyODVoCmpFRTRkUnUxVzQ2WDZkdEUrSlBZNkNKRlR2c1VUcGlqT3QzQmNTSTYyY2ZyYmFRYXhvQXk2bEJLVlB1cm1xYm0Kb09JNHVRUWJWRGt5Q3V4MzcwSTFjTUVzWkszYVNBa0ZZSUlMRndJREFRQUJBb0lCQUVLUzFhc2p6bTdpSUZIMwpQeTBmd0xPWTVEVzRiZUNHSlVrWkxIVm9YK2hwLzdjVmtXeERMQjVRbWZvblVSWFZvMkVIWFBDWHROeUdERDBLCnkzUGlnek9TNXJPNDRCNzRzQ1g3ZW9Dd1VRck9vS09rdUlBSCtUckE3STRUQVVtbE8rS3o4OS9MeFI4Z2JhaCsKZ2c5b1pqWEpQMHYzZmptVGE3QTdLVXF3eGtzUEpORFhyN0J2MkhGc3ZueHROTkhWV3JBcjA3NUpSU2U3akJIRgpyQnpIRGFOUUhjYWwybTJWbDAvbGM4SVgyOEIwSXBYOEM5ajNqVGUwRS9XOVYyaURvM0ZvbmZzVU1BSm9KeW1nCkRzRXFxb25Cc0ZFeE9iY1BUNlh4SHRLVHVXMkRDRHF3c20xTVM2L0xUZzRtMFZ0alBRbGE5cnd0Z1lQcEtVSWYKbkRya3ZBRUNnWUVBOC9EUTRtNWF4UE0xL2d4UmVFNVZJSEMzRjVNK0s0S0dsdUNTVUNHcmtlNnpyVmhOZXllMwplbWpUV21lUmQ4L0szYzVxeGhJeGkvWE8vc0ZvREthSjdHaVl4L2RiOEl6dlJZYkw2ZHJiOVh0aHVObmhJWTlkCmJPd0VhbWxXZGxZbzlhUTBoYTFpSHpoUHVhMjN0TUNiM2xpZzE3MVZuUURhTXlhS3plaVMxUmNDZ1lFQXEzU2YKVEozcDRucmh4VjJiMEJKUStEdjkrRHNiZFBCY0pPbHpYVVVodHB6d3JyT3VKdzRUUXFXeG1pZTlhK1lpSzd0cAplY2YyOEltdHY0dy9aazg1TUdmQm9hTkpwdUNmNWxoMElseDB3ZXROQXlmb3dTNHZ3dUlZNG1zVFlvcE1WV20yClV5QzlqQ1M4Q0Y2Y1FrUVdjaVVlc2dVWHFocE50bXNLTG9LWU9nRUNnWUVBNWVwZVpsd09qenlQOGY4WU5tVFcKRlBwSGh4L1BZK0RsQzRWa1FjUktXZ1A2TTNKYnJLelZZTGsySXlva1VDRjRHakI0TUhGclkzZnRmZTA2TFZvMQorcXptK3Vub0xNUVlySllNMFQvbk91cnNRdmFRR3pwdG1zQ2t0TXJOcEVFMjM3YkJqaERKdjVVcWgxMzFISmJCCkVnTEVyaklVWkNNdWhURlplQk14ZVVjQ2dZRUFqZkZPc0M5TG9hUDVwVnVKMHdoVzRDdEtabWNJcEJjWk1iWFQKUERRdlpPOG9rbmxPaENheTYwb2hibTNYODZ2aVBqSTVjQWlMOXpjRUVNQWEvS2c1d0VrbGxKdUtMZzFvVTFxSApTcXNnUGlwKzUwM3k4M3M1THkzZlRCTTVTU3NWWnVETmdLUnFSOHRobjh3enNPaU5iSkl1aDFLUDlOTXg0d05hCnVvYURZQUVDZ1lFQW5xNzJJUEU1MlFwekpjSDU5RmRpbS8zOU1KYU1HZlhZZkJBNXJoenZnMmc5TW9URXpWKysKSVZ2SDFTSjdNTTB1SVBCa1FpbC91V083bU9DR2hHVHV3TGt3Uy9JU1FjTmRhSHlTRDNiZzdndzc5aG1UTVhiMgozVFpCTjdtb3FWM0VhRUhWVU1nT1N3dHUySTlQN1RJNGJJV0RQUWxuWE53Q0tCWWNKanRraWNRPQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo= type: Opaque ``` ## Input manifest example(https://docs.claudie.io/latest/input-manifest/providers/on-prem/\#input-manifest-example) ### Private cluster example(https://docs.claudie.io/latest/input-manifest/providers/on-prem/\#private-cluster-example) ``` kubectl create secret generic static-node-key --namespace=mynamespace --from-file=privatekey=private.pem ``` ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: private-cluster-example labels: app.kubernetes.io/part-of: claudie spec: nodePools: static: - name: control nodes: - endpoint: "192.168.10.1" secretRef: name: static-node-key namespace: mynamespace - name: compute nodes: - endpoint: "192.168.10.2" secretRef: name: static-node-key namespace: mynamespace - endpoint: "192.168.10.3" secretRef: name: static-node-key namespace: mynamespace kubernetes: clusters: - name: private-cluster version: 1.27.0 network: 192.168.2.0/24 pools: control: - control compute: - compute ``` ### Hybrid cloud example(https://docs.claudie.io/latest/input-manifest/providers/on-prem/\#hybrid-cloud-example) ### Create secret for private key(https://docs.claudie.io/latest/input-manifest/providers/on-prem/\#create-secret-for-private-key) ``` kubectl create secret generic static-node-key --namespace=mynamespace --from-file=privatekey=private.pem ``` > To see how to configure Hetzner or any other credentials for hybrid cloud, refer to their docs. ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: hybrid-cloud-example labels: app.kubernetes.io/part-of: claudie spec: providers: - name: hetzner-1 providerType: hetzner secretRef: name: hetzner-secret-1 namespace: mynamespace nodePools: dynamic: - name: control-htz providerSpec: name: hetzner-1 region: fsn1 zone: fsn1-dc14 count: 3 serverType: cpx22 image: ubuntu-24.04 static: - name: datacenter-1 nodes: - endpoint: "192.168.10.1" secretRef: name: static-node-key namespace: mynamespace - endpoint: "192.168.10.2" secretRef: name: static-node-key namespace: mynamespace - endpoint: "192.168.10.3" secretRef: name: static-node-key namespace: mynamespace kubernetes: clusters: - name: hybrid-cluster version: 1.27.0 network: 192.168.2.0/24 pools: control: - control-hetzner compute: - datacenter-1 ``` # Example yaml file example.yaml ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: example-manifest labels: app.kubernetes.io/part-of: claudie spec: # Providers field is used for defining the providers. # It is referencing a secret resource in Kubernetes cluster. # Each provider haves its own mandatory fields that are defined in the secret resource. # Every supported provider has an example in this input manifest. # providers: # - name: # providerType: # Type of the provider secret [aws|azure|gcp|oci|hetzner|hetznerdns|cloudflare]. # templates: # external templates used to build the infrastructure by that given provider. If omitted default templates will be used. # repository: # publicly available git repository where the templates can be acquired # tag: # optional tag. If set is used to checkout to a specific hash commit of the git repository. # path: # path where the templates for the specific provider can be found. # secretRef: # Secret reference specification. # name: # Name of the secret resource. # namespace: # Namespace of the secret resource. providers: # Hetzner DNS provider. - name: hetznerdns-1 providerType: hetznerdns templates: repository: "https://github.com/berops/claudie-config" path: "templates/terraformer/hetznerdns" secretRef: name: hetznerdns-secret-1 namespace: example-namespace # Cloudflare DNS provider. - name: cloudflare-1 providerType: cloudflare # templates: ... using default templates secretRef: name: cloudflare-secret-1 namespace: example-namespace # Hetzner Cloud provider. - name: hetzner-1 providerType: hetzner secretRef: name: hetzner-secret-1 namespace: example-namespace # GCP cloud provider. - name: gcp-1 providerType: gcp secretRef: name: gcp-secret-1 namespace: example-namespace # OCI cloud provider. - name: oci-1 providerType: oci secretRef: name: oci-secret-1 namespace: example-namespace # AWS cloud provider. - name: aws-1 providerType: aws secretRef: name: aws-secret-1 namespace: example-namespace # Azure cloud provider. - name: azure-1 providerType: azure secretRef: name: azure-secret-1 namespace: example-namespace # Nodepools field is used for defining the nodepool specification. # You can think of them as a blueprints, not actual nodepools that will be created. nodePools: # Dynamic nodepools are created by Claudie, in one of the cloud providers specified. # Definition specification: # dynamic: # - name: # Name of the nodepool, which is used as a reference to it. Needs to be unique. # providerSpec: # Provider specification for this nodepool. # name: # Name of the provider instance, referencing one of the providers define above. # region: # Region of the nodepool. # zone: # Zone of the nodepool. # count: # Static number of nodes in this nodepool. # serverType: # Machine type of the nodes in this nodepool. # image: # OS image of the nodes in the nodepool. # storageDiskSize: # Disk size of the storage disk for compute nodepool. (optional) # autoscaler: # Autoscaler configuration. Mutually exclusive with Count. # min: # Minimum number of nodes in nodepool. # max: # Maximum number of nodes in nodepool. # labels: # Map of custom user defined labels for this nodepool. This field is optional and is ignored if used in Loadbalancer cluster. (optional) # annotations: # Map of user defined annotations, which will be applied on every node in the node pool. (optional) # taints: # Array of custom user defined taints for this nodepool. This field is optional and is ignored if used in Loadbalancer cluster. (optional) # - key: # The taint key to be applied to a node. # value: # The taint value corresponding to the taint key. # effect: # The effect of the taint on pods that do not tolerate the taint. # # Example definitions for each provider dynamic: - name: control-htz providerSpec: name: hetzner-1 region: hel1 zone: hel1-dc2 count: 3 serverType: cpx22 image: ubuntu-24.04 labels: country: finland city: helsinki annotations: node.longhorn.io/default-node-tags: '["finland"]' taints: - key: country value: finland effect: NoSchedule - name: compute-htz providerSpec: name: hetzner-1 region: hel1 zone: hel1-dc2 count: 2 serverType: cpx22 image: ubuntu-24.04 storageDiskSize: 50 labels: country: finland city: helsinki annotations: node.longhorn.io/default-node-tags: '["finland"]' - name: htz-autoscaled providerSpec: name: hetzner-1 region: hel1 zone: hel1-dc2 serverType: cpx22 image: ubuntu-24.04 storageDiskSize: 50 autoscaler: min: 1 max: 5 labels: country: finland city: helsinki annotations: node.longhorn.io/default-node-tags: '["finland"]' - name: control-gcp providerSpec: name: gcp-1 region: europe-west1 zone: europe-west1-c count: 3 serverType: e2-medium image: ubuntu-minimal-2404-noble-amd64-v20241116 labels: country: germany city: frankfurt annotations: node.longhorn.io/default-node-tags: '["germany"]' - name: compute-gcp providerSpec: name: gcp-1 region: europe-west1 zone: europe-west1-c count: 2 serverType: e2-small image: ubuntu-minimal-2404-noble-amd64-v20241116 storageDiskSize: 50 labels: country: germany city: frankfurt taints: - key: city value: frankfurt effect: NoExecute annotations: node.longhorn.io/default-node-tags: '["germany"]' - name: control-oci providerSpec: name: oci-1 region: eu-milan-1 zone: hsVQ:EU-MILAN-1-AD-1 count: 3 serverType: VM.Standard2.1 image: ocid1.image.oc1.eu-milan-1.aaaaaaaa2ixn6kthb7vn6mom6bv7fts4omou5sowilrqfub2e7ouweiirkbq - name: compute-oci providerSpec: name: oci-1 region: eu-milan-1 zone: hsVQ:EU-MILAN-1-AD-1 count: 2 serverType: VM.Standard2.1 image: ocid1.image.oc1.eu-milan-1.aaaaaaaa2ixn6kthb7vn6mom6bv7fts4omou5sowilrqfub2e7ouweiirkbq storageDiskSize: 50 - name: control-aws providerSpec: name: aws-1 region: eu-central-1 zone: eu-central-1c count: 2 serverType: t3.medium image: ami-07eef52105e8a2059 - name: compute-aws providerSpec: name: aws-1 region: eu-central-1 zone: eu-central-1c count: 2 serverType: t3.medium image: ami-07eef52105e8a2059 storageDiskSize: 50 - name: control-azure providerSpec: name: azure-1 region: North Europe zone: "1" count: 2 serverType: Standard_B2s image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 - name: compute-azure providerSpec: name: azure-1 region: North Europe zone: "1" count: 2 serverType: Standard_B2s image: Canonical:ubuntu-24_04-lts:server:24.04.202502210 storageDiskSize: 50 - name: loadbalancer-1 provider: providerSpec: name: gcp-1 region: europe-west1 zone: europe-west1-c count: 2 serverType: e2-small image: ubuntu-minimal-2404-noble-amd64-v20241116 - name: loadbalancer-2 providerSpec: name: hetzner-1 region: hel1 zone: hel1-dc2 count: 2 serverType: cpx22 image: ubuntu-24.04 # Static nodepools are created by user beforehand. # In case you want to use them in the Kubernetes cluster, make sure they meet the requirements. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#before-you-begin # Definition specification: # static: # - name: # Name of the nodepool, which is used as a reference to it. Needs to be unique. # nodes: # List of nodes which will be access under this nodepool. # - endpoint: # IP under which Claudie will access this node. Can be private as long as Claudie will be able to access it. # username: # Username of a user with root privileges (optional). If not specified user with name "root" will be used # secretRef: # Secret reference specification, holding private key which will be used to SSH into the node (as root or as a user specificed in the username attribute). # name: # Name of the secret resource. # namespace: # Namespace of the secret resource. # labels: # Map of custom user defined labels for this nodepool. This field is optional and is ignored if used in Loadbalancer cluster. (optional) # annotations: # Map of user defined annotations, which will be applied on every node in the node pool. (optional) # taints: # Array of custom user defined taints for this nodepool. This field is optional and is ignored if used in Loadbalancer cluster. (optional) # - key: # The taint key to be applied to a node. # value: # The taint value corresponding to the taint key. # effect: # The effect of the taint on pods that do not tolerate the taint. # # Example definitions static: - name: datacenter-1 nodes: - endpoint: "192.168.10.1" secretRef: name: datacenter-1-key namespace: example-namespace - endpoint: "192.168.10.2" secretRef: name: datacenter-1-key namespace: example-namespace - endpoint: "192.168.10.3" username: admin secretRef: name: datacenter-1-key namespace: example-namespace labels: datacenter: datacenter-1 annotations: node.longhorn.io/default-node-tags: '["datacenter-1"]' taints: - key: datacenter effect: NoExecute # Kubernetes field is used to define the kubernetes clusters. # Definition specification: # # clusters: # - name: # Name of the cluster. The name will be appended to the created node name. # version: # Kubernetes version in semver scheme, must be supported by KubeOne. # network: # Private network IP range. # pools: # Nodepool names which cluster will be composed of. User can reuse same nodepool specification on multiple clusters. # control: # List of nodepool names, which will be used as control nodes. # compute: # List of nodepool names, which will be used as compute nodes. # # Example definitions: kubernetes: clusters: - name: dev-cluster version: 1.27.0 network: 192.168.2.0/24 pools: control: - control-htz - control-gcp compute: - compute-htz - compute-gcp - compute-azure - htz-autoscaled installationProxy: # learn [more](https://docs.claudie.io/latest/http-proxy) mode: "on" # can be on, off or default endpoint: http://proxy.claudie.io:8880 # you can use your own HTTP proxy. If not specified http://proxy.claudie.io:8880 is the default value. - name: prod-cluster version: 1.27.0 network: 192.168.2.0/24 pools: control: - control-htz - control-gcp - control-oci - control-aws - control-azure compute: - compute-htz - compute-gcp - compute-oci - compute-aws - compute-azure installationProxy: # learn [more](https://docs.claudie.io/latest/http-proxy) mode: "off" # can be on, off or default - name: hybrid-cluster version: 1.27.0 network: 192.168.2.0/24 pools: control: - datacenter-1 compute: - compute-htz - compute-gcp - compute-azure installationProxy: # learn [more](https://docs.claudie.io/latest/http-proxy) mode: "on" # can be on, off or default endpoint: http://proxy.claudie.io:8880 # you can use your own HTTP proxy. If not specified http://proxy.claudie.io:8880 is the default value. # Loadbalancers field defines loadbalancers used for the kubernetes clusters and roles for the loadbalancers. # Definition specification for role: # # roles: # - name: # Name of the role, used as a reference later. Must be unique. # protocol: # Protocol, this role will use. # port: # Port, where traffic will be coming. # targetPort: # Port, where loadbalancer will forward traffic to. # targetPools: # Targeted nodes on kubernetes cluster. Specify a nodepool that is used in the targeted K8s cluster. # settings: # Optional settings that further configures the role. # proxyProtocol: # Turns on the proxy protocol, can be true, false. Default is true. # stickySessions: # Turn on sticky sessions that will hash the source ip to always choose the same node to which the traffic will be forwarded to. Can be true, false. Default is false. # # Definition specification for loadbalancer: # # clusters: # - name: # Loadbalancer cluster name # roles: # List of role names this loadbalancer will fulfil. # dns: # DNS specification, where DNS records will be created. # dnsZone: # DNS zone name in your provider. # provider: # Provider name for the DNS. # hostname: # Hostname for the DNS record. Keep in mind the zone will be included automatically. If left empty the Claudie will create random hash as a hostname. # alternativeNames: # Alternative hostnames for which A records will be created in addition to the specified hostname. # - other # # targetedK8s: # Name of the targeted kubernetes cluster # pools: # List of nodepool names used for loadbalancer # # Example definitions: loadBalancers: roles: - name: apiserver protocol: tcp port: 6443 targetPort: 6443 targetPools: - control-htz # make sure that this nodepools is acutally used by the targeted `dev-cluster` cluster. - name: https protocol: tcp port: 443 targetPort: 30143 # make sure there is a NodePort service. targetPools: - compute-htz # make sure that this nodepools is acutally used by the targeted `dev-cluster` cluster. settings: proxyProtocol: true clusters: - name: apiserver-lb-dev roles: - apiserver - https dns: dnsZone: dns-zone provider: hetznerdns-1 targetedK8s: dev-cluster pools: - loadbalancer-1 - name: apiserver-lb-prod roles: - apiserver dns: dnsZone: dns-zone provider: cloudflare-1 hostname: my.fancy.url alternativeNames: - app1 - app2 targetedK8s: prod-cluster pools: - loadbalancer-2 ``` # GPUs example We will follow the guide from [Nvidia](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#operator-install-guide) to deploy the `gpu-operator` into a Claudie-built Kubernetes cluster. Make sure you fulfill the necessary listed requirements in prerequisites before continuing, if you decide to use a different cloud provider. In this example we will be using [AWS](https://docs.claudie.io/latest/input-manifest/providers/aws/) as our provider, with the following config: ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: aws-gpu-example labels: app.kubernetes.io/part-of: claudie spec: providers: - name: aws-1 providerType: aws secretRef: name: aws-secret namespace: secrets nodePools: dynamic: - name: control-aws providerSpec: name: aws-1 region: eu-central-1 zone: eu-central-1a count: 1 serverType: t3.medium # AMI ID of the image Ubuntu 24.04. # Make sure to update it according to the region. image: ami-07eef52105e8a2059 - name: gpu-aws providerSpec: name: aws-1 region: eu-central-1 zone: eu-central-1a count: 2 serverType: g4dn.xlarge # AMI ID of the image Ubuntu 24.04. # Make sure to update it according to the region. image: ami-07eef52105e8a2059 storageDiskSize: 50 kubernetes: clusters: - name: gpu-example version: v1.31.0 network: 172.16.2.0/24 pools: control: - control-aws compute: - gpu-aws ``` After the `InputManifest` was successfully build by claudie, we deploy the `gpu-operator` to the `gpu-examepl` kubernetes cluster. 1. Create a namespace for the gpu-operator. ``` kubectl create ns gpu-operator ``` ``` kubectl label --overwrite ns gpu-operator pod-security.kubernetes.io/enforce=privileged ``` 1. Add Nvidia Helm repository. ``` helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update ``` 1. Install the operator. ``` helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator ``` 1. Wait for the pods in the `gpu-operator` namespace to be ready. ``` NAME READY STATUS RESTARTS AGE gpu-feature-discovery-4lrbz 1/1 Running 0 10m gpu-feature-discovery-5x88d 1/1 Running 0 10m gpu-operator-1708080094-node-feature-discovery-gc-84ff8f47tn7cd 1/1 Running 0 10m gpu-operator-1708080094-node-feature-discovery-master-757c27tm6 1/1 Running 0 10m gpu-operator-1708080094-node-feature-discovery-worker-495z2 1/1 Running 0 10m gpu-operator-1708080094-node-feature-discovery-worker-n8fl6 1/1 Running 0 10m gpu-operator-1708080094-node-feature-discovery-worker-znsk4 1/1 Running 0 10m gpu-operator-6dfb9bd487-2gxzr 1/1 Running 0 10m nvidia-container-toolkit-daemonset-jnqwn 1/1 Running 0 10m nvidia-container-toolkit-daemonset-x9t56 1/1 Running 0 10m nvidia-cuda-validator-l4w85 0/1 Completed 0 10m nvidia-cuda-validator-lqxhq 0/1 Completed 0 10m nvidia-dcgm-exporter-l9nzt 1/1 Running 0 10m nvidia-dcgm-exporter-q7c2x 1/1 Running 0 10m nvidia-device-plugin-daemonset-dbjjl 1/1 Running 0 10m nvidia-device-plugin-daemonset-x5kfs 1/1 Running 0 10m nvidia-driver-daemonset-dcq4g 1/1 Running 0 10m nvidia-driver-daemonset-sjjlb 1/1 Running 0 10m nvidia-operator-validator-jbc7r 1/1 Running 0 10m nvidia-operator-validator-q59mc 1/1 Running 0 10m ``` When all pods are ready you should be able to verify if the GPUs can be used ``` kubectl get nodes -o json | jq -r '.items[] | {name:.metadata.name, gpus:.status.capacity."nvidia.com/gpu"}' ``` 1. Deploy an example manifest that uses one of the available GPUs from the worker nodes. ``` apiVersion: v1 kind: Pod metadata: name: cuda-vectoradd spec: restartPolicy: OnFailure containers: - name: cuda-vectoradd image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04" resources: limits: nvidia.com/gpu: 1 ``` From the logs of the pods you should be able to see ``` kubectl logs cuda-vectoradd [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done ``` [Skip to content](https://docs.claudie.io/latest/input-manifest/external-templates/#rolling-update) # External Templates Claudie allows to plug in your own templates for spawning the infrastructure. Specifying which templates are to be used is done at the provider level in the Input Manifest, for example: ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: hetzner-example labels: app.kubernetes.io/part-of: claudie spec: providers: - name: hetzner-1 providerType: hetzner templates: repository: "https://github.com/berops/claudie-config" tag: "v0.9.8" # optional path: "templates/terraformer/hetzner" secretRef: name: hetzner-secret namespace: secrets ... ``` - if no templates are specified it will always default to the latest commit on the Master/Main branch of the respective cloudprovider on the berops repository (i.e. `https://github.com/berops/claudie-config`). - if templates are specified, but no tag is present it will default to the latest commit of the Master/Main branch of the respective repository. The template **repository** need to follow a certain convention to work properly. For example: If we consider an external template repository accessible via a public git repository at: ``` https://github.com/berops/claudie-config ``` The repository can either contain only the necessary template files, or they can be stored in a subtree. To handle this, you need to pass a **path** within the public git repository, such as ``` templates/terraformer/gcp ``` This denotes that the necessary templates for Google Cloud Platform can be found in the subtree at: ``` claudie-config/templates/terraformer/gcp ``` To only deal with the necessary template files a sparse-checkout is used when downloading the external repository to have a local mirror present which will then be used to generate the terraform files. When using the template files for generation the subtree present at the above given example `claudie-config/templates/terraformer/gcp` the directory is traversed and the following rules apply: - if a subdirectory with name "provider" is present, all files within this directory will be considered as related to Providers for interacting with the API of respective Cloud Providers, SaaS providers etc. When using the templates for generation, the struct [templates.Provider](https://github.com/berops/claudie/blob/5dc0e7c8f5503a6f2c202a982f5c4aa11bed0346/services/terraformer/server/domain/utils/templates/structures.go#L54) will be passed for each file individually. - if a subdirectory with name "networking" is present all files within this directory will be considered as related spawning a common networking infrastructure for all nodepools from a single provider. The files in this subdirectory will use the providers generated in the previous step. When using the templates the struct [templates.Networking](https://github.com/berops/claudie/blob/5dc0e7c8f5503a6f2c202a982f5c4aa11bed0346/services/terraformer/server/domain/utils/templates/structures.go#L92) will be passed for each file individually. - if a subdirectory with name "nodepool" is present all files within this directory will be considered as related to spawning the VM instances along with attached disk and related resources for a single node coming from a specific nodepool. When using the templates the struct [templates.Nodepools](https://github.com/berops/claudie/blob/5dc0e7c8f5503a6f2c202a982f5c4aa11bed0346/services/terraformer/server/domain/utils/templates/structures.go#L138) will be passed for each file individually. - if a subdirectory with name "dns" is present, all files within this directory will be considered as related to DNS. Thus, the [templates.DNS](https://github.com/berops/claudie/blob/5dc0e7c8f5503a6f2c202a982f5c4aa11bed0346/services/terraformer/server/domain/utils/templates/structures.go#L151) struct will be passed for each file when generating the templates. Note: This subdirectory should contain its own file that will generate the Provider needed for interacting with the necessary API of the respective cloud providers (the ones that will be generated from the "provider" subdirectory will not be used in this case). The complete structure of a subtree for a single provider for external templates located at claudie-config/templates/terraformer/gcp can look as follows: ``` └── terraformer |── gcp │ ├── dns │ └── dns.tpl │ ├── networking │ └── networking.tpl │ ├── nodepool │ ├── node.tpl │ └── node_networking.tpl │ └── provider │ └── provider.tpl ... ``` Examples of external templates can be found on: https://github.com/berops/claudie-config ## Rolling update(https://docs.claudie.io/latest/input-manifest/external-templates/\#rolling-update) To handle more specific scenarios where the default templates provided by claudie do not fit the use case, we allow these external templates to be changed/adapted by the user. By providing this ability to specify the templates to be used when building the InputManifest infrastructure, there is one common scenario that should be handled by claudie, which is rolling updates. Rolling updates of nodepools are performed when a change to a provider's external templates is registered. Claudie checks that the external repository of the new templates exists and uses them to perform a rolling update of the infrastructure already built. In the below example, when the templates of provider Hetzner-1 are changed the rolling update of all the nodepools which reference that provider will start by doing an update on a single nodepool at a time. ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: hetzner-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: hetzner-1 providerType: hetzner templates: - repository: "https://github.com/berops/claudie-config" - path: "templates/terraformer/hetzner" + repository: "https://github.com/YouRepository/claudie-config" + path: "templates/terraformer/hetzner" secretRef: name: hetzner-secret-1 namespace: mynamespace nodePools: dynamic: - name: control-htz providerSpec: # Name of the provider instance. name: hetzner-1 # Region of the nodepool. region: hel1 # Datacenter of the nodepool. zone: hel1-dc2 count: 1 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-22.04 - name: compute-1-htz providerSpec: # Name of the provider instance. name: hetzner-1 # Region of the nodepool. region: fsn1 # Datacenter of the nodepool. zone: fsn1-dc14 count: 2 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-22.04 storageDiskSize: 50 - name: compute-2-htz providerSpec: # Name of the provider instance. name: hetzner-1 # Region of the nodepool. region: nbg1 # Datacenter of the nodepool. zone: nbg1-dc3 count: 2 # Machine type name. serverType: cpx22 # OS image name. image: ubuntu-22.04 storageDiskSize: 50 kubernetes: clusters: - name: hetzner-cluster version: v1.31.0 network: 192.168.2.0/24 pools: control: - control-htz compute: - compute-1-htz - compute-2-htz ``` The rolling update is also triggered if only the tag of the template is changed. ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: hetzner-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: hetzner-1 providerType: hetzner templates: - repository: "https://github.com/berops/claudie-config" - path: "templates/terraformer/hetzner" + repository: "https://github.com/berops/claudie-config" + tag: v0.9.8 + path: "templates/terraformer/hetzner" secretRef: name: hetzner-secret-1 namespace: mynamespace ``` [Skip to content](https://docs.claudie.io/latest/input-manifest/api-reference/#inputmanifest-api-reference) # InputManifest API reference(https://docs.claudie.io/latest/input-manifest/api-reference/\#inputmanifest-api-reference) InputManifest is a definition of the user's infrastructure. It contains cloud provider specification, nodepool specification, Kubernetes and loadbalancer clusters. ## Status(https://docs.claudie.io/latest/input-manifest/api-reference/\#status) Most recently observed status of the InputManifest ## Spec(https://docs.claudie.io/latest/input-manifest/api-reference/\#spec) Specification of the desired behavior of the InputManifest - `providers` [Providers](https://docs.claudie.io/latest/input-manifest/api-reference/#providers) Providers is a list of defined cloud provider configuration that will be used in infrastructure provisioning. - `nodepools` [Nodepools](https://docs.claudie.io/latest/input-manifest/api-reference/#nodepools) Describes nodepools used for either kubernetes clusters or loadbalancer cluster defined in this manifest. - `kubernetes` [Kubernetes](https://docs.claudie.io/latest/input-manifest/api-reference/#kubernetes) List of Kubernetes cluster this manifest will manage. - `loadBalancers` [Loadbalancer](https://docs.claudie.io/latest/input-manifest/api-reference/#loadbalancer) List of loadbalancer clusters the Kubernetes clusters may use. ## Providers(https://docs.claudie.io/latest/input-manifest/api-reference/\#providers) Contains configurations for supported cloud providers. At least one provider needs to be defined. - `name` The name of the provider specification. The name is limited to 15 characters. It has to be unique across all providers. - `providerType` Type of a provider. The providerType defines mandatory fields that has to be included for a specific provider. A list of available providers can be found at [providers section](https://docs.claudie.io/latest/input-manifest/api-reference/providers). Allowed values are: | Value | Description | | --- | --- | | `aws` | [AWS](https://docs.claudie.io/latest/input-manifest/api-reference/#aws) provider type | | `azure` | [Azure](https://docs.claudie.io/latest/input-manifest/api-reference/#azure) provider type | | `cloudflare` | [Cloudflare](https://docs.claudie.io/latest/input-manifest/api-reference/#cloudflare) provider type | | `gcp` | [GCP](https://docs.claudie.io/latest/input-manifest/api-reference/#gcp) provider type | | `hetzner` | [Hetzner](https://docs.claudie.io/latest/input-manifest/api-reference/#hetzner) provider type | | `hetznerdns` | [Hetzner](https://docs.claudie.io/latest/input-manifest/api-reference/#hetznerdns) DNS provider type | | `oci` | [OCI](https://docs.claudie.io/latest/input-manifest/api-reference/#oci) provider type | - `secretRef` [SecretRef](https://docs.claudie.io/latest/input-manifest/api-reference/#secretref) Represents a Secret Reference. It has enough information to retrieve secret in any namespace. Support for more cloud providers is in the [roadmap](https://github.com/berops/claudie/blob/master/docs/roadmap/roadmap.md). For static nodepools a provider is not needed, refer to the [static section](https://docs.claudie.io/latest/input-manifest/api-reference/#static) for more detailed information. ## SecretRef(https://docs.claudie.io/latest/input-manifest/api-reference/\#secretref) SecretReference represents a Kubernetes Secret Reference. It has enough information to retrieve secret in any namespace. - `name` Name of the secret, which holds data for the particular cloud provider instance. - `namespace` Namespace of the secret which holds data for the particular cloud provider instance. ### Cloudflare(https://docs.claudie.io/latest/input-manifest/api-reference/\#cloudflare) The fields that need to be included in a Kubernetes Secret resource to utilize the Cloudflare provider. To find out how to configure Cloudflare follow the instructions [here](https://docs.claudie.io/latest/input-manifest/providers/cloudflare/) - `apitoken` Credentials for the provider (API token). - `templates` - `repository`: specifies the location from where the external template are to be acquired. Must be a publicly available git repository. - `tag`: Optional. If set when the git repository is downloaded, the commit hash from the tag version is used. - `path`: specifies the path for a specific provider within the `repository` where the source template files are located. ### HetznerDNS(https://docs.claudie.io/latest/input-manifest/api-reference/\#hetznerdns) The fields that need to be included in a Kubernetes Secret resource to utilize the HetznerDNS provider. To find out how to configure HetznerDNS follow the instructions [here](https://docs.claudie.io/latest/input-manifest/providers/hetzner/) - `apitoken` Credentials for the provider (API token). - `templates` - `repository`: specifies the location from where the external template are to be acquired. Must be a publicly available git repository. - `tag`: Optional. If set when the git repository is downloaded, the commit hash from the tag version is used. - `path`: specifies the path for a specific provider within the `repository` where the source template files are located. ### GCP(https://docs.claudie.io/latest/input-manifest/api-reference/\#gcp) The fields that need to be included in a Kubernetes Secret resource to utilize the GCP provider. To find out how to configure GCP provider and service account, follow the instructions [here](https://docs.claudie.io/latest/input-manifest/providers/gcp/). - `credentials` Credentials for the provider. Stringified JSON service account key. - `gcpproject` Project id of an already existing GCP project where the infrastructure is to be created. - `templates` - `repository`: specifies the location from where the external template are to be acquired. Must be a publicly available git repository. - `tag`: Optional. If set when the git repository is downloaded, the commit hash from the tag version is used. - `path`: specifies the path for a specific provider within the `repository` where the source template files are located. ### Hetzner(https://docs.claudie.io/latest/input-manifest/api-reference/\#hetzner) The fields that need to be included in a Kubernetes Secret resource to utilize the Hetzner provider. To find out how to configure Hetzner provider and service account, follow the instructions [here](https://docs.claudie.io/latest/input-manifest/providers/hetzner/). - `credentials` Credentials for the provider (API token). - `templates` - `repository`: specifies the location from where the external template are to be acquired. Must be a publicly available git repository. - `tag`: Optional. If set when the git repository is downloaded, the commit hash from the tag version is used. - `path`: specifies the path for a specific provider within the `repository` where the source template files are located. ### OCI(https://docs.claudie.io/latest/input-manifest/api-reference/\#oci) The fields that need to be included in a Kubernetes Secret resource to utilize the OCI provider. To find out how to configure OCI provider and service account, follow the instructions [here](https://docs.claudie.io/latest/input-manifest/providers/oci/). - `privatekey` [Private key](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm#two) used to authenticate to the OCI. - `keyfingerprint` Fingerprint of the user-supplied private key. - `tenancyocid` OCID of the tenancy where `privateKey` is added as an API key - `userocid` OCID of the user in the supplied tenancy - `compartmentocid` OCID of the [compartment](https://docs.oracle.com/en/cloud/paas/integration-cloud/oracle-integration-oci/creating-oci-compartment.html) where VMs/VCNs/... will be created - `templates` - `repository`: specifies the location from where the external template are to be acquired. Must be a publicly available git repository. - `tag`: Optional. If set when the git repository is downloaded, the commit hash from the tag version is used. - `path`: specifies the path for a specific provider within the `repository` where the source template files are located. ### AWS(https://docs.claudie.io/latest/input-manifest/api-reference/\#aws) The fields that need to be included in a Kubernetes Secret resource to utilize the AWS provider. To find out how to configure AWS provider and service account, follow the instructions [here](https://docs.claudie.io/latest/input-manifest/providers/aws/). - `accesskey` Access key ID for your AWS account. - `secretkey` Secret key for the Access key specified above. - `templates` - `repository`: specifies the location from where the external template are to be acquired. Must be a publicly available git repository. - `tag`: Optional. If set when the git repository is downloaded, the commit hash from the tag version is used. - `path`: specifies the path for a specific provider within the `repository` where the source template files are located. ### Azure(https://docs.claudie.io/latest/input-manifest/api-reference/\#azure) The fields that need to be included in a Kubernetes Secret resource to utilize the Azure provider. To find out how to configure Azure provider and service account, follow the instructions [here](https://docs.claudie.io/latest/input-manifest/providers/azure/). - `subscriptionid` Subscription ID of your subscription in Azure. - `tenantid` Tenant ID of your tenancy in Azure. - `clientid` Client ID of your client. The Claudie is design to use a service principal with appropriate permissions. - `clientsecret` Client secret generated for your client. - `templates` - `repository`: specifies the location from where the external template are to be acquired. Must be a publicly available git repository. - `tag`: Optional. If set when the git repository is downloaded, the commit hash from the tag version is used. - `path`: specifies the path for a specific provider within the `repository` where the source template files are located. ## Nodepools(https://docs.claudie.io/latest/input-manifest/api-reference/\#nodepools) Collection of static and dynamic nodepool specification, to be referenced in the `kubernetes` or `loadBalancer` clusters. - `dynamic` [Dynamic](https://docs.claudie.io/latest/input-manifest/api-reference/#dynamic) List of dynamically to-be-created nodepools of not yet existing machines, used for Kubernetes or loadbalancer clusters. These are only blueprints, and will only be created per reference in `kubernetes` or `loadBalancer` clusters. E.g. if the nodepool isn't used, it won't even be created. Or if the same nodepool is used in two different clusters, it will be created twice. In OOP analogy, a dynamic nodepool would be a class that would get instantiated `N >= 0` times depending on which clusters reference it. - `static` [Static](https://docs.claudie.io/latest/input-manifest/api-reference/#static) List of static nodepools of already existing machines, not provisioned by Claudie, used for Kubernetes (see [requirements](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#before-you-begin)) or loadbalancer clusters. These can be baremetal servers or VMs with IPs assigned. Claudie is able to join them into existing clusters, or provision clusters solely on the static nodepools. Typically we'll find these being used in on-premises scenarios, or hybrid-cloud clusters. ## Dynamic(https://docs.claudie.io/latest/input-manifest/api-reference/\#dynamic) Dynamic nodepools are defined for cloud provider machines that Claudie is expected to provision. - `name` Name of the nodepool. The name is limited by 14 characters. Each nodepool will have a random hash appended to the name, so the whole name will be of format `-`. - `provideSpec` [Provider spec](https://docs.claudie.io/latest/input-manifest/api-reference/#provider-spec) Collection of provider data to be used while creating the nodepool. - `count` Number of the nodes in the nodepool. Maximum value of 255. Mutually exclusive with `autoscaler`. - `serverType` Type of the machines in the nodepool. Currently, only AMD64 machines are supported. - `machineSpec` Further describes the selected server type, if available by the cloud provider. - `cpuCount`: specifies the number of cpu to be used by the `serverType` - `memory`: specifies the memory in GB to be used by the `serverType` - `image` OS image of the machine. Currently, only Ubuntu 22.04 AMD64 images are supported. - `storageDiskSize` The size of the storage disk on the nodes in the node pool is specified in `GB`. The OS disk is created automatically with a predefined size of `100GB` for Kubernetes nodes and `50GB` for LoadBalancer nodes. This field is optional; however, if a compute node pool does not define it, the default value will be used for the creation of the storage disk. Control node pools and LoadBalancer node pools ignore this field. The default value for this field is `50`, with a minimum value also set to `50`. This value is only applicable to compute nodes. If the disk size is set to `0`, no storage disk will be created for any nodes in the particular node pool. - `autoscaler` [Autoscaler Configuration](https://docs.claudie.io/latest/input-manifest/api-reference/#autoscaler-configuration) Autoscaler configuration for this nodepool. Mutually exclusive with `count`. - `labels` Map of user defined labels, which will be applied on every node in the node pool. This field is optional. To see the default labels Claudie applies on each node, refer to [this section](https://docs.claudie.io/latest/input-manifest/api-reference/#default-labels). - `annotations` Map of user defined annotations, which will be applied on every node in the node pool. This field is optional. You can use Kubernetes annotations to attach arbitrary non-identifying metadata. Clients such as tools and libraries can retrieve this metadata. - `taints` [v1.Taint](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#taint-v1-core) Array of user defined taints, which will be applied on every node in the node pool. This field is optional. To see the default taints Claudie applies on each node, refer to [this section](https://docs.claudie.io/latest/input-manifest/api-reference/#default-taints). ## Provider Spec(https://docs.claudie.io/latest/input-manifest/api-reference/\#provider-spec) Provider spec is an additional specification built on top of the data from any of the provider instance. Here are provider configuration examples for each individual provider: [aws](https://docs.claudie.io/latest/input-manifest/providers/aws/), [azure](https://docs.claudie.io/latest/input-manifest/providers/azure/), [gcp](https://docs.claudie.io/latest/input-manifest/providers/gcp/), [cloudflare](https://docs.claudie.io/latest/input-manifest/providers/cloudflare/), [hetzner](https://docs.claudie.io/latest/input-manifest/providers/hetzner/) and [oci](https://docs.claudie.io/latest/input-manifest/providers/oci/). - `name` Name of the provider instance specified in [providers](https://docs.claudie.io/latest/input-manifest/api-reference/#providers) - `region` Region of the nodepool. - `zone` Zone of the nodepool. ## Autoscaler Configuration(https://docs.claudie.io/latest/input-manifest/api-reference/\#autoscaler-configuration) Autoscaler configuration on per nodepool basis. Defines the number of nodes, autoscaler will scale up or down specific nodepool. - `min` Minimum number of nodes in nodepool. - `max` Maximum number of nodes in nodepool. ## Static(https://docs.claudie.io/latest/input-manifest/api-reference/\#static) Static nodepools are defined for static machines which Claudie will not manage. Used for on premise nodes. In case you want to use your static nodes in the Kubernetes cluster, make sure they meet the [requirements](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#before-you-begin). - `name` Name of the static nodepool. The name is limited by 14 characters. - `nodes` [Static Node](https://docs.claudie.io/latest/input-manifest/api-reference/#static-node) List of static nodes for a particular static nodepool. - `labels` Map of user defined labels, which will be applied on every node in the node pool. This field is optional. To see the default labels Claudie applies on each node, refer to [this section](https://docs.claudie.io/latest/input-manifest/api-reference/#default-labels). - `annotations` Map of user defined annotations, which will be applied on every node in the node pool. This field is optional. You can use Kubernetes annotations to attach arbitrary non-identifying metadata. Clients such as tools and libraries can retrieve this metadata. - `taints` [v1.Taint](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#taint-v1-core) Array of user defined taints, which will be applied on every node in the node pool. This field is optional. To see the default taints Claudie applies on each node, refer to [this section](https://docs.claudie.io/latest/input-manifest/api-reference/#default-taints). ## Static node(https://docs.claudie.io/latest/input-manifest/api-reference/\#static-node) Static node defines single static node from a static nodepool. - `endpoint` Endpoint under which Claudie will access this node. - `username` Name of a user with root privileges, will be used to SSH into this node and install dependencies. This attribute is optional. In case it isn't specified a `root` username is used. - `secretRef` [SecretRef](https://docs.claudie.io/latest/input-manifest/api-reference/#secretref) Secret from which private key will be taken used to SSH into the machine (as root or as a user specificed in the username attribute). The field in the secret must be `privatekey`, i.e. ``` apiVersion: v1 type: Opaque kind: Secret name: private-key-node-1 namespace: claudie-secrets data: privatekey: ``` ## Kubernetes(https://docs.claudie.io/latest/input-manifest/api-reference/\#kubernetes) Defines Kubernetes clusters. - `clusters` [Cluster-k8s](https://docs.claudie.io/latest/input-manifest/api-reference/#cluster-k8s) List of Kubernetes clusters Claudie will create. ## Cluster-k8s(https://docs.claudie.io/latest/input-manifest/api-reference/\#cluster-k8s) Collection of data used to define a Kubernetes cluster. - `name` Name of the Kubernetes cluster. The name is limited by 28 characters. Each cluster will have a random hash appended to the name, so the whole name will be of format `-`. - `version` Kubernetes version of the cluster. Version should be defined in format `vX.Y`. In terms of supported versions of Kubernetes, Claudie follows `kubeone` releases and their supported versions. The current `kubeone` version used in Claudie is `1.12.0`. To see the list of supported versions, please refer to `kubeone` [documentation](https://docs.kubermatic.com/kubeone/v1.12/architecture/compatibility/supported-versions/). - `network` Network range for the VPN of the cluster. The value should be defined in format `A.B.C.D/mask`. - `pools` List of nodepool names this cluster will use. Remember that nodepools defined in [nodepools](https://docs.claudie.io/latest/input-manifest/api-reference/#nodepools) are only "blueprints". The actual nodepool will be created once referenced here. - `installationProxy` Installation proxy settings used by this cluster. You can learn more about the setting [here](https://docs.claudie.io/latest/http-proxy). ## LoadBalancer(https://docs.claudie.io/latest/input-manifest/api-reference/\#loadbalancer) Defines loadbalancer clusters. - `roles` [Role](https://docs.claudie.io/latest/input-manifest/api-reference/#role) List of roles loadbalancers use to forward the traffic. Single role can be used in multiple loadbalancer clusters. - `clusters` [Cluster-lb](https://docs.claudie.io/latest/input-manifest/api-reference/#cluster-lb) List of loadbalancer clusters used in the Kubernetes clusters defined under [clusters](https://docs.claudie.io/latest/input-manifest/api-reference/#cluster-k8s). ## Role(https://docs.claudie.io/latest/input-manifest/api-reference/\#role) Role defines a concrete loadbalancer configuration. Single loadbalancer can have multiple roles. - `name` Name of the role. Used as a reference in [clusters](https://docs.claudie.io/latest/input-manifest/api-reference/#cluster-lb). - `protocol` Protocol of the rule. Allowed values are: | Value | Description | | --- | --- | | `tcp` | Role will use TCP protocol | | `udp` | Role will use UDP protocol | - `port` Port of the incoming traffic on the loadbalancer. - `targetPort` Port where loadbalancer forwards the traffic. - `targetPools` Defines from which nodepools, nodes will be targeted by the Load Balancer - `settings` Optional settings that can be configured for a role - `proxyProtocol`: Default value: `true` Specifies whether to enable the proxy protocol. The Proxy protocol forwards connection information from the client, such as the IP address, to the target pools. The application to which the traffic is forwarded must support the proxy protocol. - `stickySessions`: Default value: `false` Specifies whether incoming traffic should be sent to the same node each time, rather than load balancing between available nodes. A hash of the IP is used to determine which node the traffic is routed to. ## Cluster-lb(https://docs.claudie.io/latest/input-manifest/api-reference/\#cluster-lb) Collection of data used to define a loadbalancer cluster. - `name` Name of the loadbalancer. The name is limited by 28 characters. - `roles` List of roles the loadbalancer uses. - `dns` [DNS](https://docs.claudie.io/latest/input-manifest/api-reference/#dns) Specification of the loadbalancer's DNS record. - `targetedK8s` Name of the Kubernetes cluster targetted by this loadbalancer. - `pools` List of nodepool names this loadbalancer will use. Remember, that nodepools defined in [nodepools](https://docs.claudie.io/latest/input-manifest/api-reference/#nodepools) are only "blueprints". The actual nodepool will be created once referenced here. ## DNS(https://docs.claudie.io/latest/input-manifest/api-reference/\#dns) Collection of data Claudie uses to create a DNS record for the loadbalancer. - `dnsZone` DNS zone inside which the records will be created. GCP/AWS/OCI/Azure/Cloudflare/Hetzner DNS zone is accepted. The record created in this zone must be accessible to the public. Therefore, a public DNS zone is required. - `provider` Name of [provider](https://docs.claudie.io/latest/input-manifest/api-reference/#providers) to be used for creating an A record entry in defined DNS zone. - `hostname` Custom hostname for your A record. If left empty, the hostname will be a random hash. - `alternativeNames` Additional hostnames for which A records will be created ### Default labels(https://docs.claudie.io/latest/input-manifest/api-reference/\#default-labels) By default, Claudie applies following labels on every node in the cluster, together with those defined by the user. | Key | Value | | --- | --- | | `claudie.io/nodepool` | Name of the node pool. | | `claudie.io/provider` | Cloud provider name. | | `claudie.io/provider-instance` | User defined provider name. | | `claudie.io/node-type` | Type of the node. Either `control` or `compute`. | | `topology.kubernetes.io/region` | Region where the node resides. | | `topology.kubernetes.io/zone` | Zone of the region where node resides. | | `kubernetes.io/os` | Os family of the node. | | `kubernetes.io/arch` | Architecture type of the CPU. | | `v1.kubeone.io/operating-system` | Os type of the node. | ### Default taints(https://docs.claudie.io/latest/input-manifest/api-reference/\#default-taints) By default, Claudie applies only `node-role.kubernetes.io/control-plane` taint for control plane nodes, with effect `NoSchedule`, together with those defined by the user. [Skip to content](https://docs.claudie.io/latest/input-manifest/claudie-custom-ns/#deploying-claudie-in-a-custom-namespace) # Deploying Claudie in a custom namespace(https://docs.claudie.io/latest/input-manifest/claudie-custom-ns/\#deploying-claudie-in-a-custom-namespace) By default, when following the [Getting Started](https://docs.claudie.io/latest/getting-started/get-started-using-claudie/#install-claudie) guide, Claudie is deployed in the `claudie` namespace. However, you may want to deploy it into a custom namespace for reasons such as organizational structure, environment isolation or others. ## Modifiyng claudie.yaml bundle(https://docs.claudie.io/latest/input-manifest/claudie-custom-ns/\#modifiyng-claudieyaml-bundle) 1. Download the latest claudie.yaml ``` wget https://github.com/berops/claudie/releases/latest/download/claudie.yaml ``` 2. Before applying the manifest, make the following changes: 2.1. Replace every occurrence of `namespace: claudie` with your desired namespace (e.g., new-namespace). Using linux terminal you can use sed utility: ``` sed -i 's/namespace: claudie/namespace: new-namespace/' claudie.yaml ``` 2.2. For DNS Names within Certificate resource, `kind: Certificate`, ensure the dnsNames reflect the new namespace: ``` spec: dnsNames: - claudie-operator.new-namespace - claudie-operator.new-namespace.svc - claudie-operator.new-namespace.svc.cluster - claudie-operator.new-namespace.svc.cluster.local ``` Using linux terminal you can use sed utility: ``` sed -i 's/\(claudie-operator\)\.claudie/\1.new-namespace/g' claudie.yaml ``` 2.3. Replace annotations `cert-manager.io/inject-ca-from: claudie/claudie-webhook-certificate` and name `name: claudie-webhook` in ValidatingWebhookConfiguration resource, `kind: ValidatingWebhookConfiguration`, so that is contains name of your new namespace ``` annotations: cert-manager.io/inject-ca-from: new-namespace/claudie-webhook-certificate ... name: claudie-webhook-new-namespace ``` Using linux terminal you can use sed utility: ``` sed -i 's/cert-manager\.io\/inject-ca-from: claudie\//cert-manager.io\/inject-ca-from: new-namespace\//g' claudie.yaml sed -i 's/claudie-webhook$/claudie-webhook-new-namespace/g' claudie.yaml ``` 2.4. To restrict the namespaces monitored by the Claudie operator (as defined in `claudie.yaml`), add the `CLAUDIE_NAMESPACES` environment variable to the claudie-operator deployment. ``` env: - name: CLAUDIE_NAMESPACES value: "new-namespace" ``` Updating CLAUDIE\_NAMESPACES variable If there already exists a Claudie cluster, make sure to also update the deployment of the existing Claudie operator to reflect the correct namespace. If the `CLAUDIE_NAMESPACES` environment variable is not set in the operator, multiple Claudie instances may pick up the same InputManifests, which can lead to the cluster being unintentionally rebuilt. This can result in unexpected behavior and potentially break your Kubernetes cluster. 2.5. To ensure the `ClusterRoleBinding` is correctly applied to the specified `ServiceAccount`, make sure the `ClusterRoleBinding` has a unique name. Modify the name of the `ClusterRoleBinding` resource in the `claudie.yaml`. Using linux terminal you can use sed utility: ``` sed -i 's/claudie-operator-role-binding/claudie-operator-role-binding-new-namespace/g' claudie.yaml ``` 2.6. Once you’ve updated claudie.yaml, create your custom namespace and apply the manifest. Make sure Cert Manager is already deployed in your cluster ``` kubectl create namespace new-namespace kubectl apply -f claudie.yaml ``` [Skip to content](https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/#claudie) # Claudie(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#claudie) ## A single platform for multiple clouds(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#a-single-platform-for-multiple-clouds) [![claudie schema](https://docs.claudie.io/latest/claudie-workflow/claudie-diagram.png)](https://docs.claudie.io/latest/claudie-workflow/claudie-diagram.png) ### Microservices(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#microservices) - [Manager](https://github.com/berops/claudie/tree/master/services/manager) - [Builder](https://github.com/berops/claudie/tree/master/services/builder) - [Terraformer](https://github.com/berops/claudie/tree/master/services/terraformer) - [Ansibler](https://github.com/berops/claudie/tree/master/services/ansibler) - [Kube-eleven](https://github.com/berops/claudie/tree/master/services/kube-eleven) - [Kuber](https://github.com/berops/claudie/tree/master/services/kuber) - [Claudie-operator](https://github.com/berops/claudie/tree/master/services/claudie-operator) ### Data stores(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#data-stores) - [MongoDB](https://github.com/berops/claudie/tree/master/manifests/claudie/mongo) - [Minio](https://github.com/berops/claudie/tree/master/manifests/claudie/minio) ### Tools used(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#tools-used) - [Terraform](https://github.com/hashicorp/terraform) - [Ansible](https://github.com/ansible/ansible) - [KubeOne](https://github.com/kubermatic/kubeone) - [Longhorn](https://github.com/longhorn/longhorn) - [Nginx](https://www.nginx.com/) - [Calico](https://github.com/projectcalico/calico) - [gRPC](https://grpc.io/) ## Manager(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#manager) Manger is the brain and main entry point for claudie. To build clusters users/services submit their configs to the manager service. The manager creates the desired state and schedules a number of jobs to be executed in order to achieve the desired state based on the current state. The jobs are then picked up by the builder service. For the API see the [GRPC definitions](https://github.com/berops/claudie/blob/master/proto/manager.proto). ### Flow(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#flow) Each newly created manifest starts in the Pending state. Pending manifests are periodically checked and based on the specification provided in the applied configs, the desired state for each cluster, along with the tasks to be performed to achieve the desired state are created, after which the manifest is moved to the scheduled state. Tasks from Scheduled manifests are picked up by builder services gradually building the desired state. From this state, the manifest can end up in the Done or Error state. Any changes to the input manifest while it is in the Scheduled state will be reflected after it is moved to the Done state. After which the cycle repeats. Each cluster has a current state and desired state based on which tasks are created. The desired state is created only once, when changes to the configuration are detected. Several tasks can be created that will gradually converge the current state to the desired state. Each time a task is picked up by the builder service the relevant state from the current state is transferred to the task so that each task has up-to-date information about current infrastructure and its up to the builder service to build/modify/delete the missing pieces in the picked up task. Once a task is done building, either in error or successfully, the current state should be updated by the builder service so that the manager has the actual information about the current state of the infrastructure. When the manager receives a request for the update of the current state it transfers relevant information to the desired state that was created at the beginning, before the tasks were scheduled. This is the only point where the desired state is updated, and we only transfer information from current state (such as newly build nodes, ips, etc...). After all tasks have finished successfully the current and desired state should match. #### Rolling updates(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#rolling-updates) Unless otherwise specified, the default is to use the external templates located at https://github.com/berops/claudie-config to build the infrastructure for the dynamic nodepools. The templates provide reasonable defaults that anyone can use to build multi-provider clusters. As we understand that someone may need more specific scenarios, we allow these external templates to be overridden by the user, see https://docs.claudie.io/latest/input-manifest/external-templates/ for more information. By providing the ability to specify the templates that should be used when building the infrastructure of the InputManifest, there is one common scenario that we decided should be handled by the manager service, which is rolling updates. Rolling updates of nodepools are performed when a change to a provider's external templates is registered. The manager then checks that the external repository of the new templates exists and uses them to perform a rolling update of the already built infrastructure. The rolling update is performed in the following steps [![rolling update](https://docs.claudie.io/latest/claudie-workflow/rolling_update.png)](https://docs.claudie.io/latest/claudie-workflow/rolling_update.png) If a failure occurs during the rolling update of a single Nodepool, the state is rolled back to the last possible working state. Rolling updates have a retry strategy that results in endless processing of rolling updates until it succeeds. If the rollback to the last working state fails, it will also be retried indefinitely, in which case it is up to the claudie user to repair the cluster so that the rolling update can continue. The individual states of the Input Manifest and how they are processed within manager are further visually described in the following sections. ### Pending State(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#pending-state) [![pending state](https://docs.claudie.io/latest/claudie-workflow/pending_state.png)](https://docs.claudie.io/latest/claudie-workflow/pending_state.png) ### Scheduled State(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#scheduled-state) [![scheduled state](https://docs.claudie.io/latest/claudie-workflow/scheduled_state.png)](https://docs.claudie.io/latest/claudie-workflow/scheduled_state.png) ### Done/Error State(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#doneerror-state) [![done/error state](https://docs.claudie.io/latest/claudie-workflow/done_error_state.png)](https://docs.claudie.io/latest/claudie-workflow/done_error_state.png) ## Builder(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#builder) Processed tasks scheduled by the manager gradually building the desired state of the infrastructure. It communicates with `terraformer`, `ansibler`, `kube-eleven` and `kuber` services in order to manage the infrastructure. ### Flow(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#flow_1) - Periodically polls Manager for available tasks to be worked on. - Communicates with Terraformer, Ansibler, Kube-eleven and Kuber - After a task is completed, either successfully or not, the current state is updated along with the status, if errored. ## Terraformer(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#terraformer) Terraformer creates or destroys infrastructure via Terraform calls. For the API see the [GRPC definitions](https://github.com/berops/claudie/blob/master/proto/terraformer.proto). ## Ansibler(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#ansibler) Ansibler uses Ansible to: - set up Wireguard VPN between the infrastructure spawned in the Terraformer service. - set up nginx load balancer for the infrastructure - install dependencies for required by nodes in a kubernetes cluster For the API see the [GRPC definitions](https://github.com/berops/claudie/blob/master/proto/ansibler.proto). ## Kube-eleven(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#kube-eleven) Kube-eleven uses [KubeOne](https://github.com/kubermatic/kubeone) to spin up a kubernetes clusters, out of the spawned and pre-configured infrastructure. For the API see the [GRPC definitions](https://github.com/berops/claudie/blob/master/proto/kubeEleven.proto). ## Kuber(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#kuber) Kuber manipulates the cluster resources using `kubectl`. For the API see the [GRPC definitions](https://github.com/berops/claudie/blob/master/proto/kuber.proto). ## Claudie-operator(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#claudie-operator) Claudie-operator is a layer between the user and Claudie. It is a `InputManifest` Custom Resource Definition controller, that will communicate with the `manager` service to communicate changes to the config made by the user. ### Flow(https://docs.claudie.io/latest/claudie-workflow/claudie-workflow/\#flow_2) - User applies a new InputManifest crd holding a configuration of the desired clusters - Claudie-operator detects it and processes the created/modified input manifest - Upon deletion of user-created InputManifest, Claudie-operator initiates a deletion process of the manifest [Skip to content](https://docs.claudie.io/latest/storage/storage-solution/#claudie-storage-solution) # Claudie storage solution(https://docs.claudie.io/latest/storage/storage-solution/\#claudie-storage-solution) ## Concept(https://docs.claudie.io/latest/storage/storage-solution/\#concept) Running stateful workloads is a complex task, even more so when considering the multi-cloud environment. Claudie therefore needs to be able to accommodate stateful workloads, regardless of the underlying infrastructure providers. Claudie orchestrates storage on the kubernetes cluster nodes by creating one "storage cluster" across multiple providers. This "storage cluster" has a series of `zones`, one for each cloud provider instance. Each `zone` then stores its own persistent volume data. This concept is translated into longhorn implementation, where each `zone` is represented by a Storage Class which is backed up by the nodes defined under the same cloud provider instance. Furthermore, each node uses separate disk to the one, where OS is installed, to assure clear data separation. The size of the storage disk can be configured in `storageDiskSize` field of the nodepool specification. ## Longhorn(https://docs.claudie.io/latest/storage/storage-solution/\#longhorn) A Claudie-created cluster comes with the `longhorn` deployment preinstalled and ready to be used. By default, only **worker** nodes are used to store data. Longhorn installed in the cluster is set up in a way that it provides one default `StorageClass` called `longhorn`, which, if used, creates a volume that is then replicated across random nodes in the cluster. Besides the default storage class, Claudie can also create custom storage classes, which force persistent volumes to be created on specific nodes based on the provider instance they have. In other words, you can use a specific provider instance to provision nodes for your storage needs, while using another provider instance for computing tasks. ## Example(https://docs.claudie.io/latest/storage/storage-solution/\#example) To follow along, have a look at the example of `InputManifest` below. storage-classes-example.yaml ``` apiVersion: claudie.io/v1beta1 kind: InputManifest metadata: name: storageclass-example-manifest labels: app.kubernetes.io/part-of: claudie spec: providers: - name: storage-provider providerType: hetzner secretRef: name: storage-provider-secrets namespace: claudie-secrets - name: compute-provider providerType: hetzner secretRef: name: storage-provider-secrets namespace: claudie-secrets - name: dns-provider providerType: cloudflare secretRef: name: dns-provider-secret namespace: claudie-secrets nodePools: dynamic: - name: control providerSpec: name: compute-provider region: hel1 zone: hel1-dc2 count: 3 serverType: cpx22 image: ubuntu-22.04 - name: datastore providerSpec: name: storage-provider region: hel1 zone: hel1-dc2 count: 5 serverType: cpx22 image: ubuntu-22.04 storageDiskSize: 800 taints: - key: node-type value: datastore effect: NoSchedule - name: compute providerSpec: name: compute-provider region: hel1 zone: hel1-dc2 count: 10 serverType: cpx42 image: ubuntu-22.04 taints: - key: node-type value: compute effect: NoSchedule - name: loadbalancer providerSpec: name: compute-provider region: hel1 zone: hel1-dc2 count: 1 serverType: cpx22 image: ubuntu-22.04 kubernetes: clusters: - name: my-awesome-claudie-cluster version: 1.27.0 network: 192.168.2.0/24 pools: control: - control compute: - datastore - compute loadBalancers: roles: - name: apiserver protocol: tcp port: 6443 targetPort: 6443 targetPools: - control clusters: - name: apiserver-lb roles: - apiserver dns: dnsZone: dns-zone provider: dns-provider targetedK8s: my-awesome-claudie-cluster pools: - loadbalancer ``` When Claudie applies this input manifest, the following storage classes are installed: - `longhorn` \- the default storage class, which stores data on random nodes - `longhorn-storage-provider-zone` \- storage class, which stores data only on nodes of the `storage-provider` provider instance. - `longhorn-compute-provider-zone` \- storage class, which stores data only on nodes of the `compute-provider` provider instance. Now all you have to do is specify correct storage class when defining your PVCs. In case you are interested in using different cloud provider for `datastore-nodepool` or `compute-nodepool` of this `InputManifest` example, see the [list of supported providers instance](https://docs.claudie.io/latest/getting-started/detailed-guide/#supported-providers) For more information on how Longhorn works you can check out [Longhorn's official documentation](https://longhorn.io/docs/1.4.0/what-is-longhorn/). [Skip to content](https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/#claudie-load-balancing-solution) # Claudie load balancing solution(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#claudie-load-balancing-solution) ## Loadbalancer(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#loadbalancer) To create a highly available kubernetes cluster, Claudie has the option to create load balancers that utilize [envoy](https://www.envoyproxy.io/docs/envoy/latest/) to load balance the traffic among the cluster nodes. The DNS load balancing functionality, including health checks, is provided by supported cloud providers such as AWS, Azure, Google Cloud, Cloudflare, and OCI. Health checks monitor TCP port 65534. If a node fails to respond on this port, its corresponding DNS record is temporarily removed. Once the endpoint becomes healthy again, the DNS record is automatically restored. ## Concept(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#concept) - The load balancer machines will join the Wireguard private network of Claudie clusters relevant to it. - This is necessary so that the LB machines can send traffic to the cluster machines over the `wireguard VPN`. - DNS A records will be created and managed by Claudie on 1 or more cloud providers. - There will be a DNS A record for the public IP of each LB machine that is currently passing the health checks. - The LB machines will deploy a docker container running [envoy](https://www.envoyproxy.io/docs/envoy/latest/) for each role the loadbalancer uses, to carry out the actual load balancing. - There will be a DNS A record for the public IP of each LB machine that is currently passing the health checks. - Therefore, there will be actually 2 layers of load balancing. 1. DNS-based load balancing to determine the LB machine to be used. 2. Software load balancing on the chosen LB machine. - Claudie will dynamically manage the LB configuration, e.g. if some cluster node is removed, the LB configuration changes or DNS configuration changes (hostname change). - The load balancing will be on L4 layer, TCP/UDP, partially configurable by the Claudie input manifest. ## Example diagram(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#example-diagram) [![lb-architecture](https://docs.claudie.io/latest/loadbalancing/lb-architecture.png)](https://docs.claudie.io/latest/loadbalancing/lb-architecture.png) ## Definitions(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#definitions) ### Role(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#role) Claudie uses the concept of roles while configuring the load balancers from the input manifest. Each role represents a loadbalancer configuration for a particular use. Roles are then assigned to the load balancer cluster. A single load balancer cluster can have multiple roles assigned. ### Targeted kubernetes cluster(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#targeted-kubernetes-cluster) Load balancer gets assigned to a kubernetes cluster with the field `targetedK8s`. This field is using the `name` of the kubernetes cluster as a value. Currently, a single load balancer can only be assigned to a single kubernetes cluster. **Among multiple load balancers targeting the same kubernetes cluster only one of them can have the API server role (i.e. the role with target port 6443) attached to it.** ### DNS(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#dns) Claudie creates and manages the DNS for the load balancer. If the user adds a load balancer into their infrastructure via Claudie, Claudie creates a DNS A record with the public IP of the load balancer machines behind it. When the load balancer configuration changes in any way, that is a node is added/removed, the hostname or the target changes, the DNS record is reconfigured by Claudie on the fly. This rids the user of the need to manage DNS. ### Nodepools(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#nodepools) Loadbalancers are build from user defined nodepools in `pools` field, similar to how kubernetes clusters are defined. These nodepools allow the user to change/scale the load balancers according to their needs without any fuss. See the nodepool definition for more information. ## An example of load balancer definition(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#an-example-of-load-balancer-definition) See an example load balancer definition in our reference [example input manifest](https://docs.claudie.io/latest/input-manifest/example/). ## Notes(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#notes) ### Cluster ingress controller(https://docs.claudie.io/latest/loadbalancing/loadbalancing-solution/\#cluster-ingress-controller) You still need to deploy your own ingress controller to use the load balancer. It needs to be set up to use `nodeport` with the ports configured under `roles` in the load balancer definition. [Skip to content](https://docs.claudie.io/latest/autoscaling/autoscaling/#autoscaling-in-claudie) # Autoscaling in Claudie(https://docs.claudie.io/latest/autoscaling/autoscaling/\#autoscaling-in-claudie) Claudie supports autoscaling by installing [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) for Claudie-made clusters, with a custom implementation of `external gRPC cloud provider`, in Claudie context called `autoscaler-adapter`. This, together with Cluster Autoscaler is automatically managed by Claudie, for any clusters, which have at least one node pool defined with `autoscaler` field. Whats more, you can change the node pool specification freely from autoscaler configuration to static count or vice versa. Claudie will seamlessly configure Cluster Autoscaler, or even remove it when it is no longer needed. ## What triggers a scale up(https://docs.claudie.io/latest/autoscaling/autoscaling/\#what-triggers-a-scale-up) The scale up is triggered if there are pods in the cluster, which are unschedulable and - could be scheduled, if any of the node pools with autoscaling enabled would accommodate them if they would grow in size - the node pools, which could accommodate them, are not yet at maximum size However, if pods' resource requests are larger than any new node would offer, the scale up will not be triggered. The cluster is scanned every 10 seconds for these pods, to assure quick response to the cluster needs. For more information, please have a look at [official Cluster Autoscaler documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-up-work). ## What triggers a scale down(https://docs.claudie.io/latest/autoscaling/autoscaling/\#what-triggers-a-scale-down) The scale down is triggered, if all following conditions are met - the sum of CPU and memory requests of all pods running on node considered for scale down is below 50% (Claudie by default excludes DaemonSet pods and Mirror pods) - all pods running on the node (except those that run on all nodes by default, like manifest-run pods or pods created by DaemonSets) considered for scale down, can be scheduled to other nodes - the node considered for scale down does not have [scale-down disabled annotation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node) For more information, please have a look at [official Cluster Autoscaler documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-down-work). ## Architecture(https://docs.claudie.io/latest/autoscaling/autoscaling/\#architecture) As stated earlier, Claudie deploys Cluster Autoscaler and Autoscaler Adapter for every Claudie-made cluster which enables it. These components are deployed within the same cluster as Claudie. [![autoscaling-architecture](https://docs.claudie.io/latest/autoscaling/autoscaling.png)](https://docs.claudie.io/latest/autoscaling/autoscaling.png) ## Considerations(https://docs.claudie.io/latest/autoscaling/autoscaling/\#considerations) As Claudie just extends Cluster Autoscaler, it is important that you follow their [best practices](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-key-best-practices-for-running-cluster-autoscaler). Furthermore, as number of nodes in autoscaled node pools can be volatile, you should carefully plan out how you will use the storage on such node pools. Longhorn support of Cluster Autoscaler is still in experimental phase ( [longhorn documentation](https://longhorn.io/docs/1.4.0/high-availability/k8s-cluster-autoscaler/)). [Skip to content](https://docs.claudie.io/latest/use-cases/use-cases/#use-cases-and-customers) # Use-cases and customers(https://docs.claudie.io/latest/use-cases/use-cases/\#use-cases-and-customers) We foresee the following use-cases of the Claudie platform ## 1. Cloud-bursting(https://docs.claudie.io/latest/use-cases/use-cases/\#1-cloud-bursting) A company uses advanced cloud features in one of the hyper-scale providers (e.g. serverless Lambda and API Gateway functionality in AWS). They run a machine-learning application that they need to train for a pattern on a dataset. The learning phase requires significant compute resources. Claudie allows to extend the cluster in AWS (needed in order to access the AWS functionality) to Hetzner for saving the infrastructure costs of the machine-learning case. Typical client profiles: - startups - in need of significant computing power already in their early stages (e.g. AI/ML workloads) ## 2. Cost-saving(https://docs.claudie.io/latest/use-cases/use-cases/\#2-cost-saving) A company would like to utilize their on-premise or leased resources that they already invested into, but would like to: 1. extend the capacity 2. access managed features of a hyper-scale provider (AWS, GCP, ...) 3. get the workload physically closer to a client (e. g. to South America) Typical client profile: - medium-size business - possibly already familiar with containerized workload ## 3. Smart-layer-as-a-Service on top of simple cloud-providers(https://docs.claudie.io/latest/use-cases/use-cases/\#3-smart-layer-as-a-service-on-top-of-simple-cloud-providers) An existing customer of medium-size provider (e.g. Exoscale) would like to utilize features that are typical for hyper-scale providers. Their current provider does neither offer nor plan to offer such an advanced functionality. Typical client profile: - established business - need to access advanced managed features to innovate faster ## 4. Service interconnect(https://docs.claudie.io/latest/use-cases/use-cases/\#4-service-interconnect) A company would like to access on-premise-hosted services and cloud-managed services from within the same cluster. For on-premise services the on-premise cluster node would egress the traffic. The cloud-hosted cluster nodes would deal with the egress traffic to the cloud-managed services. Typical client profile: - medium-size/established business - already contains on-premise workloads - has the need to take the advantage of managed cloud infra (from cost, agility, or capacity reasons) [Skip to content](https://docs.claudie.io/latest/faq/FAQ/#frequently-asked-question) # Frequently Asked Question(https://docs.claudie.io/latest/faq/FAQ/\#frequently-asked-question) We have prepared some of our most frequently asked question to help you out! ### Does Claudie make sense as a pure K8s orchestration on a single cloud-provider IaaS?(https://docs.claudie.io/latest/faq/FAQ/\#does-claudie-make-sense-as-a-pure-k8s-orchestration-on-a-single-cloud-provider-iaas) Since Claudie specializes in multicloud, you will likely face some drawbacks, such as the need for a public IPv4 address for each node. Otherwise it works well in a single-provider mode. Using Claudie will also give you some advantages, such as scaling to multi-cloud as your needs change, or the autoscaler that Claudie provides. ### Which scenarios make sense for using Claudie and which don't?(https://docs.claudie.io/latest/faq/FAQ/\#which-scenarios-make-sense-for-using-claudie-and-which-dont) Claudie aims to address the following scenarios, described in more detail on the [use-cases](https://docs.claudie.io/latest/use-cases/use-cases/) page: - Cost savings - Data locality - Compliance (e.g. GDPR) - Managed Kubernetes for cloud providers that do not offer it - Cloud bursting - Service interconnect Using Claudie doesn't make sense when you rely on specific features of a cloud provider and necessarily tying yourself to that cloud provider. ### Is there any networking performance impact due to the introduction of the VPN layer?(https://docs.claudie.io/latest/faq/FAQ/\#is-there-any-networking-performance-impact-due-to-the-introduction-of-the-vpn-layer) We compared the use of the VPN layer with other solutions and concluded that the impact on performance is negligible. If you are interested in performed benchmarks, we summarized the results in [our blog post](https://www.berops.com/traffic-encryption-performance-in-kubernetes-clusters/). ### What is the performance impact of a geographically distributed control plane in Claudie?(https://docs.claudie.io/latest/faq/FAQ/\#what-is-the-performance-impact-of-a-geographically-distributed-control-plane-in-claudie) We have performed several tests and problems start to appear when the control nodes are geographically about 600 km apart. Although this is not an answer that fits all scenarios and should only be taken as a reference point. If you are interested in the tests we have run and a more detailed answer, you can read more in [our blog post](https://www.berops.com/evaluating-etcds-performance-in-multi-cloud/). ### Does the cloud provider traffic egress bill represent a significant part on the overall running costs?(https://docs.claudie.io/latest/faq/FAQ/\#does-the-cloud-provider-traffic-egress-bill-represent-a-significant-part-on-the-overall-running-costs) Costs are individual and depend on the cost of the selected cloud provider and the type of workload running on the cluster based on the user's needs. Networking expenses can exceed 50% of your provider bill, therefore we recommend making your workload geography and provider aware (e.g. using taints and affinities). ### Should I be worried about giving Claudie provider credentials, including ssh keys?(https://docs.claudie.io/latest/faq/FAQ/\#should-i-be-worried-about-giving-claudie-provider-credentials-including-ssh-keys) Provider credentials are created as secrets in the Management Cluster for Claudie which you then reference when creating the input manifest, that is passed to Claudie. Claudie only uses the credentials to create a connection to nodes in the case of static nodepools or to provision the required infrastructure in the case of dynamic nodepools. The credentials are as secure as your secret management allows. We are transparent and all of our code is open-sourced, if in doubt you can always check for yourself. ### Does each node need a public IP address?(https://docs.claudie.io/latest/faq/FAQ/\#does-each-node-need-a-public-ip-address) For dynamic nodepools, nodes created by Claudie in specified cloud providers, each node needs a public IP, for static nodepools no public IP is needed. ### Is a GUI/CLI/ClusterAPI provider/Terraform provider planned?(https://docs.claudie.io/latest/faq/FAQ/\#is-a-guicliclusterapi-providerterraform-provider-planned) A GUI is not actively considered at this point in time. Other possibilities are openly discussed in [this github issue](https://github.com/berops/claudie/issues/33). ### What is the roadmap for adding support for new cloud IaaS providers?(https://docs.claudie.io/latest/faq/FAQ/\#what-is-the-roadmap-for-adding-support-for-new-cloud-iaas-providers) Adding support for a new cloud provider is an easy task. Let us know your needs. [Skip to content](https://docs.claudie.io/latest/contributing/contributing/#contributing) # Contributing(https://docs.claudie.io/latest/contributing/contributing/\#contributing) ## Bug reports(https://docs.claudie.io/latest/contributing/contributing/\#bug-reports) When you encounter a bug, please create a new [issue](https://github.com/berops/claudie/issues/new/choose) and use our bug template. Before you submit, please check: - ...that the issue you want to open is not a duplicate - ...that you submitted the logs/screenshots of any errors and a concise way to reproduce the issue - ...the input manifest you used be careful not to include your cloud credentials [Skip to content](https://docs.claudie.io/latest/latency-limitations/latency-limitations/#latency-imposed-limitations) # Latency-imposed limitations(https://docs.claudie.io/latest/latency-limitations/latency-limitations/\#latency-imposed-limitations) The general rule of thumb is that every 100 km of distance adds roughly ~1ms of latency. Therefore in the following subsections, we describe what problems might and will most probably arise when working with high latency using etcd and Longhorn. ## etcd limitations(https://docs.claudie.io/latest/latency-limitations/latency-limitations/\#etcd-limitations) A distance between etcd nodes in the multi-cloud environment of more than 600 km can be detrimental to cluster health. In a scenario like this, an average deployment time can double compared to a scenario with etcd nodes in different availability zones within the same cloud provider. Besides this, the total number of the etcd Slow Applies increases rapidly, and a Round-trip time varies from ~0.05s to ~0.2s, whereas in a single-cloud scenario with etcd nodes in a different AZs the range is from ~0.003s to ~0.025s. In multi-cloud clusters, a request to a KubeAPI lasts generally from ~0.025s to ~0.25s. On the other hand, in a one-cloud scenario, they last from ~0.005s to ~0.025s. You can read more about this topic [here](https://www.berops.com/blog/evaluating-etcds-performance-in-multi-cloud), and for distances above 600 km, we recommend customizing further the etcd deployment ( [see](https://etcd.io/docs/v3.5/op-guide/configuration/)). ## Longhorn limitations(https://docs.claudie.io/latest/latency-limitations/latency-limitations/\#longhorn-limitations) There are basically these three problems when dealing with a high latency in Longhorn: - Kubelet fails to mount the RWO or RWX volume to a workload pod in case the latency between the node hosting the pod and the nodes with the replicas is greater than ~100ms. - Some replicas of a volume might not catch up if the latency between nodes that host replicas is greater than ~100ms. - In case of RWX volumes, Longhorn spawns a `share-manager` pod that hosts the NFS server to facilitate the data export to the workload pods. If the latency between the node with a `share-manager` pod and the node with a workload pod is greater than ~100ms, kubelet fails to mount the volume to the workload pod. Generally, a single volume with 3 replicas can tolerate a maximum network latency of around 100ms. In the case of a multiple-volume scenario, the maximum network latency can be no more than 20ms. The network latency has a significant impact on IO performance and total network bandwidth. See more about CPU and network requirements [here](https://github.com/longhorn/longhorn/issues/1691#issuecomment-729633995) ### How to avoid high latency problems(https://docs.claudie.io/latest/latency-limitations/latency-limitations/\#how-to-avoid-high-latency-problems) When dealing with RWO volumes you can avoid mount failures caused by high latency by setting Longhorn to only use storage on specific nodes (follow this [tutorial](https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/)) and using [nodeAffinity](https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/) or [nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) to schedule your workload pods only to the nodes that have replicas of the volumes or are close to them. ### How to mitigate high latency problems with RWX volumes(https://docs.claudie.io/latest/latency-limitations/latency-limitations/\#how-to-mitigate-high-latency-problems-with-rwx-volumes) To mitigate high latency issues with RWX volumes you can maximize these Longhorn settings: - [Engine Replica Timeout](https://longhorn.io/docs/1.6.0/references/settings/#engine-to-replica-timeout) \- max 30s - [Replica File Sync HTTP Timeout](https://longhorn.io/docs/1.6.0/references/settings/#timeout-of-http-client-to-replica-file-sync-server) \- max 120s - [Guaranteed Instance Manager CPU](https://longhorn.io/docs/1.6.0/references/settings/#guaranteed-instance-manager-cpu) \- max 40% Thanks to maximizing these settings you should successfully mount a RWX volume for which a latency between a node with a `share-manager` pod and a node with a workload pod + replica is ~200ms. However, it will take from 7 to 10 minutes. Also, there are some resource requirements on the nodes and limitations on the maximum size of the RWX volumes. For example, you will not succeed in mounting even a 1Gi RWX volume for which a latency between a node with a `share-manager` pod and a node with a workload pod + replica is ~200ms, if the nodes have only 2 shared vCPUs and 4GB RAM. This applies even when there are no other workloads in the cluster. Your nodes need at least 2vCPU and 8GB RAM. Generally, the more CPU you assign to the Longhorn manager the more you can mitigate the issue with high latency and RWX volumes. Keep in mind, that using machines with higher resources and maximizing these Longhorn settings doesn't necessarily guarantee successful mount of the RWX volumes. It also depends on the size of these volumes. For example, even after maximizing these settings and using nodes with 2vCPU and 8GB RAM with ~200ms latency between them, you will fail to mount a 10Gi volume to the workload pod in case you try to mount multiple volumes at once. In case you do it one by one, you should be good. To conclude, maximizing these Longhorn settings can help to mitigate the high latency issue when mounting RWX volumes, but it is resource-hungry and it also depends on the size of the RWX volume + the total number of the RWX volumes that are attaching at once. [Skip to content](https://docs.claudie.io/latest/troubleshooting/troubleshooting/#troubleshooting-guide) # Troubleshooting guide(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#troubleshooting-guide) In progress As we continue expanding our troubleshooting guide, we understand that issues may arise during your usage of Claudie. Although the guide is not yet complete, we encourage you to create a [GitHub issue](https://github.com/berops/claudie/issues) if you encounter any problems. Your feedback and reports are highly valuable to us in improving our platform and addressing any issues you may face. ## Claudie cluster not starting(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#claudie-cluster-not-starting) Claudie relies on all services to be interconnected. If any of these services fail to create due to node unavailability or resource constraints, Claudie will be unable to provision your cluster. 1. Check if all Claudie services are running: ``` kubectl get pods -n claudie ``` ``` NAME READY STATUS RESTARTS AGE ansibler-5c6c776b75-82c2q 1/1 Running 0 8m10s builder-59f9d44596-n2qzm 1/1 Running 0 8m10s manager-5d76c89b4d-tb6h4 1/1 Running 1 (6m37s ago) 8m10s claudie-operator-5755b7bc69-5l84h 1/1 Running 0 8m10s kube-eleven-64468cd5bd-qp4d4 1/1 Running 0 8m10s kuber-698c4564c-dhsvg 1/1 Running 0 8m10s make-bucket-job-fb5sp 0/1 Completed 0 8m10s minio-0 1/1 Running 0 8m10s minio-1 1/1 Running 0 8m10s minio-2 1/1 Running 0 8m10s minio-3 1/1 Running 0 8m10s mongodb-67bf769957-9ct5z 1/1 Running 0 8m10s terraformer-fd664b7ff-dd2h7 1/1 Running 0 8m9s ``` 2. Check the `InputManifest` resource status to find out what is the actual cluster state. ``` kubectl get inputmanifests.claudie.io resourceName -o jsonpath={.status} ``` ``` { "clusters": { "one-of-my-cluster": { "message": " installing VPN", "phase": "ANSIBLER", "state": "IN_PROGRESS" } }, "state": "IN_PROGRESS" } ``` 3. Examine claudie-operator service logs. The claudie-operator service logs will provide insights into any issues during cluster bootstrap and identify the problematic service. If cluster creation fails despite all Claudie pods being scheduled, it may suggest lack of permissions for Claudie providers' credentials. In this case, operator logs will point to Terrafomer service, and Terraformer service logs will provide detailed error output. ``` kubectl -n claudie logs -l app.kubernetes.io/name=claudie-operator ``` ``` 6:04AM INF Using log with the level "info" module=claudie-operator 6:04AM INF Claudie-operator is ready to process input manifests module=claudie-operator 6:04AM INF Claudie-operator is ready to watch input manifest statuses module=claudie-operator ``` Debug log level Using debug log level will help here with identifying the issue closely. [This guide](https://docs.claudie.io/v0.4.0/getting-started/detailed-guide/#claudie-deployment) shows how you can set it up during step 5. Claudie benefit The great thing about Claudie is that it utilizes open source tools to set up and configure infrastructure based on your preferences. As a result, the majority of errors can be easily found and resolved through online resources. ### Terraformer service not starting(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#terraformer-service-not-starting) Terraformer relies on MinIO datastore to be configured via jobs `make-bucket-job`. If the job fails to configure the datastore, or the datastore itself fails to start, Terraformer will also fail to start. ### Datastore initialization job(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#datastore-initialization-jobs) The `make-bucket-job` creates a bucket in the MinIO datastore. If this job encounter scheduling problems or experience slow autoscaling, it may fail to complete within the designated time frame. To handle this, we have set the `backoffLimit` to fail after approximately 42 minutes. If you encounter any issues with this job or believe the `backoffLimit` should be adjusted, please [create an issue](https://github.com/berops/claudie/issues). ## Networking issues(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#networking-issues) ### Wireguard MTU(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#wireguard-mtu) We use Wireguard for secure node-to-node connectivity. However, it requires setting the MTU value to match that of Wireguard. While the host system interface MTU value is adjusted accordingly, networking issues may arise for services hosted on Claudie managed Kubernetes clusters. For example, we observed that the GitHub actions runner docker container had to be configured with an MTU value of `1380` to avoid network errors during `docker build` process. ### Hetzner and OCI node pools(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#hetzner-and-oci-node-pools) We're experiencing networking issues caused by the blacklisting of public IPs owned by Hetzner and OCI. This problem affects the Ansibler and Kube-eleven services, which fail when attempting to add GPG keys to access the Google repository for package downloads. Unfortunately, there's no straightforward solution to bypass this issue. The recommended approach is to allow the services to fail, remove failed cluster and attempt provisioning a new cluster with newly allocated IP addresses that are not blocked by Google. ## Resolving issues with Terraform state lock(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#resolving-issues-with-terraform-state-lock) ~During normal operation, the content of this section should not be required. If you ended up here, it means there was likely a bug somewhere in Claudie. Please [open a bug report](https://github.com/berops/claudie/issues/new/choose) in that case and use the content of this section to troubleshoot your way out of it. First of all you have to get into the directory in the `terraformer` pod, where all terraform files are located. In order to do that, follow these steps: - `kubectl exec -it -n claudie -- bash` - `cd ./services/terraformer/server/clusters/` ### Locked state(https://docs.claudie.io/latest/troubleshooting/troubleshooting/\#locked-state) Once you are in the directory with all TF files, run the following command: ``` tofu force-unlock ``` The `lock-id` is generally shown in the error message. [Skip to content](https://docs.claudie.io/latest/creating-claudie-backup/creating-claudie-backup/#creating-claudie-backup) # Creating Claudie Backup(https://docs.claudie.io/latest/creating-claudie-backup/creating-claudie-backup/\#creating-claudie-backup) In this section we'll explain where the state of Claudie is and backing up the necessary components and restoring them on a completely new cluster. ## Claudie state(https://docs.claudie.io/latest/creating-claudie-backup/creating-claudie-backup/\#claudie-state) Claudie stores its state in 2 different places. - Input Manifests are stored in **Mongo**. - Terraform/OpenTofu state files are stored in **MinIO**. This same **MinIO** instance is utilized for the locking mechanism, leveraging [S3 native state locking](https://opentofu.org/blog/opentofu-1-10-0/) in OpenTofu. These are the only services that will have a PVC attached to it, the other are stateless. ## Backing up Claudie(https://docs.claudie.io/latest/creating-claudie-backup/creating-claudie-backup/\#backing-up-claudie) ### Using Velero(https://docs.claudie.io/latest/creating-claudie-backup/creating-claudie-backup/\#using-velero) This is the primary backup and restore method. Velero does not support HostPath volumes. If the PVCs in your management cluster are attached to such volumes (e.g. when running on Kind or MiniKube), the backup will not work. In this case, use the below backup method. All resources that are deployed or created by Claudie can be identified with the following label: ``` app.kubernetes.io/part-of: claudie ``` If you want to include your deployed Input Manifests to be part of the backup you'll have to add the same label to them. We'll walk through the following scenario step-by-step to back up claudie and then restore it. Claudie is already deployed on an existing Management Cluster and at least 1 Input Manifest has been applied. The state is backed up and the Management Cluster is replaced by a new one on which we restore the state. To back up the resources we'll be using Velero version v1.11.0. The following steps will all be executed with the existing Management Cluster in context. 1. To create a backup, Velero needs to store the state to external storage. The list of supported providers for the external storage can be found in the [link](https://velero.io/docs/v1.11/supported-providers/). In this guide we'll be using AWS S3 object storage for our backup. 2. Prepare the S3 bucket by following the first two steps in this [setup guide](https://github.com/vmware-tanzu/velero-plugin-for-aws#setup), excluding the installation step, as this will be different for our use-case. If you do not have the `aws` CLI locally installed, follow the [user guide](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) to set it up. 1. Execute the following command to install Velero on the Management Cluster. ``` velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.6.0 \ --bucket $BUCKET \ --secret-file ./credentials-velero \ --backup-location-config region=$REGION \ --snapshot-location-config region=$REGION \ --use-node-agent \ --default-volumes-to-fs-backup ``` Following the instructions in step 2, you should have a `credentials-velero` file with the access and secret keys for the aws setup. The env variables `$BUCKET` and `$REGION` should be set to the name and region for the bucket created in AWS S3. By default Velero will use your default config `$HOME/.kube/config`, if this is not the config that points to your Management Cluster, you can override it with the `--kubeconfig` argument. 1. Backup claudie by executing ``` velero backup create claudie-backup --selector app.kubernetes.io/part-of=claudie ``` To track the progress of the backup execute ``` velero backup describe claudie-backup --details ``` From this point the new Management Cluster for Claudie is in context. We expect that your default `kubeconfig` points to the new Management Cluster, if it does not, you can override it in the following commands using `--kubeconfig ./path-to-config`. 1. Repeat the step to install Velero, but now on the new Management Cluster. 2. Install cert manager to the new Management Cluster by executing: ``` kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml ``` 3. To restore the state that was stored in the S3 bucket execute ``` velero restore create --from-backup claudie-backup ``` Once all resources are restored, you should be able to deploy new input manifests and also modify existing infrastructure without any problems. ### Manual backup(https://docs.claudie.io/latest/creating-claudie-backup/creating-claudie-backup/\#manual-backup) Claudie is already deployed on an existing Management Cluster and at least 1 Input Manifest has been applied. Create a directory where the backup of the state will be stored. ``` mkdir claudie-backup ``` Put your Claudie inputmanifests into the created folder, e.g. `kubectl get InputManifest -A -oyaml > ./claudie-backup/all.yaml` We will now back up the state of the respective input manifests from MongoDB and MinIO. ``` kubectl get pods -n claudie NAME READY STATUS RESTARTS AGE ansibler-6f4557cf74-b4dts 1/1 Running 0 18m builder-5d68987c86-qdfd5 1/1 Running 0 18m claudie-operator-6d9ddc7f8b-hv84c 1/1 Running 0 18m manager-5d75bfffc6-d9qfm 1/1 Running 0 18m kube-eleven-556cfdfd98-jq6hl 1/1 Running 0 18m kuber-7f8cd4cd89-6ds2w 1/1 Running 0 18m make-bucket-job-9mjft 0/1 Completed 0 18m minio-0 1/1 Running 0 18m minio-1 1/1 Running 0 18m minio-2 1/1 Running 0 18m minio-3 1/1 Running 0 18m mongodb-6ccb5f5dff-ptdw2 1/1 Running 0 18m terraformer-66c6f67d98-pwr9t 1/1 Running 0 18m ``` To backup state from MongoDB execute the following command ``` kubectl exec -n claudie mongodb- -- sh -c 'mongoexport --uri=mongodb://$MONGO_INITDB_ROOT_USERNAME:$MONGO_INITDB_ROOT_PASSWORD@localhost:27017/claudie -c inputManifests --authenticationDatabase admin' > claudie-backup/inputManifests ``` Next we need to backup the state from MinIO. Port-forward the MinIO service so that it is accessible from localhost. ``` kubectl port-forward -n claudie svc/minio 9000:9000 ``` Setup an alias for the [mc](https://min.io/docs/minio/linux/reference/minio-mc.html) command line tool. ``` mc alias set claudie-minio http://127.0.0.1:9000 ``` Provide the access and secret key for minio. The default can be found in the github repository in the `manifests/claudie/minio/secrets` folder. If you have not changed them, we strongly encourage you to do so! Download the state into the backup folder ``` mc mirror claudie-minio/claudie-tf-state-files ./claudie-backup ``` You now have everything you need to restore your input manifests to a new management cluster. These files will contain your credentials, DO NOT STORE THEM OUT IN THE PUBLIC! To restore the state on your new management cluster you can follow these commands. We expect that your default `kubeconfig` points to the new Management Cluster, if it does not, you can override it in the following commands using `--kubeconfig ./path-to-config`. Copy the collection into the MongoDB pod. ``` kubectl cp ./claudie-backup/inputManifests mongodb-:/tmp/inputManifests -n claudie ``` Import the state to MongoDB. ``` kubectl exec -n claudie mongodb- -- sh -c 'mongoimport --uri=mongodb://$MONGO_INITDB_ROOT_USERNAME:$MONGO_INITDB_ROOT_PASSWORD@localhost:27017/claudie -c inputManifests --authenticationDatabase admin --file /tmp/inputManifests' ``` Don't forget to delete the `/tmp/inputManifests` file Port-forward the MinIO service and import the backed up state. ``` mc cp --recursive ./claudie-backup/ claudie-minio/claudie-tf-state-files ``` You can now apply your Claudie inputmanifests which will be immediately in the `DONE` stage. You can verify this with ``` kubectl get inputmanifests -A ``` Now you can make any new changes to your inputmanifests on the new management cluster and the state will be re-used. The secrets for the clusters, namely kubeconfig and cluster-metadata, are re-created after the workflow with the changes has finished. Alternatively you may also use any GUI clients for MongoDB and Minio for more straightforward backup of the state. All you need to backup is the bucket `claudie-tf-state-files` in MinIO and the collection `inputManifests` from MongoDB Once all data is restored, you should be able to deploy new input manifests and also modify existing infrastructure without any problems. [Skip to content](https://docs.claudie.io/latest/hardening/hardening/#claudie-hardening) # Claudie Hardening(https://docs.claudie.io/latest/hardening/hardening/\#claudie-hardening) In this section we'll describe how to further configure security hardening of the default deployment for claudie. ## Passwords(https://docs.claudie.io/latest/hardening/hardening/\#passwords) When deploying the default manifests claudie uses simple passwords for MongoDB, and MinIO. You can find the passwords at these paths: ``` manifests/claudie/mongo/secrets manifests/claudie/minio/secrets ``` It is highly recommended that you change these passwords to more secure ones. ## Network Policies(https://docs.claudie.io/latest/hardening/hardening/\#network-policies) The default deployment of claudie comes without any network policies, as based on the CNI on the Management cluster the network policies may not be fully supported. We have a set of network policies pre-defined that can be found in: ``` manifests/network-policies ``` Currently, we have a cilium specific network policy that's using `CiliumNetworkPolicy` and another that uses `NetworkPolicy` which should be supported by most network plugins. To install network policies you can simply execute one the following commands: ``` # for clusters using cilium as their CNI kubectl apply -f https://github.com/berops/claudie/releases/latest/download/network-policy-cilium.yaml ``` ``` # other kubectl apply -f https://github.com/berops/claudie/releases/latest/download/network-policy.yaml ``` [Skip to content](https://docs.claudie.io/latest/monitoring/grafana/#prometheus-monitoring) # Prometheus Monitoring(https://docs.claudie.io/latest/monitoring/grafana/\#prometheus-monitoring) In our environment, we rely on Claudie to export Prometheus metrics, providing valuable insights into the state of our infrastructure and applications. To utilize Claudie's monitoring capabilities, it's essential to have Prometheus installed. With this setup, you can gain visibility into various metrics such as: - Number of managed K8s clusters created by Claudie - Number of managed LoadBalancer clusters created by Claudie - Currently added/deleted nodes to/from K8s/LB cluster - Information about gRPC requests - and much more You can find [Claudie dashboard](https://grafana.com/grafana/dashboards/20064-claudie-dashboard/) here. ## Configure scraping metrics(https://docs.claudie.io/latest/monitoring/grafana/\#configure-scraping-metrics) We recommend using the [Prometheus Operator](https://github.com/prometheus-operator/kube-prometheus) for managing Prometheus deployments efficiently. 1. Create `RBAC` that allows Prometheus to scrape metrics from Claudie’s pods: ``` apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: claudie-pod-reader namespace: claudie rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: claudie-pod-reader-binding namespace: claudie subjects: # this SA is created by https://github.com/prometheus-operator/kube-prometheus # in your case you might need to bind this Role to a different SA - kind: ServiceAccount name: prometheus-k8s namespace: monitoring roleRef: kind: Role name: claudie-pod-reader apiGroup: rbac.authorization.k8s.io ``` 2. Create Prometheus PodMonitor to scrape metrics from Claudie’s pods ``` apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: claudie-metrics namespace: monitoring labels: name: claudie-metrics spec: namespaceSelector: matchNames: - claudie selector: matchLabels: app.kubernetes.io/part-of: claudie podMetricsEndpoints: - port: metrics ``` 3. Import [our dashboard](https://grafana.com/grafana/dashboards/20064-claudie-dashboard/) into your Grafana instance: - Navigate to the Grafana UI. - Go to the Dashboards section. - Click on "Import" and provide the dashboard ID or upload the JSON file. - Configure the data source to point to your Prometheus instance. - Save the dashboard, and you're ready to visualize Claudie's metrics in Grafana. That's it! Now you have set up RBAC for Prometheus, configured a PodMonitor to scrape metrics from Claudie's pods, and imported a Grafana dashboard to visualize the metrics. [Skip to content](https://docs.claudie.io/latest/update/update/#updating-claudie) # Updating Claudie(https://docs.claudie.io/latest/update/update/\#updating-claudie) In this section we'll describe how you can update resources that claudie creates based on changes in the manifest. ## Updating Kubernetes Version(https://docs.claudie.io/latest/update/update/\#updating-kubernetes-version) Updating the Kubernetes version is as easy as incrementing the version in the Input Manifest of the already build cluster. ``` # old version ... kubernetes: clusters: - name: claudie-cluster version: v1.30.0 network: 192.168.2.0/24 pools: ... ``` ``` # new version ... kubernetes: clusters: - name: claudie-cluster version: 1.31.0 network: 192.168.2.0/24 pools: ... ``` When re-applied this will trigger a new workflow for the cluster that will result in the updated kubernetes version. Downgrading a version is not supported once you've upgraded a cluster to a newer version # Updating Dynamic Nodepool(https://docs.claudie.io/latest/update/update/\#updating-dynamic-nodepool) Nodepools specified in the InputManifest are immutable. Once created, they cannot be updated/changed. This decision was made to force the user to perform a rolling update by first deleting the nodepool and replacing it with a new version with the new desired state. A couple of examples are listed below. ## Updating the OS image(https://docs.claudie.io/latest/update/update/\#updating-the-os-image) ``` # old version ... - name: hetzner providerSpec: name: hetzner-1 region: fsn1 zone: fsn1-dc14 count: 1 serverType: cpx22 image: ubuntu-22.04 ... ``` ``` # new version ... - name: hetzner-1 # NOTE the different name. providerSpec: name: hetzner-1 region: fsn1 zone: fsn1-dc14 count: 1 serverType: cpx22 image: ubuntu-24.04 ... ``` When re-applied this will trigger a new workflow for the cluster that will result first in the addition of the new nodepool and then the deletion of the old nodepool. ## Changing the Server Type of a Dynamic Nodepool(https://docs.claudie.io/latest/update/update/\#changing-the-server-type-of-a-dynamic-nodepool) The same concept applies to changing the server type of a dynamic nodepool. ``` # old version ... - name: hetzner providerSpec: name: hetzner-1 region: fsn1 zone: fsn1-dc14 count: 1 serverType: cpx22 image: ubuntu-22.04 ... ``` ``` # new version ... - name: hetzner-1 # NOTE the different name. providerSpec: name: hetzner-1 region: fsn1 zone: fsn1-dc14 count: 1 serverType: cpx22 image: ubuntu-22.04 ... ``` When re-applied this will trigger a new workflow for the cluster that will result in the updated server type of the nodepool. [Skip to content](https://docs.claudie.io/latest/commands/commands/#command-cheat-sheet) # Command Cheat Sheet(https://docs.claudie.io/latest/commands/commands/\#command-cheat-sheet) In this section, we'll describe `kubectl` commands to interact with Claudie. ## Monitoring the cluster state(https://docs.claudie.io/latest/commands/commands/\#monitoring-the-cluster-state) Watch the cluster state in the `InputManifest` that is provisioned. ``` watch -n 2 'kubectl get inputmanifests.claudie.io manifest-name -ojsonpath='{.status}' | jq .' { "clusters": { "my-super-cluster": { "phase": "NONE", "state": "DONE" } }, "state": "DONE" } ``` ## Viewing the cluster metadata(https://docs.claudie.io/latest/commands/commands/\#viewing-the-cluster-metadata) Each secret created by Claudie has following labels: | Key | Value | | --- | --- | | `claudie.io/project` | Name of the project. | | `claudie.io/cluster` | Name of the cluster. | | `claudie.io/cluster-id` | ID of the cluster. | | `claudie.io/output` | Output type, either `kubeconfig` or `metadata`. | Claudie creates kubeconfig secret in claudie namespace: ``` kubectl get secrets -n claudie -l claudie.io/output=kubeconfig ``` ``` NAME TYPE DATA AGE my-super-cluster-6ktx6rb-kubeconfig Opaque 1 134m ``` You can **recover kubeconfig** for your cluster with the following command: ``` kubectl get secrets -n claudie -l claudie.io/output=kubeconfig,claudie.io/cluster=$YOUR-CLUSTER-NAME -o jsonpath='{.items[0].data.kubeconfig}' | base64 -d > my-super-cluster-kubeconfig.yaml ``` If you want to connect to your **dynamic k8s nodes** via SSH, you can **recover private SSH** key for each nodepool: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata,claudie.io/cluster=$YOUR-CLUSTER-NAME -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq '.dynamic_nodepools | map_values(.nodepool_private_key)' ``` To **recover public IP** of your **dynamic k8s nodes** to connect to via SSH: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata,claudie.io/cluster=$YOUR-CLUSTER-NAME -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq '.dynamic_nodepools | map_values(.node_ips)' ``` You can display all **dynamic load balancer nodes** metadata by: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata,claudie.io/cluster=$YOUR-CLUSTER-NAME -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq -r .dynamic_load_balancer_nodepools ``` In case you want to connect to your **dynamic load balancer nodes** via SSH, you can **recover private SSH** key: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata,claudie.io/cluster=$YOUR-CLUSTER-NAME -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq '.dynamic_load_balancer_nodepools | .[]' ``` To **recover public IP** of your **dynamic load balancer nodes** to connect to via SSH: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata,claudie.io/cluster=$YOUR-CLUSTER-NAME -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq '.dynamic_load_balancer_nodepools | .[] | map_values(.node_ips)' ``` You can display all **static load balancer nodes** metadata by: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata,claudie.io/cluster=$YOUR-CLUSTER-NAME -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq -r .static_load_balancer_nodepools ``` In order to display **public IPs** and **private SSH** keys of your **static load balancer** nodes by: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata,claudie.io/cluster=$YOUR-CLUSTER-NAME -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq -r '.static_load_balancer_nodepools | .[] | map_values(.node_info)' ``` To connect to one of your **static load balancer** nodes via SSH, you can **recover private SSH** key: ``` kubectl get secrets -n claudie -l claudie.io/output=metadata,claudie.io/cluster=$YOUR-CLUSTER-NAME -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq -r '.static_load_balancer_nodepools | .[]' ``` [Skip to content](https://docs.claudie.io/latest/version-matrix/version-matrix/#version-matrix) # Version matrix(https://docs.claudie.io/latest/version-matrix/version-matrix/\#version-matrix) In the following table, you can find the supported Kubernetes and OS versions for the latest Claudie versions. | Claudie Version | Kubernetes versions | OS versions | | --- | --- | --- | | v0.6.x | 1.24.x, 1.25.x, 1.26.x | Ubuntu 22.04 | | v0.7.0 | 1.24.x, 1.25.x, 1.26.x | Ubuntu 22.04 | | v0.7.1-x | 1.25.x, 1.26.x, 1.27.x | Ubuntu 22.04 | | v0.8.0 | 1.25.x, 1.26.x, 1.27.x | Ubuntu 22.04 | | v0.8.1 | 1.27.x, 1.28.x, 1.29.x | Ubuntu 22.04 | | v0.9.0 | 1.27.x, 1.28.x, 1.29.x, 1.30.x | Ubuntu 22.04 (Ubuntu 24.04 on Hetzner and Azure) | | v0.9.1 | 1.29.x, 1.30.x 1.31.x | Ubuntu 22.04 (Ubuntu 24.04 on Hetzner and Azure) | [Skip to content](https://docs.claudie.io/latest/http-proxy/http-proxy/#usage-of-http-proxy) # Usage of HTTP proxy(https://docs.claudie.io/latest/http-proxy/http-proxy/\#usage-of-http-proxy) In this section, we'll describe the default HTTP proxy setup and its the further customization. ## Default setup(https://docs.claudie.io/latest/http-proxy/http-proxy/\#default-setup) By default installation proxy mode is set to `default`, thus Claudie utilizes the HTTP proxy when building a K8s cluster with at least one node from the Hetzner cloud provider. This means, that if you have a cluster with one master node in Azure and one worker node in AWS Claudie won't use the HTTP proxy to build the K8s cluster. However, if you add another worker node from Hetzner the whole process of building the K8s cluster will utilize the HTTP proxy. This approach was implemented to address the following issues: - [https://github.com/berops/claudie/issues/783](https://github.com/berops/claudie/issues/783) - [https://github.com/berops/claudie/issues/1272](https://github.com/berops/claudie/issues/1272) ## Further customization(https://docs.claudie.io/latest/http-proxy/http-proxy/\#further-customization) In case you don't want to utilize the HTTP proxy at all (even when there are nodes in the K8s cluster from the Hetzner cloud provider) you can turn off the installation proxy by setting the proxy mode to `off` in the InputManifest (see the example below). ``` kubernetes: clusters: - name: proxy-example version: "1.30.0" network: 192.168.2.0/24 installationProxy: mode: "off" ``` On the other hand, if you wish to use the HTTP proxy whenever building a K8s cluster (even when there aren't any nodes in the K8s cluster from the Hetzner cloud provider) you can set the proxy mode to `on` in the InputManifest (again, see the example below). ``` kubernetes: clusters: - name: proxy-example version: "1.30.0" network: 192.168.2.0/24 installationProxy: mode: "on" ``` If you want to utilize your own HTTP proxy you can set its URL in `endpoint` (see the example below). ``` kubernetes: clusters: - name: proxy-example version: "1.30.0" network: 192.168.2.0/24 installationProxy: mode: "on" endpoint: http://: ``` By default, `endpoint` value is set to `http://proxy.claudie.io:8880`. In case your HTTP proxy runs on `myproxy.com` and is exposed on port `3128` the `endpoint` has to be set to `http://myproxy.com:3128`. This means you always have to specify the whole URL with the protocol (HTTP or HTTPS), domain name, and port. The Claudie proxy strictly limits the endpoints it allows. By default, it only allows endpoints for commonly used package and container registries, in order to prevent HTTP 403 errors when setting up a cluster with nodes that may have misused IP addresses assigned. This may not suit your needs if you use/need private repositories for your deployments, however. There is an additional field called `noProxy` that allows you to specify any endpoints that should not be routed through the proxy. The most common scenario would be downloading container images from private registries. The below example allows to bypass the Proxy for any endpoint ending with `.suse.com`. ``` kubernetes: clusters: - name: proxy-example version: "1.30.0" network: 192.168.2.0/24 installationProxy: mode: "on" noProxy: ".suse.com" ``` We gradually expand the default NoProxy list, if you think there is a repository or container registry that should be added you can always let us know