Detailed guide¶

This detailed guide for Claudie serves as a resource for providing an overview of Claudie's features, installation instructions, customization options, and its role in provisioning and managing clusters. We'll start by guiding you through the process of setting up a management cluster, where Claudie will be installed, enabling you to effortlessly monitor and control clusters across multiple hyperscalers.

Documentation Conventions

Throughout this documentation, placeholders that require your own values are formatted as <placeholder>. Replace these with your actual values when using the commands or configurations.

Placeholder	Description
`<your-namespace>`	The Kubernetes namespace where you deploy resources
`<your-access-key>`	Your cloud provider access key or API token
`<your-secret-key>`	Your cloud provider secret key
`<your-domain>`	Your registered domain name for DNS configuration
`<your-cluster-name>`	The name you assign to your Kubernetes cluster

Tip!

Claudie offers extensive customization options for your Kubernetes cluster across multiple hyperscalers. This detailed guide assumes you have AWS and Hetzner accounts. You can customize your deployment across different supported providers. If you wish to use different providers, we recommend to follow this guide anyway and create your own input manifest file based on the provided example. Refer to the supported provider table for the input manifest configuration of each provider.

Supported providers¶

Supported Provider	Node Pools	DNS	DNS healthchecks	GPU	Spot
AWS
Azure
GCP
OCI
Exoscale			N/A		N/A
Hetzner			N/A		N/A
CloudRift		N/A	N/A		N/A
Verda		N/A	N/A
Cloudflare	N/A			N/A	N/A
OVHcloud			N/A		N/A
Openstack		N/A	N/A		N/A
On-Premises / Static nodes		N/A	N/A		N/A

Note: N/A indicates that the given feature is not applicable for the provider.

For adding support for other cloud providers or on-premises environments, open an issue or propose a PR.

Prerequisites¶

Install Kind by following the Kind documentation.
Install kubectl tool to communicate with your management cluster by following the Kubernetes documentation.
Install Kustomize by following Kustomize documentation.
Install Docker by following Docker documentation.

Claudie deployment¶

Create a Kind cluster where you will deploy Claudie, also referred to as the Management Cluster.
```
kind create cluster --name=claudie
```
Management cluster consideration.

We recommend using a non-ephemeral management cluster! Deleting the management cluster prevents autoscaling of Claudie node pools as well as loss of state! We recommended to use a managed Kubernetes offerings to ensure management cluster resiliency. Kind cluster is sufficient for this guide.
Check if have the correct current kubernetes context. The context should be kind-claudie.
```
kubectl config current-context
```

If context is not kind-claudie, switch to it:

kubectl config use-context kind-claudie

One of the prerequisites is cert-manager, deploy it with the following command:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.19.3/cert-manager.yaml

Download latest Claudie release:
```
wget https://github.com/berops/claudie/releases/latest/download/claudie.yaml
```
Tip!

For the initial attempt, it's highly recommended to enable debug logs, especially when creating a large cluster with DNS. This helps identify and resolve any permission issues that may occur across different hyperscalers. Locate ConfigMap with GOLANG_LOG variable in claudie.yaml file, and change GOLANG_LOG: info to GOLANG_LOG: debug to enable debug logging, for more customization refer to this table.

Deploy Claudie using Kustomize plugin:

kubectl apply -f claudie.yaml

Claudie Hardening

By default network policies are not included in claudie.yaml, instead they're provided as standalone to be deployed separately as the Management cluster to where Claudie is deployed may use different CNI plugin. You can deploy our predefined network policies to further harden claudie:

# for clusters using cilium as their CNI
kubectl apply -f https://github.com/berops/claudie/releases/latest/download/network-policy-cilium.yaml

# other
kubectl apply -f https://github.com/berops/claudie/releases/latest/download/network-policy.yaml

Claudie will be deployed into claudie namespace, you can view if all pods are running:

kubectl get pods -n claudie

NAME                                READY   STATUS              RESTARTS       AGE
ansibler-6bf78cccf4-pnxrk           1/1     Running             0              3m20s
claudie-operator-64c9554c66-rvtr5   1/1     Running             0              3m19s
kube-eleven-7bd47945c5-kbpd6        1/1     Running             0              3m19s
kuber-64554ffffc-fkdj6              1/1     Running             0              3m19s
make-bucket-job-4mxw7               0/1     Completed           0              3m19s
manager-7696cb7f9-jfbwq             1/1     Running             0              3m19s
minio-0                             1/1     Running             0              3m19s
minio-1                             1/1     Running             0              3m19s
minio-2                             1/1     Running             0              3m19s
minio-3                             1/1     Running             0              3m19s
mongodb-85487bf568-qjw2k            1/1     Running             0              3m19s
nack-644748c7b7-p6z62               1/1     Running             0              3m19s
nats-0                              2/2     Running             0              3m19s
nats-1                              2/2     Running             0              3m19s
nats-2                              2/2     Running             0              3m19s
terraformer-5868fb7695-w49sw        1/1     Running             0              3m19s

Changing the namespace

By default, Claudie will monitor all namespaces, and it will watch for Input Manifest and provider Secrets in the cluster. If you would like limit the namespaces to watch - overwrite CLAUDIE_NAMESPACES environment variable in claudie-operator deployment. Example:

env:
  - name: CLAUDIE_NAMESPACES
    value: "claudie,different-namespace"

Troubleshoot!

If you experience problems refer to our troubleshooting guide.

Let's create a AWS high availability cluster which we'll expand later on with Hetzner bursting capacity. Let's start by creating providers secrets for the infrastructure, and next we will reference them in inputmanifest-bursting.yaml.

# AWS provider requires the secrets to have fields: accesskey and secretkey
kubectl create secret generic aws-secret-1 --namespace=<your-namespace> --from-literal=accesskey='<your-access-key>' --from-literal=secretkey='<your-secret-key>'
kubectl create secret generic aws-secret-dns --namespace=<your-namespace> --from-literal=accesskey='<your-access-key>' --from-literal=secretkey='<your-secret-key>'

# inputmanifest-bursting.yaml

apiVersion: claudie.io/v1beta1
kind: InputManifest
metadata:
  name: cloud-bursting
  labels:
    app.kubernetes.io/part-of: claudie
spec:
  providers:
    - name: aws-1
      providerType: aws
      secretRef:
        name: aws-secret-1
        namespace: <your-namespace>
    - name: aws-dns
      providerType: aws
      secretRef:
        name: aws-secret-dns
        namespace: <your-namespace>    
  nodePools:
    dynamic:
      - name: aws-control
        providerSpec:
            name: aws-1
            region: eu-central-1
            zone: eu-central-1a
        count: 3
        serverType: t3.medium
        image: ami-0965bd5ba4d59211c
      - name: aws-worker
        providerSpec:
            name: aws-1
            region: eu-north-1
            zone: eu-north-1a
        count: 3
        serverType: t3.medium
        image: ami-03df6dea56f8aa618
        storageDiskSize: 200
      - name: aws-lb
        providerSpec:
            name: aws-1
            region: eu-central-2
            zone: eu-central-2a
        count: 2
        serverType: t3.small
        image: ami-0e4d1886bf4bb88d5
  kubernetes:
    clusters:
      - name: my-super-cluster
        version: v1.34.0
        network: 192.168.2.0/24
        pools:
            control:
            - aws-control
            compute:
            - aws-worker
  loadBalancers:
    roles:
      - name: apiserver
        protocol: tcp
        port: 6443
        targetPort: 6443
        targetPools:
            - aws-control
    clusters:
      - name: loadbalance-me
        roles:
            - apiserver
        dns:
            dnsZone: <your-domain> # hosted zone domain name where claudie creates dns records for this cluster
            provider: aws-dns
            hostname: supercluster # the sub domain of the new cluster
        targetedK8s: my-super-cluster
        pools:
            - aws-lb

Tip!

In this example, two AWS providers are used — one with access to compute resources and the other with access to DNS. However, it is possible to use a single AWS provider with permissions for both services.

Apply the InputManifest crd with your cluster configuration file:
```
kubectl apply -f ./inputmanifest-bursting.yaml
```
Tip!

InputManifests serve as a single source of truth for both Claudie and the user, which makes creating infrastructure via input manifests as infrastructure as a code and can be easily integrated into a GitOps workflow.

Errors in input manifest

Validation webhook will reject the InputManifest at this stage if it finds errors within the manifest. Refer to our API guide for details.

View logs from claudie-operator service to see the InputManifest reconcile process:

View the InputManifest state with kubectl

kubectl get inputmanifests.claudie.io cloud-bursting -o jsonpath={.status} | jq .

Here’s an example of .status fields in the InputManifest resource type:

  {
    "clusters": {
      "my-super-cluster": {
        "message": "creating cluster\n- Creating infrastructure for the new cluster\n  - Building desired state infrastructure",
        "phase": "Terraformer",
        "previous": [],
        "state": "IN_PROGRESS"
      }
    },
    "state": "IN_PROGRESS"
  }

Claudie architecture

Claudie utilizes multiple services for cluster provisioning, refer to our workflow documentation as to how it works under the hood.

Provisioning times may vary!

Please note that cluster creation time may vary due to provisioning capacity and machine provisioning times of selected hyperscalers.

After finishing, the InputManifest state reflects that the cluster is provisioned, the state WATCHING_FOR_CHANGES indicates that the changes were built and that the InputManifest sits idle until changes are detected.

  {
    "clusters": {
      "my-super-cluster": {
        "phase": "None",
        "previous": [
          {
            "stage": "ANSIBLER",
            "status": "DONE",
            "taskDescription": "creating cluster\n- Configuring cluster infrastructure\n  - Installing pre-requisites on all of the nodes of the cluster\n  - Installing Tee override for newly added nodes\n  - Setting up VPN across the nodes of the kuberentes and loadbalancer clusters\n  - Reconciling Envoy service across the loadbalancer nodes",
            "timestamp": "2026-03-23T10:19:22Z"
          },
          {
            "stage": "KUBE_ELEVEN",
            "status": "DONE",
            "taskDescription": "creating cluster\n- Reconciling kubernetes cluster\n  - Creating kubernetes cluster from the set up infrastructure",
            "timestamp": "2026-03-23T10:23:01Z"
          },
          {
            "stage": "KUBER",
            "status": "DONE",
            "taskDescription": "creating cluster\n- Configuring cluster\n  - Deploying kubelet csr-approver\n  - Patching nodes\n  - Deploying longhorn for storage\n  - Reconciling longhorn claudie storage classes\n  - Storing scrape config for loadbalancers",
            "timestamp": "2026-03-23T10:23:18Z"
          }
        ],
        "state": "DONE"
      }
    },
    "state": "WATCHING_FOR_CHANGES"
  }

Claudie creates kubeconfig secret in claudie namespace:

kubectl get secrets -n claudie -l claudie.io/output=kubeconfig

NAME                                  TYPE     DATA   AGE
my-super-cluster-6ktx6rb-kubeconfig   Opaque   1      134m

You can recover kubeconfig for your cluster with the following command:

kubectl get secrets -n claudie -l claudie.io/output=kubeconfig -o jsonpath='{.items[0].data.kubeconfig}' | base64 -d > my-super-cluster-kubeconfig.yaml

If you want to connect to your dynamic k8s nodes via SSH, you can recover private SSH key:

kubectl get secrets -n claudie -l claudie.io/output=metadata -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq '.dynamic_nodepools | map_values(.nodepool_private_key)'

To recover public IP of your dynamic k8s nodes to connect to via SSH:

kubectl get secrets -n claudie -l claudie.io/output=metadata -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq -r .dynamic_nodepools

In case you want to connect to your dynamic load balancer nodes via SSH, you can recover private SSH key:

kubectl get secrets -n claudie -l claudie.io/output=metadata -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq '.dynamic_load_balancer_nodepools | .[]'

To recover public IP addresses of your dynamic load balancer nodes to connect to via SSH:

kubectl get secrets -n claudie -l claudie.io/output=metadata -ojsonpath='{.items[0].data.metadata}' | base64 -d | jq -r '.dynamic_load_balancer_nodepools[]'

Each secret created by Claudie has following labels:

Key	Value
`claudie.io/project`	Name of the project.
`claudie.io/cluster`	Name of the cluster.
`claudie.io/cluster-id`	ID of the cluster.
`claudie.io/output`	Output type, either `kubeconfig` or `metadata`.

Use your new kubeconfig to see what’s in your new cluster

kubectl get pods -A --kubeconfig=my-super-cluster-kubeconfig.yaml

Let's add a bursting autoscaling node pool in Hetzner cloud. In order to use other hyperscalers, we'll need to add a new provider with appropriate credentials. First we will create a provider secret for Hetzner Cloud, then we open inputmanifest-bursting.yaml input manifest again and append the new Hetzner node pool configuration.

# Hetzner provider requires the secrets to have field: credentials
kubectl create secret generic hetzner-secret-1 --namespace=<your-namespace> --from-literal=credentials='<your-access-key>'

Claudie autoscaling

Autoscaler in Claudie is deployed in Claudie management cluster and provisions additional resources remotely at the time of need. For more information check out how Claudie autoscaling works.

# inputmanifest-bursting.yaml

apiVersion: claudie.io/v1beta1
kind: InputManifest
metadata:
  name: cloud-bursting
  labels:
    app.kubernetes.io/part-of: claudie
spec:
  providers:
    - name: hetzner-1         # newly added provider for cloud bursting.
      providerType: hetzner
      secretRef:
        name: hetzner-secret-1
        namespace: <your-namespace>        
  nodePools:
    dynamic:
      - name: hetzner-worker  # add under nodePools.dynamic section
        providerSpec:
            name: hetzner-1   # use your new hetzner provider hetzner-1 to create these nodes
            region: hel1
            zone: hel1-dc2
        serverType: cpx52
        image: ubuntu-22.04
        autoscaler:           # this node pool uses a claudie autoscaler instead of static count of nodes
            min: 1
            max: 10
    kubernetes:
      clusters:
      - name: my-super-cluster
        version: v1.34.0
        network: 192.168.2.0/24
        pools:
            control:
            - aws-control
            compute:
            - aws-worker
            - hetzner-worker  # add it to the compute list here
...

Update the crd with the new InputManifest to incorporate the desired changes.

Deleting existing secrets!

Deleting or replacing existing input manifest secrets triggers cluster deletion! To make changes to your existing clusters, generate a new secret value and re-apply it using the following command.
```
kubectl apply -f ./inputmanifest-bursting.yaml
```

You can also passthrough additional ports from load balancers to control plane and or worker node pools by adding additional roles under roles.

# inputmanifest-bursting.yaml

apiVersion: claudie.io/v1beta1
kind: InputManifest
metadata:
  name: cloud-bursting
  labels:
    app.kubernetes.io/part-of: claudie
spec:
  ...
  loadBalancers:
    roles:
      - name: apiserver
        protocol: tcp
        port: 6443
        targetPort: 6443
        targetPools: # only loadbalances for port 6443 for the aws-control nodepool
            - aws-control
      - name: https
        protocol: tcp
        port: 443
        targetPort: 443
        targetPools: # only loadbalances for port 443 for the aws-worker nodepool
            - aws-worker
            # possible to add other nodepools, hetzner-worker, for example
    clusters:
      - name: loadbalance-me
        roles:
            - apiserver
            - https # define it here
        dns:
            dnsZone: <your-domain>
            provider: aws-dns
            hostname: supercluster
        targetedK8s: my-super-cluster
        pools:
            - aws-lb

Load balancing

Please refer how our load balancing works by reading our documentation.

Update the InputManifest again with the new configuration.
```
kubectl apply -f ./inputmanifest-bursting.yaml
```
To delete the cluster just simply delete the secret and wait for Claudie to destroy it.
```
kubectl delete -f ./inputmanifest-bursting.yaml
```
Removing clusters

Deleting Claudie or the management cluster does not remove the Claudie managed clusters. Delete the secret first to initiate Claudie's deletion process.
After Claudie-operator finished deletion workflow delete minikube cluster
```
kind delete cluster
```

General tips¶

Control plane considerations¶

Single Control Plane Node: Node pool with one machine manages your cluster.
Multiple Control Plane Nodes: Control plane node pool that has more than one node.
- Load Balancer Requirement: A load balancer is optional for high availability setup, however we recommend it. Include an additional node pool for load balancers.
- DNS Requirement: If you want to use load balancing, you will need a registered domain name, and a hosted zone. Claudie creates a failover DNS record for the load balancer machines.
  - Supported DNS providers: If your DNS provider is not supported, delegate a subdomain to a supported DNS provider, refer to supported DNS providers.
- Egress Traffic: Hyperscalers charge for outbound data and multi-region infrastructure. To avoid egress traffic deploy control plane node pools in the same region to one hypoerscaler. If availability is more important than egress traffic costs, you can have multiple control plane node pools spanning across different hyperscalers.

Egress traffic¶

Hyperscalers charge for outbound data and multi-region infrastructure.

Control plane: To avoid egress traffic deploy control plane node pools in the same region to one hyperscaler. If availability is more important than egress traffic costs, you can have multiple control plane node pools spanning across different hyperscalers.
Workloads: Egress costs associated with workloads are more complicated as they depend on each use case. What we recommend it to try and use localised workloads where possible.

Example

Consider a scenario where you have a workload that involves processing extensive datasets from GCP storage using Claudie managed AWS GPU instances. To minimize egress network traffic costs, it is recommended to host the datasets in an S3 bucket and limit egress traffic from GCP and keep the workload localised.

On your own path¶

Once you've gained a comprehensive understanding of how Claudie operates through this guide, you can deploy it to a reliable management cluster, this could be a cluster that you already have. Tailor your input manifest file to suit your specific requirements and explore a detailed example showcasing providers, load balancing, and DNS records across various hyperscalers by visiting this comprehensive example.

Claudie customization¶

All of the customisable settings can be found in claudie/.env file.

Variable	Default	Type	Description
`GOLANG_LOG`	`info`	string	Log level for all services. Can be either `info` or `debug`.
`DATABASE_HOSTNAME`	`mongodb`	string	Database hostname used for Claudie configs.
`MANAGER_HOSTNAME`	`manager`	string	Manager service hostname.
`TERRAFORMER_HOSTNAME`	`terraformer`	string	Terraformer service hostname.
`ANSIBLER_HOSTNAME`	`ansibler`	string	Ansibler service hostname.
`KUBE_ELEVEN_HOSTNAME`	`kube-eleven`	string	Kube-eleven service hostname.
`KUBER_HOSTNAME`	`kuber`	string	Kuber service hostname.
`MINIO_HOSTNAME`	`minio`	string	MinIO hostname used for state files.
`AWS_REGION`	`local`	string	Region for MinIO.
`DATABASE_PORT`	27017	int	Port of the database service.
`TERRAFORMER_PORT`	50052	int	Port of the Terraformer service.
`ANSIBLER_PORT`	50053	int	Port of the Ansibler service.
`KUBE_ELEVEN_PORT`	50054	int	Port of the Kube-eleven service.
`MANAGER_PORT`	50055	int	Port of the MANAGER service.
`KUBER_PORT`	50057	int	Port of the Kuber service.
`MINIO_PORT`	9000	int	Port of the MinIO service.