Setting up Talos in HA Mode (Day 32)

Setting up a high-availability Kubernetes cluster with Talos

Setting up Talos in HA Mode (Day 32)
Photo by Egemen Şahin / Unsplash

I decided to migrate from kubeadm and ansible playbooks and switch to talos (mostly out of curiosity and it looks like an easier way to manage and do cluster upgrades)

Why Talos?

What makes Talos interesting:

  • Immutable infrastructure (no SSH, no shell)
  • API-driven configuration
  • Designed from the ground up for Kubernetes
  • Also I did say I would try it after this years KubeCon EU so...

Setting Up HA Control Plane

I didn't want to setup an external haproxy load balancer (though I plan to use opnsense instead, a bit different from my existing clusters), I defaulted to using talos's inbuilt VIP support.

Here's how I approached it:

First, create a controlplane patch file for configuration overrides:

machine:
  network:
    interfaces:
      - interface: enp6s18 # Use talosctl -n <IP> get links --insecure
        dhcp: true
        vip:
          ip: 10.30.30.135
cluster:
  apiServer:
    certSANs:
      - 10.30.30.135
      - 10.30.30.131
      - 10.30.30.132
      - 10.30.30.133
    admissionControl:
      - name: PodSecurity
        configuration:
          defaults:
            audit: privileged
            audit-version: latest
            enforce: privileged
            enforce-version: latest
            warn: privileged
            warn-version: latest
  network:
    cni:
      name: none
    podSubnets:
      - 10.244.0.0/16
    serviceSubnets:
      - 10.96.0.0/16
  proxy:
    disabled: true

The patch disables the CNI and kubeproxy as I plan to use Cilium as a replacement for these two later.

Configuration Generation

Generate configs for your HA setup with the VIP:

talosctl gen config daedalus https://10.30.30.135:6443 \ # Use the VIP
  --output-dir _out \
  --with-cluster-discovery \
  --config-patch-control-plane @controlplane.yaml \
  --config-patch-worker @worker.yaml # If you have worker patches apply them too

Applying Configurations

Apply to control plane nodes:

talosctl apply-config --insecure --nodes 10.30.30.131 --file _out/controlplane.yaml
talosctl apply-config --insecure --nodes 10.30.30.132 --file _out/controlplane.yaml
talosctl apply-config --insecure --nodes 10.30.30.133 --file _out/controlplane.yaml

Apply to worker nodes:

talosctl apply-config --insecure --nodes 10.30.30.134 --file _out/worker.yaml

After applying the config, the nodes reboot, wait for the reboot and do a bootstrap on one of the controlplane nodes.

Bootstrapping

After Talos installs, and reboots run:

export TALOSCONFIG=$(pwd)/_out/talosconfig
talosctl config endpoint 10.30.30.131 10.30.30.132 10.30.30.133
talosctl config node 10.30.30.131
talosctl bootstrap

Health Check and Kubeconfig

Check cluster health:

talosctl health

This command might stall at waiting for all k8s nodes to report ready if you set CNI to none in your config.

As long as the kubelet, apiserver, controller-manager, and scheduler are ready, you can proceed to install a CNI plugin, I went with Cilium as always.

Generate kubeconfig:

talosctl kubeconfig --nodes 10.30.30.131 --endpoints 10.30.30.135 -f
talosctl config endpoint 10.30.30.135

Automating

I created an Ansible playbook to automate this entire process, but I just found there's a terraform provider for talos, so I may be switching to that instead.

UPDATE: While switching I instead ended up with makefiles, the amount recreate i was doing needed something to just run all the terragrunt, helmfile etc commands.

First Impressions

The biggest challenge was understanding the bootstrapping process and how the VIP gets managed, but once configured, I pointed the deployed Argo instance and had my deployments up and running.