<![CDATA[Nomadin]]>https://mrdvince.me/https://mrdvince.me/favicon.pngNomadinhttps://mrdvince.me/Ghost 5.107Tue, 04 Mar 2025 18:45:34 GMT60<![CDATA[Installing CrowdSec on OPNsense (Day 27)]]>CrowdSec is a security tool that detects and blocks malicious IPs using a collaborative approach to share threat intelligence across users.

I initially planned to run CrowdSec just on Traefik, but having it at the firewall level provides more protection for all devices on the network.

Installation

CrowdSec has a

]]>
https://mrdvince.me/installing-crowdsec-on-opnsense-day-27-2/67c740d9fbf3350001fdb1a8Sun, 02 Mar 2025 18:45:00 GMT

CrowdSec is a security tool that detects and blocks malicious IPs using a collaborative approach to share threat intelligence across users.

I initially planned to run CrowdSec just on Traefik, but having it at the firewall level provides more protection for all devices on the network.

Installation

CrowdSec has a convenient plugin for OPNsense that makes installation straightforward:

  1. Navigate to System → Firmware → Plugins
  2. Search for and install os-crowdsec

Configuration

Once installed, you'll find CrowdSec under the Services tab:

  1. Go to Services → CrowdSec → Settings
  2. Enable the following options:
    1. Enable Log Processor (IDS) - This is the detection component
    2. Enable LAPI - Unless you're connecting to LAPI on another machine
    3. Enable Remediation Component (IPS) - This actively blocks detected threats
    4. Enable log for rules - Optional, but useful for troubleshooting

Rules

By default, CrowdSec creates floating rules to block incoming connections from malicious IP addresses.

However, we can use the automatically created crowdsec_blacklists and crowdsec6_blacklists aliases to create custom floating rules that block all outgoing connections to malicious IPs.

This is useful in case a device on the network is already compromised and tries to connect back to the IP blocklisted.

Testing the Setup

To verify that CrowdSec is working properly, you can temporarily ban an IP address:

cscli decisions add -t ban -d 1m -i <IP address>

This will ban the specified IP for one minute.

If you use your own IP, expect your connection to freeze, confirming that the ban is working.

To view active decisions (bans):

cscli decisions list

Todo

CrowdSec also has a Prometheus endpoint for metrics collection, so will look into integrating with Grafana for visualization.

]]>
<![CDATA[Secure Secret Management with SOPS, Age, and Bitwarden (Day 26)]]>https://mrdvince.me/secure-secret-management-with-sops-age-and-bitwarden-day-26/67c1cd75fbf3350001fdb140Fri, 28 Feb 2025 21:18:00 GMT

When working with infrastructure as code and Kubernetes, you inevitably face the challenge of managing secrets securely.

API tokens, and other sensitive information shouldn't be stored in plain text in your Git repositories, but they still need to be accessible for deployments.

Enter SOPS and Age

SOPS (Secrets OPerationS) is a powerful tool that supports multiple encryption providers including AWS KMS, GCP KMS, Azure Key Vault, age, and PGP.

But for those of us without cloud provider resources, age offers a lightweight, modern alternative for encryption.

Key Management with Bitwarden

The first question with age is: where do you store your keys securely, especially when you might need to access them across multiple machines? or what happens you reset or loose your machine and loose the keys.

I went looking for options and found Bitwarden Secrets Manager, it offers an elegant solution to securely store cryptographic keys and access them, if you're also already using Bitwarden for password management then why not try it.

Setting Up the Infrastructure

Install the Required Tools

First, install age and SOPS:

# Install age and SOPS (commands will vary by OS)
# on mac you can use brew

Generate Your Age Key

age-keygen -o key.txt

The generated file contains two important pieces:

  • A public key (starts with age1...)
  • A secret key (starts with AGE-SECRET-KEY-...)

Store Keys in Bitwarden Secrets Manager

Follow the Bitwarden Secrets Manager guide to set up your account and store both keys.

Use the Bitwarden CLI

Install the Bitwarden Secrets CLI (bws) and set up your access token:

export BWS_ACCESS_TOKEN=<your token>

Load keys as env variables

Add this function to your shell profile (e.g., .zshrc) to easily load keys when needed:

load_age_secrets() {
    export SOPS_AGE_KEY=$(bws secret get <secret id> | jq .value | xargs)
    export AGE_PUBLIC_KEY=$(bws secret get <secret id> | jq ".value" | xargs)
    echo "export SOPS_AGE_KEY='$SOPS_AGE_KEY'" > /tmp/.secrets_exports
    echo "export AGE_PUBLIC_KEY='$AGE_PUBLIC_KEY'" >> /tmp/.secrets_exports
    echo "Secrets loaded"
}

# this loads the keys as env variables if the values exist
[[ -f /tmp/.secrets_exports ]] && source /tmp/.secrets_exports 

the <secret id> is the secret uuid that can be found by running bws secret list.

Terragrunt and Proxmox

An example of using encrypted secrets with Terragrunt for infra provisioning (in this case, for Proxmox):

  1. Create a YAML file with your secrets
# auth.yaml
proxmox_token: "your_super_secret_token_that_no_one_should_know"
proxmox_user_id: "your_user_id_which_we_also_encrypted_because_why_not"

Using YAML because we will use Terragrunt's yamldecode function for parsing decrypted secret later.

  1. Encrypt it
sops --encrypt --age $AGE_PUBLIC_KEY --in-place auth.yaml

--in-place overwrites the file with a new encrypted file.

  1. Reference it in your Terragrunt file
secret_vars = yamldecode(sops_decrypt_file(find_in_parent_folders("sample.yaml")))

# ...

pm_api_token_secret = local.secret_vars.proxmox_token
pm_api_token_id = local.secret_vars.proxmox_user_id
  1. Use the variables in your provider block
generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "proxmox" {
  pm_api_url          = "${local.pm_api_url}"
  pm_api_token_id     = "${local.pm_api_token_id}"
  pm_api_token_secret = "${local.pm_api_token_secret}"
  pm_tls_insecure     = false
  pm_parallel         = 10
}
EOF
}

Terragrunt commands should work as usual i.e e.g when running terragrunt apply the secret gets decrypted and used to authenticate with the proxmox API.

Kubernetes Secrets

You can also encrypt Kubernetes secrets

  1. Create a secret file
# secret.yaml
apiVersion: v1
data:
  key1: c3VwZXJzZWNyZXQ=
  key2: dG9wc2VjcmV0
kind: Secret
metadata:
  name: my-secret
  1. Encrypt it, specifying only certain fields to encrypt
sops --encrypt --age $AGE_PUBLIC_KEY --encrypted-regex '^(data|stringData)$' secret.yaml

To apply it you can pipe the decrypted output to kubectl e.g

sops --decrypt --encrypted-regex '^(data|stringData)$' sample.yaml | k apply -f -

Benefits of This Approach

  • Security: Secrets never appear in plain text in your repositories
  • Convenience: Access to secrets across multiple machines or if something happens to the data locally
  • Auditability: Changes to encrypted files are tracked in git

So this setup gives you a nice foundation for secret management.

]]>
<![CDATA[ZFS RAIDZ VDEV Extension (Day 25)]]>Got a new drive to add to my storage pool and TrueNAS Scale now supports RAIDZ VDEV extension. Which is a relatively new feature, introduced in TrueNAS 24.10 (Electric Eel).

Steps:

  1. Navigate to the storage pool
  2. Select "Manage Devices"
  3. Choose the VDEV you want to extend
  4. Click
]]>
https://mrdvince.me/zfs-raidz-vdev-extension-day-25-2/67bf6075fbf3350001fdb107Wed, 26 Feb 2025 19:31:14 GMT

Got a new drive to add to my storage pool and TrueNAS Scale now supports RAIDZ VDEV extension. Which is a relatively new feature, introduced in TrueNAS 24.10 (Electric Eel).

Steps:

  1. Navigate to the storage pool
  2. Select "Manage Devices"
  3. Choose the VDEV you want to extend
  4. Click "Extend"
  5. Select the new drive
  6. Confirm and wait

See the docs here, the plus side of this process is that your NAS remains fully functional during the extension.

You can continue using all services while the VDEV rebuilds in the background.

Important Notes on Capacity

There's an interesting caveat with the extension process:

The expanded vdev uses the pre-expanded parity ratio, which reduces the total vdev capacity. To reset the vdev parity ratio and fully use the new capacity, manually rewrite all data in the vdev. This process takes time and is irreversible.

In practical terms, this means you won't immediately get the full theoretical capacity increase. The system recovers this "lost headroom" over time as data naturally gets modified or deleted. From the TrueNAS docs:

Extended VDEVs recover lost headroom as existing data is read and rewritten to the new parity ratio. This can occur naturally over the lifetime of the pool as you modify or delete data. To manually recover capacity, simply replicate and rewrite the data to the extended pool.

However, scripts like this exits for ZFS balancing can be done using e,g the linked one does it in place.

For those wanting to calculate potential capacity gains, TrueNAS provides a handy Extension Calculator.

Performance Impact

The extension process is fairly time-consuming - my current extension has been running for over 5 hours.

However, the entire system remains usable during this time, with all services continuing to function (that said there's not much load on the NAS, can't say the same if it was under heavy use)

]]>
<![CDATA[Sharing Services Across Tailnets (Day 24)]]>https://mrdvince.me/sharing-services-across-tailnets-day-24/67b82db17d1a4f0001ddb72aThu, 20 Feb 2025 04:39:00 GMT

So I needed to share some services with friends outside my tailnet, but:

  • Didn't want to add users directly to my tailnet
  • Preferred not to add public records to Cloudflare (keep using my self-hosted DNS)

The Setup

The current infrastructure setup includes:

  • Traefik as the root reverse proxy in Kubernetes
  • A LoadBalancer service (traefik's) using a private, non-tailscale-routable IP
  • Self-hosted DNS server

The Solution

After doing some research and trials, the approach I went with was to expose the Traefik LoadBalancer service directly to Tailscale using their Kubernetes operator.

And tailscale does include a blog post on how to do this, I recommend checking it out.

1. Installing the Operator

Important: Create an OAuth client in the Tailscale console with Devices Core and Auth Keys write scopes first, (see the full post here).

Install the Tailscale operator using Helm:

  1. Add https://pkgs.tailscale.com/helmcharts to your local Helm repositories:
helm repo add tailscale https://pkgs.tailscale.com/helmcharts
  1. Update your local Helm cache:
helm repo update
  1. Install the operator passing the OAuth client credentials:
helm upgrade \
  --install \
  tailscale-operator \
  tailscale/tailscale-operator \
  --namespace=tailscale \
  --create-namespace \
  --set-string oauth.clientId="<client_id>" \
  --set-string oauth.clientSecret="<client_secret>" \
  --wait
Can be added to be part of helmfile template or argo deployment

2. Exposing the Service

With the operator running, exposing a service is as simple as adding an annotation:

annotations:
  tailscale.com/expose: "true"

3. Sharing Access

From the Tailscale console:

  1. Share both the Traefik node and DNS server
  2. Users can then set their Tailscale DNS to use the shared DNS server
  3. For more granular control, users can use Tailscale's split DNS to route only specific domains

4. ACL Configuration

One useful thing is to set up ACLs to restrict what autogroup:shared and specific tags (the operator is a tagged device) can access.

This ensures users only have access to the services you explicitly want to share.

So

The benefits are essentially:

  • Users don't need direct access to your tailnet
  • DNS resolution works seamlessly (I deployed a new separate DNS server with a limited amount of records just for tailnet shares)
  • No need for public DNS records
  • Fine-grained access control through Tailscale ACLs
  • Services remain secure behind Tailscale's encryption
]]>
<![CDATA[Navidrome on TrueNAS Scale (Day 23)]]>https://mrdvince.me/navidrome-on-truenas-scale-day-23/67b15f497d1a4f0001ddb6e0Sun, 16 Feb 2025 07:10:44 GMT

I found some old hard drives from my campus days (surprisingly still working) with a bunch of songs. Rather than letting these sit idle, figured it was time to make this collection accessible on the go.

So I went looking for something I could use and found Navidrome.

Setting Up TrueNAS Datasets

First, create two datasets through the TrueNAS GUI (I prefer this over regular folders for better permission control):

navidrome
├── data
└── music

Installing Navidrome

TrueNAS Scale's ElectricEel release moved to Docker for apps (instead of Kubernetes)

Install:

  1. Create the necessary datasets
  2. Install Navidrome from the apps catalog
  3. Configure:
    1. Set environment variables if needed
    2. Point to your data and music folders
    3. Set user/group IDs
    4. Configure resource

Adding Traefik Routing

This part assumes you already have Traefik set up as a reverse proxy with cert-manager and HTTPS redirect middleware configured.

If you're starting fresh, you'll want to get those pieces in place first.

Once Navidrome is running, we need to make it accessible through a reverse proxy. This requires two pieces:

  • an external service
  • ingress route

An external service definition:

apiVersion: v1
kind: Service
metadata:
  name: navidrome
  namespace: routes
spec:
  ports:
    - port: <port>
      targetPort: <port>
  type: ExternalName
  externalName: <Ip>

HTTP redirect route:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: navidrome-redirect
  namespace: routes
spec:
  entryPoints:
    - web
  routes:
    - match: Host(`<host>`)
      kind: Rule
      middlewares:
        - name: https-redirect
      services:
        - name: noop@internal
          kind: TraefikService

And the actual route over HTTPS:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: navidrome
  namespace: routes
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`<host>`)
      kind: Rule
      services:
        - name: navidrome
          port: <port>
          scheme: http
  tls:
    secretName: <ssl cert secret>

Beyond the Web UI

While Navidrome's web interface is solid, one of its strengths is Subsonic API compatibility.

This means you can use various Subsonic-compatible apps as front-ends for your music collection.

I chose to use Symfonium as my client of choice and it's been impressive.

Went for a photo walk and with gapless playback and a smart queue that keeps playing similar tracks (like Spotify's song radio'ish), it basically works, I forgot it was a self hosted thing. Also thanks to Tailscale, I can stream my music anywhere without noticing any difference from other services.

Now this isn't a replacement and I will still most definitely keep my spotify playlists
]]>
<![CDATA[K3s upgrade to 1.31.5 (Day 22)]]>https://mrdvince.me/k3s-upgrade-to-1-31-5-2/67b159b17d1a4f0001ddb6bcThu, 13 Feb 2025 03:21:00 GMT

Upgrading K3s is remarkably straightforward. You just use the same install command you used when first creating your cluster for me that's:

NB: This is a single node k3s cluster
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_EXEC='--disable=traefik,disable-kube-proxy,disable-network-policy --flannel-backend=none --write-kubeconfig-mode=644 --etcd-expose-metrics true' sh -

About Those Flags

Looking at the install command, you might notice several flags:

  • --disable=traefik: Disabled because I'm running my own managed version of Traefik
  • --disable-kube-proxy,--flannel-backend=none: Both disabled as Cilium handles these functions (CNI and service networking)
  • --write-kubeconfig-mode=644: Sets readable permissions on the kubeconfig file right from the start
  • --etcd-expose-metrics true: Exposes etcd metrics.

In my case, the output showed:

[INFO]  Finding release for channel stable
[INFO]  Using v1.31.5+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.31.5+k3s1/sha256sum-amd64.txt
[INFO]  Skipping binary downloaded, installed k3s matches hash
[INFO]  Skipping installation of SELinux RPM
...
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
...

Verifying the Upgrade

After the upgrade, your workloads should continue running without interruption. You can verify the new version with:

k3s -version

That's really all there is to it - K3s keeps things refreshingly simple.

]]>
<![CDATA[Traefik Timeouts and Immich (Day 21)]]>I have been setting up Immich, i.e destroying and recreating it until I got my photo organization and volumes just right.

The next step was to do a mass import after testing, the CLI tool makes mass imports pretty straightforward, but I kept having an issue with certain files

]]>
https://mrdvince.me/traefik-timeouts-and-immich/67aba2a57d1a4f0001ddb673Tue, 11 Feb 2025 20:25:45 GMT

I have been setting up Immich, i.e destroying and recreating it until I got my photo organization and volumes just right.

The next step was to do a mass import after testing, the CLI tool makes mass imports pretty straightforward, but I kept having an issue with certain files failing to import especially when trying to import or upload large files (like 2GB).

I would generally get a very generic upload failed

Everything was working fine for regular photos, and my phone syncing without issues.

Did some digging and I found that the issue was to do with Traefik time outs, see the traefik page here

The Fix

So I adjusted the timeouts in the config i.e.:

ports:
  web:
    redirections:
..............
  websecure:
    tls:
      enabled: true
    transport:
      respondingTimeouts:
        readTimeout: 20m
        writeTimeout: 20m    # 20 minutes - adjust based on your needs

And while at it updated traefik to v3.3.3 (helm chart 34.3.0) and changed from redirectTo to redirections (diff):

ports:
  web:
-    redirectTo:
-      port: websecure
+    redirections:
+      entryPoint:
+        to: websecure
+        scheme: https
+        permanent: true

I now got the large file uploads peachy.

PPS: Planning to move the days of homelab to a dedicated page and reduce the amount of logs on the homepage
]]>
<![CDATA[Bufferbloat (Day 20)]]>While browsing and editing large RAW photos over SMB, I noticed some high kind of latency and got asking if this could be reduced.

Some research later i found something called bufferbloat.

What's Bufferbloat?

I found this analogy of bufferbloat from Waveform.com that's worth sharing

]]>
https://mrdvince.me/bufferbloat-day-20/67a75dc57d1a4f0001ddb648Sat, 08 Feb 2025 20:27:11 GMT

While browsing and editing large RAW photos over SMB, I noticed some high kind of latency and got asking if this could be reduced.

Some research later i found something called bufferbloat.

What's Bufferbloat?

I found this analogy of bufferbloat from Waveform.com that's worth sharing here:

Think of your internet connection like a sink with a narrow drain (your bandwidth limit). When someone downloads a large file, it's like dumping a bucket of water into the sink. Now if you try to do something time-sensitive - like gaming or a video call - those packets are like drops of oil trying to get through a sink full of water. They have to wait for all that "water" to drain first, causing lag and delays. That's bufferbloat.

Check out Waveform's bufferbloat test tool, and a more detailed ELI5 explanation.

In OPNsense, we can address this using traffic shaping - setting up pipes and queues with FlowQueue-CoDel, it ensures that packets from small flows are sent in a timely fashion, while large flows share the bottleneck’s capacity.

Preliminaries

I initially found guides for pfSense, but OPNsense has its own really nice guide on how to address bufferbloat here

Before messing around with creating pipes, queues, and rules, it's advisable to ran some tests to establish a baseline.

Before Optimization

My initial bufferbloat grade was a B, with some concerning latency spikes:

  • Download speed: 712.9 Mbps
  • Upload speed: 654.1 Mbps
  • Latency under load: +28ms download, +48ms upload

as seen in the screenshot below:

Bufferbloat (Day 20)

After Optimization

After setting up and tuning the traffic shaping rules:

  • Download speed: 588.8 Mbps
  • Upload speed: 516.9 Mbps
  • Latency under load: +12ms download, +14ms upload
  • Bufferbloat grade improved to A

as seen in the screenshot below:

Bufferbloat (Day 20)

So traded some raw speed for consistency.

Internal Network Optimization

After seeing improvements on the WAN side, I got more specific with my internal network.

I set it up for:

  • SMB traffic (port 445) where I needed lower latency for raw files
  • Specific devices on certain VLANs that needed more controlled latency

I really just needed it for low latency especially when editing raw files attached on over smb and it did help, not a drastic difference but something noticeable, and so I disabled the WAN side optimizations.

]]>
<![CDATA[Machine and Container updates (Day 19)]]>Today was all about updates.

What started as just routine maintenance turned into a reminder of why we keep backups (and backups of backups).

The Update Plan

  • Update Proxmox nodes
  • Update VMs
  • Update OPNsense to version 25

Things Go Sideways

After the updates and reboots, OPNsense decided to forget about

]]>
https://mrdvince.me/machine-and-container-updates-day-20-2/67a309d1d29f65000181a831Wed, 05 Feb 2025 20:50:30 GMT

Today was all about updates.

What started as just routine maintenance turned into a reminder of why we keep backups (and backups of backups).

The Update Plan

  • Update Proxmox nodes
  • Update VMs
  • Update OPNsense to version 25

Things Go Sideways

After the updates and reboots, OPNsense decided to forget about its VLANs and misconfigure WAN and LAN interfaces.

This cascaded into:

  • Everything losing connectivity
  • DNS becoming unreachable
  • General network chaos

Recovery Process

  1. Direct connection to Proxmox node (thank goodness for out-of-band management)
  2. Tried the built-in backup list - no luck
  3. Remembered the lesson from the last time i had to do a reinstall at 1.20am: keep config backups locally
  4. Reset OPNsense, restored from local backup
  5. Fixed an interface mismatch
  6. Network starts coming back to life

DNS

Everything seemed fixed until I noticed I still had no internet.

OPNsense looked good, but the DNS server was unreachable despite appearing online and healthy. So basically "everything's fine but nothing works."

After some troubleshooting and replacing the VMs NIC and re-assigning it the same static ip on OPNSense the node was now reachable and my DNS working.

Most services recovered quickly once DNS and OPNsense were back, though TrueNAS took its time and couldn't update catalogs so added in a Quad 9 as a fallback for next time (because there probably will)

So

  1. Keep multiple backups in different locations (the built-in backups aren't always enough)
  2. Added Quad9 to some nodes like trunas as a fallback DNS for future resilience
  3. When debugging network issues, don't trust what "looks fine" - verify connectivity layer by layer
  4. Updates, while necessary, can turn out to be well ....

And the Kubernetes clusters (both k3s and the HA one) - everything just came back online like nothing had happened, without needing to touch a single node (including the 2 haproxy nodes etc).

At least now I have a fallback plan for DNS issues, and another validation of why Kubernetes is great for self-healing infrastructure.
]]>
<![CDATA[Homepage Dashboard (Day 18)]]>Spent today setting up Homepage to organize access to all the URLs I currently have.

Current dashboard

Basic Configuration

The base setup is defined in the settings.yaml file and looks like this:

title: <Your title name>
theme: dark
color: slate
background:
  image: <image>
  blur: sm
]]>
https://mrdvince.me/homepage-dashboard-day-18/679fa627d29f65000181a7ecSun, 02 Feb 2025 18:37:18 GMT

Spent today setting up Homepage to organize access to all the URLs I currently have.

Current dashboard
Homepage Dashboard (Day 18)

Basic Configuration

The base setup is defined in the settings.yaml file and looks like this:

title: <Your title name>
theme: dark
color: slate
background:
  image: <image>
  blur: sm
  saturate: 50
  brightness: 50
  opacity: 50
  ....
<rest of config including layouts>

Service Organization

Services are grouped logically, making it easy to find what you need.

For icons, you can use:

When using material icons or simple icons prefix the icon with e.g mdi-<icon-name> or si-<icon-name>

An example definition in the services.yaml

- Hypervisor:
      - Avalon:
            icon: proxmox.svg
            href: <proxmox url>
            description: Main Compute Node
      - Aegis:
            icon: proxmox.svg
            href: <proxmox url>
            description: Network Node

- Network:
      - Vale:
            icon: opnsense.svg
            href: <url>
            description: OPNsense
      - DNS:
            icon: adguard.svg
            href: <url>
            description: DNS Management

The page auto-refreshes as you edit the config, you see the changes in real-time.

Widgets and Extras

You can also add widgets. e.g showing the time:

- datetime:
    text_size: xl
    format:
      timeStyle: short

And bookmarks for quick access:

- Blogs:
    - Local Blog:
        - abbr: Blog
          href: <your blog url>

I still need to:

  • Connect and collect metrics
  • And add more detailed service information

For now, it's a functional start that makes navigating between services easier, and the cool thing is you can set it as the default browser landing page.

]]>
<![CDATA[Minio (S3-Compatible Storage) and Terragrunt (Day 17)]]>With Minio running on our new storage, we've now got S3-compatible storage right. First use case: moving VM state off the local filesystem.

Setup and Configuration

Access Keys

First, head to Minio and create an access key - you'll need both the Access Key and Secret

]]>
https://mrdvince.me/minio-s3-compatible-storage-and-terragrunt-day-17/679a7c77d29f65000181a78fWed, 29 Jan 2025 20:28:00 GMT

With Minio running on our new storage, we've now got S3-compatible storage right. First use case: moving VM state off the local filesystem.

Setup and Configuration

Access Keys

First, head to Minio and create an access key - you'll need both the Access Key and Secret Key for the next steps.

Download the file or copy the created keys (once you close the popup the you can no longer copy the keys).

Minio (S3-Compatible Storage) and Terragrunt (Day 17)

AWS CLI Configuration

After installing the AWS CLI (see installing guide here) we need to configure it:

aws configure
AWS Access Key ID [None]: <Key ID copied from minio>
AWS Secret Access Key [None]: <Secret key copied from minio>
Default region name [None]: eu-west-1
Default output format [None]:

# Set signature version
aws configure set default.s3.signature_version s3v4

Add your Minio endpoint to the config (if you are using a reverse proxy make sure the URL is pointing to port 9000, not the GUI port):

# ~/.aws/config
[default]
region = eu-west-1
endpoint_url=https://minio-s3.<your domain name> <- Add this line
s3 =
    signature_version = s3v4

Quick Test

Let's make sure everything works:

# Create a bucket
aws s3 mb s3://test-minio-bucket
make_bucket: test-minio-bucket

# List buckets
aws s3 ls
2025-01-29 20:33:06 test-minio-bucket

# Remove the test bucket
aws s3 rb s3://test-minio-bucket
remove_bucket: test-minio-bucket

Terragrunt Integration

To use this with Terragrunt, you'll need to add some config to the remote_state block. (See example repo here).

Since we're using Minio rather than actual S3, we need to disable several S3-specific features - otherwise Terragrunt will try to do S3 specific modifications to the bucket settings:

# remote_state block
remote_state {
  backend = "s3"
  config = {
    bucket                             = "terragrunt-state"
    key                                = "${path_relative_to_include()}/baldr.tfstate"
    region                             = local.aws_region
    endpoint                           = local.environment_vars.locals.endpoint_url
    # Skip various S3 features we don't need
    skip_bucket_ssencryption           = true
    skip_bucket_public_access_blocking = true
    skip_bucket_enforced_tls           = true
    skip_bucket_root_access            = true
    skip_credentials_validation        = true
    force_path_style                   = true
  }
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
}

Note: The endpoint needs to be set in the Terragrunt config even if it's in the AWS config file - I found Terragrunt doesn't pick it up from there.

]]>
<![CDATA[Storage (RAIDZ1) continued....(Day 16)]]>This is a continuation of this post, the final piece arrived today - another 12TB drive to complete the storage setup.

Storage Configuration

Created a new pool using RAIDZ1 with a 256GB SSD as ZFS L2ARC read-cache. Unlike the previous mirror setup from my testing, RAIDZ1 gives me more usable

]]>
https://mrdvince.me/storage-continued-day-16/6797f002d29f65000181a75eMon, 27 Jan 2025 21:04:45 GMT

This is a continuation of this post, the final piece arrived today - another 12TB drive to complete the storage setup.

Storage Configuration

Created a new pool using RAIDZ1 with a 256GB SSD as ZFS L2ARC read-cache. Unlike the previous mirror setup from my testing, RAIDZ1 gives me more usable space while still protecting against a single drive failure.

Dataset Organization

Made some datasets:

  • backups: For system and VM backups
  • k8s: Kubernetes persistent storage
  • iscsi: Testing ground for VM storage over iSCSI

I will create more datasets as the need arises.

Quality of Life Improvements

Swapped out the built-in fan in the drive enclosure with a Noctua - because if you're running 24/7 storage, noise matters. The difference is noticeable (Noctua fans are awesome).

Next Steps

Starting the Minio setup - finally getting back to what started this whole thing.

]]>
<![CDATA[Storage (TrueNas, and USB controllers) (Day 13-15)]]>https://mrdvince.me/storage-truenas-and-usb-controllers-day-13-15/67954cb8d29f65000181a657Sun, 26 Jan 2025 22:27:00 GMT

This started as a simple "let's set up a container registry".

Instead, it turned into a deep dive into USB architecture, storage performance, and a lesson in why not all USB controllers are created equal, that lasted a whole weekend.

The Registry

I needed to set up a container registry, which meant thinking about storage. The options were:

  • Use storage on the K8s VMs and let Longhorn handle replication
  • Go with external storage

The registry actually supports using S3 as a backend for image storage. But well cloud providers aren't in the budget, so the next best thing is Minio. I had recently got my hands on two 12TB drives (waiting on a third for RAIDZ1), so this seemed like a perfect use case.

But there was a catch - I'm running mini PCs. No fancy drive bays, or PCIE expansion slots etc, just USB. And this marks the beginning of a very long rabbit hole learning about USB and PCIE devices and things i had no idea about.

UASP

The first bit of good news was that my external enclosure supports USB 3.2 Gen 2 and UASP (USB Attached SCSI Protocol). If you are like me you had no idea what UASP was:

  • It's basically a protocol that gets you closer to native SCSI/SATA performance.
  • And supports parallel commands and better command queuing, making it behave more like direct disk access.

You can see this in with lsusb -t:

/:  Bus 10.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 10000M
    | Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M
        | Port 1: Dev 3, If 0, Class=Mass Storage, Driver=uas, 10000M
        | Port 2: Dev 4, If 0, Class=Mass Storage, Driver=uas, 10000M
        | Port 4: Dev 5, If 0, Class=Mass Storage, Driver=uas, 10000M

See that Driver=uas?.

USB Controllers

Digging deeper into the USB setup with lspci -k | grep -i usbwe see:

06:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #3
06:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #4
07:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #8
07:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #5
07:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #6

Okay so I have USB 4 ports, cool, but little did I know these weren't all created equal...

TrueNAS

Getting TrueNAS set up was straightforward enough - grab ISO, upload to Proxmox, create VM (using q35, UEFI, disabled memory ballooning), install.

Now the "fun" started when trying to get the drives to the VM.

Attempt 1: The PCIe passthrough

First attempt: pass through the USB controller as a PCIe device.

So looking at our lsusb -t command we see we see we have:

  • Root hub (Bus 10)
  • A physical USB hub connected to it
  • Three storage devices through that hub

To find which PCI device controls a USB bus we do a readlink /sys/bus/usb/devices/usb10 which shows:

../../../devices/pci0000:00/0000:00:08.3/0000:07:00.4/usb10

This path shows:

  • USB Bus 10
  • Controlled by PCI device 07:00.4

IOMMU Groups

IOMMU groups show which devices can be passed through independently.

So checking IOMMU groups with find /sys/kernel/iommu_groups/ -type l | grep "0000:07:00":

/sys/kernel/iommu_groups/26/devices/0000:07:00.0
/sys/kernel/iommu_groups/27/devices/0000:07:00.3
/sys/kernel/iommu_groups/28/devices/0000:07:00.4

So if I understand this correctly it means each device in its own group can be passed through independently.

I passed through the device in group 28 with "all functions" enabled. The drives disappeared from Proxmox (expected), but so did a bunch of other stuff (kind of makes sense given the PCI thing has other things attached to it not just the USB controller).

However the VM wouldn't even boot (interesting):

error writing '1' to '/sys/bus/pci/devices/0000:07:00.0/reset': Inappropriate ioctl for device failed to reset PCI device '0000:07:00.0', but trying to continue as not all devices need a reset error writing '1' to '/sys/bus/pci/devices/0000:07:00.3/reset': Inappropriate ioctl for device failed to reset PCI device '0000:07:00.3', but trying to continue as not all devices need a reset error writing '1' to '/sys/bus/pci/devices/0000:07:00.4/reset': Inappropriate ioctl for device failed to reset PCI device '0000:07:00.4', but trying to continue as not all devices need a reset kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed. TASK ERROR: start failed: QEMU exited with code 1

looking at what "all functions" is about it looks like it basically says "Pass everything about this device through"

So I unchecked "all functions" (which gave the host some of the previous "disappeared" devices) and the VM started.

Created a pool, everything seemed fine until:

Critical
Pool test state is SUSPENDED: One or more devices are faulted in response to IO failures

Well, that's not good, this usually suggests disk or hardware issues.

TrueNAS UI also shows the pool is in an unhealthy state

Storage (TrueNas, and USB controllers) (Day 13-15)

I go to the console and check the status zpool status test and

Storage (TrueNas, and USB controllers) (Day 13-15)

and check the errors on dmesg dmesg | grep -i error

Storage (TrueNas, and USB controllers) (Day 13-15)

so:

  • I've got a lot of I/O errors on the device sdc, both reads and writes
  • The pool is in a SUSPENDED state with 9366 data errors
  • The mirror seems to still be ONLINE but there are serious I/O issues

I try a smartctl and lsblk, to see what's up with sdc and the drive has disappeared (this also happened for the other drives later).

Storage (TrueNas, and USB controllers) (Day 13-15)

No sdc and the errors are full of references to sdc. This suggests the drive was present (as sdc) when the errors started occurring, but has since completely disappeared from the system - which could mean:

  • The drive physically disconnected (Which I didn't do)
  • The USB/SATA connection failed
  • The drive completely failed and is no longer being recognized (This would suck)

I do an ls -l /dev/disk/by-id/ and confirm the drives are still there, using different names.

usb-ASMT_<serial>-0:0 -> sde
usb-ASMT_<serial>-0:0 -> sdf
usb-ASMT_<serial>-0:0 -> sdg

So remember when I said all USB and USB controllers are not made the same?

After hours of debugging it turns out that under heavy load the USB controller on the port I was using would "crash" and the drives would "shift around", getting remounted with different paths. (There's is more to this)

Not exactly ideal for a storage system.

So I went looking for ways to have TrueNAS use drives wwn instead of the dev path but could not find anything that helped.

Attempt 2: Direct Disk Passthrough

Time for Plan B. Instead of passing through the controller, pass through individual drives.

Install lshw and do an lshw -class disk -class storage to see the drives and their serial numbers.

Do an ls -l /dev/disk/by-id and copy the wwn path or the ata path of the drives.

1. Get drive info:

➜  ~ ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Jan 26 20:16 ata-<name-with-serial-number> -> ../../sde
lrwxrwxrwx 1 root root  9 Jan 26 20:16 ata-<name-with-serial-number> -> ../../sdd
lrwxrwxrwx 1 root root  9 Jan 26 20:16 ata-<name-with-serial-number> -> ../../sdf

2. Set up SCSI drives:

then using the wwn path or the ata path set scsi drives with the following command:

qm set <vm-id> -scsi1 /dev/disk/by-id/ata-<name-with-serial-number>
qm set <vm-id> -scsi2 /dev/disk/by-id/ata-<name-with-serial-number>
qm set <vm-id> -scsi3 /dev/disk/by-id/ata-<name-with-serial-number>

3. Add serial numbers to config:

➜  ~ vim /etc/pve/qemu-server/<vm-id>.conf
... other config
scsi1: /dev/disk/by-id/ata-<name-with-serial-number>,size=11176G,serial=<serial-number>
scsi2: /dev/disk/by-id/ata-<name-with-serial-number>,size=11176G,serial=<serial-number>
scsi3: /dev/disk/by-id/ata-<name-with-serial-number>,size=250059096K,serial=<serial-number>
... other config

on the proxmox GUI, you should see the drives are attached and the serial numbers are set under the hardware tab.

And then found if I moved the USB cable to the USB-C port it fixed the flakiness seen when using the other albeit still USB 4 ports.

Performance Testing

To make sure this works I did some stress tests with fio and well the results speak for themselves:

Cache performance

Reading a 10G file in 3.89 seconds (2632MiB/s throughput):

fio --name=test --rw=read --bs=1m --size=10g --filename=./testfile
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][75.0%][r=2639MiB/s][r=2638 IOPS][eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=7607: Sun Jan 26 05:46:15 2025
  read: IOPS=2632, BW=2632MiB/s (2760MB/s)(10.0GiB/3890msec)
    clat (usec): min=352, max=1613, avg=379.02, stdev=30.05
     lat (usec): min=352, max=1613, avg=379.07, stdev=30.06
    clat percentiles (usec):
     |  1.00th=[  359],  5.00th=[  363], 10.00th=[  367], 20.00th=[  367],
     | 30.00th=[  371], 40.00th=[  371], 50.00th=[  375], 60.00th=[  379],
     | 70.00th=[  383], 80.00th=[  388], 90.00th=[  396], 95.00th=[  404],
     | 99.00th=[  441], 99.50th=[  465], 99.90th=[  562], 99.95th=[ 1090],
     | 99.99th=[ 1467]
   bw (  MiB/s): min= 2604, max= 2650, per=100.00%, avg=2633.71, stdev=16.18, samples=7
   iops        : min= 2604, max= 2650, avg=2633.71, stdev=16.18, samples=7
  lat (usec)   : 500=99.79%, 750=0.12%, 1000=0.03%
  lat (msec)   : 2=0.07%
  cpu          : usr=0.28%, sys=99.49%, ctx=86, majf=0, minf=266
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=2632MiB/s (2760MB/s), 2632MiB/s-2632MiB/s (2760MB/s-2760MB/s), io=10.0GiB (10.7GB), run=3890-3890msec

Bigger file and Direct I/O

A bigger file and adding --direct=1 to the command to go around the RAM cache

Reading a 50G file in 129 seconds (396MiB/s throughput):

fio --name=test --rw=read --bs=1m --size=50g --filename=./bigtest --direct=1
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
test: Laying out IO file (1 file / 51200MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=267MiB/s][r=267 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=9526: Sun Jan 26 05:54:51 2025
  read: IOPS=396, BW=396MiB/s (415MB/s)(50.0GiB/129239msec)
    clat (usec): min=133, max=180737, avg=2522.21, stdev=3642.09
     lat (usec): min=133, max=180738, avg=2522.37, stdev=3642.10
    clat percentiles (usec):
     |  1.00th=[   149],  5.00th=[   194], 10.00th=[   212], 20.00th=[   260],
     | 30.00th=[   799], 40.00th=[  1500], 50.00th=[  1876], 60.00th=[  2245],
     | 70.00th=[  2737], 80.00th=[  3458], 90.00th=[  5997], 95.00th=[  8029],
     | 99.00th=[ 12387], 99.50th=[ 16712], 99.90th=[ 34341], 99.95th=[ 45876],
     | 99.99th=[116917]
   bw (  KiB/s): min=51200, max=2932736, per=100.00%, avg=406019.97, stdev=193023.81, samples=258
   iops        : min=   50, max= 2864, avg=396.50, stdev=188.50, samples=258
  lat (usec)   : 250=17.75%, 500=11.38%, 750=0.74%, 1000=1.46%
  lat (msec)   : 2=21.94%, 4=30.99%, 10=13.34%, 20=2.05%, 50=0.31%
  lat (msec)   : 100=0.02%, 250=0.02%
  cpu          : usr=0.14%, sys=10.08%, ctx=37023, majf=0, minf=269
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=51200,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=396MiB/s (415MB/s), 396MiB/s-396MiB/s (415MB/s-415MB/s), io=50.0GiB (53.7GB), run=129239-129239msec

Okay, I could live with those numbers as long as they are stable and consistent.

Sustained Operations

So I do 5 runs of reading an 800G file (which does include a write during initial file creation) and and writing 900G file, with a mix of both reading and writing at the same time.

The idea is to see if something breaks, so i'm also monitoring the logs and drive temps

I will maybe never experience this kind of thing in one go unless during a resilver so if this is stable I'm good with that

I leave these running and go have a drink with a some friends, it's the weekend after all.

Reading an 800G file:

fio --name=read --rw=read --bs=1m --size=800g --filename=./bigtest
read: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=310MiB/s][r=310 IOPS][eta 00m:01s]
read: (groupid=0, jobs=1): err= 0: pid=10516: Sun Jan 26 21:32:56 2025
  read: IOPS=255, BW=256MiB/s (268MB/s)(800GiB/3205833msec)
    clat (usec): min=356, max=303293, avg=3911.61, stdev=6703.53
     lat (usec): min=356, max=303293, avg=3911.78, stdev=6703.52
    clat percentiles (usec):
     |  1.00th=[   416],  5.00th=[   429], 10.00th=[   437], 20.00th=[   457],
     | 30.00th=[   506], 40.00th=[   603], 50.00th=[   676], 60.00th=[  3982],
     | 70.00th=[  4555], 80.00th=[  5276], 90.00th=[ 11731], 95.00th=[ 12387],
     | 99.00th=[ 20579], 99.50th=[ 24773], 99.90th=[ 77071], 99.95th=[125305],
     | 99.99th=[198181]
   bw (  KiB/s): min=43008, max=555008, per=100.00%, avg=261727.05, stdev=73610.29, samples=6411
   iops        : min=   42, max=  542, avg=255.54, stdev=71.87, samples=6411
  lat (usec)   : 500=29.19%, 750=21.95%, 1000=0.45%
  lat (msec)   : 2=1.53%, 4=7.32%, 10=24.63%, 20=13.79%, 50=0.95%
  lat (msec)   : 100=0.12%, 250=0.07%, 500=0.01%
  cpu          : usr=0.10%, sys=13.54%, ctx=402475, majf=0, minf=269
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=819200,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=256MiB/s (268MB/s), 256MiB/s-256MiB/s (268MB/s-268MB/s), io=800GiB (859GB), run=3205833-3205833msec

and writing a 900G file:

fio --name=write --rw=write --bs=1m --size=900g --filename=./big2testfile
write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=216MiB/s][w=216 IOPS][eta 00m:00s]
write: (groupid=0, jobs=1): err= 0: pid=24687: Sun Jan 26 23:32:36 2025
  write: IOPS=179, BW=180MiB/s (188MB/s)(900GiB/5127844msec); 0 zone resets
    clat (usec): min=118, max=400029, avg=5542.57, stdev=5208.15
     lat (usec): min=120, max=400047, avg=5561.90, stdev=5208.03
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    5], 20.00th=[    5],
     | 30.00th=[    5], 40.00th=[    5], 50.00th=[    5], 60.00th=[    5],
     | 70.00th=[    6], 80.00th=[    6], 90.00th=[    7], 95.00th=[    8],
     | 99.00th=[   22], 99.50th=[   35], 99.90th=[   87], 99.95th=[  101],
     | 99.99th=[  144]
   bw (  KiB/s): min= 6144, max=2981888, per=100.00%, avg=184084.36, stdev=65812.35, samples=10254
   iops        : min=    6, max= 2912, avg=179.74, stdev=64.27, samples=10254
  lat (usec)   : 250=0.10%, 500=0.02%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=1.37%, 10=95.23%, 20=2.08%, 50=0.88%
  lat (msec)   : 100=0.20%, 250=0.07%, 500=0.01%
  cpu          : usr=0.45%, sys=3.40%, ctx=931630, majf=0, minf=16
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,921600,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=180MiB/s (188MB/s), 180MiB/s-180MiB/s (188MB/s-188MB/s), io=900GiB (966GB), run=5127844-5127844msec

From the TrueNAS GUI

Storage (TrueNas, and USB controllers) (Day 13-15)

What I Learned

In general, I think I learned a whole lot of stuff about USB somethings i had no idea about before.
  1. Not all USB controllers handle sustained operations equally - the USB-C controller in my setup proved more stable
  2. It's worth checking for UASP when buying enclosures
  3. When passing through USB storage to VMs, direct disk passthrough is maybe more reliable than attempting to pass through the whole controller.
  4. Monitor temperatures during stress testing (I had journalctl and temp monitoring running throughout) something like for drive in sda sdb sdc sdd; do echo "=== /dev/$drive ==="; smartctl -A /dev/$drive | grep -i temp; done for temps.
]]>
<![CDATA[ArgoCD & App of Apps (Day 11 - 12)]]>https://mrdvince.me/argocd-app-of-apps-day-11-12/678fdc4bea0ee00001ec0231Thu, 23 Jan 2025 21:53:19 GMT

It's hard to see Argo CD mentioned and GitOps not mentioned (though tbf that's the point of Argo).

GitOps is a way to manage your Kubernetes clusters where your desired state lives in Git, and tools like Argo CD continuously sync this state to your cluster.

Think of it like "infrastructure as code" but for Kubernetes resources.

Why GitOps?

Well for starters, given how often I kept rebuilding everything from scratch, being able to just point Argo CD at my repo and have it apply everything was 👌.

Anyway, why GitOps?

GitOps helps:

  • Keep track of all your changes (it's all on git)
  • Making cluster recovery straightforward (just point Argo CD at your repo)
  • Automating deployments (push to git, Argo handles the rest)

Initial Cluster Setup

Before jumping into it, you need your cluster in a "usable" state i.e:

  • CNI configurations done
  • Essential secrets (I apply these directly with sops decrypt and pipe to kubectl)
  • Argo CD itself (installed via helm/helmfile)

The App of Apps Pattern

It's a tree structure (of sorts) where you have one root application that points to all your other applications.

When Argo syncs this root app, it creates and manages everything defined in your repo.

ArgoCD & App of Apps (Day 11 - 12)
I ended up preferring Helm charts for this, though other methods exist.

Setting Up Argo CD

Assuming you already installed Argo

First, grab the initial admin password:

k -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

Port forward so you can authenticate to the CLI (you can also access the UI)

k -n argocd port-forward services/argocd-server 8080:80

Login on the CLI

argocd login localhost:8080

Change the default password:

argocd account update-password --new-password "<your password>"

Add your git repo:

argocd repo add [email protected]:mrdvince/<your repo>.git --ssh-private-key-path <your ssh key path>

Creating the Root App

You can create the root app either via CLI:

argocd app create apps \
    --dest-namespace argocd \
    --dest-server https://kubernetes.default.svc \
    --repo [email protected]:mrdvince/<your repo>.git \
    --path apps/argo_apps

Then sync it:

argocd app sync apps

Or apply a manifest with kubectl:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: apps
  namespace: argocd
spec:
  project: default
  source:
    repoURL: [email protected]:mrdvince/<your repo>.git
    targetRevision: HEAD
    path: apps/argo_apps/
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - RespectIgnoreDifferences=true
      - ApplyOutOfSyncOnly
Note: the path: apps/argo_apps/ points to the path from the base of the repo.

The GitOps Workflow

ArgoCD & App of Apps (Day 11 - 12)

Once everything is set up, the workflow looks like this:

  1. Push changes to your git repo
  2. Argo CD detects changes
  3. Changes are pulled and compared with the cluster state
  4. If different, Argo CD applies the changes

Now just sit back and watch as Argo CD starts creating and managing all your applications defined in the git repo.

You should then be able to see a dashboard that looks like my screenshot below

ArgoCD & App of Apps (Day 11 - 12)
]]>
<![CDATA[Debugging and General tips (Day 10)]]>https://mrdvince.me/debugging-and-general-tips-day-10-2/678fdc3aea0ee00001ec022dTue, 21 Jan 2025 22:00:05 GMT

After setting up and debugging various parts, I thought I'd share some basic tips that have helped me along the way.

Managing Multiple Clusters

Here's how to merge multiple kubeconfig files:

KUBECONFIG=~/.kube/config:~/.kube/config.cluster2 kubectl config view --flatten > ~/.kube/config.merged
cp ~/.kube/config ~/.kube/config.backup
mv ~/.kube/config.merged ~/.kube/config

You can then rename contexts for better clarity:

kubectl config rename-context default prism
kubectl config rename-context kubernetes-admin@kubernetes atlas

And set proper permissions on your kube config:

chmod 600 ~/.kube/config

Node Scheduling Issues

If pods aren't scheduling on control plane nodes (I'm using 3 control plane nodes), check for taints:

kubectl get nodes -o json | jq '.items[].spec.taints'

To remove control-plane taints if needed:

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Troubleshooting Tips

In general, most issues can be found and solved by following a pattern:

  • Get the resource
  • Describe it
  • And follow the trail of related resources
  • Check the related logs

An example of a certificate issue:

Certificate Issues

Follow the chain of resources when debugging cert-manager:

kubectl get certificate -n argocd
kubectl -n argocd describe certificate argocd-certificate
kubectl -n argocd describe certificaterequests.cert-manager.io argocd-certificate-1
kubectl -n argocd describe order argocd-certificate-1-1494176820

kubectl -n cert-manager logs pods/cert-manager-<some-hash>

Other times just deleting a resource and having it get recreated solves the issue, for example, switching from staging to production Let's Encrypt, you may need to delete the old secrets or the orders and they should be recreated:

e,g kubectl -n argocd delete secrets argocd-tls

Network Debugging

When services aren't reachable:

  • Check firewall rules and network policies between VLANs
  • Use dig or nslookup to verify DNS resolution
  • Verify LoadBalancer IP assignments
  • Use tcpdump and netstat for network debugging:
# Check listening ports
netstat -tlpn

# Monitor ARP requests
tcpdump -i any -n arp  

LoadBalancer Configuration

If setting up a new cluster using kubeadm (not on the cloud) use Metalb or Cilium to give load balancer IP addresses.

If using Cilium, here's a sample configuration:

apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: "lb-pool"
spec:
  blocks:
    - cidr: "192.168.30.140/30"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: cilium-l2-announce
spec:
  externalIPs: true
  loadBalancerIPs: true
  interfaces:
    - eth0
All services run through traefik so a few loadbalancer IPs are plenty.

Helm and Argo CD Debugging

Debug Argo CD applications, you can render out the chart:

helm template . -f values.yaml > rendered-app.yaml

And for helmfile:

helmfile template > rendered.yaml
]]>