I initially planned to run CrowdSec just on Traefik, but having it at the firewall level provides more protection for all devices on the network.
CrowdSec has a
]]>CrowdSec is a security tool that detects and blocks malicious IPs using a collaborative approach to share threat intelligence across users.
I initially planned to run CrowdSec just on Traefik, but having it at the firewall level provides more protection for all devices on the network.
CrowdSec has a convenient plugin for OPNsense that makes installation straightforward:
Once installed, you'll find CrowdSec under the Services tab:
By default, CrowdSec creates floating rules to block incoming connections from malicious IP addresses.
However, we can use the automatically created crowdsec_blacklists and crowdsec6_blacklists aliases to create custom floating rules that block all outgoing connections to malicious IPs.
This is useful in case a device on the network is already compromised and tries to connect back to the IP blocklisted.
To verify that CrowdSec is working properly, you can temporarily ban an IP address:
cscli decisions add -t ban -d 1m -i <IP address>
This will ban the specified IP for one minute.
If you use your own IP, expect your connection to freeze, confirming that the ban is working.
To view active decisions (bans):
cscli decisions list
CrowdSec also has a Prometheus endpoint for metrics collection, so will look into integrating with Grafana for visualization.
]]>When working with infrastructure as code and Kubernetes, you inevitably face the challenge of managing secrets securely.
API tokens, and other sensitive information shouldn't be stored in plain text in your Git repositories, but they still need to be accessible for deployments.
SOPS (Secrets OPerationS) is a powerful tool that supports multiple encryption providers including AWS KMS, GCP KMS, Azure Key Vault, age, and PGP.
But for those of us without cloud provider resources, age offers a lightweight, modern alternative for encryption.
The first question with age is: where do you store your keys securely, especially when you might need to access them across multiple machines? or what happens you reset or loose your machine and loose the keys.
I went looking for options and found Bitwarden Secrets Manager, it offers an elegant solution to securely store cryptographic keys and access them, if you're also already using Bitwarden for password management then why not try it.
First, install age and SOPS:
# Install age and SOPS (commands will vary by OS)
# on mac you can use brew
age-keygen -o key.txt
The generated file contains two important pieces:
age1...
)AGE-SECRET-KEY-...
)Follow the Bitwarden Secrets Manager guide to set up your account and store both keys.
Install the Bitwarden Secrets CLI (bws
) and set up your access token:
export BWS_ACCESS_TOKEN=<your token>
Add this function to your shell profile (e.g., .zshrc
) to easily load keys when needed:
load_age_secrets() {
export SOPS_AGE_KEY=$(bws secret get <secret id> | jq .value | xargs)
export AGE_PUBLIC_KEY=$(bws secret get <secret id> | jq ".value" | xargs)
echo "export SOPS_AGE_KEY='$SOPS_AGE_KEY'" > /tmp/.secrets_exports
echo "export AGE_PUBLIC_KEY='$AGE_PUBLIC_KEY'" >> /tmp/.secrets_exports
echo "Secrets loaded"
}
# this loads the keys as env variables if the values exist
[[ -f /tmp/.secrets_exports ]] && source /tmp/.secrets_exports
the <secret id>
is the secret uuid that can be found by running bws secret list
.
An example of using encrypted secrets with Terragrunt for infra provisioning (in this case, for Proxmox):
# auth.yaml
proxmox_token: "your_super_secret_token_that_no_one_should_know"
proxmox_user_id: "your_user_id_which_we_also_encrypted_because_why_not"
Using YAML because we will use Terragrunt's yamldecode
function for parsing decrypted secret later.
sops --encrypt --age $AGE_PUBLIC_KEY --in-place auth.yaml
--in-place
overwrites the file with a new encrypted file.
secret_vars = yamldecode(sops_decrypt_file(find_in_parent_folders("sample.yaml")))
# ...
pm_api_token_secret = local.secret_vars.proxmox_token
pm_api_token_id = local.secret_vars.proxmox_user_id
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "proxmox" {
pm_api_url = "${local.pm_api_url}"
pm_api_token_id = "${local.pm_api_token_id}"
pm_api_token_secret = "${local.pm_api_token_secret}"
pm_tls_insecure = false
pm_parallel = 10
}
EOF
}
Terragrunt commands should work as usual i.e e.g when running terragrunt apply
the secret gets decrypted and used to authenticate with the proxmox API.
You can also encrypt Kubernetes secrets
# secret.yaml
apiVersion: v1
data:
key1: c3VwZXJzZWNyZXQ=
key2: dG9wc2VjcmV0
kind: Secret
metadata:
name: my-secret
sops --encrypt --age $AGE_PUBLIC_KEY --encrypted-regex '^(data|stringData)$' secret.yaml
To apply it you can pipe the decrypted output to kubectl
e.g
sops --decrypt --encrypted-regex '^(data|stringData)$' sample.yaml | k apply -f -
So this setup gives you a nice foundation for secret management.
]]>Steps:
Got a new drive to add to my storage pool and TrueNAS Scale now supports RAIDZ VDEV extension. Which is a relatively new feature, introduced in TrueNAS 24.10 (Electric Eel).
Steps:
See the docs here, the plus side of this process is that your NAS remains fully functional during the extension.
You can continue using all services while the VDEV rebuilds in the background.
There's an interesting caveat with the extension process:
The expanded vdev uses the pre-expanded parity ratio, which reduces the total vdev capacity. To reset the vdev parity ratio and fully use the new capacity, manually rewrite all data in the vdev. This process takes time and is irreversible.
In practical terms, this means you won't immediately get the full theoretical capacity increase. The system recovers this "lost headroom" over time as data naturally gets modified or deleted. From the TrueNAS docs:
Extended VDEVs recover lost headroom as existing data is read and rewritten to the new parity ratio. This can occur naturally over the lifetime of the pool as you modify or delete data. To manually recover capacity, simply replicate and rewrite the data to the extended pool.
However, scripts like this exits for ZFS balancing can be done using e,g the linked one does it in place.
For those wanting to calculate potential capacity gains, TrueNAS provides a handy Extension Calculator.
The extension process is fairly time-consuming - my current extension has been running for over 5 hours.
However, the entire system remains usable during this time, with all services continuing to function (that said there's not much load on the NAS, can't say the same if it was under heavy use)
]]>So I needed to share some services with friends outside my tailnet, but:
The current infrastructure setup includes:
After doing some research and trials, the approach I went with was to expose the Traefik LoadBalancer service directly to Tailscale using their Kubernetes operator.
And tailscale does include a blog post on how to do this, I recommend checking it out.
Important: Create an OAuth client in the Tailscale console with Devices Core
and Auth Keys
write scopes first, (see the full post here).
Install the Tailscale operator using Helm:
https://pkgs.tailscale.com/helmcharts
to your local Helm repositories:helm repo add tailscale https://pkgs.tailscale.com/helmcharts
helm repo update
helm upgrade \
--install \
tailscale-operator \
tailscale/tailscale-operator \
--namespace=tailscale \
--create-namespace \
--set-string oauth.clientId="<client_id>" \
--set-string oauth.clientSecret="<client_secret>" \
--wait
Can be added to be part of helmfile template or argo deployment
With the operator running, exposing a service is as simple as adding an annotation:
annotations:
tailscale.com/expose: "true"
From the Tailscale console:
One useful thing is to set up ACLs to restrict what autogroup:shared
and specific tags
(the operator is a tagged device) can access.
This ensures users only have access to the services you explicitly want to share.
The benefits are essentially:
I found some old hard drives from my campus days (surprisingly still working) with a bunch of songs. Rather than letting these sit idle, figured it was time to make this collection accessible on the go.
So I went looking for something I could use and found Navidrome.
First, create two datasets through the TrueNAS GUI (I prefer this over regular folders for better permission control):
navidrome
├── data
└── music
TrueNAS Scale's ElectricEel release moved to Docker for apps (instead of Kubernetes)
Install:
This part assumes you already have Traefik set up as a reverse proxy with cert-manager and HTTPS redirect middleware configured.
If you're starting fresh, you'll want to get those pieces in place first.
Once Navidrome is running, we need to make it accessible through a reverse proxy. This requires two pieces:
An external service definition:
apiVersion: v1
kind: Service
metadata:
name: navidrome
namespace: routes
spec:
ports:
- port: <port>
targetPort: <port>
type: ExternalName
externalName: <Ip>
HTTP redirect route:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: navidrome-redirect
namespace: routes
spec:
entryPoints:
- web
routes:
- match: Host(`<host>`)
kind: Rule
middlewares:
- name: https-redirect
services:
- name: noop@internal
kind: TraefikService
And the actual route over HTTPS:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: navidrome
namespace: routes
spec:
entryPoints:
- websecure
routes:
- match: Host(`<host>`)
kind: Rule
services:
- name: navidrome
port: <port>
scheme: http
tls:
secretName: <ssl cert secret>
While Navidrome's web interface is solid, one of its strengths is Subsonic API compatibility.
This means you can use various Subsonic-compatible apps as front-ends for your music collection.
I chose to use Symfonium as my client of choice and it's been impressive.
Went for a photo walk and with gapless playback and a smart queue that keeps playing similar tracks (like Spotify's song radio'ish), it basically works, I forgot it was a self hosted thing. Also thanks to Tailscale, I can stream my music anywhere without noticing any difference from other services.
Now this isn't a replacement and I will still most definitely keep my spotify playlists]]>
Upgrading K3s is remarkably straightforward. You just use the same install command you used when first creating your cluster for me that's:
NB: This is a single node k3s cluster
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_EXEC='--disable=traefik,disable-kube-proxy,disable-network-policy --flannel-backend=none --write-kubeconfig-mode=644 --etcd-expose-metrics true' sh -
Looking at the install command, you might notice several flags:
--disable=traefik
: Disabled because I'm running my own managed version of Traefik--disable-kube-proxy,--flannel-backend=none
: Both disabled as Cilium handles these functions (CNI and service networking)--write-kubeconfig-mode=644
: Sets readable permissions on the kubeconfig file right from the start--etcd-expose-metrics true
: Exposes etcd metrics.In my case, the output showed:
[INFO] Finding release for channel stable
[INFO] Using v1.31.5+k3s1 as release
[INFO] Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.31.5+k3s1/sha256sum-amd64.txt
[INFO] Skipping binary downloaded, installed k3s matches hash
[INFO] Skipping installation of SELinux RPM
...
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
...
After the upgrade, your workloads should continue running without interruption. You can verify the new version with:
k3s -version
That's really all there is to it - K3s keeps things refreshingly simple.
]]>The next step was to do a mass import after testing, the CLI tool makes mass imports pretty straightforward, but I kept having an issue with certain files
]]>I have been setting up Immich, i.e destroying and recreating it until I got my photo organization and volumes just right.
The next step was to do a mass import after testing, the CLI tool makes mass imports pretty straightforward, but I kept having an issue with certain files failing to import especially when trying to import or upload large files (like 2GB).
I would generally get a very generic upload failed
Everything was working fine for regular photos, and my phone syncing without issues.
Did some digging and I found that the issue was to do with Traefik time outs, see the traefik page here
So I adjusted the timeouts in the config i.e.:
ports:
web:
redirections:
..............
websecure:
tls:
enabled: true
transport:
respondingTimeouts:
readTimeout: 20m
writeTimeout: 20m # 20 minutes - adjust based on your needs
And while at it updated traefik to v3.3.3 (helm chart 34.3.0)
and changed from redirectTo
to redirections
(diff):
ports:
web:
- redirectTo:
- port: websecure
+ redirections:
+ entryPoint:
+ to: websecure
+ scheme: https
+ permanent: true
I now got the large file uploads peachy.
PPS: Planning to move the days of homelab to a dedicated page and reduce the amount of logs on the homepage]]>
Some research later i found something called bufferbloat.
I found this analogy of bufferbloat from Waveform.com that's worth sharing
]]>While browsing and editing large RAW photos over SMB, I noticed some high kind of latency and got asking if this could be reduced.
Some research later i found something called bufferbloat.
I found this analogy of bufferbloat from Waveform.com that's worth sharing here:
Think of your internet connection like a sink with a narrow drain (your bandwidth limit). When someone downloads a large file, it's like dumping a bucket of water into the sink. Now if you try to do something time-sensitive - like gaming or a video call - those packets are like drops of oil trying to get through a sink full of water. They have to wait for all that "water" to drain first, causing lag and delays. That's bufferbloat.
Check out Waveform's bufferbloat test tool, and a more detailed ELI5 explanation.
In OPNsense, we can address this using traffic shaping - setting up pipes and queues with FlowQueue-CoDel, it ensures that packets from small flows are sent in a timely fashion, while large flows share the bottleneck’s capacity.
I initially found guides for pfSense, but OPNsense has its own really nice guide on how to address bufferbloat here
Before messing around with creating pipes, queues, and rules, it's advisable to ran some tests to establish a baseline.
My initial bufferbloat grade was a B, with some concerning latency spikes:
as seen in the screenshot below:
After setting up and tuning the traffic shaping rules:
as seen in the screenshot below:
So traded some raw speed for consistency.
After seeing improvements on the WAN side, I got more specific with my internal network.
I set it up for:
I really just needed it for low latency especially when editing raw files attached on over smb and it did help, not a drastic difference but something noticeable, and so I disabled the WAN side optimizations.
]]>What started as just routine maintenance turned into a reminder of why we keep backups (and backups of backups).
After the updates and reboots, OPNsense decided to forget about
]]>Today was all about updates.
What started as just routine maintenance turned into a reminder of why we keep backups (and backups of backups).
After the updates and reboots, OPNsense decided to forget about its VLANs and misconfigure WAN and LAN interfaces.
This cascaded into:
Everything seemed fixed until I noticed I still had no internet.
OPNsense looked good, but the DNS server was unreachable despite appearing online and healthy. So basically "everything's fine but nothing works."
After some troubleshooting and replacing the VMs NIC and re-assigning it the same static ip on OPNSense the node was now reachable and my DNS working.
Most services recovered quickly once DNS and OPNsense were back, though TrueNAS took its time and couldn't update catalogs so added in a Quad 9 as a fallback for next time (because there probably will)
And the Kubernetes clusters (both k3s and the HA one) - everything just came back online like nothing had happened, without needing to touch a single node (including the 2 haproxy nodes etc).
At least now I have a fallback plan for DNS issues, and another validation of why Kubernetes is great for self-healing infrastructure.]]>
Current dashboard
The base setup is defined in the settings.yaml
file and looks like this:
title: <Your title name>
theme: dark
color: slate
background:
image: <image>
blur: sm
]]>Spent today setting up Homepage to organize access to all the URLs I currently have.
Current dashboard
The base setup is defined in the settings.yaml
file and looks like this:
title: <Your title name>
theme: dark
color: slate
background:
image: <image>
blur: sm
saturate: 50
brightness: 50
opacity: 50
....
<rest of config including layouts>
Services are grouped logically, making it easy to find what you need.
For icons, you can use:
When using material icons or simple icons prefix the icon with e.g mdi-<icon-name> or si-<icon-name>
An example definition in the services.yaml
- Hypervisor:
- Avalon:
icon: proxmox.svg
href: <proxmox url>
description: Main Compute Node
- Aegis:
icon: proxmox.svg
href: <proxmox url>
description: Network Node
- Network:
- Vale:
icon: opnsense.svg
href: <url>
description: OPNsense
- DNS:
icon: adguard.svg
href: <url>
description: DNS Management
The page auto-refreshes as you edit the config, you see the changes in real-time.
You can also add widgets. e.g showing the time:
- datetime:
text_size: xl
format:
timeStyle: short
And bookmarks for quick access:
- Blogs:
- Local Blog:
- abbr: Blog
href: <your blog url>
I still need to:
For now, it's a functional start that makes navigating between services easier, and the cool thing is you can set it as the default browser landing page.
]]>First, head to Minio and create an access key - you'll need both the Access Key and Secret
]]>With Minio running on our new storage, we've now got S3-compatible storage right. First use case: moving VM state off the local filesystem.
First, head to Minio and create an access key - you'll need both the Access Key and Secret Key for the next steps.
Download the file or copy the created keys (once you close the popup the you can no longer copy the keys).
After installing the AWS CLI (see installing guide here) we need to configure it:
aws configure
AWS Access Key ID [None]: <Key ID copied from minio>
AWS Secret Access Key [None]: <Secret key copied from minio>
Default region name [None]: eu-west-1
Default output format [None]:
# Set signature version
aws configure set default.s3.signature_version s3v4
Add your Minio endpoint to the config (if you are using a reverse proxy make sure the URL is pointing to port 9000, not the GUI port):
# ~/.aws/config
[default]
region = eu-west-1
endpoint_url=https://minio-s3.<your domain name> <- Add this line
s3 =
signature_version = s3v4
Let's make sure everything works:
# Create a bucket
aws s3 mb s3://test-minio-bucket
make_bucket: test-minio-bucket
# List buckets
aws s3 ls
2025-01-29 20:33:06 test-minio-bucket
# Remove the test bucket
aws s3 rb s3://test-minio-bucket
remove_bucket: test-minio-bucket
To use this with Terragrunt, you'll need to add some config to the remote_state
block. (See example repo here).
Since we're using Minio rather than actual S3, we need to disable several S3-specific features - otherwise Terragrunt will try to do S3 specific modifications to the bucket settings:
# remote_state block
remote_state {
backend = "s3"
config = {
bucket = "terragrunt-state"
key = "${path_relative_to_include()}/baldr.tfstate"
region = local.aws_region
endpoint = local.environment_vars.locals.endpoint_url
# Skip various S3 features we don't need
skip_bucket_ssencryption = true
skip_bucket_public_access_blocking = true
skip_bucket_enforced_tls = true
skip_bucket_root_access = true
skip_credentials_validation = true
force_path_style = true
}
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
}
Note: The endpoint needs to be set in the Terragrunt config even if it's in the AWS config file - I found Terragrunt doesn't pick it up from there.
]]>Created a new pool using RAIDZ1 with a 256GB SSD as ZFS L2ARC read-cache. Unlike the previous mirror setup from my testing, RAIDZ1 gives me more usable
]]>This is a continuation of this post, the final piece arrived today - another 12TB drive to complete the storage setup.
Created a new pool using RAIDZ1 with a 256GB SSD as ZFS L2ARC read-cache. Unlike the previous mirror setup from my testing, RAIDZ1 gives me more usable space while still protecting against a single drive failure.
Made some datasets:
backups
: For system and VM backupsk8s
: Kubernetes persistent storageiscsi
: Testing ground for VM storage over iSCSII will create more datasets as the need arises.
Swapped out the built-in fan in the drive enclosure with a Noctua - because if you're running 24/7 storage, noise matters. The difference is noticeable (Noctua fans are awesome).
Starting the Minio setup - finally getting back to what started this whole thing.
]]>This started as a simple "let's set up a container registry".
Instead, it turned into a deep dive into USB architecture, storage performance, and a lesson in why not all USB controllers are created equal, that lasted a whole weekend.
I needed to set up a container registry, which meant thinking about storage. The options were:
The registry actually supports using S3 as a backend for image storage. But well cloud providers aren't in the budget, so the next best thing is Minio. I had recently got my hands on two 12TB drives (waiting on a third for RAIDZ1), so this seemed like a perfect use case.
But there was a catch - I'm running mini PCs. No fancy drive bays, or PCIE expansion slots etc, just USB. And this marks the beginning of a very long rabbit hole learning about USB and PCIE devices and things i had no idea about.
The first bit of good news was that my external enclosure supports USB 3.2 Gen 2 and UASP (USB Attached SCSI Protocol). If you are like me you had no idea what UASP was:
You can see this in with lsusb -t
:
/: Bus 10.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 10000M
| Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M
| Port 1: Dev 3, If 0, Class=Mass Storage, Driver=uas, 10000M
| Port 2: Dev 4, If 0, Class=Mass Storage, Driver=uas, 10000M
| Port 4: Dev 5, If 0, Class=Mass Storage, Driver=uas, 10000M
See that Driver=uas
?.
Digging deeper into the USB setup with lspci -k | grep -i usb
we see:
06:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #3
06:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #4
07:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #8
07:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #5
07:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #6
Okay so I have USB 4 ports, cool, but little did I know these weren't all created equal...
Getting TrueNAS set up was straightforward enough - grab ISO, upload to Proxmox, create VM (using q35, UEFI, disabled memory ballooning), install.
Now the "fun" started when trying to get the drives to the VM.
First attempt: pass through the USB controller as a PCIe device.
So looking at our lsusb -t
command we see we see we have:
To find which PCI device controls a USB bus we do a readlink /sys/bus/usb/devices/usb10
which shows:
../../../devices/pci0000:00/0000:00:08.3/0000:07:00.4/usb10
This path shows:
IOMMU groups show which devices can be passed through independently.
So checking IOMMU groups with find /sys/kernel/iommu_groups/ -type l | grep "0000:07:00"
:
/sys/kernel/iommu_groups/26/devices/0000:07:00.0
/sys/kernel/iommu_groups/27/devices/0000:07:00.3
/sys/kernel/iommu_groups/28/devices/0000:07:00.4
So if I understand this correctly it means each device in its own group can be passed through independently.
I passed through the device in group 28 with "all functions" enabled. The drives disappeared from Proxmox (expected), but so did a bunch of other stuff (kind of makes sense given the PCI thing has other things attached to it not just the USB controller).
However the VM wouldn't even boot (interesting):
error writing '1' to '/sys/bus/pci/devices/0000:07:00.0/reset': Inappropriate ioctl for device failed to reset PCI device '0000:07:00.0', but trying to continue as not all devices need a reset error writing '1' to '/sys/bus/pci/devices/0000:07:00.3/reset': Inappropriate ioctl for device failed to reset PCI device '0000:07:00.3', but trying to continue as not all devices need a reset error writing '1' to '/sys/bus/pci/devices/0000:07:00.4/reset': Inappropriate ioctl for device failed to reset PCI device '0000:07:00.4', but trying to continue as not all devices need a reset kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed. TASK ERROR: start failed: QEMU exited with code 1
looking at what "all functions" is about it looks like it basically says "Pass everything about this device through"
So I unchecked "all functions" (which gave the host some of the previous "disappeared" devices) and the VM started.
Created a pool, everything seemed fine until:
Critical
Pool test state is SUSPENDED: One or more devices are faulted in response to IO failures
Well, that's not good, this usually suggests disk or hardware issues.
TrueNAS UI also shows the pool is in an unhealthy state
I go to the console and check the status zpool status test
and
and check the errors on dmesg dmesg | grep -i error
so:
sdc
, both reads and writesI try a smartctl
and lsblk
, to see what's up with sdc
and the drive has disappeared (this also happened for the other drives later).
No sdc
and the errors are full of references to sdc
. This suggests the drive was present (as sdc) when the errors started occurring, but has since completely disappeared from the system - which could mean:
I do an ls -l /dev/disk/by-id/
and confirm the drives are still there, using different names.
usb-ASMT_<serial>-0:0 -> sde
usb-ASMT_<serial>-0:0 -> sdf
usb-ASMT_<serial>-0:0 -> sdg
So remember when I said all USB and USB controllers are not made the same?
After hours of debugging it turns out that under heavy load the USB controller on the port I was using would "crash" and the drives would "shift around", getting remounted with different paths. (There's is more to this)
Not exactly ideal for a storage system.
So I went looking for ways to have TrueNAS use drives wwn
instead of the dev path but could not find anything that helped.
Time for Plan B. Instead of passing through the controller, pass through individual drives.
Install lshw
and do an lshw -class disk -class storage
to see the drives and their serial numbers.
Do an ls -l /dev/disk/by-id
and copy the wwn
path or the ata
path of the drives.
1. Get drive info:
➜ ~ ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root 9 Jan 26 20:16 ata-<name-with-serial-number> -> ../../sde
lrwxrwxrwx 1 root root 9 Jan 26 20:16 ata-<name-with-serial-number> -> ../../sdd
lrwxrwxrwx 1 root root 9 Jan 26 20:16 ata-<name-with-serial-number> -> ../../sdf
2. Set up SCSI drives:
then using the wwn
path or the ata
path set scsi
drives with the following command:
qm set <vm-id> -scsi1 /dev/disk/by-id/ata-<name-with-serial-number>
qm set <vm-id> -scsi2 /dev/disk/by-id/ata-<name-with-serial-number>
qm set <vm-id> -scsi3 /dev/disk/by-id/ata-<name-with-serial-number>
3. Add serial numbers to config:
➜ ~ vim /etc/pve/qemu-server/<vm-id>.conf
... other config
scsi1: /dev/disk/by-id/ata-<name-with-serial-number>,size=11176G,serial=<serial-number>
scsi2: /dev/disk/by-id/ata-<name-with-serial-number>,size=11176G,serial=<serial-number>
scsi3: /dev/disk/by-id/ata-<name-with-serial-number>,size=250059096K,serial=<serial-number>
... other config
on the proxmox GUI, you should see the drives are attached and the serial numbers are set under the hardware tab.
And then found if I moved the USB cable to the USB-C port it fixed the flakiness seen when using the other albeit still USB 4 ports.
To make sure this works I did some stress tests with fio
and well the results speak for themselves:
Reading a 10G file in 3.89 seconds (2632MiB/s throughput):
fio --name=test --rw=read --bs=1m --size=10g --filename=./testfile
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][75.0%][r=2639MiB/s][r=2638 IOPS][eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=7607: Sun Jan 26 05:46:15 2025
read: IOPS=2632, BW=2632MiB/s (2760MB/s)(10.0GiB/3890msec)
clat (usec): min=352, max=1613, avg=379.02, stdev=30.05
lat (usec): min=352, max=1613, avg=379.07, stdev=30.06
clat percentiles (usec):
| 1.00th=[ 359], 5.00th=[ 363], 10.00th=[ 367], 20.00th=[ 367],
| 30.00th=[ 371], 40.00th=[ 371], 50.00th=[ 375], 60.00th=[ 379],
| 70.00th=[ 383], 80.00th=[ 388], 90.00th=[ 396], 95.00th=[ 404],
| 99.00th=[ 441], 99.50th=[ 465], 99.90th=[ 562], 99.95th=[ 1090],
| 99.99th=[ 1467]
bw ( MiB/s): min= 2604, max= 2650, per=100.00%, avg=2633.71, stdev=16.18, samples=7
iops : min= 2604, max= 2650, avg=2633.71, stdev=16.18, samples=7
lat (usec) : 500=99.79%, 750=0.12%, 1000=0.03%
lat (msec) : 2=0.07%
cpu : usr=0.28%, sys=99.49%, ctx=86, majf=0, minf=266
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=2632MiB/s (2760MB/s), 2632MiB/s-2632MiB/s (2760MB/s-2760MB/s), io=10.0GiB (10.7GB), run=3890-3890msec
A bigger file and adding --direct=1
to the command to go around the RAM cache
Reading a 50G file in 129 seconds (396MiB/s throughput):
fio --name=test --rw=read --bs=1m --size=50g --filename=./bigtest --direct=1
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
test: Laying out IO file (1 file / 51200MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=267MiB/s][r=267 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=9526: Sun Jan 26 05:54:51 2025
read: IOPS=396, BW=396MiB/s (415MB/s)(50.0GiB/129239msec)
clat (usec): min=133, max=180737, avg=2522.21, stdev=3642.09
lat (usec): min=133, max=180738, avg=2522.37, stdev=3642.10
clat percentiles (usec):
| 1.00th=[ 149], 5.00th=[ 194], 10.00th=[ 212], 20.00th=[ 260],
| 30.00th=[ 799], 40.00th=[ 1500], 50.00th=[ 1876], 60.00th=[ 2245],
| 70.00th=[ 2737], 80.00th=[ 3458], 90.00th=[ 5997], 95.00th=[ 8029],
| 99.00th=[ 12387], 99.50th=[ 16712], 99.90th=[ 34341], 99.95th=[ 45876],
| 99.99th=[116917]
bw ( KiB/s): min=51200, max=2932736, per=100.00%, avg=406019.97, stdev=193023.81, samples=258
iops : min= 50, max= 2864, avg=396.50, stdev=188.50, samples=258
lat (usec) : 250=17.75%, 500=11.38%, 750=0.74%, 1000=1.46%
lat (msec) : 2=21.94%, 4=30.99%, 10=13.34%, 20=2.05%, 50=0.31%
lat (msec) : 100=0.02%, 250=0.02%
cpu : usr=0.14%, sys=10.08%, ctx=37023, majf=0, minf=269
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=51200,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=396MiB/s (415MB/s), 396MiB/s-396MiB/s (415MB/s-415MB/s), io=50.0GiB (53.7GB), run=129239-129239msec
Okay, I could live with those numbers as long as they are stable and consistent.
So I do 5 runs of reading an 800G file (which does include a write during initial file creation) and and writing 900G file, with a mix of both reading and writing at the same time.
The idea is to see if something breaks, so i'm also monitoring the logs and drive temps
I will maybe never experience this kind of thing in one go unless during a resilver so if this is stable I'm good with that
I leave these running and go have a drink with a some friends, it's the weekend after all.
Reading an 800G file:
fio --name=read --rw=read --bs=1m --size=800g --filename=./bigtest
read: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=310MiB/s][r=310 IOPS][eta 00m:01s]
read: (groupid=0, jobs=1): err= 0: pid=10516: Sun Jan 26 21:32:56 2025
read: IOPS=255, BW=256MiB/s (268MB/s)(800GiB/3205833msec)
clat (usec): min=356, max=303293, avg=3911.61, stdev=6703.53
lat (usec): min=356, max=303293, avg=3911.78, stdev=6703.52
clat percentiles (usec):
| 1.00th=[ 416], 5.00th=[ 429], 10.00th=[ 437], 20.00th=[ 457],
| 30.00th=[ 506], 40.00th=[ 603], 50.00th=[ 676], 60.00th=[ 3982],
| 70.00th=[ 4555], 80.00th=[ 5276], 90.00th=[ 11731], 95.00th=[ 12387],
| 99.00th=[ 20579], 99.50th=[ 24773], 99.90th=[ 77071], 99.95th=[125305],
| 99.99th=[198181]
bw ( KiB/s): min=43008, max=555008, per=100.00%, avg=261727.05, stdev=73610.29, samples=6411
iops : min= 42, max= 542, avg=255.54, stdev=71.87, samples=6411
lat (usec) : 500=29.19%, 750=21.95%, 1000=0.45%
lat (msec) : 2=1.53%, 4=7.32%, 10=24.63%, 20=13.79%, 50=0.95%
lat (msec) : 100=0.12%, 250=0.07%, 500=0.01%
cpu : usr=0.10%, sys=13.54%, ctx=402475, majf=0, minf=269
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=819200,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=256MiB/s (268MB/s), 256MiB/s-256MiB/s (268MB/s-268MB/s), io=800GiB (859GB), run=3205833-3205833msec
and writing a 900G file:
fio --name=write --rw=write --bs=1m --size=900g --filename=./big2testfile
write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=216MiB/s][w=216 IOPS][eta 00m:00s]
write: (groupid=0, jobs=1): err= 0: pid=24687: Sun Jan 26 23:32:36 2025
write: IOPS=179, BW=180MiB/s (188MB/s)(900GiB/5127844msec); 0 zone resets
clat (usec): min=118, max=400029, avg=5542.57, stdev=5208.15
lat (usec): min=120, max=400047, avg=5561.90, stdev=5208.03
clat percentiles (msec):
| 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 5], 20.00th=[ 5],
| 30.00th=[ 5], 40.00th=[ 5], 50.00th=[ 5], 60.00th=[ 5],
| 70.00th=[ 6], 80.00th=[ 6], 90.00th=[ 7], 95.00th=[ 8],
| 99.00th=[ 22], 99.50th=[ 35], 99.90th=[ 87], 99.95th=[ 101],
| 99.99th=[ 144]
bw ( KiB/s): min= 6144, max=2981888, per=100.00%, avg=184084.36, stdev=65812.35, samples=10254
iops : min= 6, max= 2912, avg=179.74, stdev=64.27, samples=10254
lat (usec) : 250=0.10%, 500=0.02%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=1.37%, 10=95.23%, 20=2.08%, 50=0.88%
lat (msec) : 100=0.20%, 250=0.07%, 500=0.01%
cpu : usr=0.45%, sys=3.40%, ctx=931630, majf=0, minf=16
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,921600,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=180MiB/s (188MB/s), 180MiB/s-180MiB/s (188MB/s-188MB/s), io=900GiB (966GB), run=5127844-5127844msec
From the TrueNAS GUI
In general, I think I learned a whole lot of stuff about USB somethings i had no idea about before.
for drive in sda sdb sdc sdd; do echo "=== /dev/$drive ==="; smartctl -A /dev/$drive | grep -i temp; done
for temps.It's hard to see Argo CD mentioned and GitOps not mentioned (though tbf that's the point of Argo).
GitOps is a way to manage your Kubernetes clusters where your desired state lives in Git, and tools like Argo CD continuously sync this state to your cluster.
Think of it like "infrastructure as code" but for Kubernetes resources.
Well for starters, given how often I kept rebuilding everything from scratch, being able to just point Argo CD at my repo and have it apply everything was 👌.
Anyway, why GitOps?
GitOps helps:
Before jumping into it, you need your cluster in a "usable" state i.e:
sops
decrypt and pipe to kubectl
)It's a tree structure (of sorts) where you have one root application that points to all your other applications.
When Argo syncs this root app, it creates and manages everything defined in your repo.
I ended up preferring Helm charts for this, though other methods exist.
Assuming you already installed Argo
First, grab the initial admin password:
k -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
Port forward so you can authenticate to the CLI (you can also access the UI)
k -n argocd port-forward services/argocd-server 8080:80
Login on the CLI
argocd login localhost:8080
Change the default password:
argocd account update-password --new-password "<your password>"
Add your git repo:
argocd repo add [email protected]:mrdvince/<your repo>.git --ssh-private-key-path <your ssh key path>
You can create the root app either via CLI:
argocd app create apps \
--dest-namespace argocd \
--dest-server https://kubernetes.default.svc \
--repo [email protected]:mrdvince/<your repo>.git \
--path apps/argo_apps
Then sync it:
argocd app sync apps
Or apply a manifest with kubectl
:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: apps
namespace: argocd
spec:
project: default
source:
repoURL: [email protected]:mrdvince/<your repo>.git
targetRevision: HEAD
path: apps/argo_apps/
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- RespectIgnoreDifferences=true
- ApplyOutOfSyncOnly
Note: the path: apps/argo_apps/
points to the path from the base of the repo.
Once everything is set up, the workflow looks like this:
Now just sit back and watch as Argo CD starts creating and managing all your applications defined in the git repo.
You should then be able to see a dashboard that looks like my screenshot below
After setting up and debugging various parts, I thought I'd share some basic tips that have helped me along the way.
Here's how to merge multiple kubeconfig files:
KUBECONFIG=~/.kube/config:~/.kube/config.cluster2 kubectl config view --flatten > ~/.kube/config.merged
cp ~/.kube/config ~/.kube/config.backup
mv ~/.kube/config.merged ~/.kube/config
You can then rename contexts for better clarity:
kubectl config rename-context default prism
kubectl config rename-context kubernetes-admin@kubernetes atlas
And set proper permissions on your kube config:
chmod 600 ~/.kube/config
If pods aren't scheduling on control plane nodes (I'm using 3 control plane nodes), check for taints:
kubectl get nodes -o json | jq '.items[].spec.taints'
To remove control-plane taints if needed:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
In general, most issues can be found and solved by following a pattern:
An example of a certificate issue:
Follow the chain of resources when debugging cert-manager:
kubectl get certificate -n argocd
kubectl -n argocd describe certificate argocd-certificate
kubectl -n argocd describe certificaterequests.cert-manager.io argocd-certificate-1
kubectl -n argocd describe order argocd-certificate-1-1494176820
kubectl -n cert-manager logs pods/cert-manager-<some-hash>
Other times just deleting a resource and having it get recreated solves the issue, for example, switching from staging to production Let's Encrypt, you may need to delete the old secrets or the orders and they should be recreated:
e,g kubectl -n argocd delete secrets argocd-tls
When services aren't reachable:
dig
or nslookup
to verify DNS resolutiontcpdump
and netstat
for network debugging:# Check listening ports
netstat -tlpn
# Monitor ARP requests
tcpdump -i any -n arp
If setting up a new cluster using kubeadm (not on the cloud) use Metalb or Cilium to give load balancer IP addresses.
If using Cilium, here's a sample configuration:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "lb-pool"
spec:
blocks:
- cidr: "192.168.30.140/30"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: cilium-l2-announce
spec:
externalIPs: true
loadBalancerIPs: true
interfaces:
- eth0
All services run through traefik so a few loadbalancer IPs are plenty.
Debug Argo CD applications, you can render out the chart:
helm template . -f values.yaml > rendered-app.yaml
And for helmfile:
helmfile template > rendered.yaml
]]>