Machine and Container updates (Day 19)

Vince

05 Feb 2025 • 1 min read

Photo by Francesco Ungaro / Unsplash

Today was all about updates.

What started as just routine maintenance turned into a reminder of why we keep backups (and backups of backups).

The Update Plan

Update Proxmox nodes
Update VMs
Update OPNsense to version 25

Things Go Sideways

After the updates and reboots, OPNsense decided to forget about its VLANs and misconfigure WAN and LAN interfaces.

This cascaded into:

Everything losing connectivity
DNS becoming unreachable
General network chaos

Recovery Process

Direct connection to Proxmox node (thank goodness for out-of-band management)
Tried the built-in backup list - no luck
Remembered the lesson from the last time i had to do a reinstall at 1.20am: keep config backups locally
Reset OPNsense, restored from local backup
Fixed an interface mismatch
Network starts coming back to life

DNS

Everything seemed fixed until I noticed I still had no internet.

OPNsense looked good, but the DNS server was unreachable despite appearing online and healthy. So basically "everything's fine but nothing works."

After some troubleshooting and replacing the VMs NIC and re-assigning it the same static ip on OPNSense the node was now reachable and my DNS working.

Most services recovered quickly once DNS and OPNsense were back, though TrueNAS took its time and couldn't update catalogs so added in a Quad 9 as a fallback for next time (because there probably will)

So

Keep multiple backups in different locations (the built-in backups aren't always enough)
Added Quad9 to some nodes like trunas as a fallback DNS for future resilience
When debugging network issues, don't trust what "looks fine" - verify connectivity layer by layer
Updates, while necessary, can turn out to be well ....

And the Kubernetes clusters (both k3s and the HA one) - everything just came back online like nothing had happened, without needing to touch a single node (including the 2 haproxy nodes etc).

At least now I have a fallback plan for DNS issues, and another validation of why Kubernetes is great for self-healing infrastructure.