Debugging and General tips (Day 10)
Common debugging patterns and tips.
After setting up and debugging various parts, I thought I'd share some basic tips that have helped me along the way.
Managing Multiple Clusters
Here's how to merge multiple kubeconfig files:
KUBECONFIG=~/.kube/config:~/.kube/config.cluster2 kubectl config view --flatten > ~/.kube/config.merged
cp ~/.kube/config ~/.kube/config.backup
mv ~/.kube/config.merged ~/.kube/config
You can then rename contexts for better clarity:
kubectl config rename-context default prism
kubectl config rename-context kubernetes-admin@kubernetes atlas
And set proper permissions on your kube config:
chmod 600 ~/.kube/config
Node Scheduling Issues
If pods aren't scheduling on control plane nodes (I'm using 3 control plane nodes), check for taints:
kubectl get nodes -o json | jq '.items[].spec.taints'
To remove control-plane taints if needed:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
Troubleshooting Tips
In general, most issues can be found and solved by following a pattern:
- Get the resource
- Describe it
- And follow the trail of related resources
- Check the related logs
An example of a certificate issue:
Certificate Issues
Follow the chain of resources when debugging cert-manager:
kubectl get certificate -n argocd
kubectl -n argocd describe certificate argocd-certificate
kubectl -n argocd describe certificaterequests.cert-manager.io argocd-certificate-1
kubectl -n argocd describe order argocd-certificate-1-1494176820
kubectl -n cert-manager logs pods/cert-manager-<some-hash>
Other times just deleting a resource and having it get recreated solves the issue, for example, switching from staging to production Let's Encrypt, you may need to delete the old secrets or the orders and they should be recreated:
e,g kubectl -n argocd delete secrets argocd-tls
Network Debugging
When services aren't reachable:
- Check firewall rules and network policies between VLANs
- Use
dig
ornslookup
to verify DNS resolution - Verify LoadBalancer IP assignments
- Use
tcpdump
andnetstat
for network debugging:
# Check listening ports
netstat -tlpn
# Monitor ARP requests
tcpdump -i any -n arp
LoadBalancer Configuration
If setting up a new cluster using kubeadm (not on the cloud) use Metalb or Cilium to give load balancer IP addresses.
If using Cilium, here's a sample configuration:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "lb-pool"
spec:
blocks:
- cidr: "192.168.30.140/30"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: cilium-l2-announce
spec:
externalIPs: true
loadBalancerIPs: true
interfaces:
- eth0
All services run through traefik so a few loadbalancer IPs are plenty.
Helm and Argo CD Debugging
Debug Argo CD applications, you can render out the chart:
helm template . -f values.yaml > rendered-app.yaml
And for helmfile:
helmfile template > rendered.yaml