It's Always DNS
I bought an Ikea shelf (Bekant) to house the stormlight NUCs and my Synology NAS. I shut down the entire cluster so that I could move the machines into the shelf. I built the shelf. I placed all the machines into the shelf and meticulously looped cables behind the shelf and plugged them into a powered-off surge protector. I hit the power switch on the surge protector and booted everything.
Since the NUCs were built with NVMe disks, they boot up instantly. Fatty was unfortunately behind.
I checked the kubernetes dashboard for the cluster and noticed that the docker registry and the nfs-client-provisioner services were down. They could not reach fatty (Synology NAS). Makes sense, fatty is old and have four spindle drives it needs to validate.
Once fatty was available, I restarted the pods in stormlight that were broken with no luck. The pods were down.
I looked into the error and realized they were failing to connect to https://fatty.stormlight.home
.
Mutha F'er. It's always F'n DNS.
This is what I get for making my router's primary DNS server fatty.
When fatty was offline, my router fell back to the secondary resolver. When the NUCs came online and started querying fatty.stormlight.home
, they cached the secondary resolver's response of NXDOMAIN
. And that's my problem.
The fix? Flush the DNS cache and restart.
I flushed dns (with ansible: ansible -i hosts all --become -a 'systemd-resolve --flush-caches'
) and then verified the NUCs were able to resolve fatty's DNS. Afterwards, I checked the kubernetes dashboard and everything came back online.
Anyways, here's what my corner looks like now:
