It's Always DNS

I bought an Ikea shelf (Bekant) to house the stormlight NUCs and my Synology NAS. I shut down the entire cluster so that I could move the machines into the shelf. I built the shelf. I placed all the machines into the shelf and meticulously looped cables behind the shelf and plugged them into a powered-off surge protector. I hit the power switch on the surge protector and booted everything.

Since the NUCs were built with NVMe disks, they boot up instantly. Fatty was unfortunately behind.

I checked the kubernetes dashboard for the cluster and noticed that the docker registry and the nfs-client-provisioner services were down. They could not reach fatty (Synology NAS). Makes sense, fatty is old and have four spindle drives it needs to validate.

Once fatty was available, I restarted the pods in stormlight that were broken with no luck. The pods were down.

I looked into the error and realized they were failing to connect to https://fatty.stormlight.home.

Mutha F'er. It's always F'n DNS.

This is what I get for making my router's primary DNS server fatty.

When fatty was offline, my router fell back to the secondary resolver. When the NUCs came online and started querying fatty.stormlight.home, they cached the secondary resolver's response of NXDOMAIN. And that's my problem.

The fix? Flush the DNS cache and restart.

I flushed dns (with ansible: ansible -i hosts all --become -a 'systemd-resolve --flush-caches') and then verified the NUCs were able to resolve fatty's DNS. Afterwards, I checked the kubernetes dashboard and everything came back online.

Anyways, here's what my corner looks like now:

Stormlight: My Intel NUC Kubernetes Cluster

When I began building a kubernetes cluster in December 2019, I didn't have a great plan. I wanted to program with go, I wanted to learn kubernetes, and I definitely wanted Intel NUCs. As I researched technical decisions after technical decisions, I finally came to a list of desires for the cluster. First, I had to have a name. I named the cluster Stormlight.

Second, I crafted user stories I wanted for myself.

  1. I want to access my cluster services at the domain .stormlight.home so that I don't have to remember IP addresses and port numbers.
  2. I want a simple (to me) deployment system that didn't require touching DNS configuration, storage configuration, and TLS certificate configuration for each service I deploy. If I did have to configure these components for a service, the settings should live within the kubernetes manifests.
  3. When I open cluster HTTP services with Chrome, I wanted a locked icon in the URL bar. I don't want to see the "Your connection is not private" warning and then click the "Proceed to ..." link. These annoy me.

With these out of the way, let's dive into the components that make up the Stormlight.

Physical Hardware

Obviously, I'm using Intel NUCs. But, there are a few other devices on the network that help fulfill my needs. Here are the machines, their names, and some specs.

  • lightweaver
    • Kubernetes master
    • Intel NUC 8i3BEK M.2 SSD
    • 32GB RAM
    • 250GB SSD
  • skybreaker
    • Kubernetes worker
    • Intel NUC 8i3BEK M.2 SSD
    • 32GB RAM
    • 250GB SSD
  • windrunner
    • Kubernetes worker
    • Intel NUC 8i3BEK M.2 SSD
    • 64GB RAM (I got lucky here! Amazon shipped me 64GB instead of the originally purchased 32GB!)
    • 250GB SSD
  • fatty
    • Synology DS413j
    • 8TB storage
    • Some pathetic amount of CPU and RAM. This thing is old, slow, and still works. I can't really complain.
  • TRENDnet 8-Port Gigabit GREENnet Switch
  • Netgear Nighthawk Wifi Router

My home's network diagram looks like this.

Network Diagram
Network Diagram

For all machines, I configure a static IP address on the wifi router. For NUCs, I assign the IP address when the computer is installing the operating system. There's likely a more simple way to do this, but this worked, and I only had to do it three times.

Software

  • Ubuntu 18.04 Server
  • DNS Server
    • Runs on fatty using Synology's DNS server package
    • Hosts the private zone stormlight.home
    • All machines have hostnames defined to make SSH easier
    • *.stormlight.home record points to lightweaver (more on this later)
  • NFS Server
  • Certificate Authority (self-hosted) for SSL certificate signing
    • Root CA for creating intermediate CAs
    • Intermediate CA for signing server certs
    • Both Root and Intermediate certs are installed on my laptop and all cluster machines
  • HAProxy
    • All traffic to the Stormlight is directed here (via *.stormlight.home DNS above)
    • Runs on lightweaver
    • Uses a wildcard cert for *.stormlight.home giving me a nice 🔒 icon in Chrome
    • Terminates SSL traffic
    • Proxies traffic to the local kubernetes ingress (see kubernetes configuration below)
  • Kubernetes v1.17
    • Installed with kubeadm
    • Uses a single host as the master (lightweaver)
    • Uses kubernetes self-signed certs (I built the CA after I set up kubernetes, so I didn't use my own CA at the time).

stormlight.home Domain

Kubernetes relies on load balancers in the cloud or on-premise to handle ingress traffic. For Stormlight, I could deploy services using NodePorts, and I would be able to access the service at <ip address of any NUC>:<NodePort>. But I find this inelegant. I want a domain name for Stormlight.

Therefore, I looked at using DNS. Initially, I wanted to use a public top-level domain. But this costs money, and I'm cheap. So I decided on the stormlight.home private domain. It's not entirely clear to me that the .home TLD is suitable for private use, but I'm comfortable dealing with this in the future.

stormlight.home is served by the DNS server running on fatty. My home router is configured to request records with fatty first before hopping out to 1.1.1.1. Therefore, stormlight.home is available while I'm connected to my home network.

stormlight.home has a handful of configured DNS records. Every machine Stormlight has an entry. This keeps me from saving IP addresses in my SSH config to connect to my NUC machines. Aside from machine records, the DNS server has a wildcard record handling all other subdomains. This is how I send traffic to Stormlight.

Ingress Traffic to Stormlight

Running on lightweaver's port 443 is HAProxy enabling traffic into Stormlight from *.stormlight.home subdomains. HAProxy terminates SSL (user story #2 and #3) and forwards traffic, locally, to the nginx-ingress NodePort service running in kubernetes.

Tracing an HTTPS Request

Let's move our attention to a simplified HTTPS request for the fictional service mysvc. Below is a diagram of an HTTP request into Stormlight. I simplified kubernetes to make the diagram easier to comprehend. Because in reality, I remove the taints on the master so that the kubernetes scheduler runs on all NUCs. So theoretically, traffic could stay entirely on lightweaver if there were mysvc pods running there.

HTTPS Request Tracing
  1. A user request https://mysvc.stormlight.home. This resolves to lightweaver's IP address because I have the wildcard DNS *.stormlight.home record on my Synology NAS.
  2. The request routes to lightweaver's HAProxy.
  3. HAProxy terminates the SSL request.
  4. HAProxy forwards the HTTP request to the local kubernetes cluster's nginx ingress port.
  5. nginx ingress forwards to the mysvc kubernetes service.
  6. mysvc processes the request by forwarding to whatever deployment/replica/pods are running within the cluster.
  7. mysvc sends the response back to the nginx ingress.
  8. nginx ingress responds to HAProxy.
  9. HAProxy responds to the user.
  10. Hopefully, the user is happy. The user is me. I am happy.

Kubernetes

Let's move our attention to Kubernetes. Aside from nginx-ingress, there are a few other services.

Here is the complete list of services on Stormlight.

nginx-ingress

Stormlight uses nginx-ingress to route all HTTP traffic into the cluster. This really helps me with user story #2. I can configure the subdomain/path of a service running in stormlight simply by creating an Ingress resource. I don't have to configure anything in fatty's DNS server or the HAProxy. I really dig this setup.

For example, here's a basic configuration for kuard (a handy debugging application ) so that uses https://kuard.stormlight.home as the domain.

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-kuard
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  # I can set this to whatever I want
  - host: kuard.stormlight.home
    http:
      paths:
      - path: /
        backend:
          serviceName: kuard
          servicePort: 80

nfs-client

A part of user story #2 deals with storage. While I could configure services to use local storage, I wanted to use fatty as well. Data on fatty has better durability (four 2TB drives) and configured for cloud backups (I did this a long time ago). I want to use local storage for performance, but anything important would be stored on fatty.

I dug into Kubernetes docs on storage options and ran around in circles. Should I use Volumes, Persistent Volumes, or Container Storage Interface plugins? After several days of reading, I landed on nfs-client from the external-storage github repo. To add to my initial confusion, external-storage states that the repository is deprecated and that I should use sig-storage-lib-external-provisioner instead. But, on sig-storage-lib-external-provisioner page, it links back to external-storage for examples. Sigh. Luckily, nfs-client worked well and was easy to set up.

Here's how I configured nfs-client.

nfs-client creates a dynamic provisioner for fatty's NFS shares. With the provisioner, I expose two types of storage classes. The first one, called fatty-archives, archives the data when a PersistentVolumeClaim (PVC) is deleted. Therefore, I don't have to worry about losing data when I muck around with the cluster and accidentally delete PVCs.

The second storage class, called fatty, deletes data when a PVC is removed. Honestly, I don't have much use for the fatty storage class yet, but it enables scaling pods across nodes and use the NFS mount for shared data.

Here's the configuration I use for my docker registry setup.

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: registry-data
  annotations:
    volume.beta.kubernetes.io/storage-class: "fatty-archives"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100G
---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: registry
spec:
  selector:
    matchLabels:
      app: registry
  replicas: 1
  template:
    metadata:
      labels:
        app: registry
    spec:
      containers:
        - name: registry
          image: registry
          imagePullPolicy: Always
          ports:
            - containerPort: 5000
          volumeMounts:
            - name: registry-data
              mountPath: /var/lib/registry
      volumes:
        - name: registry-data
          persistentVolumeClaim:
            claimName: registry-data

If you look carefully, you don't see fatty's host or mounts shares in this configuration. The NFS configuration is managed in a single place -- the nfs-client manifests. From a service perspective, there's no dependency on NFS. The service depends on a PVC (i.e., a storage request) and the volume configuration for the Deployment. In the future, if I decided to replace fatty with a new NAS (I'm due for an upgrade), then I reconfiguring a new storage class, moving data around from the old NAS to the new NAS, and then changing the PVCs for all my services. I don't have to find all the places where the NFS information is configured in service manifests. Lovely!

Docker Registry

One of the biggest reasons I wanted durable network storage was to run a docker registry. Remember, I'm cheap. So paying for a registry was not something I wanted to do. I also didn't want to store the data on a single node. If I wanted to scale up the number of pods running the registry, I'd like to do so without worry about where the data lives.

So Stormlight runs Docker's Registry at registry.stormlight.home. The images are stored on fatty through the nfs-client provisioner.

Master Component Backups

I worry about failures. I work with cloud providers at my day job, so failures are common and expected. At home, my computers fail far less than cloud instances (case in point: fatty). But, they will fail. And that makes me nervous. So, I made some contingency plans.

Kubernetes depends on the etcd as the backing database. In a single-master configuration, there's one etcd instance. Failure of etcd renders the master services unusable. Services running on the cluster continue to run, but if a service pod dies, the master is unable to provision a replacement. Therefore, backing up etcd is useful.

I found this nice post on backing up the master. I tweaked the script so that it also backs up kubernetes' self-signed certs. This is backup runs every hour and stores data in fatty. The restoration process is scripted with ansible and very similar to what's described in the linked blog post.

This plan leaves me a little less nervous, but it doesn't consider a full lightweaver failure. With a full failure of lightweaver, the entire cluster unusable. All *.stormlight.home subdomains will fail because the machine is offline.

Even writing this makes me nervous, but I think I'll deal with the failure in the future. I've built Stormlight using code. There are no manual steps. So, if I lose the cluster, I can rebuild by relying on my code. I'll open source my code in a future post.

That said, I might buy a few Raspberry Pis to build a multi-master cluster later. That could be another fun side project.

The Results: My Deploy Process

With all the above in place, I can begin programming my own services (about damn time). To deploy, all I need is a single manifest file with a Deployment, a Service, and an Ingress. Here's what kuard's manifest file looks like:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kuard
spec:
  selector:
    matchLabels:
      app: kuard
  replicas: 1
  template:
    metadata:
      labels:
        app: kuard
    spec:
      containers:
      - image: gcr.io/kuar-demo/kuard-amd64:1
        imagePullPolicy: Always
        name: kuard
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: kuard
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
  selector:
    app: kuard

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-kuard
  annotations:
    # use the shared ingress-nginx
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: kuard.stormlight.home
    http:
      paths:
      - path: /
        backend:
          serviceName: kuard
          servicePort: 80

With the above file, I run kubectl apply -f kuard.yml and then visit https://kuard.stormlight.home. That's my whole deploy process.

And if I need durable storage? I can use the fatty and fatty-archives storage class, or hook into a NUC's local disk.

Onwards

That's Stormlight! I'll share my code with future posts. I need to spend some time cleaning up the code to make it a little easier to use first.

The Side Project: Keeping The Maker In Me Alive

I started building a homelab using Intel NUCs in December, but I never wrote down why I wanted to do this. Since my mind runs rampant with thoughts, I figure I should write this down before I lose it.

For the past two years, my work activities lead me farther away from coding. It was a conscious decision. I became a tech lead for a year. After that, I formed a new team as the engineering manager. I'll likely stick to this role for a while because it's challenging to solve problems amongst humans. I finally understand the saying that all problems are human problems. I definitely don't know how to solve them, though. That said, I sorely miss being in a flow state and building projects.

So the middle of last year, I thought about what I could work on.

Firstly, the project should keep my coding skills sharp. I immediately thought of go and how I wanted to learn it. Like really learn it. I built a few services with the language, but honestly, I'm still looking up packages and syntax rules.

Secondly, I want to learn kubernetes. I helped build the original kubernetes cluster at work (v1.3 yeesh!) but moved onto other projects that kept me at arm's length. I never learned the details well enough. I want to know kubernetes like I know the back of my hand.

Finally, right around the time, I thought about these ideas, I fell in love with Intel NUCs. They're mini-computers (4" x 4"), and I love the look. I'm a big sucker for dope-ass-looking devices. They're also decently priced, and I had closet space to spare. So I purchased three of them back in December 2019.

My goals are:

  • set up a local kubernetes cluster so that I can launch go projects
  • write posts about the process
  • open source code for the system (first one was the stormlight-iso)

I've already built quite a bit of this system, so in the next post, I'll present what I've built.

Oh, and the name of the cluster is stormlight because I love the Stormlight Archive books from Brandon Sanderson.

Introducing stormlight-iso

I decided in December that I want to start coding again. I've been in engineering leadership roles for the past two and a half years, which has kept me away from coding. I miss being in a flow state working on low-level technical problems for hours on end. One of the areas I'm missing out on is mastering kubernetes. I use kubernetes at work, but I removed enough that it's difficult to grok the details.

Therefore, last winter, I bought three Intel NUCs (8i3BEK1) to be the basis of my kubernetes homelab. After physically building each machine, connecting them to my network, and manually installing Ubuntu 18.04 three times, I decided that I didn't want to do this again. I am, to my wife's annoyance, forgetful. I have difficulty remembering to buy milk when going to the grocery store with the sole intention of buying milk. Imagine what the three NUCs looked like after I installed Ubuntu three times. Not very similar.

Luckily, Ubuntu had an installation method called preseeding to install itself with pre-configured answers to the dialogue prompts. Essentially, this allowed me to remaster the installation ISO so that I did not have to manually enter resposes to dialog prompts. After following the instructions from the wiki, I created an ISO that installed Ubuntu Server from start to finish without any keyboard prompts. With the ISO, I installed Ubuntu identically on my three NUCs and went about my business installing Kubernetes.

This development took several weeks because I became a father at the same time. And apparently, newborns need to feed every few hours. Though, I admit that's a coverup to the real reason it took so long. I didn't know how to do this. I've never dealt with the debian installer (what Ubuntu uses for installation), manipulating initrd, or configuring VirtualBox images to mimic intel NUCs for development. And then to top it all off, I still had to deal with differences in linux and mac tools.

Nevertheless, I codified my work into the stormlight-iso project on GitHub (stormlight is the name of my kubernetes cluster). Now I can forget the entire process without guilt. And if you'd like, you too can forget how to do it too!

With that, I'll leave you at the beginning of the README.

stormlight-iso

This project builds an Ubuntu 18.04 ISO to install Ubuntu unattended (no keyboard interaction) on Intel NUC 8 Core i3 machines.

This project assumes:

  • Installation of Ubuntu via USB stick
  • ISO built on a Mac OSX machine
  • Intel NUC has a static IP assigned to it to SSH to the machine (or some way for you to find the IP of your machine after Ubuntu has been installed and booted)
  • A USB stick with minimum 100MB of space

The project is designed to minimize the amount of physical effort to set up an Intel NUC because the author is lazy and forgetful. Also, the author has several Intel NUCs and manually entering in configuration value is error prone. Here's what the installation process looks like.

  1. Build the stormlight.iso with preseed config and an ssh public key
  2. Create a bootable USB from the stormlight.iso
  3. Walk over to the Intel NUC, plug in USB stick, and power on the machine
  4. Wait until the machine powers itself down after the installation (roughly 10-15 mins). "Look ma, no keyboard!"
  5. Unplug USB stick and power on the machine.
  6. Walk back to your computer and SSH into the machine.

That's it!

Simulate Intel NUCs with Virtualbox

Since yesterday's update, I've made the repo! It has a minimal (read: shitty) README, but the code is all there. Just git clone the repo on a linux machine and run make.

I need to add a simple way to check all dependencies. Maybe a command like make check-deps would work.

I also need to add a user-friendly way of letting people know that they should add a config/authorized_hosts file.

Simulating Intel NUCs with VirtualBox

I was thinking about writing some tests for this new repo. I'd like to run my testing on my mac before making physical moves. Here are some commands I used help manage the stormlight virtualbox.

# Turn on and off the VM
VBoxManage startvm stormlight
VBoxManage controlvm stormlight poweroff

# Attach stormlight.iso as a DVD
VBoxManage storageattach stormlight --storagectl IDE --port 1 --device 0 --type dvddrive --medium ~/Downloads/stormlight.iso

# Setting medium to "none" removes the drive. 
VBoxManage storageattach stormlight --storagectl IDE --port 1 --device 0 --medium "none"

Creating a VM to Simulate an Intel NUC with Virtualbox

# making a virtualbox for stormlight
alias vb=VBoxManage

mkdir '/Users/pbui/VirtualBox VMs/test/'
vb createvm --name test --register --ostype "Ubuntu_64"
vb modifyvm test --firmware efi --rtcuseutc on --ioapic on --memory 1024 --vram 128 --nic1 nat 

vb createmedium disk --filename '/Users/pbui/VirtualBox VMs/test/test.vdi' --size 10240 --format VDI --variant Standard

vb storagectl test --name nvme --add pcie --controller NVMe
vb storagectl test --name ide --add ide --controller PIIX4

vb storageattach test --storagectl nvme --port 0 --device 0 --type hdd --medium '/Users/pbui/VirtualBox VMs/test/test.vdi' --nonrotational on

vb storageattach test --storagectl ide --port 1 --device 0 --type dvddrive --medium ~/Downloads/stormlight.iso

vb startvm test