Julien DOCHE/My video streaming setup part 2: Running on kubernetes

Created Mon, 18 Oct 2021 22:53:57 +0200 Modified Sun, 08 May 2022 21:11:35 +0000
1083 Words

In a previous blog post I showed how my video streaming setup is working. But that does not tell you exactly how it is running on my home lab. I am using Kubernetes to orchestrate my containers, in this blog post I will describe how I implemented the setup I described earlier.

In order for all that to work on Kubernetes we have a few challenges to overcome, mainly persistence and exposing the services to the outside world.

Side note: In general if you need some help running Kubernetes for your home lab, the k8s-at-home discord is a great place to seek advice.

Deployment

To deploy those services I’m using the helm charts from the great source k8s-at-home https://github.com/k8s-at-home/charts A helm chart is basically comparable to a package you could install on your favorite linux distribution, but it applies to Kubernetes. For now, I am using helmfile for those components like this. However, I am very soon going to be moving those to the new way of deploying which I find better : using Flux like this.

Making services accessible to the outside world

I’m exposing the UIs of all those tools using traefik as the ingress controller. I’d like to try out the new and shiny Kubernetes Gateway API using traefik like this. I do not need it, but I’d like to play a little bit with it.

Persistence

Since we have multiple nodes in our Kubernetes cluster the services could run on any of those. Thus, we need a way to have the configuration of all the services persisted somewhere otherwise when a service would restart we would lose its configuration. We could store the data locally on a node and expose it to the pod running the service by using a local PV, but that would mean that if the node on which the data is stored is not available, I would be unable to run the service. I need a better solution.

What I needed was a distributed storage solution. I tried a few of them, this is how it went :

The ugly way

I first tried the NFS provisionner. I have a NAS at home, it can expose a directory as NFS, so I installed nfs-subdir-external-provisioner using helm. I knew a lot of the services I am running were using sql databases internally, I also knew those do not play well with NFS, but I decided I wanted to try it anyway because this solution was so simple to implement, if it failed it was not a big deal. Unsurprisingly this method, while beeing simple, did not work because of the incompatibility between NFS and sql databases. However, in the process I discovered quite a few bugs and pushed me to implement proper healthchecks for Sonarr Radarr and Bazarr like this.

The less ugly way

I decided to stop using the nfs-subdir-external-provisioner because Sonarr and Radarr would sometime randomly crash because of NFS. I looked for different solutions and one of them which seemed easy to implement was Longhorn. It tries to solve the distributed storage problem while keeping things as simple as possible to the operator. It basically stores multiple replicas of the data on few nodes in the cluster and expose it to the node running the pod that need to access the data. Longhorn regularly checks for missing replicas and create new replicas on other nodes if necessary. I again used helm to deploy Longhorn.

Moving from a StorageClass to another

And now that I had a new and shiny way of creating PVs, I needed to transfer the data from the old PVs to PVs provisionned from the new StorageClass. One way of cloning PVC is to create the new PVC and specifying the datasource on the new PVC like this. But it is not possible to change the StorageClass, it has to stay the same, so it is useless to us. Also, it requires the new PVC to be in the same namespace as the old one.

A solution is to create a new PVC with the new StorageClass, then we stop all pods that mounts the PVC. We create a temporary container and mount the old PV along with the new PV in the container. And we use the container to copy the old data to the new PV. This works well, but the creation of the new PVCs is manual. I hadn’t a lot of PVs to transfer and downtime was not an issue, so I could have done things this way, but I wanted to try a better approach. Also, I wanted to change the namespace name, so I had to do something a bit more clever.

I settled on using Velero with the new and shiny Restic integration. Velero is a backup solution for Kubernetes that backs-up Kubernetes objects. Velero has now an integration with Restic to back up the data stored in the PVs as well. And it has a super neat feature to restore the backed up PVs using a different StorageClass. This allows to transfer easily the data from the PVs of my old NFS StorageClass to the new Longhorn StorageClass and as a bonus we have a backup of the data.

Velero can store the backup on a S3 bucket, This is great since I am running minio on my NAS to expose a S3 interface.

The ultimate solution

Longhorn has quite a few caveats. The most anoying is that sometimes Longhorn will become stuck because it “forgot” to delete the VolumeAttachment object that attaches a volume to a node. When it does this I have to manually delete the VolumeAttachment objet to detach the volume and make it available to attach to a new node.

I am thinking of running ceph to replace Longhorn, however it is probably overkill. Although it would be interesting from a learning perspective.

What’s next ?

I now have a working setup that can automatically download content and can stream it anywhere I want just like Netflix! But there’s a little issue. My machines in my home lab are not very powerful and sometimes when transcoding high quality content, it is struggling a bit. Fortunatelly there is a way to remedy that! For now only Software decoding was used (Meaning using only the CPU to decode video). This is inefficient, and it can be greatly improved by using Hardware decoding (Using the GPU).

To find out how I configured this you can look at my next blog post