When I first started the homelab it was intended to be used mainly for learning. However, as time has gone on I’ve ended up using it more for home infrastructure. This has resulted in an infrastructure that is fairly fragile and difficult to make changes in, defeating the purpose of using it as a learning tool. I’ve therefore decided that now is the time for a complete rebuild. While this won’t be an overnight process and I’ll probably need to retain the existing infrastructure and replace individual components at a time, I aim to go in with a plan this time, rather than allowing things to develop organically.
To this end, I have identified some pain points with the existing set up and have a plan for how to improve them in the rebuild. Each step will be documented in future articles.
When I initially started, I used libvirt on Ubuntu as my hypervisor. While this worked pretty well initially there are definitely a few pain points. There is no Windows management client, requiring a Linux PC for managing VMs. The networking is also a bit of a pain to set up compared to a more dedicated hypervisor OS. As I’m running on a single host I don’t want to deal with the additional complication of oVirt so plan to use Proxmox. This will allow me to use grouping to keep production and lab infrastructure separate. I can also use the built in LXC support to create containers for workloads where a full VM isn’t necessary.
To make matters even better, it also has modules for Ansible and Terraform, which will be useful further down the line when I implement configuration management.
As my ISP provided IPv4 address is behind CGN, IPv6 has proved useful to allow external access to internal services. However, running IPv4 and IPv6 on the internal LAN has proved to be overly cumbersome and usually resulted in me just sticking with IPv4. This time I plan to go IPv6 only for the LAN, using NAT64 to enable access to IPv4 services.
Most of my existing resources have been configured manually. This lack of standardisation has resulted in a mix of different operating systems with different base configuration on each device. Going forward, all configuration will be set up using configuration management and stored in source control. This will make it easier to rebuild services, either for disaster recovery or just to make a copy for testing. My current plan is to provision all VMs using Terraform and then manage configuration through Ansible. As much as possible will be run on Docker or Kubernetes.
An advantage to using containers is that stateless workloads can be parallelised, making updates possible with no downtime. While I only have a single host so can’t avoid downtime for host updates, it should be possible to restart or update anything else without causing interruptions to anything critical such as the firewall, DNS or file access.
I had previously run Zabbix but due to the lack of standardisation, managing alert policies for all the different VMs posed an excessive maintenance burden so it started to fall by the wayside. I will rebuild it using a standard set of basic alerts for disk space, updates, expiring SSL certificates etc.
Initially I will back up all existing VMs and restore them as-is onto Proxmox. I will then begin developing the Kubernetes cluster and migrate all possible workloads onto that. This will require me to come up with some plan for persistent HA SQL and file storage. At this point I will also set up a new Zabbix server and figure out exactly what I want to monitor.
Once all internal workloads are migrated, I will replace the firewall and disable IPv4. Finally I will do a full test of the infrastructure as code by setting up a clone of the network which can be used as the test network. Come back soon for future updates.