Migrate from Public Cloud: Building Kubernetes Bare-Metal Infrastructure

Migrate from Public Cloud: Building Kubernetes Bare-Metal Infrastructure

Introduction

With rising cloud prices and a growing demand for infrastructure control, more enterprises are looking into on-premises options. For us, the solution was Kubernetes on bare-metal servers, which was a difficult but gratifying trip. This post walks you through the entire process, including the hurdles we faced and how specific tools contributed to our success.

If you want to move workloads off the cloud while keeping flexibility and scalability, this extensive technical guide is for you.

Why Kubernetes on Bare Metal?

Kubernetes is frequently promoted as the preferred container orchestration solution. However, in this case, we used it for more than just container management. Kubernetes acted as a fleet management system, managing server-level duties that were previously handled by distinct technologies.

This strategy enabled us to:

  1. Reduce reliance on cloud services and save more than 70%.
  2. Manage physical hardware and workloads centrally.
  3. Create an infrastructure that is optimized for maximum control and performance.

Despite the obvious benefits, implementing Kubernetes on bare hardware is complicated. Typical problems include server deployment, distributed storage management, and assuring dependable networking. Here's how we overcame these hurdles.

Core Tools and Technologies

We used a number of sophisticated tools to streamline Kubernetes deployment and operation on bare metal. The following are the fundamental technologies that enabled this.

Talos Linux

Article content

Sidero Labs' Talos Linux is a modified Linux distro created specifically for running Kubernetes. Unlike general-purpose operating systems, Talos:

  1. Is lightweight (less than 50MB)
  2. Operates purely via an API, removing the need for a shell.
  3. Reduces the attack surface, considerably increasing security.

Here's how Talos operates in practice:

  1. Installation: Talos asks you to define your cluster setup in a declarative YAML file. Using talosctl, you can provision the OS on your bare-metal nodes, converting them into Kubernetes-ready computers.
  2. Management: All interactions with Talos take place through the API, which you may access using talosctl. This disables SSH access and further secures the nodes.

We assigned a junior DevOps engineer to handle the configuration. They were able to construct a completely functional Kubernetes cluster on our rack-colocated servers with few hassles, showcasing Talos' user-friendliness despite its cutting-edge architecture.

MetalLB for Load Balancing

Article content

Load Balancing on bare-metal Kubernetes clusters is famously difficult because you don’t have the managed load balancers provided by cloud providers. We used MetalLB to address this.

How MetalLB Works:

  1. Layer 2 Configuration: MetalLB operates in Layer 2 mode, listening for ARP queries and responding with the correct IP address. This strategy allowed us to effectively use public IP addresses assigned to our servers.
  2. Integration with Kubernetes: By deploying MetalLB as a Kubernetes operator, we were able to specify IP address pools in a ConfigMap. MetalLB then assigned IP addresses to Kubernetes services identified as LoadBalancer.

Implementation Steps:

  1. Defined the range of public IP addresses available in our environment.
  2. Installed MetalLB via kubectl using the official YAML manifests.
  3. Configured services (e.g., Nginx and HAProxy) with LoadBalancer type, enabling MetalLB to automatically assign public IPs.

As a result, our cluster's load balancing was easy, scalable, and cost-efficient.

Rancher Longhorn For Distributed Storage

Article content

Storage posed one of the most difficult issues. Many distributed storage solutions, such as Ceph or OpenEBS, require that each node in the storage cluster have at least two attached disks: one for the operating system and another for storage. Our nodes only had one boot disk per node.

Enter Rancher Longhorn, a lightweight but strong distributed storage system:

  1. Longhorn enabled us to effectively use single-disk nodes.
  2. It automatically handles cross-node replication, assuring data redundancy and availability.
  3. Even in our limited hardware environment, the configuration process was uncomplicated.

Implementation Steps:

  1. Deployed Longhorn via Helm charts into the Kubernetes cluster.
  2. Configured the storage classes to use Longhorn’s dynamic provisioning.
  3. Tested replication and failover scenarios to ensure data integrity.

Longhorn's ability to adapt to single-disk nodes was a game changer, since it allowed us to meet our storage needs without replacing hardware.

The Final Architecture

The final architecture looked like this:

  1. Bare-Metal Nodes: These nodes were provisioned with Talos Linux and served as the foundation of our Kubernetes cluster.
  2. Networking: MetalLB handled the public IP address pool and ensured that services were load balanced seamlessly.
  3. Storage: Rancher Longhorn provided a distributed storage solution tailored to our hardware limits.

To manage the cluster, we used Kubernetes' native capabilities and Talos Linux's API-driven approach. These technologies enabled us to manage provisioning, configuration, and maintenance without adding complexity or external dependencies.

Cost Savings and Resilience

One of the most significant benefits of this project was the cost reduction. By moving nearly all workloads from the cloud to our on-premises Kubernetes cluster, we saved over 70% on infrastructure costs. However, the cloud was not completely destroyed; we kept cloud storage for:

  1. Archival backups are used to protect against catastrophic cluster failures.
  2. Disaster recovery functions as a secondary layer of data protection.

With this hybrid strategy, we were able to strike a compromise between cost-efficiency and robustness. In the event of a complete failure, IaC technologies combined with cloud backups enable us to reconstruct the entire infrastructure in a matter of hours.

Future Plans

While the existing configuration is sturdy, we want to make significant improvements:

  1. Disaster Recovery Center (DRC): A facility dedicated to high availability and data security.
  2. Hardware Security Modules (HSM) protect important cryptographic keys.
  3. Centralized NAS Storage: Enables better data sharing and accessibility across teams.

These changes will boost the infrastructure by assuring data security, scalability, and long-term viability.

Key Takeaways

Building an on-premises Kubernetes cluster on bare metal is a daunting task, but with the appropriate tools and tactics, it is possible. Here are a few lessons learned:

  1. Utilize Specialized Tools: Solutions such as Talos Linux, MetalLB, and Rancher Longhorn make complex tasks easier.
  2. Prepare for Failures: Hybrid solutions that include on-premise and cloud backups improve resilience.
  3. Automate Everything: Infrastructure as Code is critical for managing complicated systems.

Bare-metal Kubernetes is a feasible alternative for enterprises wishing to decrease expenses while gaining control over their infrastructure. While the path is difficult, the rewards—both financial and technical—are well worth it.

This expertise has not only optimized our infrastructure, but has also paved the way for future new solutions. We intend to present our discoveries at technical conferences and inspire others to pursue similar pathways. Stay tuned for updates as we continue to fine-tune this system and push the limits of what is possible with Kubernetes on bare metal.


To view or add a comment, sign in

More articles by Glend Maatita

Insights from the community

Others also viewed

Explore topics