Scalability: What is Kubernetes trying to achieve exactly?

Understanding the abstraction behind nodes and pods

8 min readSep 22, 2022

I’m sure you heard about it, but if you never used it, or you’re only starting, Kubernetes can be extremely robust and confusing. Of course that everything is in the eye of the beholder, but that’s how I felt when I first started, and I’m sure I’m not alone in this.

In my recent works, I’ve been focusing a lot on scalability. I’ve been trying to create a pixel-streaming solution, which is a fancy term for streaming-3d-cloud-applications. The objective is simple to describe: Running Unreal simulations on very powerful computers hosted on the cloud and streaming them in real-time to people’s personal device, thus removing the necessity to own expensive hardware. While implementation can vary, at its core it’s the same: every person who’s trying to get into an experience, should be provided with a dedicated instance.

Source: Unreal Engine 5 Pixel Streaming Overview

In this article, I’ll be walking through the architectural solution for the use case in question. Besides a few remarks, I will try to avoid using any Kubernetes terms, and you will NOT become a Kubernetes expert by the time you finish reading this article. My goal is to provide readers with the right context to approach APIs like Kubernetes much quicker, and potentially become experts.

With that in mind, let’s get started!

Visualizing the cloud

I always thought of Kubernetes as a client tool. While this is somewhat true, because it has a client component, I was actually looking at it from the other way around. Kubernetes was built as a framework for managing your resources, typically as a back-end service provider. I will explain.

Imagine you have a farm of computers, potentially thousands of them. The computers vary in capabilities and hardware — some have more cores than others, and some have more memory; but in the grand scheme, you have a ton of resources. Your goal is to rent these resources to different tenants — you want people to pay you for compute power, kind of like real estate where you rent empty spaces so people can do as they wish. But what if you only own 4 bedroom apartments, and most tenants are interested in a single bedroom? Of course, you will rent a single bedroom to each tenant, and let them share an apartment. That’s what Kubernetes does, only with computers.

Kubernetes is a suite of tools that implements orchestration over a network of computers. In other words — it’s a real estate manager for computers. Here are some of the use cases Kubernetes was designed to tackle:

Keeping track of all the resources — see what’s available, what’s not available, and what’s malfunctioning.
If a computer suddenly stops working, perhaps because of a short-circuit, look for an alternative to accommodate the tenants who were hosted on that computer.
Ensure that things are separated, so tenants don’t access resources that don’t belong to them.
Keep things secure, so unwanted guests don’t access tenants’ networks, similar to what a Firewall does.
Replicating tenant applications and splitting incoming traffic between multiple computers in an efficient manner, also known as “load balancing”.

All of the above requires a system that can diligently keep track and manage everything. Kubernetes offers several components that when put together, can achieve the objective in question. Here are some of these components, that to my opinion, are conceptually easier to understand than others:

kube-apiserver — A REST API server that can be used by tenants to send application deployment requests to the cloud.
kubelet — An agent that’s installed on each and every computer and essentially makes it a part of the Kubernetes network. It is responsible for deploying tenant applications and running health-checks based on a set of instructions sent from the apiserver.
etcd — A key-value database that stores variables about the Kubernetes cloud, e.g., configurations, and the status of each computer and the applications that should be running in them. It is used for coordinating system-wide tasks. In other words — this is the source of truth of the cloud at any given point.

If you wish to dive into the full list of Kubernetes components, I present you the official diagram, which can be found in the official Kubernetes website:

The full list of Kubernetes components, from the official Kubernetes website

As you can see, the Kubernetes specification likes to use abstractions on top of things that we’re already familiar with. A cluster means a farm, or a pool, and a node means a computer, or a machine. At some point in history, a node was even called a minion (as discussed in this GitHub issue from 2014). To me, names are one of the most confusing things about Kubernetes, because there’s lots of them, and they don’t necessarily mean much. But once I understood that Kubernetes is nothing more than a framework for managing a pool of resources — things clicked much quicker. For reference, you’re welcome to have a look in this dictionary that has a full list of Kubernetes terms and their definitions.

Storage is a whole different problem

Similar to a Kubernetes network of computers, when it comes to storage, there’s a network of hard-drives, and there are different strategies to use them. Before I elaborate, let’s think of some edge cases when it comes to storage:

A tenant would like to upload a game that’s 120 GB in size, but there are only hard-drives with 100 GB available at most.
One of the hard-drives stopped working, and it may have included important files that are mission-critical to some tenants.
Extending / removing / replacing hard-drives without interrupting day-to-day operations by tenants.
All of the above, while sharing storage between computers and maintaining the right read / write access.

These problems aren’t trivial to solve, and they require niche expertise in storage and how it works. Some cloud providers offer proprietary managed storage solutions, such as AWS EBS, GCE Disk, and Azure Disk. But if we’ll imagine for a second that you have a whole room full of hard-drives, and you wish to rent them to potential tenants — you’ll probably want to do so with an open source big-data solution like Ceph.

Ceph is similar to Kubernetes in the sense that it manages a bunch of hard-drives like it’s a real estate. Here’s a diagram I took from Ross Turk’s amazing talk about Ceph, that illustrates how Ceph handles data, by splitting it into chunks like pieces of Lego, and storing it across multiple devices. I’m not going to get technical as to how it actually works, because the rabbit hole can get pretty deep, but you’re free to dive in:

Source: Ceph Intro and Architectural Overview by Ross Turk

You may have noticed how Ceph treats hardware as a graph of nodes, very similar to Kubernetes. This means that when using a Kubernetes back-end, we’re most likely interfacing with a couple of clusters and not just one, a realization that made me better understand how the hardware is managed across all dimensions.

It’s just an HTTP server

To initiate a Kubernetes API call, we need to send a request to kube-apiserver. As mentioned earlier, kube-apiserver is a component that’s installed by the cloud provider and it’s a REST server. Accordingly, any HTTP client can be used to query the back-end, e.g., cURL or axios. However, requests need to be authorized, and responses need to be parsed, thus you will see many Kubernetes client implementations, depending on the language and the environment that you’re using. One of the most obvious and popular Kubernetes clients is its CLI tool — kubectl.

With kubectl you can easily make authorized API calls, and the responses will always be parsed and displayed on the screen in a readable format, which is great when working with the terminal. We can even use kubectl in verbose mode to see details about the underlying HTTP call:

kubectl get pods --v=6

GET https://k8s.ord1.coreweave.com/api/v1/namespaces/tenant-user/pods?limit=500

For now, you can think of pods as applications (well, kind of). If you went through the article you probably understand that they can run on any computer that was arbitrarily picked, depending on what was available.

Simulating a Kubernetes request

So getting back to the original problem that kick started this whole article — we want to connect users to computers on the cloud, so they can play games without owning expensive hardware. When a user is finished, the occupied hardware will become available again to host the next user in queue. Let’s try to tackle this!

We’re given the following:

A queue of users.
A Kubernetes cluster.
A Docker image with a game installed.

The recommended specs for the game to run are the following:

4 cores.
16 GB RAM.
50 GB storage.
RTX A4000.

Accordingly, we can use what we’ve learnt and describe the deployment process as follows. It’s a very dumbed-down and hypothetical description, so please take it with a grain of salt:

When a user gets to the bottom of the queue, send a request to kube-apiserver with the image URL and the desired hardware specs.
Kube-apiserver will use etcd to lookup for a computer with enough resources available to host the image.
If anything was found, kube-apiserver will use the node ID to send a deployment request to its corresponding kubelet.
Once the deployment was successfully made, the kubelet will return a port to connect to the game stream.
Kube-apiserver will return the game port to the user, along with the target computer IP address.
The user can use http://{IP}:{port} to connect to the game stream via the web-browser.

A simplified Kubernetes request simulation

I hope you enjoyed my prologue to Kubernetes and its underlying principles. If you decide to learn or use it, I would stick to the following links regularly: