Project: Kubernetes From Scratch

Photo by uncleboatshoes@flickr.com

Photo by [email protected]

After getting stuck into the world of Kubernetes last week at KubeCon, I’ve decided to explore the details of running a container environment as a little side-project.

I’ve been thinking about doing something with containers for a while, but wasn’t sure quite what to do. Now I’ve got a solid plan: build a Kubernetes environment pretty much from the ground up to learn how all the pieces interact.

There are easier ways to get a Kubernetes environment up and running. I got minikube running on my laptop during the conference, and there are projects like OpenShift and Rancher for more substantial deployments. These all assume a certain level of pre-deployed infrastructure, as do many of the guides provided by the Kubernetes community. Kelsey Hightower’s Kubernetes the Hard Way guide makes use of pre-existing IaaS services from GCP or AWS, for example.

Understanding The Abstractions

My goal is to learn about the interface between the actual physical infrastructure and the container layer, including the various abstractions that get put in place. I want to do this because it became clear to me during KubeCon that the Cloud Native folk don’t really interact with infrastructure much at all. They assume it’s there somewhere and Just Works™. They also pretty much ignore stateful data, but managing state is one of the most important things you need to do.

I’ve come from infrastructure land, but I’ve also written a bunch of code, so I can see things from both perspectives. State management is really, really important when you’re building IT systems that have any kind of value, because 90%+ of the value is in the state. This blog has lots of state, like this post itself. It’s stored in a database, which is backed by disk. It all happens to sit on a virtual server hosted in a cloud IaaS, but before then it lived on VM hosted on a physical server in my lab, and before that it lived on a physical server in my lab.

Abstractions are useful, because they mean you can ignore some details and just get on with things, like ignoring how CPUs actually perform calculations with bunches of transistors. Keeping all those details in your mind simultaneously while you edit photos in an application is nigh impossible, and a waste of energy besides.

But those details start becoming really important when things don’t work. The physical location of infrastructure is important when you’re trying to move data around, because the speed-of-light is a constraint. There are orders of magnitude differences in performance depending on how far away components are from each other, from on-chip cache to NUMA RAM access (local or remote CPU?) to storage I/O to network I/O and onwards.

As I’ve written beforesomeone has to care about resilience and performance. If the DevOps people in their abstract container land don’t, then the details have to be handled in the platform they’re using. That’s fine if your organisation just outsources the problem to AWS, GCE, Azure or some other IaaS cloud thing, but that creates a set of constraints. What if you want to be doing things locally? Maybe constantly pulling down images from the Internet over your crappy ADSL link isn’t feasible?

But now you have to provide that abstracted, resilient IaaS yourself. How do you do that?

Storage Matters

The second goal for my learning is to investigate the storage layer of Kubernetes. This is where state is handled, so how does it work? Do we assume all state lives in one or more etcd clusters? Where do those clusters storage their data? If there’s some sort of external storage service, what is it made of? If you just plug in Elastic Block Store or a database service, those things have to exist somewhere in the first place. Again, are we reliant on cloud IaaS to use containers at all, or is it possible to do this entirely in-house?

From what I’ve seen of the storage layer of Kubernetes (and Docker AFAICT) persistent data is a relative latecomer to the implementations. There are aspects of some architectural decisions seem a little odd to me with my storage hat on, so I want to dig into them and see if it’s just me not understanding things, or if they’ve been created by people with limited storage experience. I suspect the latter from what I’ve seen so far, but I may well be wrong.

Again, this relates to management of state, and containers came to the fore as an inherently stateless idea, so it’s not surprising to me that statefulness would be added later.

Lifecycle Management

The final part that I want to look at relates to how the abstractions interact, particularly with state.

If everything is stateless, swapping out servers and storage is easy. If state needs to be maintained across upgrades, things get a lot harder. If state needs to be maintained and upgrades are happening online while state is being changed (e.g. people can keep shopping while you upgrade the shopping site) then things are harder still. Now make the installation multi-site and we’re in distributed state management land which is a capital H Hard Problem.

It’s also where enterprises need to be.

State management is why services have traditionally been silo implementations with scheduled offline maintenance. Shutting the system down, copying data to the new one, and then starting up the new hardware is much simpler than keeping everything online while you do a live data migration. But it’s hard on the people who want to be using the system to do things, because they can’t use internet banking while it’s offline. It’s also hard on the infrastructure teams who have to do the migration work, as anyone who’s done a data migration will tell you.

What I want is the same level of always on, continuous deployment that the DevOps people crave but at the infrastructure layer. I want to be able to wheel in new storage arrays (or just boxes full of SSDs, or whatever you base your storage on) and have the system automatically move data to the new gear. I want to drain data off the old storage just as DevOps folks can drain an application off a set of compute. Then I can remove the old gear without having to plan, schedule, and run a 32 hour migration effort over the Easter weekend because it’s the only time the business will let us have an outage. Those projects suck.

Stay Tuned

I’m taking lots of notes as I run through this project, so stay tuned for all the gory details of my stupidity and failure as I fumble my way through this complex array of new technology.

What fun!

Bookmark the permalink.

One Comment

  1. Pingback: Transform or Die | Tech news roundup, week ending Nov 19, 2016

Comments are closed