StormForge is infrastructure optimisation software for Kubernetes.
My feeling is that the best way to optimise your use of Kubernetes is to not use it in the first place. Most people absolutely do not need it.
Unfortunately, the thing that they actually need is Docker Swarm, but because Docker imploded and lost all of its fashionable tech cool points, people abandoned it. People still wanted to use containers, and still needed something to host and run them, but plain old VMs aren’t quite enough. Kubernetes is the main choice left, so it wins by default.
But it kinda sucks in a bunch of ways. Kubernetes is complicated and messy and hard to use well. This is normal for any technology at the beginning of its lifecycle. The operational tooling and known-good practices haven’t really settled down yet, nor has it been covered over with abstractions that deal with the complexity for you.
This provides an opportunity for products like StormForge: find all the ways you’re Doing It Wrong™ and help you to not do that, if only a little bit. “Try sucking less” isn’t very motivational, though, so we call it continuous improvement instead. A rose by any other name would smell as sweet.
People who have chosen to use Kubernetes generally won’t accept that the answer to their problem might be “stop using Kubernetes” so there’s plenty of scope for optimisation software that helps them make it hurt less.
If you’re using Kubernetes at any kind of scale, figuring out how to use it more efficiently is challenging. Microservices means lots more degrees of freedom in your infrastructure, which gives you lots of different variables in your optimisation problem. It’s both easier to screw up, and harder to do well, so augmenting your humans with software tools makes sense.
Yet I can’t shake the feeling that StormForge is more feature than product. It feels like something that should be part of a managed Kubernetes offering, or bundled with a bunch of other services to help you operate Kubernetes.
Stand Back, I’m Going To Try Science
StormForge tries to use a scientific approach of observation, theory, and experimentation to figure out the Pareto frontier of your particular optimisation problem.
The first approach used is called Optimize Pro, which uses load-testing in a non-production environment to simulate different scenarios and model what the outcome will look like. You can then select from a set of optimal configurations based on what your business priorities are, such as “best throughput at lowest cost”.
There’s also a new offering called Optimize Live which uses observability data from production to model how things are going and try to learn and guess what a more optimal configuration might be. As more data gets collected, and different scenarios are tried in production as natural experiments (or live trials of StormForge’s predictions) the system can make more accurate models.
To support these approaches, StormForge also offers a Kubernetes load testing tool you can use if you don’t already have one of your own.
Better Than Excel?
Earlier in my career I spent several years as a performance and capacity consultant, so I’m fairly familiar with the broad nature of the challenge here, though my specific skills are likely well out of date (not much call for Solaris 7 or Oracle 9i any more). I don’t doubt that StormForge does work, and does provide real value to customers with Kubernetes optimisation challenges. But I’m skeptical of the “machine learning” claims made by StormForge. Most of what they described during their presentation at Cloud Field Day 13 sounded like stepwise linear regression to me.
I’m hopeful that it will indeed allow more than a 2-parameter optimisation frontier in the near future, because otherwise I’m unclear on why I need the overheads of data volumes and processing power needed by machine learning to do something I could easily model in Excel. But I don’t think the claims of machine learning are actually where the value of StormForge lies.
I would suggest that the biggest benefit here isn’t so much the optimisation method as the approach of putting it into an automated pipeline. Pushing new code might upset the balance of the running system, so getting some kind of indication of the likely performance and tuning impact, backed by actual data instead of the wishful thinking of developers, would provide much needed assurance to operations.
The trick for StormForge is to figure out who has the budget to pay for this thing: developers, or operations? The structure of the organisation StormForge gets deployed into will have as much impact on its effectiveness as anything the software actually does.