Scale Computing In The Goldilocks Zone

Scale Computing know exactly what they’re doing, and they’re doing it well. The founders have built a system that is just what they wanted when they worked for mid-sized firms in the middle of America.

I had a couple of informal conversations about Scale with Silicon Valley types over the week of Storage Field Day 5, and what struck me was how they didn’t seem to understand the middle-ground. Their thinking revolved around ‘Enterprise’ or ‘Apps’, (sometimes both at once) and they just didn’t understand what Scale are doing. I guess this was part of the famously insular Valley thinking I keep hearing about.

This is great news for Scale, because they can keep growing their customer base of mid-sized customers who want what they’re selling. My analysis of the situation in my prep-post was correct (which is nice) in that they’ve picked a specific set of customers, and customer problems, and built a product to match. Their product-market fit, to use the lingo, is spot on.

Hidden Complexity

Scale have done a lot of work to make what is inherently complex (a scale-out cluster of Linux servers) simple to use. They’ve built an orchestration layer that “hides complexity, and handles failures automatically”. Scale recognise that hardware fails, but so does software, so simply making everything “software defined” doesn’t magically make all your problems go away.

Scale’s solution is essentially a rule-based expert system, but it’s been written in a lovely, scalable way that I suspect many people won’t fully appreciate because they’re not as completely dorky as I am about this stuff. I know that at least one of my fellow delegates was a bit bored by this part of the presentation, but I thought it was super-neat.

The expert system has sensors called (Collectors), sanity checking (Checked Conditions) and then the rules engine (Conditions) for figuring out if a set of values constitutes a problem or not. I love that a history is stored so you can go back in time and replay what was going on around when an incident occurred; it makes root-cause discovery so much easier.

On top of this is a state-machine that encodes the expert-system itself. This state-machine allows the system to automatically detect issues, and trigger actions designed to either fix the issue, or generate an alert if external intervention is required. The state-machine manages the transition from one state to another, such as Ok->Failing->Failed, or Failed->Reset->Resetting->Booting->Online->Rejoin->Ok, or some other complex state-transition.

Reliability

The end result is a system that handles failures automatically, and for the ones it can’t handle, it’s easy to figure out what went wrong and fix it quickly. That makes the system reliable, which is important for the non-IT-expert audience they’re selling to. Scale gives them computers that just do their thing, and if it breaks, Scale do the hard work of fixing it, if necessary.

And because of the way this sort of system is designed, you can keep adding to it and the value increases over time. Just as a good regression test suite helps you make your software better over time, Scale’s system will keep getting more and more robust as they keep identifying new corner cases and adding them to the expert system.

All this talk of expert systems and state-machines is very undergrad/Masters CompSci theory, but Scale have used knowledge of theory very well here to create a functional system that automates boring and repetitive tasks, which is what computers are really good at, and humans are not. It’s the sort of thing I’ve been banging on about for over a decade in my consulting, and my writing, and generally bothering all kinds of people about. And these folks have gone and built one, and it works.

Let me be absolutely clear: there is a fabulous software product here, and if Scale decided to, it could get spun out as an Enterprise software product that would be worth a hell of a lot of money. More than the hyper-converged infrastructure business they already have, in my opinion.

Competition

In my prep-post, I raised the question of competition from SaaS vendors (or indeed PaaS vendors) for Scale’s customers. Scale’s customers tend to go with on-site infrastructure because they simply don’t have access to the required network connections that make SaaS viable. As someone who lives at the end of an 8MB down/1MB up ADSL link (if I’m lucky, and when it’s not raining) I completely understand.

Either higher bandwidth/reliability links aren’t available, or they’re prohibitively expensive when compared with an entry level ~$25k installation of Scale computing gear installed on-site. If you’re a mid-sized manufacturer with specialised (expensive) machines that need to talk to specialised software, running that software at the far end of a wet piece of string isn’t smart, but nor is eating up all your margin in network fees.

The running SFD5 theme of predictability and reliability rears its head again: these businesses don’t care about IT. They run a business that happens to need some IT so they can build their products or provide their services. They’re not high tech. They’re not large. Their accountant and their head of Sales are far more important to their business than their IT person.

The IT just needs to be there and to work. Scale provide that with a single platform and what appears to be great support, at a compelling price.

Scale isn’t for everyone, but for the people it is for, it’s just right.

Scale Computing In The Goldilocks Zone

Hidden Complexity

Reliability

Competition

2 Comments

More from me

Archives

Scale Computing In The Goldilocks Zone

Hidden Complexity

Reliability

Competition

2 Comments

Popular Posts

More from me

Archives