VMworld US Here I Come

For the first time ever, I’ll be at VMworld US this year, covering the event for iTNews.com.au.

Well, assuming the US government lets me have an official journalist style I-visa after my interview in a couple of weeks. Since I’ll be getting paid (freelance, so only for what I actually produce and can sell, natch), I need a proper work visa, not a B1/B2 conference style visa that I would normally use under the Visa Waiver Program when I go to the US as just me.

Nerd Herd

Watching the Twitter feed last year, I could feel the energy of VMworld through the tiny mobile screen, and I had a palpable sense of missing something wonderful. It was like hearing all about this great party that you missed because you had to work. The nature of our far-flung industry is that I only tend to see my Internet Friends at the occasional conference, and the fact that I’m in Australia makes it that much harder to see them, because all the major conferences are either in the US or Europe.

It’s shaping up to be a great event, with all the action on the storage front in recent years, the explosion of software definitions, and a rekindled enthusiasm for networks. It’s a busy time for the industry, and I look forward to tapping into the zeitgeist at one of the premier technology conferences in the world.

Say Hi

If you’re there, do come up and say hi. I’ll be tweeting up a storm as @jpwarren as usual, so tweet at me if you think there’s something cool I should come see. I’ve even updated my avatar to a recent photo so you’ll know what I look like.

SolidFire

SolidFire LogoDave from SolidFire was impressive.

He had a quiet confidence that comes from knowing exactly what he’s doing. He knows his own product very well, which he should, but he also knows a lot about competitor products, and the market in generally. He was able to talk about all of them with the cool detachment that comes from focussed competence.

Go watch the videos. This is what really smart, competent people look like when they’re talking about their area of expertise. He’s the CEO, so you expect him to be a decent salesman for his product. He has to convince customers to buy his stuff, but he also has to convince employees to work for him, and investors to invest in him and the company. CEO is a sales job first and foremost. But Dave also knows the technology very, very well.

But let’s talk about the product.

Efficient, Real World Scale

My favourite thing about SolidFire is the ability to pull out a node and re-install it somewhere else, on the fly. Other shared-nothing type deployments should be able to do the same thing, so it’s probably not unique, but this is the first time I’d heard it pointed out explicitly as a feature.

In your traditional 2-site datacentre design, you’ll have a production site and a DR site. DR is probably smaller than prod, because you don’t plan to bring everything up if you lose all of prod, just the most critical stuff. You buy a big SAN for prod, and a slightly smaller one for DR, unless everything on the prod SAN is critical, which is often is. So now you buy two prod-sized things for your sites.

Except your capacity forecasts, if you have any, are always wrong. In six months, you need more capacity in prod, and have spare in DR. Or the other way around, because someone decided to run dev out of DR because otherwise you’re ‘wasting’ DR capacity. The budget cycle is 12 months long, and you’re constantly being asked to reduce it anyway, but the business changes strategic direction with the season.

If you can move a single node around, as you can with SolidFire, you can move hardware around to match actual capacity requirements, rather than what you guessed they might be six months ago. You can more closely match supply with measured demand, at the location of that demand, which is more efficient. If you get your forecasts wrong, or you simply change your mind, you can just move kit around to suit. You don’t have to go buy a bunch of extra kit simply because you put your existing kit in what turned out to be the wrong place.

Features like this allow IT to deal with the world as it is, not as we might wish it was. The business needs to respond quickly to a changing market, and needs IT to help them. If that means waiting 4 months to order and provision a SAN upgrade, that’s too long. Rebalancing existing capacity is the same thing as moving VMs around inside a cluster, or moving staff into different roles, or adjusting manufacturing to match customer orders.

Which is the holy-grail of IT infrastructure: to turn infrastructure resources into a fungible commodity, so you can buy a global pool of ‘some CPU’ and ‘some storage’ and move it about to where it’s needed, on-demand. You can adjust the overall pool based on aggregate demand, dialling it up and down as required. But with that specific pool, you move portions of it around the network to match local spikes and troughs in demand. The better you are at it, the greater your efficiency.

Tradeoffs

Dave was very clear in his presentation about the tradeoffs SolidFire have made when designing for their target market. His comparisons of SolidFire with competitors Pure Storage and EMX XtremIO were all about the different tradeoffs the different companies have made. SolidFire focussed, in order, on scale, availability, and then cost. According to Dave, Pure focussed on time to market, then availability, then cost, and ExtremIO focussed on time to market and scale. These all make sense when you think about what kind of company is behind each of the different products.

Here’s the slide from the presentation so you can see it clearly:

SolidFire Company Tradeoffs

SolidFire Competitor Tradeoffs (Source: SolidFire)

Dave educated all of us about the logic behind the choices SolidFire have made. He stepped through it in a non-jargony, clear way that made it easy to understand. He was also quite clear on the type of market SolidFire are going after: cloud service providers and similar enterprises who need lots of very fast disk in big pools. Pure is best suited to specific pools of fast, available disk (like for databases). ExtremIO is somewhere in between, predictably fast storage for several workloads, topping out at 40-80TB per pool.

SolidFire IOPS vs Size

SolidFire Target Market IOPS vs Size (Source: SolidFire)

What makes a good choice for you depends a lot on your individual requirements, so making any kind of broad statement about one or the other being ‘better’ overall is stupid and unhelpful. Various people will persist in making that kind of bogus claim, but hey, humans gotta human.

SolidFire looks like a really solid (sorry) product. I expect more good things from them in future, and they should prove a strong contender in the explosion of flash storage startups currently enveloping us.

EMC XtremIO

EMC ExtremIO Closeup

Image courtesy of EMC

This is my review of XtremIO based on EMC’s presentation at Storage Field Day 5.

Getting a clear picture of what XtremIO was about from EMC’s presentation was a significant challenge. It’s taken me a few re-watches of the video, and triangulation from other sources (including Solidfire’s presentation, ironically) to figure out how it works.

XtremIO ‘X-bricks’ are dual-active controllers connected to 25 eMLC flash SSDs in a separate drive shelf. The logical path to the SSDs has two logical lookups: firstly its metadata (such as the ultimate location of the data block) and the data itself. Data (and metadata) is stored on the SSDs using a form of wide-striped 23+2 parity-RAID, which EMC call XDP because they don’t want to call it RAID for some reason. Differentiation from competitors, I assume. It looks, walks, and quacks like parity-RAID to me.

The controllers are connected to each other over a fully-meshed RDMA network running over Infiniband, which is nicely speedy. It does limit the size of the cluster to the number of ports on the largest Infiniband switch on the market, but that’s not a big deal, because the maximum cluster size is currently 4 bricks/8 controllers.

The full-mesh architecture is a big part of where the consistent latency story comes from: the path length to each data block is the same no matter how you reach it, meaning the same number of hops, which provides the same latency (assuming all hops perform the same, which they should).

Of concern, though, is that adding bricks to the cluster is disruptive. Having to take the entire storage system offline for an upgrade is far from ideal, particularly in this day and age, and particularly for something that bills itself as a “scale-out” solution.

In fact, because of its tightly-coupled architecture, XtremIO is far more like a scale-up solution than scale-out. It’s the elasticity of being able to add, and remove, components that makes a solution ‘scale-out’ more than any other feature, in my opinion. This is generally achieved through some sort of shared-nothing style architecture, while XtremIO is more a ‘shared everything’ architecture because of the RDMA fabric, and the fact that data is only stored in one place. In my opinion, XtremIO is a lot more like a Symmetrix than a scale-out system, but let’s not get too hung up on that point.

No Compression

Another missing feature is compression. As per Solidfire’s presentation, this is tricky for XtremIO to do because of their choice of fixed 4k blocks for their RAID stripes, but one assumes they’re working on it. It’s not actually clear to me how much of an advantage compression is compared to deduplication, as I’ve not dug into the details of how other vendors do this, and how it performs in real world situations. There’s marketing material out there, but vendors will always put a positive spin on their own thing.

Snapshots

Snapshots (and clones) are brand new for the platform (they were due to be announced at EMCWorld), which is a little odd, given that snapshots and clones are a core storage array feature these days. The implementation of snapshots on XtremIO also seem a bit odd, and sound a lot like the way NetApp snapshots work in ONTAP 7-mode. XtremIO have traded space efficiency for speed: snapshots are fast, because they’re just a pointer to a metadata list. Snaps of snaps are also fast, because the pointers are partial lists.

The only issue happens if you delete a snapshot in the ‘middle’ of a tree of snaps. That removes pointers from the middle of your indirected pointer list, so you have to go through all the child snaps of the snap you want to delete, and update the pointers for all unchanged blocks to point at the snap’s parent blocks. It’s like cutting a branch at the fork, removing one branch of the fork, and then gluing the rest of the branch back on. Doing this merge is a background process, and is apparently not strictly necessary, plus the metadata is all held in DRAM, so it’s pretty fast.

Replication

Array native replication is also not available on XtremIO. If you want replication, you need to use VPLEX in front of XtremIO. The XtremIO team are apparently working with the RecoverPoint people to figure out how to put RecoverPoint techniques into the XtremIO software so you can do it natively, but it’s vapourware today.

Optimisation Means Choices

I actually think Dave Wright’s summation of the choices made by the different vendors is a great way to look at how the different systems he walked through work, and why. XtremIO’s architecture looks the way it does because of the choices they made to optimise in a certain direction. That makes it different from other solutions, but not necessarily better. It might suit certain use cases better than an alternative solution, but it won’t be a good fit for all use-cases.

Unfortunately, during the presentation the marketing message of “we’re the best!” eclipsed the fact that you can’t optimise for everything simultaneously. Too much of the message from EMC was that other choices are inherently, and objectively, bad rather than different choices made to optimise in a different way.

XtremIO have optimised for predictable low-latency block storage performance. The maximum capacity of the system isn’t as large as some other offerings, but that’s not what it’s for. (Tape is much better, but the latency sucks). The choice of full-mesh RDMA fabric, and fixed 4k block parity-RAID provide consistent latency. Fast Inifiniband interconnect, DRAM storage of metadata, wide-striping, all these choices contribute to fast performance. Time to market was also important for XtremIO, so some of their choices (UPS backed DRAM, shared disk) have been made to get a product out there and selling quickly.

XtremIO look reasonable if you want predictable, low-latency performance for block storage in the 10-80 TB capacity range, and you’re pretty sure how much performance and capacity your application(s) will need, and you have the money to spend (or know how to drive a hard bargain). I suspect that it might also turn into a good choice if you’re already an EMC shop and want something that will work well with all your existing EMC products and tools. While that tight integration isn’t there today, it’ll probably arrive in relatively short order. If the integration is important to you, I’d be asking for a roadmap briefing, possibly under NDA, to understand if EMC’s roadmap aligns with yours.

For everyone else, though, XtremIO just doesn’t seem really ready yet. It feels like it was rushed to market, but has also been slowed down because they got bought partway through their product development cycle and had to suddenly integrate into the Greater EMC Federation.

Actual Customer Experience

Chris Gurley wrote up his experience with using XtremIO in a comprehensive and fair way. It looks like the impressions I got from SFD5 bear out in reality.

To be honest, Chris’s review makes me even more critical of XtremIO. The operational hassles of managing the gear make it unsuitable for anything but a workload that desperately needs the stable performance. The business value would have to seriously outweigh the costs of working around XtremIO’s limitations, and I’m not convinced it’s the only option.

XtremIO is now firmly in my “wait and see” bucket until they get the base level features up to parity with virtually every other modern array out there.

Cisco ACI and Lossy Control

The centralised command-and-control method doesn’t work at scale. Many people have discovered this independently, from the Romans to the Chinese. It’s just not humanly- (or even computerly-) possible to track that many interdependent variables and make smart decisions.

To manage a vast array of components, a central apparatus needs to understand every detail of the entire whole. You need to direct every single peasant about how many shoes to make, or what to plant (and when! and when to fertilise! and when to harvest!) and if you don’t get it all right, the entire system collapses under its own weight.

Instead, delegation works better. There is an overriding common goal, but groups of individuals, or elements, are able to make their own decisions (within certain spans of control) to propel the overall organisation towards the goal.

The trick is deciding how much to delegate, and how much to control. It’s not a binary choice, but a continuum.

This delegation choice correlates quite closely to the “promise theory” that is at the core of Cisco’s ACI, as explained to me by Joe Onisick and his betters. The principle sounds simple: define the overall goal, and let sub-groups figure out how to achieve it. Define overall policy, and let the switch (or webcam, or IP-phone, or firewall, or whatever) implement that policy based on how it needs to.

Promise Theory and Management

This is very similar to corporate management. You need to define sane, and well formed, overall policy for sub-units to implement. If the policy is insane, or contradictory, or just bad, then the most dutiful implementation will still result in failure. If you tell a firewall to allow all inbound traffic no matter the source, you’re going to have a bad time.

But if we assume the policy is good, computers are very good at following orders. They are deterministic, which means they only interpret policy in one way. They may be complex, sometimes vastly so, but they are not random.

Humans, however, are vastly more complex than computers. They can be chaotic, if not completely random, when compared with the regular performance of computers. A computer is rarely perverse: it will do what it is told (sometimes annoyingly so) while a human may decide to do the exact opposite of what they are instructed, because they feel like it.

Lossy Communication

Now add in the human propensity to misunderstand. Unlike a computer, which does only what you tell it to do, human language contains a vast capability to miscommunicate. What you think you are defining as policy may not be what a group of people (or an individual) think you mean. Communication between humans is lossy.

The courts are filled with cases of humans thinking different things about what was promised. These disagreements are frequently not rational, in the economic sense or indeed in the usual sense. Contract law exists largely because of this propensity of humans to misunderstand one another, sometimes deliberately.

While the lessons of scaling communication between many computer systems may well be applied to human communication, perhaps there is a lesson from human interaction for computer interaction as well?

After all, many billions of humans have been co-existing on this planet, more or less, for hundreds of years.

It hasn’t all gone well.

Future Potential

I think the potential for promise theory is very large, but it’s also quite recent, and the full impact of it hasn’t been quite fleshed out yet. Distributed systems are extremely complex to understand, and many people vastly smarter than I are attempting to wrangle them into usable shape.

To dismiss ACI, and promise theory, out-of-hand seems terribly foolish to me. But then so does assuming it is a panacea. It wasn’t than many years ago that separation of command and data was proclaimed as the new heir to the networking throne, so perhaps we should wait a little longer before we swear fealty to promise theory, or ACI, as our new King?

Video from Melbourne VMUG Meeting 14 May 2014

I recorded the presentations at the Melbourne VMUG meeting last week, and have finally uploaded them all to YouTube.

You can view the entire playlist here, or watch pieces embedded in the post below.

The slides for the talks are available here.

Introduction and Update

Craig Waters introduced the meeting and gave everyone an update on VMware related news. Melbourne had a massive contingent of vExperts this year, including myself.

Salvation Army

We had a presentation by VMUG members from the Salvation Army on how they run IT in far-flung places of the world on a shoe-string budget.

Storage Panel

We had presentations from four ‘startup’ style storage vendors: VMware VSAN, Nutanix, Tintri, and Pure Storage.

and finished off with a panel style Q&A session with the audience:

I have renewed respect for people who do video production for a living.