SFD5 Journey Begins

Justin Airplane Engine

In one more sleep, plus a couple of hours, my journey to SFD5 begins.

The tyranny of distance means that to get to any major tech event involves travelling 8,000 miles from where I live on the arse end of the world to where the rest of you live at the top of the Mercator projection.

It’s not always super-fun, and yet, like many of my antipodean brethren, I enjoy the adventure that travel brings. I’m flying United once again, because I’m not a princess and I respect other people paying for me to go all this way to harass their marketing people. That and I have prescription chemicals to assist me this time.

Most of all I’m looking forward to hanging out with Stephen, Tom, and Claire, and @plankers and @gallifreyan and @DeepStorageNet, and the folks from Prime Image who run the A/V feed for everyone following along online. No doubt I’ll make some new friends from the other SFD5 delegates I haven’t met yet.

And hopefully I’ll spend some quality time with the various tech vendors we meet. The types of people Stephen gets to present at Tech Field Day events aren’t your usual “gimme gimme” types (because Howard would annihilate them). They’re geeks like you and me, trying to bring cool tech into a market that finds it useful, and at a reasonable price. They don’t always 100% succeed, and that’s why we have these events: so that they can find out, from you and me, about how they can do better.

I’ll be tweeting like mad on the #SFD5 hashtag, or you can follow me directly as @jpwarren. Please get involved and tweet your questions (or snark!) to the delegates or just with the hashtag. We’re your proxies; your eyes, ears, and mouths, there in the room to ask direct questions and expecting direct answers. Help us make this an event worth paying attention to by getting involved online, and we’ll all do our best to get you the answers you need.

SFD5 Prep Work: Veeam

Veeam LogoVeeam have a long history of presenting at various Field Days, as you can see here. I wrote about them for Tech Field Day 9, in a prep post here, and in a review here.

Going on past performances, we can expect that Veeam will be presenting some sort of major product announcement at SFD5. Veeam V8 is coming out in the second half of the year, so perhaps we’ll be doing a “what’s new in V8″ session?

Veeam just announced they’re going to support NetApp snapshots in V8, so I reckon that’s a solid bet for at least part of what we’ll cover.

Veeam and NetApp

This partnership announcement is interesting because, contrary to that El Reg article, NetApp do have their own backup software: SnapManager, including a version specifically for VMs, SnapManager for Virtual Infrastructure.

I’ve never really liked SnapManager. Back when I used to do NetApp architectures, the different types of SnapManager didn’t talk to one another, so there was no central point of control for the backup admins, and it wouldn’t integrate with other backup software that these heterogeneous environments used (like NetBackup, Tivoli, CommVault, etc.). We would joke that NetApp needed to create SnapManager for SnapManager.

NetApp started integrating snapshot management into Operations Manager, but it only worked in a very limited set of cases, and it didn’t talk to SnapManager. I seem to remember SnapManager didn’t support vFilers when we first looked at it, and I know Operations Manager struggled with them in early versions too. vFilers were always second-class citizens compared to non-virtual Filers. Maybe it’s better now, but as far as I can tell you still can’t do in-place 7-mode to C-mode upgrades yet, and you’d think that’d be more important to fix first.

Anyway, it’s interesting that NetApp are now partnering with Veeam. Unlike SnapManager, people seem to love them some Veeam, so perhaps NetApp have finally thrown in the towel on their own attempts at writing backup software, or at least given customers a decent alternative. I’ll be sure to drill into how the integration with NetApp snapshots (and SnapVault, and SnapMirror) works, because I know waaaay too much about them and the painful cases where they don’t work well.

Other Options

If we don’t talk about Veeam and NetApp snapshots, I’m not sure what other new big-ish features we’ll be discussing.

Let’s go with some wild speculation. Then I can claim I’m brilliant if any of my guesses pay off. ;)

Veeam can already back up to lots of public clouds with their Cloud Edition, so perhaps we’ll learn about how it can target various private cloud things, like OpenStack? That’s not super-compelling, unless you want to be able to use your own gear like a public cloud.

Maybe Veeam will announce one or two other partnerships with storage array vendors. That might be a little tricky politically, given how recent the NetApp announcement was. If Veeam came out saying they have an EMC partnership as well, that’d be weird. EMC already has a backup product of their own in NetWorker (née Legato), so it’s highly unlikely. As El Reg mentioned, maybe HDS or Fujitsu or Dell? Even then, announcing another major storage array partnership this soon would put noses out of joint at NetApp.

Perhaps Veeam will announce some sort of networking partnership, maybe with Riverbed or similar around WAN acceleration? Alternately, maybe some sort of “software defined networking” company alliance, though I can’t think of any immediate benefits to Veeam. They already partner with Cisco to make VM restores on UCS faster. It won’t be VMware and VNX though, given the shenanigans at PEX.

Whatever it is, we’ll have a good time. It’s always fun getting right into the tech with the Veeam folks, and we know they’re not afraid of a whiteboard.

We’ll find out soon enough!

SFD5 Prep Work: Scale Computing

Scale Computing Logo

Scale sell a converged platform: compute and storage in pizza boxes that glom together to make a cluster of 3 to 8 nodes, like Voltron. It’s based on KVM, with proprietary bits added into the mix.

Scale make a big deal about “no licensing” or, as they call it, a ‘vTax’. I have to admire the choice of vTax as the descriptor; it’s some clever marketing. Scale emphasise that you don’t have to pay licensing fees for VMware (or Hyper-V, or Virtualbox, or whatever) which saves you money. Sure.

You know how much I hate a “it’s cheaper!” positioning, but in this case, it looks like it might actually be well aligned with their marketing strategy.

Let’s look at the specs of the top line nodes Scale sell, the HC3x, and I’ll explain my thinking.

Scale Specs

HC3x nodes are 6 core, 12 thread, 2.2GHz CPUs, with 64GB of RAM (at 1333MHz). Storage is 4 SAS drives, 15k RPM 600GB, 10k RPM 900 GB or 10k RPM 1200 GB. Networking is 2 x 10GbE active/passive (presumably for data and cluster comms) and 2 x 1GbE (for management).

Using Scale’s VM sizing assumptions, Scale say you can get 200 VMs on a cluster. That’s a small to medium size cluster, so now we can clearly see the target market for Scape: small to medium enterprise, and that’s medium in an Australian sense, not a US sense (everything is bigger in Texas).

The storage of the nodes is aggregated into one dirty great pool of “some storage” and all of it is available to all of the nodes if they want it, which is nice. There is a protocol abstraction layer, which means you can talk to the storage pool from remote servers over your favourite Ethernet based protocol: SMB, NFS, or iSCSI. This means you can have both file and block storage from the one platform, like a NetApp (with no FC), but it does compute as well. Nifty!

VM files are stored internally on an NFS fileshare, but appear to the VM like a disk, so similar to NFS storage pools on VMware. The virtual harddisks use the open qcow2 format. VMs recover from a node failure by restarting, not active/passive failover, so bear that in mind.

A RAM cache on each node helps to speed up access to commonly blocks for VMs on a given node: they can read from cache without having to fetch from storage over the network. Scale also support a form of write-back caching to cache some writes as well.

Scale Marketing Strategy

Overall, the Scale offering seems to match their positioning well: it’s targeted at mid-size firms who don’t want all the extended bells and whistles (and prices!) that come from servers with VMware ESXi, or a higher end converged system like a Nutanix or SimpliVity. It has plenty of the useful features needed to run typical mid-size workloads, and if the UX is as simple as they say, it makes it easy for your typical IT generalist to operate.

IT for these midsize firms isn’t really value-adding, it’s just cost of doing business. They have to have finance and accounting, HR, payroll, inventory tracking, email, a website, etc. There will be a couple of differentiating apps, possibly, but most of their value is in whatever product they sell, or service they provide with humans, not technology, because in an advanced economy, the majority of GDP is services. There’s not much point in investing a lot in fancy technology, and training staff in how to drive it, when all it does for you is email, accounting, and payroll. This market don’t need, and often can’t afford, top end performance or features.

If you can convince these firms that Scale gear is easier to run than a bunch of discrete servers (because your two IT folks can manage it more easily), and also you don’t have to pay for a bunch of features you don’t need, the business owner may well be swayed.

In fact, the biggest competitors for Scale and its ilk are cloud SaaS providers: GMail instead of Exchange, Office365 instead of Office on desktops. If you’re going to move to a new technology platform, do you choose the CapEx option of having gear on your premises, or do you just rent it from the cloud?

It’s an interesting conundrum, because the dust hasn’t settled on this argument yet; in fact, it’s barely gotten started. The illusion of control means smaller firms may well prefer to have some tangible gear in their own premises than to trust their systems to a nebulous thing out there in the cloud, for the same reason that people feel better about driving than flying, even though driving is far more dangerous.

I look forward to hearing from Scale about what their customers are saying they value from Scale gear, and what kinds of objections they’re getting (and how they answer them). I think there’s a solid niche here in the mid-market for a fast-following, converged-infrastructure player.

Do take the time to read Scale’s Theory of Operations whitepaper that explains their technology and approach in more detail.

SFD5 Prep Work: SolidFire

SolidFire LogoI had a lot of trouble figuring out what SolidFire’s architecture looks like conceptually. I had to read a bunch of different whitepapers and ‘reference architecture’ documents on their website that were pretty light on the conceptual detail, but had plenty of configuration file examples and other gritty detail that I don’t care about at this stage. The other stuff was super-high level and didn’t really give me a nice, two page description with a diagram of how to plug one of these things into a storage network.

What I’ve been able to work out is this:

Solidfire is a shared-nothing distributed storage cluster, with a minimum of 5 nodes, and up to 100 nodes in the cluster. Each node is 1RU high and has 10 SSDs in it, from 300GB to 960GB depending on the model. The nodes use replication of blocks for data protection, with 1 or more copies distributed around the cluster using SolidFire’s proprietary Helix data-protection, so a lot like GFS, Lustre or HDFS it seems.

Each node has 2 x 10Gb SFP+ iSCSI ports, and there appears to be a FibreChannel access node available as well that doesn’t have SSDs in it (but has 4 x 10Gb iSCSI ports, as well as 4 x 16Gb FC ports), but one assumes it participates in the cluster to provide network access to the storage pool. I guess the extra iSCSI ports are to avoid over-saturating the Ethernet ports trying to serve 16Gb of FC traffic, though the Ethernet is 1.6x oversubscribed if you run all the FC at line rate.

I’m not sure how this storage pool is carved up into LUNs that get presented to hosts. Is it just one dirty big pool of storage? Are there logical groupings of LUNs into volumes (for replication to remote clusters, or snapshots, for example) or is everything done at a LUN level? I would hope so, because databases hate it when their snapshots aren’t done across a consistency group. SolidFire mention grouping of something for storage QoS, as well as capacity, but I’m not sure if it’s something more than just LUNs. SolidFire do integrate with VMware vVols, which is nice, but doesn’t explain how the multi-tenancy works inside the SolidFire cluster itself.

I get the failover ability of a shared-nothing cluster from a data storage perspective (you’d need 2 simultaneous failures of nodes or disks holding the same data block replicas to lose data, and more if you have 2 or more data copies) but I’m not clear on how network port failover happens. How do the IPs for the iSCSI targets fail over between nodes? What about FC port WWNs? Do you have to configure multi-pathing for it to work? Do you design where the targets go, or does the system automatically figure it out for you?

Overall it seems like Solidfire may well have quite a nice offering here. Unlike Pure Storage, there’s no active/active HA pair of controllers mediating access to the whole cluster. If you go iSCSI, it looks like you could spread the LUN targets around a bunch of nodes and smooth out your I/O quite nicely. Similar to Pure and others, SolidFire has inline dedupe and compression. Pure boast $5-10/GB usable in their marketing materials, while SolidFire reckon they can come it at about $3/GB.

On the downside, shared-nothing scale-out systems can have issues when you have lots of nodes because of the amount of inter-node communication required to keep state synchronised across the cluster. It’ll be interesting to hear how SolidFire have addressed this issue. Using the same 2 x 10GbE ports for both inter-node comms and serving data is something that draws my attention as a possible bottleneck at scale, so it’d be interesting to see what data SolidFire have about that.

Really, learning about SolidFire so far has raised more questions than it’s answered, so I look forward to hearing from their team about more of the details. Hopefully I’ll be able to get them to whiteboard the architecture of a SolidFire cluster and how the LUNs and multi-tenancy works, or figure it out enough that I can draw you a picture myself.

SFD5 Prep Work: X-IO Storage

X-IO LogoX-IO confuse me.

My overall impression of X-IO is that they had something interesting and special several years ago, but the changes to the industry have passed now them by.

Their 700 series arrays are hybrid SSD/HDD arrays with auto-tiering software to make the flash a kind of persistent cache. The 200 series arrays have 10k RPM SAS drives in them. Both lines do FC or iSCSI, so LUN based block storage only. There is another line, the 2200 series, which does SMB3.0 and NFS4.1 NAS.

Their headline storage arrays appear to be an active-active dual-controller storage array, but with super-specialised proprietary hardware inside the box to get lower power consumption and disk availability. The controllers apparently work like Tandems (now HP Non-Stop) where both controllers are in sync with each other for all operations, so if one of them dies, the other one is already processing the same stuff, so there’s no failover time (like NetApp 7-mode).

They use the term “in-situ remanufacturing” for what everyone else knows as hot-spares. Actually, that’s unfair. The physical HDDs live inside a proprietary module called a “datapac” which wraps them in a bunch of propritery hardware and software. They do RAID using ASICs to make it faster (and they call it Matrix RAID, but in the spec sheets still talk about RAID-5 and RAID-10 for working out usable capacity), and apparently monitor the drive telemetry to be able to spare out sub-drive components like platters. It sounds a bit like bad sector avoidance.

Straddle Strategy?

X-IO Value Proposition

X-IO’s value proposition is apparently that they’re the most cost-effective solution for people who want 40-800 IOPS per 100GB of storage. That’s a tough place to play.

Their product positioning is tricky: they’re not “cheap and deep”, nor are they high-performance. They’re not really scale-out. They don’t really stand out with any sort of unique features, which means they end up competing solely on cost.

I’m not sold on this strategy. You can’t do super-proprietary hardware and software in a cost-leader array unless it gives you major cost advantages. X-IO bang on about using enterprise grade disks instead of low-duty cycle disks to boost reliability. But that’s a value add, not a low-cost feature. Huh?

And remember that flash prices are dropping really fast, as are the costs of HDDs. Maintaining any sort of cost advantage in a market that’s changing as rapidly as storage is a really tough business, and even worse for a startup without economies of scale or a major technology advantage over competitors. I just don’t see how X-IO can win, and continue to win, with this strategy.

If you go to the Applications section of X-IO’s website, you’ll see them claiming to be all things to all people, particular if you hover over the navigation tab. OLTP and Server Virtualisation! VDI and Big Data! Cloud and Data Warehousing! The list of things that X-IO isn’t suitable for would be shorter.

X-IO seem really confused about what they want to be, so they’ve gone with a straddling strategy. They’ll get killed on performance by people like Pure Storage talking up equivalent IOPS/$/GB by using fancy compression and dedupe software (which I don’t see a mention of from X-IO). They’ll get killed on cheap and deep stuff because other can do it with commodity storage and simpler hardware.

X-IO don’t have a niche to play in, which means they have to compete head-on with everyone else making a storage array these days. That’s a lot of competitors, and without some sort of differentiator, I just can’t find a reason to want to buy from them. Why would I choose X-IO over Pure Storage, or Nimble, or Tintri, or Tegile, or SolidFire, or Coho Data, or XtremeIO, or StoreVirtual? It’s a massive list of alternatives, and that’s not even going into things like VSAN or Nutanix or SimpliVity.

I hope X-IO can talk me around in their presentation, and can help me to see what makes them special. And I really, really hope it isn’t simply cost.