SFD5 Prep Work: Veeam

Veeam LogoVeeam have a long history of presenting at various Field Days, as you can see here. I wrote about them for Tech Field Day 9, in a prep post here, and in a review here.

Going on past performances, we can expect that Veeam will be presenting some sort of major product announcement at SFD5. Veeam V8 is coming out in the second half of the year, so perhaps we’ll be doing a “what’s new in V8″ session?

Veeam just announced they’re going to support NetApp snapshots in V8, so I reckon that’s a solid bet for at least part of what we’ll cover.

Veeam and NetApp

This partnership announcement is interesting because, contrary to that El Reg article, NetApp do have their own backup software: SnapManager, including a version specifically for VMs, SnapManager for Virtual Infrastructure.

I’ve never really liked SnapManager. Back when I used to do NetApp architectures, the different types of SnapManager didn’t talk to one another, so there was no central point of control for the backup admins, and it wouldn’t integrate with other backup software that these heterogeneous environments used (like NetBackup, Tivoli, CommVault, etc.). We would joke that NetApp needed to create SnapManager for SnapManager.

NetApp started integrating snapshot management into Operations Manager, but it only worked in a very limited set of cases, and it didn’t talk to SnapManager. I seem to remember SnapManager didn’t support vFilers when we first looked at it, and I know Operations Manager struggled with them in early versions too. vFilers were always second-class citizens compared to non-virtual Filers. Maybe it’s better now, but as far as I can tell you still can’t do in-place 7-mode to C-mode upgrades yet, and you’d think that’d be more important to fix first.

Anyway, it’s interesting that NetApp are now partnering with Veeam. Unlike SnapManager, people seem to love them some Veeam, so perhaps NetApp have finally thrown in the towel on their own attempts at writing backup software, or at least given customers a decent alternative. I’ll be sure to drill into how the integration with NetApp snapshots (and SnapVault, and SnapMirror) works, because I know waaaay too much about them and the painful cases where they don’t work well.

Other Options

If we don’t talk about Veeam and NetApp snapshots, I’m not sure what other new big-ish features we’ll be discussing.

Let’s go with some wild speculation. Then I can claim I’m brilliant if any of my guesses pay off. ;)

Veeam can already back up to lots of public clouds with their Cloud Edition, so perhaps we’ll learn about how it can target various private cloud things, like OpenStack? That’s not super-compelling, unless you want to be able to use your own gear like a public cloud.

Maybe Veeam will announce one or two other partnerships with storage array vendors. That might be a little tricky politically, given how recent the NetApp announcement was. If Veeam came out saying they have an EMC partnership as well, that’d be weird. EMC already has a backup product of their own in NetWorker (née Legato), so it’s highly unlikely. As El Reg mentioned, maybe HDS or Fujitsu or Dell? Even then, announcing another major storage array partnership this soon would put noses out of joint at NetApp.

Perhaps Veeam will announce some sort of networking partnership, maybe with Riverbed or similar around WAN acceleration? Alternately, maybe some sort of “software defined networking” company alliance, though I can’t think of any immediate benefits to Veeam. They already partner with Cisco to make VM restores on UCS faster. It won’t be VMware and VNX though, given the shenanigans at PEX.

Whatever it is, we’ll have a good time. It’s always fun getting right into the tech with the Veeam folks, and we know they’re not afraid of a whiteboard.

We’ll find out soon enough!

SFD5 Prep Work: Scale Computing

Scale Computing Logo

Scale sell a converged platform: compute and storage in pizza boxes that glom together to make a cluster of 3 to 8 nodes, like Voltron. It’s based on KVM, with proprietary bits added into the mix.

Scale make a big deal about “no licensing” or, as they call it, a ‘vTax’. I have to admire the choice of vTax as the descriptor; it’s some clever marketing. Scale emphasise that you don’t have to pay licensing fees for VMware (or Hyper-V, or Virtualbox, or whatever) which saves you money. Sure.

You know how much I hate a “it’s cheaper!” positioning, but in this case, it looks like it might actually be well aligned with their marketing strategy.

Let’s look at the specs of the top line nodes Scale sell, the HC3x, and I’ll explain my thinking.

Scale Specs

HC3x nodes are 6 core, 12 thread, 2.2GHz CPUs, with 64GB of RAM (at 1333MHz). Storage is 4 SAS drives, 15k RPM 600GB, 10k RPM 900 GB or 10k RPM 1200 GB. Networking is 2 x 10GbE active/passive (presumably for data and cluster comms) and 2 x 1GbE (for management).

Using Scale’s VM sizing assumptions, Scale say you can get 200 VMs on a cluster. That’s a small to medium size cluster, so now we can clearly see the target market for Scape: small to medium enterprise, and that’s medium in an Australian sense, not a US sense (everything is bigger in Texas).

The storage of the nodes is aggregated into one dirty great pool of “some storage” and all of it is available to all of the nodes if they want it, which is nice. There is a protocol abstraction layer, which means you can talk to the storage pool from remote servers over your favourite Ethernet based protocol: SMB, NFS, or iSCSI. This means you can have both file and block storage from the one platform, like a NetApp (with no FC), but it does compute as well. Nifty!

VM files are stored internally on an NFS fileshare, but appear to the VM like a disk, so similar to NFS storage pools on VMware. The virtual harddisks use the open qcow2 format. VMs recover from a node failure by restarting, not active/passive failover, so bear that in mind.

A RAM cache on each node helps to speed up access to commonly blocks for VMs on a given node: they can read from cache without having to fetch from storage over the network. Scale also support a form of write-back caching to cache some writes as well.

Scale Marketing Strategy

Overall, the Scale offering seems to match their positioning well: it’s targeted at mid-size firms who don’t want all the extended bells and whistles (and prices!) that come from servers with VMware ESXi, or a higher end converged system like a Nutanix or SimpliVity. It has plenty of the useful features needed to run typical mid-size workloads, and if the UX is as simple as they say, it makes it easy for your typical IT generalist to operate.

IT for these midsize firms isn’t really value-adding, it’s just cost of doing business. They have to have finance and accounting, HR, payroll, inventory tracking, email, a website, etc. There will be a couple of differentiating apps, possibly, but most of their value is in whatever product they sell, or service they provide with humans, not technology, because in an advanced economy, the majority of GDP is services. There’s not much point in investing a lot in fancy technology, and training staff in how to drive it, when all it does for you is email, accounting, and payroll. This market don’t need, and often can’t afford, top end performance or features.

If you can convince these firms that Scale gear is easier to run than a bunch of discrete servers (because your two IT folks can manage it more easily), and also you don’t have to pay for a bunch of features you don’t need, the business owner may well be swayed.

In fact, the biggest competitors for Scale and its ilk are cloud SaaS providers: GMail instead of Exchange, Office365 instead of Office on desktops. If you’re going to move to a new technology platform, do you choose the CapEx option of having gear on your premises, or do you just rent it from the cloud?

It’s an interesting conundrum, because the dust hasn’t settled on this argument yet; in fact, it’s barely gotten started. The illusion of control means smaller firms may well prefer to have some tangible gear in their own premises than to trust their systems to a nebulous thing out there in the cloud, for the same reason that people feel better about driving than flying, even though driving is far more dangerous.

I look forward to hearing from Scale about what their customers are saying they value from Scale gear, and what kinds of objections they’re getting (and how they answer them). I think there’s a solid niche here in the mid-market for a fast-following, converged-infrastructure player.

Do take the time to read Scale’s Theory of Operations whitepaper that explains their technology and approach in more detail.

SFD5 Prep Work: SolidFire

SolidFire LogoI had a lot of trouble figuring out what SolidFire’s architecture looks like conceptually. I had to read a bunch of different whitepapers and ‘reference architecture’ documents on their website that were pretty light on the conceptual detail, but had plenty of configuration file examples and other gritty detail that I don’t care about at this stage. The other stuff was super-high level and didn’t really give me a nice, two page description with a diagram of how to plug one of these things into a storage network.

What I’ve been able to work out is this:

Solidfire is a shared-nothing distributed storage cluster, with a minimum of 5 nodes, and up to 100 nodes in the cluster. Each node is 1RU high and has 10 SSDs in it, from 300GB to 960GB depending on the model. The nodes use replication of blocks for data protection, with 1 or more copies distributed around the cluster using SolidFire’s proprietary Helix data-protection, so a lot like GFS, Lustre or HDFS it seems.

Each node has 2 x 10Gb SFP+ iSCSI ports, and there appears to be a FibreChannel access node available as well that doesn’t have SSDs in it (but has 4 x 10Gb iSCSI ports, as well as 4 x 16Gb FC ports), but one assumes it participates in the cluster to provide network access to the storage pool. I guess the extra iSCSI ports are to avoid over-saturating the Ethernet ports trying to serve 16Gb of FC traffic, though the Ethernet is 1.6x oversubscribed if you run all the FC at line rate.

I’m not sure how this storage pool is carved up into LUNs that get presented to hosts. Is it just one dirty big pool of storage? Are there logical groupings of LUNs into volumes (for replication to remote clusters, or snapshots, for example) or is everything done at a LUN level? I would hope so, because databases hate it when their snapshots aren’t done across a consistency group. SolidFire mention grouping of something for storage QoS, as well as capacity, but I’m not sure if it’s something more than just LUNs. SolidFire do integrate with VMware vVols, which is nice, but doesn’t explain how the multi-tenancy works inside the SolidFire cluster itself.

I get the failover ability of a shared-nothing cluster from a data storage perspective (you’d need 2 simultaneous failures of nodes or disks holding the same data block replicas to lose data, and more if you have 2 or more data copies) but I’m not clear on how network port failover happens. How do the IPs for the iSCSI targets fail over between nodes? What about FC port WWNs? Do you have to configure multi-pathing for it to work? Do you design where the targets go, or does the system automatically figure it out for you?

Overall it seems like Solidfire may well have quite a nice offering here. Unlike Pure Storage, there’s no active/active HA pair of controllers mediating access to the whole cluster. If you go iSCSI, it looks like you could spread the LUN targets around a bunch of nodes and smooth out your I/O quite nicely. Similar to Pure and others, SolidFire has inline dedupe and compression. Pure boast $5-10/GB usable in their marketing materials, while SolidFire reckon they can come it at about $3/GB.

On the downside, shared-nothing scale-out systems can have issues when you have lots of nodes because of the amount of inter-node communication required to keep state synchronised across the cluster. It’ll be interesting to hear how SolidFire have addressed this issue. Using the same 2 x 10GbE ports for both inter-node comms and serving data is something that draws my attention as a possible bottleneck at scale, so it’d be interesting to see what data SolidFire have about that.

Really, learning about SolidFire so far has raised more questions than it’s answered, so I look forward to hearing from their team about more of the details. Hopefully I’ll be able to get them to whiteboard the architecture of a SolidFire cluster and how the LUNs and multi-tenancy works, or figure it out enough that I can draw you a picture myself.

SFD5 Prep Work: X-IO Storage

X-IO LogoX-IO confuse me.

My overall impression of X-IO is that they had something interesting and special several years ago, but the changes to the industry have passed now them by.

Their 700 series arrays are hybrid SSD/HDD arrays with auto-tiering software to make the flash a kind of persistent cache. The 200 series arrays have 10k RPM SAS drives in them. Both lines do FC or iSCSI, so LUN based block storage only. There is another line, the 2200 series, which does SMB3.0 and NFS4.1 NAS.

Their headline storage arrays appear to be an active-active dual-controller storage array, but with super-specialised proprietary hardware inside the box to get lower power consumption and disk availability. The controllers apparently work like Tandems (now HP Non-Stop) where both controllers are in sync with each other for all operations, so if one of them dies, the other one is already processing the same stuff, so there’s no failover time (like NetApp 7-mode).

They use the term “in-situ remanufacturing” for what everyone else knows as hot-spares. Actually, that’s unfair. The physical HDDs live inside a proprietary module called a “datapac” which wraps them in a bunch of propritery hardware and software. They do RAID using ASICs to make it faster (and they call it Matrix RAID, but in the spec sheets still talk about RAID-5 and RAID-10 for working out usable capacity), and apparently monitor the drive telemetry to be able to spare out sub-drive components like platters. It sounds a bit like bad sector avoidance.

Straddle Strategy?

X-IO Value Proposition

X-IO’s value proposition is apparently that they’re the most cost-effective solution for people who want 40-800 IOPS per 100GB of storage. That’s a tough place to play.

Their product positioning is tricky: they’re not “cheap and deep”, nor are they high-performance. They’re not really scale-out. They don’t really stand out with any sort of unique features, which means they end up competing solely on cost.

I’m not sold on this strategy. You can’t do super-proprietary hardware and software in a cost-leader array unless it gives you major cost advantages. X-IO bang on about using enterprise grade disks instead of low-duty cycle disks to boost reliability. But that’s a value add, not a low-cost feature. Huh?

And remember that flash prices are dropping really fast, as are the costs of HDDs. Maintaining any sort of cost advantage in a market that’s changing as rapidly as storage is a really tough business, and even worse for a startup without economies of scale or a major technology advantage over competitors. I just don’t see how X-IO can win, and continue to win, with this strategy.

If you go to the Applications section of X-IO’s website, you’ll see them claiming to be all things to all people, particular if you hover over the navigation tab. OLTP and Server Virtualisation! VDI and Big Data! Cloud and Data Warehousing! The list of things that X-IO isn’t suitable for would be shorter.

X-IO seem really confused about what they want to be, so they’ve gone with a straddling strategy. They’ll get killed on performance by people like Pure Storage talking up equivalent IOPS/$/GB by using fancy compression and dedupe software (which I don’t see a mention of from X-IO). They’ll get killed on cheap and deep stuff because other can do it with commodity storage and simpler hardware.

X-IO don’t have a niche to play in, which means they have to compete head-on with everyone else making a storage array these days. That’s a lot of competitors, and without some sort of differentiator, I just can’t find a reason to want to buy from them. Why would I choose X-IO over Pure Storage, or Nimble, or Tintri, or Tegile, or SolidFire, or Coho Data, or XtremeIO, or StoreVirtual? It’s a massive list of alternatives, and that’s not even going into things like VSAN or Nutanix or SimpliVity.

I hope X-IO can talk me around in their presentation, and can help me to see what makes them special. And I really, really hope it isn’t simply cost.

SFD5 Prep Work: Diablo Technologies

Diablo Technologies Logo

Diablo Technologies are a Canadian company who have developed flash that you can access like memory, called MCS: Memory Channel Storage. It’s like the inverse of a ramdisk, for those of us old enough to remember such things: put flash into your server, but access it over the memory channel.

Memory Channel Storage

The Diable Technologies Memory Channel Storage Architecture

The advantages for performance are pretty clear: the fastest possible way to access data from the CPU is if it’s already loaded in a register somewhere:

cmp ecx, edx

There aren’t many registers, so you have to put data somewhere else: RAM or other storage. I/O to memory is at least one order of magnitude (often more) faster than I/O to even directly attached disk, let alone disk on the other end of a network cable somewhere.

Plus, access to I/O goes over a data bus of some sort to talk to the CPU. The data bus is shared between many devices, and while some systems have more than one, they don’t have a dedicated bus for each I/O device, so you will get some sort of contention that has to be managed; only one thing can talk over the data bus at a time.

Diablo bypass that by plugging flash modules directly into the DDR3 DIMM slots in a server, making it look like memory. Well, not quite. There’s a mediating driver that manages the access to these speciality devices, and you need a UEFI/BIOS update, but they’re apparently functional on Windows, Linux and VMware, though I don’t know which specific OS versions are required.

Niche Problem

This is a solution for people who find flash too slow and memory too confining, and can’t afford to just buy loads of DRAM. You get faster access than flash over PCIe, and you get persistence not available with DRAM, and flash is cheaper than DRAM too. There aren’t many workloads that I’d imagine would require this sort of solution, depending on the price, but I can see situations where it would be useful: loading a lot of data into Hadoop for processing, for example, or a high-frequency trading application.

And really, why do we still have this weird bus-based access to I/O? I mean, with DMA controllers and memory-mapped I/O, we’re trying to access everything as if it was memory anyway. Why not just plug storage devices directly into memory?

And if we continue to see convergence of storage and compute, why bother with flash disks when you can just have flash memory? Leave the disk I/O for everything too slow for memory channel access and bring the data even closer to the CPU.


There is weirdness about this company, though. They were apparently founded in 2003, and their website doesn’t seem to have gotten a lot of love since then. There’s precious little in the Solutions section, for example. This is in spite of raising $36 million a bit over a year ago. It looks like the company has either been in stealth mode prior to 2012, or it’s just taken a long time to get this product designed and built.

One other thing that really needs fixing is whatever happened to the charts in this whitepaper [PDF]. Here’s the most egregious example:

Diablo Chart

This chart seems to suggest that you get two different latencies when you go above about 160,000 IOPS. That implies that the relationship between latency and IOPS is a higher-order polynomial (quadratic or higher) where y has more than one solution for certain x values. ORLY? That would be a very surprising result. Show me the data, please, and how you managed to get it.

I reckon it’s more likely to be an error caused by an overly-enthusiastic marketing person who discovered Bezier curves in Illustrator.

But I’m happy to be corrected either way.


Diablo got in touch with me, and said that the charts look like this because there’s a third variable that isn’t shown: effective queue depth:


The plots show data collected at increasing effective Queue Depths. Eventually, increasing the eff QD will have diminishing IOPS returns while increasing latency. As the eff QD increases, it’s not unusual (as the data indicates) to see IOPS degrade at some point… likely as a result of tuning to maximize SSD performance at an eff QD “sweet spot”

These plots are, therefore, 2-dimensional projections of a 3-dimensional curve. Actually, it’s a bit worse than that: it’s a plot of 2 dependent variables with the independent variable left off.


More Info

This little chart whoopsie aside, Diablo have apparently partnered with Sandisk to create this MCS stuff, with Sandisk doing the hardware and Diablo the software, more-or-less. The injection of funds, and a few key hires, indicate that Diablo is starting to ramp up their marketing and start to get the word out about what their MCS product can do.

I expect that Storage Field Day is part of that effort, so I look forward to hearing more from Diablo about where they are in the product development cycle, and what their go-to-market strategy is.

As an interesting aside, Michael Cornwall from Pure Storage is on the Technical Advisory Board for Diablo Technologies.