DFD1 Prep Post: Hedvig

It has become somewhat of a tradition for me to write a bunch of preparation posts on the company’s I’ll be seeing at each Tech Field Day event. With Data Field Day 1 just around the corner (next week! Ah!) it’s about time I wrote up a little something about the latest batch of presenters.


The first lucky company to fall under my beady gaze is Hedvig. I assume this is the anglicised spelling of something from German… which I’ve just looked up and apparently yes it is! Wikipedia says it’s the Scandinavian form of Hedwig, which is from Old High German (and various other sources do too).

It is a Germanic name consisting of the two elements hadu “battle, combat” and wig “fight, duel”.

I guess Hedvig Inc are fond of tautologies.

What Do They Do?

Hedvig make Software Defined Storage. I know, yawn, but wait! They seem to be doing sort-of nifty things here, just by virtue of the breadth of what their stuff does.

The software has essentially two bits:

  1. The Hedvig Storage Service, which is software you install on commodity x86, and ARM, apparently. The software joins all the nods of hardware together into a scale-out, distributed storage cluster. This is the kind of thing that any scale-out storage service needs to do, including the ones that run under the covers of the server-SAN/hyper-converged players.
  2. The Hedvig Storage Proxy, which is a software agent thing that you install as a VM or Docker container to provide access to the Hedvig Storage Service. This is your gateway stuff. Again, every other storage decide ever has something like this. It might just be a filesystem abstraction that sits on top of the SCSI controller, but it’s what turns chunks of ones-and-zeros into something easier to use.

There are already two bits of intriguing stuff here: Firstly, ARM support for the storage engine. Low-power storage clusters anyone? Who wants to fire up a Raspberry Pi2 and test this out? I know I do.

And secondly, the access system as a Docker container? Talk about jumping on the bandwagon with both feet! Bare metal scale-out storage cluster is what I see. Fun fun.

But Wait, There’s More

Ok, so some novelty, but basically this has been done before, yes? Scale Computing have SCRIBE underneath their KVM based hyper-converged offering, Nutanix has their own PAXOS based thing, and EMC have ScaleIO (which will apparently be downloadable for free from 29 May 2015), plus there are others.

Ok, sure. But here’s the thing, or rather, things.

The Storage Service can be installed on-site or in public clouds, or both, apparently “to create a single storage cluster that is implicitly hybrid.” Kinda like Microsoft StorSimple, only not tied to Azure. Nifty!

The Storage Proxy can run in a VM, or a container, and is designed to plug into existing infrastructure without changing existing adminstrator workflows too much. This is incredibly cool for reasons I’ll elaborate on in a minute. Which environments? vSphere, Hyper-V, KVM, OpenStack (via Cinder) and Xen, and bare metal! When was the last time you saw a startup supporting all of these platforms just out of stealth?

Ah, but how do you talk to this magical storage I hear you ask. Are we limited to the legacy evil that is LUNs? Or some painful file protocol like NFSv2?

Hedvig Storage Platform Architecture

Fear not! Hedvig will give you LUNs via iSCSI if you really want them. You can have NFS as well if you want, and in v3 and v4! My contacts at Hedvig are checking on NFS v4.1/pNFS support as well, so it might even do that already. SMB is coming (next release 6-12 months, roadmap, don’t hold me to it) and will be SMBv3! Yay! All things Ethernet, no FibreChannel, but who cares about FibreChannel?

Did someone say object storage? Oh yeah, S3 and Swift support, too. Done.

And this is about the point where I sat back and thought “Hmm, perhaps these Hedvig people aren’t so boring after all“.

Not Storage Field Day

But this is a Data Field Day, not Storage Field Day, so I’ll try to focus less on the storage-y-ness and look more at what this means for data, and data management.

Tell me more about this magical “some storage” pool that can live both on-site and in the cloud, simultaneously. How does that affect data-protection, snapshots, etc? Can I more easily find the data I want? Will this help me move my data between clouds and on-site more easily than with an export/import operation? (Now that I think about it a bit, I’d say it probably can, and it could be very cool, and I’ll need to think about this some more).

How does this help me use data more effectively? Is it particularly good for connecting to data analysis tools? Can I connect it directly to R (or Stata, or SPSS, or SAS, or whatever tool you want to use). What about Hadoop and friends?

What would you like to know more about?

I’m quite intrigued now, and in a way I didn’t think I was going to be. I might even be, astoundingly, a little bit excited about the implications of this stuff.

Bookmark the permalink.

Comments are closed.