Abstractions Help More Than Hinder

Only the other day I commented to a colleague that the seemingly constant outbreaks of storage industry infighting on Twitter (aka twitpiss) had pretty much stopped. Then I got back on Twitter after a long and busy day to see that one had erupted over the EMC XtremIO v3.0 upgrade thing. Sigh.

I don’t want to revisit the specific issue here, because plenty has been written about it elsewhere by smarter people than me. But I do want to talk about abstractions and how they’re a really good thing, and how they help in situations like this.

Abstractions Are Good

I never worry about which platter or track of a spinning hard drive my data is written to. It just doesn’t meaningfully affect anything I do. The firmware and drivers for using the disk take care of all those details for me, precisely so that I don’t need to care. Similarly, with the flash disk in my laptop, and all the SD cards I have for cameras and audio gear, I never worry about where in the cells my data actually goes. I just don’t care about wear-levelling on a daily basis, because other smart people have figured out a way to solve the problems, at least so that I don’t need to care when I’m typing this post or doing any of my other work.

All of that detail has been abstracted away, and it makes the storage more useful.

The same goes for RAID striping. When I use striping, I don’t care which block of data goes on which disk for each stripe. I just care (a bit) that the data uses more than one disk. I might care a bit if a specific disk breaks, but then, that’s why I design my RAID in such a way that I don’t really care, because all of the disks are the same. They’re commodity (Inexpensive Disk, right?) disks, so any one of them could break and I just swap it out and my data is fine. That’s the whole point.

My filesystem is another abstraction. I can put it on a specific disk, or a group of them configured with RAID. Now I’m dealing with groups of data in files, not blocks, because I don’t really care about blocks. Or superblocks. Or inodes. Because I don’t have to, and really, why should I?

As I keep abstracting away the details, I care less and less about what actually goes on at those detailed levels because I don’t have to care, because why should I? Smart people have worked very hard so that I don’t have to care if I don’t want to. I just choose the level of abstraction that suits what I want to do. I don’t need to understand how an internal combustion engine works in order to drive a car, and I don’t have the foggiest idea how capacitive smartphone screens work.

When to Care

I wonder, in this age of cloud computing and virtual machines and other such abstractions, why are we even having a conversation about disruptive upgrades to a storage controller? Why should this even matter today, on any vendor’s equipment? Why aren’t client systems able to just point at “some storage” and not care about what’s at the other end? Why are we still using endpoint sensitive protocols like FibreChannel, NFS, and SMB?

If I moved my filesystem from ext4fs to btrfs, (or ZFS), would it really change how I use Firefox? I notice that having an SSD in my laptop is faster (and smaller) than having an HDD, but other than that, there’s no functional difference. Why can’t I just decide that I’d like my storage to be over there now and have it just happen?

Why aren’t we abstracting above storage arrays more than we are, just as we’ve layered abstractions on top of blocks, platters, and disks?

Is it purely inertia from having so many legacy systems that don’t support scale-out clustering, or applications that are built to only understand legacy network storage protocols?

Or is what we have good enough for enough use cases that change just isn’t worth the bother? Do our abstractions above the array level need to get much better, much easier to use, so that, like smartphones, people flock to them because they offer so much more than what came before?

My Take

I think the next level of abstraction, above the array, is inevitable, and is already here in multiple forms. We already have scale-out storage, we have server-SANs, and we have software based client and server endpoints.

And I think that, somewhere, a consumer oriented device will pave the way for substantial change. We already have people using iCloud, and Google Drive, and SkyDrive, and Dropbox, in ways that offer a dramatically better user experience than copying files and folders around. I long for a cross-platform, client based driver of some kind that lets me abstract away all the storage devices that live out there so that I can easily, and portably, just point at “some storage” and be done with it.


Bookmark the permalink.

Comments are closed.