This is part of a series on the SNIA Blogfest 2010.
NetApp were the fourth, and final, vendor of the day. We were met by John Martin, Principal Technologist, and a young guy from NetApp’s PR firm who was presumably there just to observe. We gathered around a conference table, and fired up laptops and wireless Internet (3 was working again, huzzah!) to connect back into the Twitterverse.
John had a slide deck prepared, and started by talking about “Key IS Issues and Trends”. “Do you have an internal SLA?” was a question that more companies are beginning to ask themselves, and more and more of them are beginning to be able to say “yes”. More of them have service catalogues for IT, and more are beginning to compare those internal services with what they can get from the cloud. If you know that it costs you $4500 to put a desktop on someone’s desk, you can start making intelligent decisions about whether VDI is a good idea for you. John also said that “everyone stopped buying infrastructure for about a year” during the GFC.
Shared Storage Platform of Choice
John moved on to describing the market for NetApp. He said that with the trend towards service catalogues and cloud services (bingo!), companies were building more generic services, rather than custom ones. He used a wedding dress analogy. Rather than spending a lot of money on an item you’ll use for one purpose, and that requires lots of fittings and adjustments and customisations, companies are moving more towards want to buy off the rack. IT designs things in standard sizes and then you just buy one in the size you need, which is cheaper and faster.
In this view of the world, John said that NetApp’s primary aim is to be the “platform for choice for shared storage infrastructure.” If you’re building a shared storage model, NetApp want to be your choice for the storage part. This was the first time on the day that I recall a vendor making a concise, clear statement about their goal like this.
John quoted numbers from an IDC report about NetApp shipped disk, but I’m not going to repeat it here because I haven’t seen the evidence for the stat. The IDC Quarterly Disk Storage Systems Tracker press release from September doesn’t gel with what John mentioned at the time, so he could be referring to something else. To be honest, I don’t really care, as these kinds of figures are a simple marketing tactic of social proof and I tend not to recommend a course of action just because “everyone else is doing it”, particularly when the proof for “everyone” depends very much on the statistical fine print.
Features For Now
John reminded us that NetApp have supported thin provisioning since ONTAP 7.0, and that you can swap between thin and thick provisioning with a simple command. If you decide you don’t want to thin provision any more (and you have the space) you can just turn off thin provisioning of volumes. Or you can turn on thin provisioning just as easily.
John also spoke about dedupe, and we delved into some of the technical details of the way it works. NetApp dedupe uses the MD5 hashing algorithm to select blocks that are candidates for dedupe. I asked about the potential for hash collisions (as has been proven to be possible for the MD5 method) and was assured that before a block is actually de-duplicated, the candidate blocks are fully compared; the MD5 is just a pre-selection method. Good to know.
John mentioned QoS was available, which in NetApp parlance is called FlexShare. This isn’t real QoS; it’s a prioritisation/penalty mechanism that allows you to specify which volumes should lose first when resources become scarce. It’s analagous to the way networks prioritise low-latency traffic (like VoIP traffic) so that if there’s contention for bandwidth, those latency sensitive workloads get priority over things that can cope with slower response times. You can do the same with NetApp volumes. It’s a bit of a blunt instrument, but it’s better than nothing.
John also mentioned a feature he called Dynamic Tiering (bingo!). Apparently this is use of Performance Acceleration Modules (PAM). For those not in the know, PAM (or Flash Cache as it’s now called), is an addin card of flash memory that acts like an extra bunch of RAM in the data path for NetApp devices. It’s not SSD. PAM behaves just like a RAM cache, and it can work with FlexShare if you want to prioritise certain volumes to have cached blocks held longer compared with others. This is quite neat for what it is. Which is not tiering in the sense of permanently (manually, or automatically) moving busy blocks of data to faster disk, or slower blocks to slower disk, such as in EMC FAST. Calling it Dynamic Tiering is misdirection, so remember if you hear that term from NetApp that they mean PAM.
Interestingly, John made a comment saying that with dedupe turned on, you were likely to experience a performance hit on random reads if you didn’t also have PAM in the system. I’d not heard that before, so I’d be keen to see some corroboration for that statement.
John made several references to NDA material, which we decided as a group not to hear. Blogfest is about bloggers writing, in public, about public information. I personally wasn’t interested in learning about stuff I couldn’t write about. What’s the point? I was also conscious of not wanting to taint the material that we had heard with NDA material. There was the further complication that no other vendor had offered NDA material, so that would have meant NetApp were on an uneven footing compared with the others; not something I felt comfortable with. So, we elected to issue a communal “Thanks, but no thanks” and moved on.
So we went into speculation land, with John sharing what felt a lot like his personal views on where things were going. I make a special point of this because John can be… opinionated… and I’m not entirely clear on where the line was between his personal views and the views on NetApp. Let’s chalk it up to the novelty of Blogfest, and relish in the candidness of the commentary.
John said that NetApp’s scale-out version of ONTAP (C-mode in ONTAP 8.x) aims to provide customers with an ‘immortal cluster’ so you can do rolling upgrades of the gear without interrupting service. This is a great idea, and echoes what Clive from EMC was talking about at the start of the day, and it was present in the IBM session as well. Imagine being able to move workload around, non-disruptively (using a global namespace, among other things) so that you can do scheduled maintenance, or upgrade the hardware from the old gear you bought 5 years ago to the latest and greatest? Rolling upgrades, in other words. It’s not a new idea, but I’ve seen very few organisations actually do what’s required to implement it themselves. If a vendor can make it easy, or built-in, well. That would be something.
We chatted a bit about ONTAP 8 and how long it’d taken to start seeing these sorts of features make it into the ONTAP codebase since the Spinnaker acquisition in 2003. John made some unflattering comments that I won’t repeat, and to his credit, he did check himself before saying anything more. Suffice to say that the merging of the Spinnaker code and the ONTAP code has proved to be somewhat of a challenge, and that NetApp changed direction midway through the process. It seems that cost them some time (like, a couple of years) so hopefully we’ll now start to see more solid progress on this front.
John stated that “all backup vendors are moving to replication based backups.” I’m not entirely clear on what he meant by this, but I assume that he means disk based storage areas that are replicated to at least one site, rather than making multiple tape copies and then storing some, or all, of them offsite.
John believes that “stub based HSM [hierarchical storage management] is a crock.” He waxed lyrical about how tape is bad, and that disk based replications (i.e. snapshots and SnapMirror/SnapVault) are the One True Way.
Now, John is quite passionate about backup, and I have a tendency to agree about the benefits of disk based backup in many ways, but this kind of hyperbole kindof puts me off. Tape has a lot of benefits, and many customers are smart to use it where it makes sense to do so. To declare all of them as daft would be.. well, off-putting. The case for disk based backup is not well made as a panacea to all customers backup needs. John’s bald assertions leant a little too far towards ideology for my liking. NetApp abortive foray into VTL systems doesn’t exactly cement their credibility in making these kinds of future gazing statements.
John has apparently written a whitepaper on why stub based archiving is bad, so hopefully he can pass it on to me and I’ll become enlightened. For now, I’m more inclined to view it as an extreme position taken to stir thought and debate.
John ended with two slides, one a list of areas he believes are over-rated, and the second a list of topics he believes are under-appreciated. I will call attention to two topics from the each list:
Badly designed Ethernet storage networks got a mention, as did 8Gb FibreChannel. John’s intent here was to say that sure, badly designed Ethernet won’t work very well. Because it’s been badly designed. I would add poor implementation to this as well. Well designed networks, of any kind, will work well for the purpose they are put to, be it data traffic, storage traffic, or a hybrid. It takes skill and care to do this well, and picking on other vendors for failures of others is unfair and misses the point. A note for the more savvy: there are opportunities here for a smart company to make serious savings if you do what others say is impossible. As seen on Twitter recently: “History is littered with corpses that thought they were smarter than Ethernet.”
John’s point in dissing 8Gb FibreChannel was an echo of a previous statement he made saying that most FibreChannel links are about 5% utilised. FibreChannel networks in general have been drastically over-engineered and are fiendishly expensive. With FCoE enjoying greater acceptance, and multiple tens of gigabits available with Ethernet today, why would you deploy expensive hardware that only handles one protocol?
On the under-rated side of the ledger, SNIA gets a special mention. John called on SNIA to champion customers and drive the adoption of a common capacity efficiency measure of some sort to assist customers with assessing different vendors’ claims. With everyone making different, noisy, claims about how efficient they are compared with others, and the fine print challenging for industry professionals to follow, I remain unconvinced that a benchmark of some sort would actually improve things. Consider all the noise erupting recently over NetApp’s latest SPEC SFS numbers. Yawn.
And finally, John was the only vendor to mention #FCoTR and continued his habit of making bold statements that he may come to regret by urging us to consider the superior qualities of FC over Carrier Pidgeon for today’s storage needs.
Merch: NetApp supplied some party pies and arancini ball type things, along with beer and wine. I had a beer, from memory a Cascade Premium, during the session, and then at the end had 2 arancini balls, 1 mini-quiche, 1 party pie, and 2 half-glasses of wine (no idea what it was, so it wasn’t amazing) before bailing for a plane home to Melbourne.
The core problem with stub based archiving (and where I agree with John) is that over time it just continues to magnify the dense filesystem issue. A dense filesystem is one where the time taken to walk the filesystem, either for a check or for a backup, becomes time-prohibitive.
Stub based archiving removes big files and leaves small pointer files behind to reference the removed files. This frees up space, which gets occupied by other files, which in turn, over time, get removed and replaced with small stubs.
Over time, as more files are added (due to space being freed by the archival process) and then later archived (resulting in more stubs), the density of the filesystem continues to explode. We solve a capacity issue, but in doing so create a new issue that operating system vendors/providers have really struggled to address – filesystem traversals where the density is growing becomes increasingly costly.
I’d suggest that stub based archiving demonstrates a fundamental flaw in current filesystem architectures – the actual theory is sound, it’s just that the backing technology frequently fails our needs.
Thanks for commenting, Preston.
I can see how this is moving the problem. However, if the stubs weren’t created, then you’d still end up with a dense tree problem, but you’d also still have the capacity issue. At least one problem is being solved, even if it’s exposing another (potential) issue.
Certainly one should be aware of these issues when designing a system for scale, but that’s always true. Is this enough of a reason to abandon stub based archiving altogether?
Pingback: Tweets that mention SNIA Blogfest 2010: NetApp | eigenmagic -- Topsy.com
It’s not sufficient reason for abandoning stub based archiving. However, any decision to use stub based archiving needs to be balanced against design and management issues that it introduces. Many companies mistakenly introduce stub based archiving without considering the implications. As long as you design with it in mind, you can at least work around the problem…