Flash Costs and Dedupe

Okay flash storage vendors, enough is enough.

Several of you have been trying desperately to convince people that flash is cheap enough to replace spinning disk by measuring it wrong.

You keep trying to compare the price of deduped flash with raw spinning disk, and that’s just silly.

What Even Is Dedupe?

De-duplication is the act of reducing the amount of physical storage media you need by finding duplicate blocks in the stuff you’re storing. If you’ve got a bunch of VMs with the same OS image, then there are lots of duplicates. An encrypted database system? Not so much.

Dedupe is particularly important for flash based systems because of write amplification. Basically, the more you write to the flash, the shorter its lifespan, so if you can reduce the amount of data you write by detecting duplicates on the way in, then you make the flash last longer. That’s why all-flash array vendors tend to have inline dedupe rather than post-process dedupe, and tend to have it from launch. (Though not all, e.g. Violin Memory, who only just announced it).

And hey, dedupe means you can store more ‘data’ on the same physical media, which is a good thing. Yay!

Dedupe Doesn’t Care About Media

The thing is, dedupe is all about the ones and zeros that make up the information. Dedupe is also available on arrays that use spinning disk, and has been for years (though some more than others).

Let’s say your data achieves a dedupe rate of 5x, that is, 500 GB of data would only take up 100GB of space because each block has, on average, 4 duplicates. This dedupe rate will be the same if the data is being stored on flash or spinning rust (assuming the same algorithm is used, on which more in a moment).

If you dedupe your data, and store it on HDDs, you get a 5x reduction. If you store it on an all-flash array, same deal. So the effective capacity of both options is exactly the same.

But what the flash vendors try to do is compare the cost of effective usable flash with raw HDD. And that’s bullshit.

The only way you can compare these scenarios is if there’s a different dedupe rate for HDD than for flash. And that comes down to a difference in algorithm. Right now, the flash vendors are assuming your algorithm is no dedupe at all, and so you get a dedupe rate of 0x, and then they compare it to their optimistic 7x reduction or whatever they claim, and then start trying to say flash is as affordable as disk.

Which is bollocks.

If a flash vendor wants to come out and say that their dedupe algorithm is superior to the dedupe on a spinning disk array, fine. Let’s have that discussion. But to compare it with zero dedupe is to insult our intelligence.

Economics Understanding Fail

The idea that flash would be cheaper than spinning disk also fails the economics test. Is spinning disk cheaper than tape? No. Why not? The price of spinning disk has been falling continuously for years.

Ah, but so has tape. Technology advances, and tape has not stood still. You can fit 2500 GB on a single LTO-6 tape cartridge, raw. No compression, no dedupe, just raw capacity. A bit over 12 months ago, I wrote that that’s about 3 cents a GB. As demand for tape falls (assuming it does), the price for tape will fall as well.

But people are more willing to pay for the performance of flash, or the lower power consumption, or some other factor, just as they are willing to pay for the online nature of spinning disk instead of storing everything on cheap tape. Tape is linear, and latency is a killer. So too with flash versus spinning disk. The willingness to pay for a GB of flash is higher because it’s better than a GB of spinning disk!

And all the flash vendors are making this exact claim! Flash is better! It’s faster! It chews less power! It takes up less space! These are all true, and there are costs associated with them. But the cost aren’t enough to eclipse the price difference between HDD and flash.

And as the price of flash falls, the price of HDD will also fall, because people will be less willing to buy it. I mean, if you had a 500GB 15k RPM disk and a 500GB SSD, and both were the same price, which would you buy? The SSD, of course! Demand is lower for the HDD, so guess what happens? The price drops!

Yes, it ends up being more complex than that in real life (production capacity limits, horizontal and vertical differentiation, information asymmetries, etc.) but broadly this is true.

And that’s why we get these ridiculous comparisons between raw HDD and deduped flash costs. Because on a byte-for-byte comparison, flash costs more than disk.

Cut the Cost Crap

When will IT people in general learn to talk about value instead of cost savings?

Talk to me about how flash makes my databases faster. How I can get my sales figures processed faster, and see the trend reports earlier, so I can make better sales decisions. How I can figure out what my customers want by running analytics on my sales data faster. It’s faster, so I can run more, different analyses. Maybe I can run simulations, instead of just trying to guess based on reports? Maybe it helps me ship my products to customers quicker?

Talk to me about the qualitatively different ways flash can help my business do more of what it’s there to do, which isn’t to install storage systems.

I don’t store my OLTP database on tape, and I don’t put my backups on flash. I use different technologies for different purposes, and I don’t want One Ring to Rule Them All because then I’d get a pretty sucky outcome for everything, either over- or under-paying for my solutions.

Bookmark the permalink.

One Comment

  1. Hi,

    But, the legacy high end arrays like VSP do not have the dedup and suck at thin provisionning. I know that in 3 U, I can put as many VM as I do with 3 or 4 complete racks of dinosaurs systems and get better performance.

    Of course, every situation is unique but in the long run, flash/memory will be the only medium where you store data. Ironically, those 2 terms are latin since ages ;-)

Comments are closed