Why NetApp Snapshots Are Awesome

I recently wrote a short article for a client on NetApp snapshots. It was for some internal marketing, to help application developers understand the technology a little better, so they could design their applications to take advantage of some of the features.

It reminded me that snapshots, that most basic of NetApp features, really are very cool. When something awesome (like lightbulbs) are part of your every day, it’s easy to forget just how awesome they really are.

So here’s a primer on NetApp snapshots for those who haven’t used them, and a reminder of their awesomeness for those of us who take them for granted.

Basic Snapshots

In the beginning, snapshots were pretty simple: a backup, only faster. Read everything on your primary disk, and copy it to another disk.

Simple. Effective. Expensive.

Think of these kinds of snapshots as being like a photocopier. You take a piece of paper, and write on it. When you want a snapshot, you stop writing on the paper, put it into the photocopier, and make a copy. Now you have 2 pieces of paper.

A big database might take up 50 pieces of paper. Taking a snapshot takes a while, because you have to copy each page. And the cost adds up. Imagine each piece of paper cost $5k, or $10k.

Still, it’s faster than hand-copying your address book into another book every week.

It’s not a perfect analogy, but it’s pretty close.

Copy-on-Write Snapshots

Having to copy all the data every time is a drag, because it takes up a lot of space, takes ages, and costs more. Both taking the snapshot, and restoring it, take a long time because you have to copy all the data.

But what if you didn’t have to? What if you could copy only the bits that changed?

Enter copy-on-write snapshots. The first snapshot records the baseline, before anything changes. Since nothing has changed yet, you don’t need to move data around.

But as soon as you want to change something, you need to take note of it somewhere. Copy-on-write does this by first copying the original data to a (special, hidden) snapshot area, and then overwriting the original data with the new data. Pretty simple, and effective.

And now it doesn’t take up as much space, because you’re just recording the changes, or deltas.

But there are some downsides.

Each time you change a block of data, the system has to read the old block, write it to the snapshot area, and then write the new block. So, for each write, the disk actually does two writes and one read. This slows things down.

It’s a tradeoff. You lose a bit in write performance, but you don’t need as much disk to get snapshots. With some clever cacheing and other techniques, you can reduce the performance impact, and overall you save money but get some good benefits, so it was often worth it.

But what if you didn’t have to copy the original data?

NetApp Snapshots

NetApp snapshots (and ZFS snapshots, incidentally) do things differently. Instead of copying the old data out of the way before it gets overwritten, the NetApp just writes the new information to a special bit of disk reserved for storing these changes, called the SnapReserve. Then, the pointers that tell the system where to find the data get updated to point to the new data in the SnapReserve.

That’s why the SnapReserve fills up when you change data on a NetApp. And remember that deleting is a change, so deleting a bunch of data fills up the SnapReserve, too.

This method has a bunch of advantages. You’re only recording the deltas, so you get the disk savings of copy-on-write snapshots. But you’re not copying the original block out of the way, so you don’t have the performance slowdown. There’s a small performance impact, but updating pointers is much faster, which is why NetApp performance is just fine with snapshots turned on, so they’re on by default.

It gets better.

Because the snapshot is just pointers, when you want to restore data (using SnapRestore) all you have to do is update the pointers to point to the original data again. This is faster than copying all the data back from the snapshot area over the original data, as in copy-on-write snapshots.

So taking a snapshot completes in seconds, even for really large volumes (like, terabytes) and so do restores. Seconds to snap back to a point in time. How cool is that?

But wait, there’s more.

Snapshots Are Views

It’s much better to think of snapshots as a View of your data as it was at the time the snapshot was taken. It’s a time machine, letting you look into the past.

Because it’s all just pointers, you can actually look at the snapshot as if it was the active filesystem. It’s read-only, because you can’t change the past, but you can actually look at it and read the data.

This is incredibly cool.

Seriously. It’s amazing. You get snapshots with almost no performance overhead, and you can browse through the data to see what it looked like yesterday, or last week, or last month. Online.

So if you accidentally delete a file, you don’t have to restore the entire G:, or suck the data off a copy on tape somewhere. You can just wander through the .snapshot (or ~snapshot) directory and find the file, and read it. You can even copy it back out into the active file system if you want.

All without ringing the helpdesk.

Celebrate the Humble Snapshot

I’m really glad I had to write this up for people new to NetApp, because it’s given me a renewed appreciation for this most basic feature.

Have you been taking your snapshots for granted?

13 Comments

Emilio
23 October 2009 at 5:47 am

Justin, this post is such a great piece of work, in normal ‘English’ terms even a non technical guy can understand.

And yes, I share the love for Netapp and Solaris ZFS (despite some … how to say that … teething issues)

Thanks A lot for your contribution

Emilio (somewhere not in Australia, but a nice place as well)
Justin Warren
23 October 2009 at 6:07 am

Thanks for commenting, Emilio!

The idea of thinking about snapshots as Views came from something a colleague (Aman Dhillon) wrote, and it just clicked with me. It describes them so much better than ‘snapshots’ in the traditional copy-of-primary-data sense.
cheng anbao
28 April 2010 at 10:22 pm

Thanks for your wonderful sharing. ???
bertvanderlingen
14 February 2011 at 8:37 pm

ZFS is using Copy-on-Write Snapshots, not like Netapp.
That’s why almost an unlimited amount of snapshots are possible.

The netapp solution looks like an elegant snapshot implementation, you should be aware of the limitations.

You might want to have a look of this snapshot technology comparison:
http://www.ibm.com/developerworks/tivoli/library/t-snaptsm1/index.html
HoosierStorageGuy
17 February 2012 at 8:56 am

Here is another article worth reading. One of the most objective comparisons I’ve seen comparing the two snapshot methodologies.

http://oraclestorageguy.typepad.com/oraclestorageguy/2007/07/oracle-backup-1.html
Shiva
13 February 2013 at 1:03 am

Great work done…. its very clear explanation
papandut
26 February 2013 at 11:56 am

great share… now i understand the difference between Netapp snapshot and IBM snapshot. tq for sharing
Hernán J. Larrea
27 March 2013 at 5:27 am

Awesome! Wanted to share an article I have written some ago with some insights on how the NetApp snaps work: http://blog.hernanjlarrea.com.ar/index.php/netapp-snapshot-technology-when-does-a-snapshot-grow/
karanampranathi
23 April 2013 at 5:46 pm

very Awesome and easy explanation
Raghav
24 November 2013 at 9:44 am

It is a very useful article.. thanks a lot
Desmond Whelan
24 January 2014 at 7:44 am

ZFS DOES use copy on write snapshots. It can do “unlimited” snapshots because it does not use a snap reserve. It uses open file system space. Of course, no such thing as unlimited but it is not constrained by a snap reserve but is restrained by file system space.
Pingback: SFD5 Prep Work: Veeam | eigenmagic
Pingback: Module 5: Snapshop Technology | Caroline Wood : NET701 Modules