I recently wrote a short article for a client on NetApp snapshots. It was for some internal marketing, to help application developers understand the technology a little better, so they could design their applications to take advantage of some of the features.
It reminded me that snapshots, that most basic of NetApp features, really are very cool. When something awesome (like lightbulbs) are part of your every day, it’s easy to forget just how awesome they really are.
So here’s a primer on NetApp snapshots for those who haven’t used them, and a reminder of their awesomeness for those of us who take them for granted.
In the beginning, snapshots were pretty simple: a backup, only faster. Read everything on your primary disk, and copy it to another disk.
Simple. Effective. Expensive.
Think of these kinds of snapshots as being like a photocopier. You take a piece of paper, and write on it. When you want a snapshot, you stop writing on the paper, put it into the photocopier, and make a copy. Now you have 2 pieces of paper.
A big database might take up 50 pieces of paper. Taking a snapshot takes a while, because you have to copy each page. And the cost adds up. Imagine each piece of paper cost $5k, or $10k.
Still, it’s faster than hand-copying your address book into another book every week.
It’s not a perfect analogy, but it’s pretty close.
Having to copy all the data every time is a drag, because it takes up a lot of space, takes ages, and costs more. Both taking the snapshot, and restoring it, take a long time because you have to copy all the data.
But what if you didn’t have to? What if you could copy only the bits that changed?
Enter copy-on-write snapshots. The first snapshot records the baseline, before anything changes. Since nothing has changed yet, you don’t need to move data around.
But as soon as you want to change something, you need to take note of it somewhere. Copy-on-write does this by first copying the original data to a (special, hidden) snapshot area, and then overwriting the original data with the new data. Pretty simple, and effective.
And now it doesn’t take up as much space, because you’re just recording the changes, or deltas.
But there are some downsides.
Each time you change a block of data, the system has to read the old block, write it to the snapshot area, and then write the new block. So, for each write, the disk actually does two writes and one read. This slows things down.
It’s a tradeoff. You lose a bit in write performance, but you don’t need as much disk to get snapshots. With some clever cacheing and other techniques, you can reduce the performance impact, and overall you save money but get some good benefits, so it was often worth it.
But what if you didn’t have to copy the original data?
NetApp snapshots (and ZFS snapshots, incidentally) do things differently. Instead of copying the old data out of the way before it gets overwritten, the NetApp just writes the new information to a special bit of disk reserved for storing these changes, called the SnapReserve. Then, the pointers that tell the system where to find the data get updated to point to the new data in the SnapReserve.
That’s why the SnapReserve fills up when you change data on a NetApp. And remember that deleting is a change, so deleting a bunch of data fills up the SnapReserve, too.
This method has a bunch of advantages. You’re only recording the deltas, so you get the disk savings of copy-on-write snapshots. But you’re not copying the original block out of the way, so you don’t have the performance slowdown. There’s a small performance impact, but updating pointers is much faster, which is why NetApp performance is just fine with snapshots turned on, so they’re on by default.
It gets better.
Because the snapshot is just pointers, when you want to restore data (using SnapRestore) all you have to do is update the pointers to point to the original data again. This is faster than copying all the data back from the snapshot area over the original data, as in copy-on-write snapshots.
So taking a snapshot completes in seconds, even for really large volumes (like, terabytes) and so do restores. Seconds to snap back to a point in time. How cool is that?
But wait, there’s more.
Snapshots Are Views
It’s much better to think of snapshots as a View of your data as it was at the time the snapshot was taken. It’s a time machine, letting you look into the past.
Because it’s all just pointers, you can actually look at the snapshot as if it was the active filesystem. It’s read-only, because you can’t change the past, but you can actually look at it and read the data.
This is incredibly cool.
Seriously. It’s amazing. You get snapshots with almost no performance overhead, and you can browse through the data to see what it looked like yesterday, or last week, or last month. Online.
So if you accidentally delete a file, you don’t have to restore the entire G:, or suck the data off a copy on tape somewhere. You can just wander through the .snapshot (or ~snapshot) directory and find the file, and read it. You can even copy it back out into the active file system if you want.
All without ringing the helpdesk.
Celebrate the Humble Snapshot
I’m really glad I had to write this up for people new to NetApp, because it’s given me a renewed appreciation for this most basic feature.
Have you been taking your snapshots for granted?