Here are some tips for implementing thin-provisioning with NetApp controllers in a way that is operationally manageable.
This isn’t the only way, I’ve just found it to work well a couple of years after you’ve turned it on, unlike some other methods that work well at first, and then explode in ways that are hard to fix.
My Preferred Method
Create the biggest aggregates you can. Here’s an example command for 1 terabyte SATA disks, for ONTAP 7.3 and above:
aggr create SATA_72K_AGGR01 -r 12 [email protected]
When provisioning volumes, set volume space guarantee to ‘none’ (this is the thin provisioning bit):
vol create <volname> -s none SATA_72K_AGGR01 <size>
For existing volumes, you can set the space reserve like this:
vol options <volname> guarantee none
For a “get out of jail” safety-belt, enable snapshot autodelete:
snap autodelete <volname> on
This will automatically delete snapshots to get 10% space free in the volume, starting with the oldest ones.
N.B.: This is equivalent to taking some of your backup tapes out of your tape silo and setting them on fire to create space. It’s an emergency measure, not standard operating procedure. NetApp controllers will send SNMP alerts when it autodeletes snapshots, so monitor for them and increase the size of the volume if it triggers.
Unless you don’t care about backups, in which case, give me your contact details so I can call you up and mock you when all your data gets lost. Because it will. You know this, right? Of course you do.
What This Does
This does thin provisioning at the aggregate level. This gives you a big pool of storage to allocate storage from, and when people ask for too much (which they will), all the space they didn’t actually need stays in the pool for the two or three people who suddenly load the entire production database into development so they can “run some tests” (which they also will).
Using a big pool of storage maximises the amount of free space sharing you do. It also minimises the change that 5 dev teams will all load their production databases at the same time and use all the space.
vol autosize is another thing that people also use, but I prefer the above method for two reasons:
- Autosize only does auto-grow. There’s no auto-shrink. With space guarantee none, when the space is freed, it automatically goes back into the aggregate’s free space pool.
- Autogrow only triggers 5 times in a short time period. If you set the grow amount too small, the volume will try to grow, hit the limit, and then stop growing, possibly causing out of space errors.
Traps for the Unwary
Thin provisioning is a bet: you’re betting that not everyone will need all the storage they’ve asked for all at once. How big a bet you make is how much you over-allocate your aggregate.
Let’s say you reckon people ask for about 15% more storage than they end up using. This means you can use the method to create volumes that, in total, are allocated 115% of the total available in the aggregate.
Only no, because you need about 10% free space in the aggregate to avoid performance issues associated with WAFL searching for free blocks.
So plan to keep about 10-15% free space in the aggregate. Note that because you’re thin provisioning, this is easy to do.
Don’t set volume fractional reserve to less than 100%, that’s thin-provisioning inside the volume. You’re getting the same thing from aggregate thin provisioning, but you only have to monitor aggregate free space, not free space in every volume. It’s easier.
Do respond to the alarms, don’t ignore them. If snapshot autodelete triggers, check your environment to see if things need to be moved around.