Thin Provisioning with NetApp: Operations Version

Here are some tips for implementing thin-provisioning with NetApp controllers in a way that is operationally manageable.

This isn’t the only way, I’ve just found it to work well a couple of years after you’ve turned it on, unlike some other methods that work well at first, and then explode in ways that are hard to fix.

My Preferred Method

Create the biggest aggregates you can. Here’s an example command for 1 terabyte SATA disks, for ONTAP 7.3 and above:

aggr create SATA_72K_AGGR01 -r 12 23@847

When provisioning volumes, set volume space guarantee to ‘none’ (this is the thin provisioning bit):

vol create <volname> -s none SATA_72K_AGGR01 <size>

For existing volumes, you can set the space reserve like this:

vol options <volname> guarantee none

For a “get out of jail” safety-belt, enable snapshot autodelete:

snap autodelete <volname> on

This will automatically delete snapshots to get 10% space free in the volume, starting with the oldest ones.

N.B.: This is equivalent to taking some of your backup tapes out of your tape silo and setting them on fire to create space. It’s an emergency measure, not standard operating procedure. NetApp controllers will send SNMP alerts when it autodeletes snapshots, so monitor for them and increase the size of the volume if it triggers.

Unless you don’t care about backups, in which case, give me your contact details so I can call you up and mock you when all your data gets lost. Because it will. You know this, right? Of course you do.

What This Does

This does thin provisioning at the aggregate level. This gives you a big pool of storage to allocate storage from, and when people ask for too much (which they will), all the space they didn’t actually need stays in the pool for the two or three people who suddenly load the entire production database into development so they can “run some tests” (which they also will).

Using a big pool of storage maximises the amount of free space sharing you do. It also minimises the change that 5 dev teams will all load their production databases at the same time and use all the space.

vol autosize is another thing that people also use, but I prefer the above method for two reasons:

  1. Autosize only does auto-grow. There’s no auto-shrink. With space guarantee none, when the space is freed, it automatically goes back into the aggregate’s free space pool.
  2. Autogrow only triggers 5 times in a short time period. If you set the grow amount too small, the volume will try to grow, hit the limit, and then stop growing, possibly causing out of space errors.

Traps for the Unwary

Thin provisioning is a bet: you’re betting that not everyone will need all the storage they’ve asked for all at once. How big a bet you make is how much you over-allocate your aggregate.

Let’s say you reckon people ask for about 15% more storage than they end up using. This means you can use the method to create volumes that, in total, are allocated 115% of the total available in the aggregate.

Only no, because you need about 10% free space in the aggregate to avoid performance issues associated with WAFL searching for free blocks.

So plan to keep about 10-15% free space in the aggregate. Note that because you’re thin provisioning, this is easy to do.

Don’t set volume fractional reserve to less than 100%, that’s thin-provisioning inside the volume. You’re getting the same thing from aggregate thin provisioning, but you only have to monitor aggregate free space, not free space in every volume. It’s easier.

Do respond to the alarms, don’t ignore them. If snapshot autodelete triggers, check your environment to see if things need to be moved around.

Bookmark the permalink.

2 Comments

  1. Justin,
    Good post. In regards to the statement: ” Autogrow only triggers 5 times in a short time period.” how many times can autogrow execute in a given time period? Can you provide the document that you gathered this information from?
    Thank you

  2. Thank Aaron,

    I can’t recall if it’s documented, and a quick search now doesn’t turn anything up. This information was gathered through experience at various client sites who hit the problem, where a volume tried to grow multiple times back to back because the autogrow amount was quite small, but the max volume size was large.

    I’m not sure what defines a “short” period, nor if it’s configurable. It seems to be a safety mechanism to prevent a mis-configuration of autogrow from having a single volume grow huge due to a temporary spike in storage requirement. Autogrow seems designed more for gradual increases required over time where you don’t want to actively manage space in individual volumes.

    Essentially, make the autogrow amount large enough that it’ll only need to trigger once or twice to add enough space in a volume.

    Note this was on ONTAP 7.x up to about 7.3. Later versions may operate differently.

Comments are closed