I have a lot of experience with performance and capacity management.
Some years ago, a colleague and I got fed up with the failings of enterprise tools like Concord eHealth, and also the limitations of low-end tools like MRTG. We wanted something that was easy to use like MRTG, but powerful as well. Back then, this was a big gap.
So we did what any insane technical person does: we wrote our own. Several times. This was one of my failures at doing a software startup, really.
I won’t go into the details other than to say that I’ve personally written an SNMP discovery and polling engine, a charting library, and a web-application (using Zope!) to do performance and capacity monitoring, so I have a little bit of knowledge in this area.
Foglight covers all the basics. It discovers information about the virtualisation platforms using interfaces and credentials that you give it. It collects statistics. So does every other tool.
Stuff I Like
Foglight runs on Windows and Linux. So many of the tools dealing with virtualisation seem to be Windows only, which makes me sad.
The DVR capability. You can ‘rewind’ the view and see what was going on at a particular time. This would have been incredibly useful at any number of clients I’ve worked with when trying to diagnose performance issues. The constant back-and-forth, and people arguing about how to do performance monitoring, of trying to get instrumentation set up is one of the more frustrating things I’ve had to work through. Just being able to scroll back through the statistics history, and the “what’s changing” views would make troubleshooting so much easier.
The auto-detect of a large variety of devices. This isn’t a hard feature to implement, but it is bloody tedious. Particularly when some vendors’ equipment crashes if you walk the SNMP MIB and hit certain OIDs (has happened), and you can’t discover too quickly or you’d denial-of-service the CPU (some oldish Cisco Catalyst switch I once accidentally swamped). Being able to do this well highlights a level of code maturity that augurs well of a product.
Browser based console!
Pretty charts and sparklines! Tufte++!
Stuff I Don’t Like
Foglight does data rollups too early for my liking. Storage is cheap, so I don’t understand why you wouldn’t want to keep granular data for at least a month by default. As @plankers pointed out during the presentation, you lose information whenever you start to average. A year from now, sure, trends is probably important, but at many organisations, you might not even get the “performance problem” ticket for a couple of weeks because of the bureaucracy of the place. If the data’s already rolled up, now you’re in finger-pointing hell because you have to wait until the next end-of-month run before you can help solve the problem. And then it won’t be solved until next month, so you’ve just cost the business three months of performance because you wanted to save a couple of gig of storage.
Having said that, the rollup intervals are tunable. I just wish vendors would update their defaults from 10 years ago to reflect the fact that storage is orders of magnitude cheaper now.
I also don’t like the animated graphics. They’re shiny for the demo to management who sign the cheques, but annoyingly pointless for the admins who actually use the tool every day. @plankers was again right on the money when he highlighted the accessibility problems on just having colours (traffic lights) to indicate issues. There needs to be a shape change as well. All instrumentation software people should study vary carefully the design issues with control panels of all kinds.
Hopefully your UX designers have, but many haven’t. Go find a good industrial designer to help out, preferably someone with experience in manufacturing plants like chemical factories. Those places don’t have time for fancy graphics, they need clear and ambiguous control panels that tell them just what they need to know. When you have a gas leak, or a chemical spill, “the circle is spinning faster” doesn’t cut it.
Foglight looks like a great tool and well worth checking out in more detail.