Issues of Scale

Over the years, I’ve had some frustrating times at work due to the… shortcomings… of some people who purport to be IT professionals. Once upon a time, a project team was in a dither about a ‘performance problem’ with the storage their application/system was using. I was working in the storage team at the time, and we checked it out. The storage was fine, and we demonstrated to the project that this was so, suggesting that perhaps the problem was elsewhere. It often is, but people are quick to blame either the storage, or the network, as the root cause of their woe. In my experience, badly written applications are more often to blame.

The project people wandered off to look elsewhere and that was the last I heard about it, until a couple of weeks later. Someone external to the project had been attempting to help them fix the myriad of issues they had with their application, mere weeks before GoLive, and they noticed a fundamental design flaw that would cause massive scaling problems; problems that would likely bring the application offline mere months after going live. Not good. It turned out that the ‘performance problem’ some weeks earlier had, in fact, been a symptom of this design flaw. It was a doozy.

For some reason, the application programmers had decided that their application would store all its images on the filesystem, and would store the path to the images in its database, rather than just storing the images in the database. They anticipated that there would be a couple of million or so of these images every month. So where do you think they put the images?

All in one directory.

Yeah. Millions of files in a single directory. And they were apparently surprised that this didn’t work terribly well (a storage problem, of course), so once they figured out that this was the cause of their ‘performance problem’, they changed the way the application worked. By having it create directories to contain similar image files. So instead of millions of image files, they had millions of directories, each containing 5 or so image files. And where did they put these millions of directories?

All in one directory.

The chief architect, lead programmers and everyone else on this project had somehow failed to understand that putting millions of things into a single directory will not perform well, even after this fact was pointed out to them by the application not working. Truly EPIC FAIL.

What amazes me is that only months earlier, we’d had to explain to a completely different project (with their own chief architect, lead programmers, etc.), slowly and painfully, this same concept: putting lots and lots of things into a single directory isn’t going to scale well, and is a Bad Idea. Two totally different groups of people, the same, basic misunderstanding of scalability. We actually had to argue with them about why this wouldn’t work well.

This isn’t a new problem, and it has been solved many times before. Yet none of these people appeared to have any grasp of why they should think about the implications of scale.

Here’s what really bothers me about this situation: The people who made these really obvious and very, very stupid mistakes, the Chief Architects, Lead Designers, or whatever, were charging lots and lots of money for their ‘services’. Many times I’ve been in the situation where I, or my team, have been called upon to work miracles to somehow get these fundamentally broken designs to function. Always in incredibly short timeframes, always with the expectation that stopping to figure out what went wrong isn’t acceptable because there are Deadlines. We don’t have time to do it right, we only have time to do it fast.

This is why businesses hate IT. These projects cost millions of dollars, run late, don’t deliver what they’re supposed to, and then cost even more money while you try to fix the mistakes. And no one has the time to understand why things went badly, because they’ve all lurched off to another project that’s in crisis for the same stupid reasons.

This is happening in lots and lots of very big companies right now. And it’s not going to get better any time soon, because these same people are going to put ‘Chief Architect of Big-Project-X at ImpressiveCorp’ on their CV and get a job somewhere else, where they can make the same stupid mistakes, leaving you to clean up the mess they’ve left behind. You probably won’t hear about these people, either, because a project has to fail really badly and publicly for anyone outside the company to notice, since companies don’t tend to advertise just how bad they are at running IT projects to their competitors or customers. That, and if you knew you had one of these clowns working at your company, would you want to hinder their rapid departure? Very Dilbert-esque.

If you’re a business person who’s sick of this happening in your company, stay tuned for my next article on what you can do to fix it.

Bookmark the permalink.

Comments are closed.