I was reflecting on Runecast’s presentation at Tech Field Day Extra recently and it highlighted to me a persistent gap in how we operate our IT systems.
Successful systems spend most of their time being run. This is a pretty simple and obviously true observation, and yet I keep being surprised that the implications of this statement tend to get overlooked. The implications seem similarly obvious and straightforward to me, but clearly not to other people, which I find perplexing.
For example, there’s still a lot of marketing rhetoric aimed at how bad it allegedly is that the large majority of IT budget is spent on maintenance. There are plenty of articles talking about how organisations should be trying to shift spending into ‘innovation’ and change instead of maintenance. Various big name consultants have talked about it for years. Analysts write reports on it.
But consider what happens when an ‘innovation’ project is successful: it becomes important to the business and hangs around. Systems spend most of their useful life being used, not being built. That’s the whole point of building them. The only systems that spend most of their time in design or build phase are the projects that are failures because they never get to the part where they provide actual value.
The bias towards the shiny and new in technology has lead to an under-appreciation of maintenance which, I would argue, has actually made the design and construction of systems worse. If systems are built to be thrown away and replaced with a new system, they are designed and constructed very differently from systems that will need to be maintained for a long time. Long-lived systems need to be easy to maintain, and also flexible so that they can be modified to adapt to a changing world.
Ease of change is at the heart of what Agile programming tried to do, but it’s also why people find Excel supremely useful, and why CI/CD is popular and why COBOL was designed to be easy for non-programmers to use.
And this is where things get thorny, because not all change is good. Ransomware infecting a system is change, but not a change that most people would welcome. So we now need a way to decide if a change is beneficial or not, and the subtlety of this decision process is where many organisations come unstuck.
Any tool like Runecast Analyser needs a standard of goodness to measure against, and for many systems this standard is assumed to be a fixed, impermanent thing. In reality, the standard of goodness needs to change as technology improves and expectations change. Consider security patches: it used to be acceptable for a system to go months without receiving security patches (and for some systems this is still the case) but now most systems, especially ones connected to the Internet, patching happens automatically. Any system that measures “have security patches been applied recently?” needs to take this into account.
But finding the problems we need to fix is just part of the challenge. We need to be designing and building our systems so that problems are easy to fix. And not just the problems of today. Changing standards means that what we might consider perfectly acceptable today may become a problem tomorrow, and fixing it will mean our system needs to change. Reducing the cost of change is possibly the most important area to focus on for building a robust system.
Thus we see that flexibility and standardisation are inextricably linked. They are not a problem to be solved once and for all, but a tension to be managed in an ongoing, iterative process. We need to be constantly deciding what good looks like and if change is beneficial or harmful. And not everyone will agree, which means we need a system for managing this disagreement.
Maintaining IT systems ultimately depends on our ability to maintain human systems.
Which makes it quite a substantial challenge, but one worth working on, don’t you think?