Walking The Tightrope

Posted: 30/11/2010 in IT-centric Articles

We all know about the National Australia Bank snafu that has rendered thousands of people unable to get to their funds over the past week. We can have nothing but sympathy for their customers’ position, obviously. Banking is one of the fundamental planks of our societal infrastructure, and when it fails, its repercussions are often disastrous for people.

Last night came the “revelation” that the outage was caused by human error. Having been in the IT industry for 15 years, this came as no surprise. It was going to be human error irrespective – either at the level that the architecture of the system that failed was inadequate right from the start, that processes were flawed, or that someone had Pressed The Wrong Button. Unfortunately, it turns out it was the latter. I feel for the guy.

As a senior tech and architect, I am well acquainted with the risks of being in the IT game. People fail to appreciate the position we put ourselves in every day when we go to work. IT systems are by their very nature an intricate, complicated bitch of a house of cards, and we know that the balance of probability is that at some time in our careers we’re going to bump one and knock it down. This knowledge in some ways renders getting out of bed of a morning clinically insane. You never know when it is going to be your turn. One little mistake can take out a company for several days. When systems do break and break but good, they take Time to get going again. We have to wait on vendors, or rebuild systems from scratch, or restore terabytes of data from tape. It’s not always as simple as “Have a backup system”, as much as we like to think so.

Because computer technology pervades modern life, we as its keepers have the potential to take companies offline and adversely affect hundreds or even thousands of people. And there is just nowhere to hide.
This is compounded by the fact that mostly the systems we look after run on products bought, strictly as-is, with no liability express or implied, from software vendors whose myriad developers may or may not have been up to the task allotted to them. This extends down into the hardware these systems run on, because even it has embedded code, written by humans.
We rely upon these faceless people to get not only their code right,  but also the documentation – the advice and procedures – that they publish. If it is not right, we will be steering the ship straight at the rocks full speed ahead.

So we tiptoe through, we check everything six times, and we test religiously, if we know what’s good for us. Things take a long time to get done because we’re making sure that whatever action is proposed is going to be right, because if it’s wrong, the Big Boss tends to come calling.

As you move along in the industry, you can move into architectural/design positions, where you have the scope to screw up larger and larger systems and cause larger and larger outages. You would think there would be some commensurate prestige for the added responsibility, but the truth of the matter is that the field is so bloody esoteric that people just don’t get what goes on behind the scenes. Compounding the thanklessness of the job is the fact that whilst any outage or even slight degradation in service results in outraged screams from users, huge improvements in performance or functionality are quickly forgotten and taken for granted as the new minimum requirement of service if indeed they’re noticed at all.

It begs the question why do we do it and how do we cope? Sometimes we wonder, we really do. But we do it because we’re good at it and deep down we don’t trust anybody else to do it. We cope by getting Good, and by getting acquainted with every support resource in existence for the technology we support. It’s get good or get out.

So next time you want to have a go at an IT guy for not getting things fixed fast enough or for things being down in the first place, remember that he is constrained by the quality of the systems his managers let him buy & build, by the quality of the products and support resources supplied by the vendors he is told to use, the limitations in terms of speed of how fast data can get from place to place, and a dozen other things. He’s going as fast as he is physically able to.

If you want to get things moving as fast as they possibly can, bring coffee and pizza. And then let him work.

  1. Alan says:

    Interesting article, we are all faced with this sort of delema, computer IT, engineer,doctor, mechanic one wrong move and its a potential disaster, banking system goes down, patients die, wheels fall off your car. You have a job that your paid to do, do it the best you can and pray all is right

  2. Arthur Mugatroyd says:

    And if a dentist or medical doctor makes a mistake, they get sued, and if the mistake knocks down the whole system, the patient dies.

  3. Mate – what a top-shelf article. I don’t think I could have done better myself (and I wouldn’t have helped things by including excesses of sarcasm and the cynicism that is inherent in IT people that have been in the game for this long 🙂

  4. rfc1394 says:

    We also don’t build the reliability into software systems that they should have. We would refuse to permit a Taiwanese manufacturer of toasters to build equipment as badly as we as computer programmers get away with in building software implementations. It’s one thing to write a quick and dirty application to do something where if it crashes, we can just run it again, it’s like, if you mess up doing a photocopy you just make another one.

    When you’ve got huge amounts of money on the line or human survival involved, it should be subject to a higher standard. But nobody wants to pay for the cost of safety, at least in complex financial systems. They want it yesterday, the people maintaining the systems get no respect, and often the people hiring them consider them expendable cogs who can be replaced or farmed out through outsourcing to countries with cheaper labor.

    And if programmers are producing garbage, then that’s exactly what they’re asking for.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s