Fixing machines rant
I’m interested in the idea of designing machines that are easier to operate and do not require an expert everytime something goes wrong. Machines that would be designed with the aim of making faults visible. That machine could be a computer, a router, or a really simple user appliance. I wonder if this might mean changing the way machines are currently designed right from scratch, or perhaps, just a different paradigm of programming.
For example if a pair of scissors don’t work, there can be very limited reasons why they don’t. Either the blades are not sharp, or the screw is not oiled, or it is broken. Just by physical inspection a person can tell what went wrong with the scissors.
The level of complexity of computers is far far more, I’d say by several orders of magnitude. However, the machine is also composed of different parts. What would it take to localize a fault and zoom in? A fault can be in hardware or software.
So suppose we have a hypothetical machine that does like an x-ray of the machine and tell exactly if there is a fault somewhere in it. Suppose it does like a CT scan of the machine. Then we’d have solved part of the detecting error problem: that of detecting a hardware issue. Now software is not visible so obviously you can’t do any xray. Software is usually a bunch of files with text in it that is compiled and run. Now if the system that is looking for errors(which is itself software) itself has an error in one of its files, how do you locate it?
That seems to translate into checking the operating system for errors. But this is a long shot, and the errors are hard to detect. How can we build a system that would be designed such that it made the errors visible: either software or hardware?
Analogy with humans: we can detect what is wrong with us. Hardware errors in humans are easy to detect because we have a basic system of a brain and senses that still work. But if one of the senses, say sight was lost, you wouldnt be able to detect if your skin looked red. But you could ask someone else to look at it? Or you could try touching and seeing if it felt different. So in humans certain issues can be detected by looking at the characteristics of the problem. Like looking at all possible manifestations of the system, or asking someone else for help.
What if humans have a software problem? Are they able to fix it?
A lot of time we actually do, by looking out for information about such a problem. So if something was worrying us, we could talk to other humans about it, or read up on it on the Internet. So we are using information that others have represented in a way that we communicate and understand. Then we compare the problem with ours, and if it matches, we try to see if the solution is appropriate for us etc.
Now one solution is what computers in space do. In space, there is no expert sitting all the time(though astronauts do make an occasional visit to repair stuff) to fix the system when it goes wrong. So one procedure thats followed is that the system is started and checked one step at a time. This is communicated to earth(or Houston, if you prefer) over a very low bandwidth line(10 bps). So the system goes on checking one thing at a time till it locates the error. After that I guess engineers from earth probably send patches to fix the error, given that the computer is still capable of fixing it.
There is also the concept of fault tolerant computers. That literally interpreted would mean, a computer that runs despite faults: redundancy of a sort? Perhaps fault tolerance is cheaper than designing a system that tracks down the tiniest meaniest error in the system. But this still doesn’t get to the problem of making errors in a system visible so that they can be fixed easily. But then one would argue: ahven’t you seen the system error messages that pop up when there is a problem? Doesn’t that tell you whats wrong? Doesn’t that solve the problem? See, people have already thought of this problem and they have designed computers that spews out error messages when there is some problem.
A computer could be made specialised to do one particular job, and then made really reliable. But then it would turn into an appliance, not remain a computer. The power of a computer is in its being a generic machine that allows new things to work on it. So is it going to be impossible to make the system better and more fixable by average users because its a generic platform? And sorry, but generic platforms are never going to be child’s play to maintain?
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.
