Computers, People and the Real World
- Extra Reading
Almost nobody wants an IT system. What they want is a better way of doing something, whether that is buying and selling shares on the Stock Exchange, flying an airliner or running a hospital. So the system they want will usually involve changes to the way people work, and interactions with physical objects and the environment. Drawing on examples including the new programme for IT in the NHS, this lecture explores what can go wrong when business change is mistakenly viewed as an IT project.
05 April 2016
Computers, People and the Real World
Professor Martyn Thomas
The Inquiry into the failure of the London Ambulance Service in 1992
The London Ambulance Service (LAS) was founded in 1930 - previously the service was run by the Metropolitan Asylums Board. In 1965, when the Greater London Council was established, the LAS was also enlarged to take in part or all of eight other services. As with other ambulance services, responsibility was transferred to the NHS in 1974. It is broadly divided into an Accident and Emergency Service (A&E) and a non-emergency Patient Transport Service (PTS).
LAS covers a geographical area of around 620 square miles, from Heathrow in the west to Upminster in the east, and from Enfield in the north to Purley in the south. It serves more than seven million people who live and work in the London area. In 1990 it was (and possibly still is) the largest ambulance service in the world, carrying over 5,000 patients every day and receiving between 2,000 and 2,500 calls daily including between 1,300 and 1,600 999 calls.
Computer Aided Dispatch
A computer aided dispatch (CAD) system provides one or more of the primary command and control functions of:
a) call taking, accepting and verifying incident details including location;
b) resource identification, determining which ambulance to send;
c) resource mobilisation, communicating details of the incident to the appropriate ambulance to be despatched;
d) ambulance resource management, primarily the positioning of suitably equipped and staffed vehicles to minimise response times.
In addition a CAD system will provide management information to assess performance and help in medium and long term resource management and planning.
Depending on the functions to be performed a CAD system consists of a combination of:
a) CAD software;
b) CAD hardware;
c) gazetteer and mapping software;
d) communication interface (RIFS);
e) radio system;
f) mobile data terminals (MDTs);
g) automatic vehicle location system (AVLS).
LAS attempted to introduce computer aided despatch once before the project that led to the failure in 1992. The first project started in the early 1980s initially to supply a system without mobile data but including a new radio infrastructure. The design specification was changed in 1989 to include mobile data. The project was abandoned in the autumn of 1990 after load test performance criteria could not be met. The second CAD project started soon afterwards. Key dates for the second project were:
a) writing of system requirement specification. Autumn 1990 to February 1991. CAD systems used by other services wee investigated but none of them had all the features that LAS required;
b) invitation to tender advertisement placed in the Journal of the European Communities, 7 February 1991;
c) Systems Design Specification, June and July 1991;
d) contract with Systems Options Ltd (SO) signed August 1991;
e) contract to supply mobile data equipment with Solo Electronic Systems Ltd (SOLO) signed September 1991;
f) original implementation planned for 8 January 1992.
35 Companies originally expressed an interest and 17 suppliers subsequently provided full proposals for all or part of the system. Out of all of the proposals there was only one which met the total LAS requirement at an acceptable price. On the basis of the proposals as submitted, the optimum solution appeared to be the one proposed by the consortium consisting of Apricot, Systems Options (SO), and Datatrak.
The Inquiry report states that:
The proposal from Apricot is very much a hardware led proposal. Compared with most of the other bids there is little detail on the application software proposed. This reflects very much the lack of enthusiasm at the time of SO to invest a lot of time in preparing a proposal in which they felt they had little chance of success. However, their proposal does state that they can meet the total requirement within the timescales proposed. Their proposal also superficially suggests that they have experience of designing systems for emergency services. This is a true statement, but their expertise hitherto had actually been in administrative systems for such organisations rather than mission critical systems such as Command and Control. It should also be noted that the SO quotation for the CAD development was only £35,000 - a clear indication that they had almost certainly underestimated the complexity of the requirement (although it is recognised that as is common in the industry SO would also be making a small margin on the contract price for the hardware). It is worth noting also that, at a meeting between LAS and SO prior to contract award, it is minuted that SO were told that one of the reasons for abandonment of the earlier [CAD] system was the alleged inability of the software house to understand fully the complexity of the requirement.
A review of the tenders received and of the evaluation process indicates that Apricot/Systems Options/Datatrak was not the only permutation of bidders that had expressed an ability to meet both the requirement and the timescale. Marconi Command and Control, Technical Software Designers, Surf Technology and Solo Electronic Systems Ltd (SOLO) amongst the CAD bidders, working with a variety of partners, were also able to meet the need. However, the main distinguishing feature between these and the Apricot consortium was price. … The Apricot bid at £937,463 was some £700,000 cheaper than the next nearest bid. Amongst the papers relating to the selection process there is no evidence of key questions being asked about why the Apricot bid, particularly the software cost, was substantially lower than other bidders. Neither is there evidence of serious investigation, other than the usual references, of SO (or any other of the potential suppliers') software development experience and abilities.
The timetable was impossibly tight and, unsurprisingly, the project slipped. Up to early December 1991 it was still hoped that the original deadline of 8 January 1992 for full implementation would be met. However, by mid December it was clear that this could not be achieved. At that point the CAD software was incomplete and largely untested, the RIFS hardware and software was not fully delivered and tested, the installation of the Datatrak equipment was incomplete and its reporting accuracy still under question .
In January 1992 a first attempt was made at functional and load testing. However, as the software was incomplete at the time and not all elements of the system were available, these tests were inconclusive. Over the following months various systems elements (e.g. CAD software and Datatrak performance) were tested, but never as a fully integrated whole .
The Inquiry Report summarises the activities of the following nine months
Following the failure to meet the original 8 January 1992 deadline for implementation of the full system, a decision was taken by the project group to implement a partial solution during January 1992 whereby the call taking routines would be implemented and the incident reports printed out for manual allocation and voice despatch. The gazetteer was also to be brought into service thus enabling control assistants to identify more easily the locations of incidents. Accordingly printers were installed to enable this to happen. This partial implementation was broadly successful although problems were experienced of screens locking up, occasional server failure, and on one day the failure to despatch a resource because a printer was turned off thus losing the incident in the printer memory buffer. Occasional problems such as these continued to occur thus undermining staff confidence in the system. Many of these problems stemmed from the fact that printers being used in this way were never part of the original specification, but were added in haste as a short term expedient to show some positive progress at an already published implementation date.
Over the following months, whilst work on all aspects of the system continued, various elements of the system were trialled. Following the partial success of implementing the computerised call taking and the gazetteer the next stage was to report the Datatrak locations to supplement the information available to allocators. This was done in conjunction with status reporting via the SOLO terminals by ambulance crews. This combination was trialled initially in the North East Division where the crews were deemed to be the most supportive.
Over the months of this partial trial many problems were identified including:
a) frequent incomplete status reporting by ambulance crews caused by inadequate training, communication failures and alleged wilful misuse;
b) inaccurate Datatrak location fixes caused by faulty equipment (poor installation or alleged sabotage), transmission blackspots, or software error. In this latter case an example of problems was the failure to identify every 53rd vehicle in the fleet. This was caused by an error in a formula provided by Datatrak to SO and not cleared until October 1992;
c) the inability of the system to cope easily with certain established working practices (e.g. the taking of a vehicle different to the one allocated by the system) - this causes exception messages to be raised for manual exception rectification - in itself a somewhat laborious process;
d) overload of the communications channels, particularly at crew shift change, resulting in late update of status information or failure of other messages to be received;
e) continued occasional problems with Central Ambulance Control hardware, particularly the freezing of workstations and the perceived system slowness;
f) bugs in the software causing it sometimes to fail to identify the nearest available vehicle;
g) continued difficulties with mobile data through failures to transmit or receive signals and, occasionally, through MDT lock up.
These and other difficulties resulted in the perceived and sometimes actual misallocation of vehicles as the system would always propose the nearest available resource with the correct status as known to the system. Many of the problems referred to above would contribute to an inappropriate vehicle being sent to an incident.
Under the rules of the system the software was able to recommend, and the call taker to accept and subsequently mobilise, a resource proposal if the resource was within 11 minutes of the incident. Only if this requirement could not be met would human judgement he required.
Over the first nine months of 1992 the system was implemented piecemeal across the different LAS Divisions in the following phases (although there were some variations within each phase):
Phase I: Using the call taking software and the gazetteer to help with the recording and location fixing of incidents. Printers then used to pass information to allocators who, using their traditional activation boxes, would identify the optimum resource and the crews or stations would be mobilised by radio or by telephone.
Phase 2: Call takers would take details using the computer system. The incidents would then be passed to the allocators' terminals. The system would be used to track vehicle locations using Datatrak and the MDTs would be used to notify status. Crew activation would be done by messages to the MDTs. This phase was implemented by varying degrees across different Divisions and shifts. In all cases within this phase the human allocators would determine the optimum resource using the system information being passed to them and their traditional activation box
Phase 3: Full implementation whereby call takers would allocate using automated resource proposals if a resource would arrive within 11 minutes of activation. Otherwise the allocators would identify and allocate the most suitable resource. Phase 3 was designed to operate without paper backup.
Phases 1 and 2 would always operate on a Divisional basis (North West, North East, and South) whereas phase 3 would eventually operate pan-London. This is in effect what happened for the first time on 26 October 1992
During these months the system was never stable. Changes and enhancements were being made continually to the CAD software. The Datatrak system was being similarly amended and enhanced.
The MDTs and the RIFS system were also undergoing continuous changes. Thus there was never a time when the project team could stand back and commission a full systems test. Ideally a phased implementation should have been planned for in the first place rather than added out of desperation. A properly phased and controlled implementation, under strong project management, would not have allowed the next phase to be implemented until there was total confidence in the integrity and acceptance of the current phase.
However, although a phased implementation was considered by LAS management at the commencement of the project, a positive decision was taken to go for full implementation in one phase. This was seen to be the only way in which the planned improvements in resource activation performance could be achieved.
In the months before 26 and 27 October the system was used in a semi manual fashion. Calls were taken via the system and paper copies printed as back up to screen based information. An allocator was assigned to each of the three Divisions and worked with a radio operator and despatcher. By this method of working, together with the paper back up, staff were able to update manually vehicle status and override suggested resource allocations where necessary to overcome problems .
The Decision to Go Live on 26 October 1992
What is clear from the Inquiry Team's investigations is that neither the Computer Aided Despatch (CAD) system itself, nor its users, were ready for full implementation on 26 October 1992. The CAD software was not complete, not properly tuned, and not fully tested. The resilience of the hardware under a full load had not been tested. The fall back option to the second file server had certainly not been tested. There were outstanding problems with data transmission to and from the mobile data terminals. There was some scepticism over the accuracy record of the Automatic Vehicle Location System (AVLS). Staff, both within Central Ambulance Control (CAC) and ambulance crews, had no confidence in the system and were not all fully trained. The physical changes to the layout of the control room on 26 October 1992 meant that CAC staff were working in unfamiliar positions, without paper backup, and were less able to work with colleagues with whom they had jointly solved problems before. There had been no attempt to foresee the effect of inaccurate or incomplete data available to the system (late status reporting/vehicle locations etc.). These imperfections led to an increase in the number of exception messages that would have to be dealt with and which in turn would lead to more call backs and enquiries. In particular the decision on that day to use only the computer generated resource allocations (which were proven to be less than 100% reliable) was a high risk move .
The system was fully implemented at 07.00 on 26 October 1992 and initially the system was lightly loaded which meant that staff could correct errors of position or status but, as the demand increased, the amount of incorrect location and status information in the system increased with the result that the system made incorrect allocations, sending multiple vehicles to the same incident, or not sending the closest vehicle; the number of available vehicles decreased and the exception messaged exceeded the rate at which staff could handle them before they scrolled off the top of screens and were lost.
The overall result was that delays and frustration increased, leading to more repeated calls and further delays. There were claims that 20 to 30 patients had died as a result of delays of up to ten hours in ambulances arriving, but this was not confirmed by subsequent inquests.
The following day, LAS reverted to the semi-manual system. This worked with reasonable success from the afternoon of 27 October 1992 up to the early hours of 4 November. However, shortly after 2am on 4 November 1992 the system slowed significantly and, shortly after this, locked up altogether. Attempts were made to re-boot workstations in the manner that CAC staff had previously been instructed to do in these circumstances. This re-booting failed to overcome the problem with the result that calls in the system could not be printed out and mobilisations via CAD from incident summaries could not take place. CAC management and staff, having assured themselves that all calls had been accounted for by listening to the voice tapes, and having taken advice from senior management, reverted fully to a manual, paper-based system with voice or telephone mobilisation . Later investigation showed that a programming error had led to server memory being allocated and not released, so that the system finally ran out of available memory. The back-up server could not be used because it had not been designed to work as part of the semi-manual system. A few days later the LAS Chief Executive resigned and a Public Inquiry was set up, which reported in February 1993.
Lessons from LASCAD for other systems
The CAD system was overambitious and the required timetable was impossible. Despite this, several bidders had stated their willingness to contract to meet the requirements and the timetable. It is often the case that suppliers assume that they will be able to negotiate their way out of difficulties, usually as the result of changes that the customer finds that they need to make to the requirements, or because something that the customer has contracted to do has not been delivered. It is rarely possible to transfer to a supplier all of the business risks that an organisation faces if the system fails.
The bidder who least understands the complexity of a requirement is likely to put in the lowest bid.
The LASCAD architecture was too prone to cascade failures when data was wrong because the people involved in the system would not be able to deal with the growing number of problems. Many other systems have failed because defects in the software led to more demands on the human operators than they can handle. This was one of the issues identified in the nuclear accident at Three-Mile Island , for example.
It is a mistake to use a computer system to impose new work processes on under-trained or reluctant staff. Front-line staff are often best placed to judge what is practical and, as the Inquiry Report itself says: "In any system implementation the people factor is as important, and arguably more important, than the technical infrastructure" .
The cost and time required to redesign, to trial and to agree new changes to processes, to train the affected staff, and to provide additional staff whilst consultation, training and familiarisation are being undertaken, often exceeds the cost and time required to acquire and implement a new computer system to support the new work processes — but these staff costs are often underestimated or overlooked completely when preparing the business case and budget for a proposed change. Experienced managers increasingly recognise that there is no such thing as an "IT Project". Such projects are always business changes enabled by IT and they should be planned, costed and justified as business change projects.
The true requirements that lead to a need for a new computer system are almost always requirements in a much larger system. This makes it imperative to decide (both as the customer and as the supplier) what part of the larger system you are going to redesign and implement.
For example, the requirements for a control and safety system may be that trains can be scheduled efficiently whilst ensuring that they do not collide, or that goods are delivered efficiently and quickly from factory to customers. The former may require a signalling, train detection and points interlocking system, and the latter may require a computer system to manage workflow through a new warehouse, but these are not the places to start.
To take the warehouse example: an analysis that overlooks the role of the new warehouse in the overall distribution system for goods may fail to recognise that by redesigning the role of other depots and the routes by which goods are moved around, it may be possible to eliminate the need for this new warehouse entirely and to save both time and money.
In many organisations, front-line staff have optimised the way they work to achieve efficiencies or to reduce the risk of errors and the detailed reasons for their current work processes may not be written down or well-understood by other staff (and managers in particular). For example, when National Air Traffic Services were first considering how they could eliminate the need to print paper flight progress strips and distribute them to air traffic controllers, they discovered that the ways in which paper strips were positioned in their holders were used in unexpected ways to communicate between members of the controller team – for example to signal that an aircraft needed priority attention or that a military fast jet would soon be transiting the sector. Furthermore, these signals were part of the work culture of individual shifts and differed between shifts.
Another example of trying to impose an impractical way of working arose during the UK National Programme for IT in the NHS (NPfIT and Connecting for Health), when the introduction of smart cards to provide role-based access to patient records caused a dispute with front-line doctors and the British Medical Association. To provide role-based access to confidential information, doctors were instructed to log out when another doctor needed to look at the patient records. For obvious reasons of privacy, logging out cleared the screen and the other doctor logging in again led to a home screen from which it was necessary to navigate to the patient's records, a process that took many seconds (or longer if the system was heavily loaded). The doctors knew that this was impractical when a team were treating a patient (especially in an emergency) and shared their logins but they were threatened with disciplinary action.
NPfIT was seen by the Department of Health as a way in which they could ensure that best practice procedures were adopted by all NHS hospitals. It rapidly became apparent that there were good reasons why local practices differed; for example, in one case it was said that the process of discharging a patient had as the penultimate obligatory step that the medicines that the patient should take home were obtained from the pharmacy and that the computer system did not support this step being done earlier in the discharge process, but some local hospitals do not have a pharmacy that is open at weekends.
NPfIT set unrealistic timescales, just as LASCAD had done. The Connecting for Health team set out to let the contracts for NPfIT in record time and to implement the necessary IT systems in very short timescales; the briefing paper prepared for Prime Minister Tony Blair ahead of a key internal seminar that he chaired to discuss the programme and to agree the investment said that most of the functionality would have been implemented and in use within 3 years. These short timescales led to the contracts being let before any detailed requirements had been developed by Connecting for Health; several major companies refused to bid because of the risks involved and of those that bid and won contracts, some left the programme after a few years and there have been major litigations. The full, centralised Electronic Patient Records that were envisaged have not yet been implemented.
There is a growing recognition in academia that requirements are statements about the real world, not about the technology; that computer systems are almost always there to support the actions of humans, not the reverse; that the humans whose behaviour can affect the achievement of the requirements are part of the system, and their behaviour is affected by their training, culture, laws, physical environment and ethics. Systems where human behaviour is key to success are referred to as sociotechnical systems , and many aspects of sociotechnical systems were explored in a six-year Interdisciplinary Research Collaboration in Dependability, DIRC funded by the UK Engineering and Physical Sciences Research Council EPSRC . DIRC brought together computer scientists with psychologists, ethnographers, sociologists and economists to explore what affected the dependability of sociotechnical systems, with a focus on structure, timeliness, diversity, responsibility and risk – these themes led to many peer reviewed publications and several books .
Where computer systems interact with humans they affect behaviour, often in unexpected ways. A DIRC study of an advisory system to help radiographers to interpret mammograms turned out to improve the performance of the least expert radiographers but to degrade the performance of the most expert. When an expert radiographer is unsure whether an image on a mammogram indicates a tumour they would often choose to recall the patient for further examination; if the advisory computer system also did not highlight the image as a potential tumour the expert radiographer was influenced not to recall the patient .
Most computer systems in organisations are sociotechnical. The software supports work processes, workflows and interactions between human participants and these processes and interactions are often implicit. One consequence is that an organisation that buys commercial off-the shelf (COTS) packaged software to support their accounting or the scheduling of their factory or the work of their hospital is also committing (consciously or not) to changing their work practices to conform to those that their chosen package supports.
In several litigations where I have been retained as an expert witness, the customer organisation has agreed that they would adopt the work processes supported by the package they have chosen, but without understanding in detail what this would imply and without reviewing the implications with their front-line staff. Sometimes the supplier of the package has led them to believe that they could modify the package to fit their unique needs; this has rarely proved a good option because any modifications have to be re-implemented in every new release of the package (which will be expensive and may be impossible). The package supplier may say that they will consider a customer's changes for incorporation into the base product but it is a mistake to rely on this because different customers' modifications conflict and the suppliers' business depends on maintaining a core product that is well-designed and reliable, which implies that they need to minimise complexity and the number of different versions that they have to support, for the reasons we have discussed in my earlier cyberliving lectures.
This means that buying packages from overseas for use in regulated or otherwise inflexible organisations is a high risk because something that may work successfully in one environment may be inconvenient, inadequate or even illegal in a different country, culture or organisational structure. An Australian electricity supplier discovered this when the US package they bought to support their customer accounts proved not to support the legal requirements to treat many classes of vulnerable customers differently from others if their accounts fell into arrears. NPfIT encountered similar problems when suppliers attempted to introduce American healthcare computer systems (which had a great focus on billing) into the UK national healthcare system where treatment was free to everyone at the point of use. UK universities have had similar problems when the systems used in US universities proved to be incompatible with UK structures, especially with the college-based processes in Oxford and Cambridge.
No-one just wants an IT system; what they want is a way of achieving something in the real world, at some distance from the technology. It is important to decide what this is and to ensure that a computer system is the best way to achieve it. There are excellent and rigorous ways to describe, explore, design and analyse business processes .
Most IT systems interact with people, but this is often the wrong way to look at the requirements. It reality it will usually be found that the people and the computer system are part of a larger system and that the technology is there to support what the people are expected to do. The people should be seen as part of a larger system, not on the outside of a system boundary that has been tightly drawn around the new technology. This way, their expertise, culture and training needs are less likely to be overlooked.
The people who best understand what is involved in delivering a successful outcome in the real world are the front-line staff who have been doing it. It is necessary to involve these staff in designing the improved system, especially if the same staff will be using any new technology and following any redesigned work processes. Without their input it is likely that practicalities will not get sufficient attention and that front-line staff will not feel ownership of the new system and enthusiastic about adopting the new ways of working.
© Martyn Thomas CBE FREng, 2016
Gresham College has offered an outstanding education to the public free of charge for over 400 years. Today, Gresham plays an important role in fostering a love of learning and a greater understanding of ourselves and the world around us. Your donation will help to widen our reach and to broaden our audience, allowing more people to benefit from a high-quality education from some of the brightest minds.