Contingency planning for computer disaster recovery is an important aspect of computer security. There are any number of events that might cause a 'data fall-out' in your company, but how might end users nullify the worst effects of such a scenario? Stewart McCall – winner of the prestigious ASIS Chapter 208 Ted Legg Memorial Essay Competition for 2002 – provides some solutions.
Organisations of all sizes are becoming ever more dependent on information technology (IT). Any loss of – or a reduction in – IT facilities, the associated access to data and indeed the data itself can adversely affect the operational effectiveness of given companies.

In general terms, IT security seeks to protect the data held on electronic systems by addressing three main areas: confidentiality (the restriction of information to only those authorised to see it), integrity (the preservation of information in its original form unless correctly amended or deleted by authorised personnel) and availability (where access to information, as and when required, is handled by IT systems).

Contingency planning (otherwise known as business continuity planning) is designed to facilitate integrity and availability following on from any disruption to an organisation's IT systems. Essentially, it's designed to minimise the effects of a disaster, and allow for a timely resumption of activities. The importance of contingency planning is acknowledged in its consideration as one of the five main security controls recommended in BS 7799-1:2000 'Information Technology: Code of Practice for Information Security Management' as common best practice for IT security.

Although an extreme case, the following quote (from The Financial Times) illustrates the benefits realised by contingency planning and effective management of the disaster recovery phase after the September 2001 terrorist attacks on the World Trade Centre...

"They were left without office space, computers and the basic management infrastructures they had taken for granted. Yet within two to three days, many of the former tenants of what had become a smoking mass of twisted steel and debris were back at work. Merrill Lynch relocated nearly 9,000 staff in midtown offices. Citigroup moved 16,000 staff to other locations. This astounding feat of relocation and business-almost-as-usual was a testament to excellent contingency planning".

Indeed, organisations may be unduly influenced by the events of September 11. Although the attacks on America were both dramatic and destructive, other less newsworthy events need to be considered during any form of risk analysis.

For example, a recent survey conducted by an IT continuity firm in the City of London concluded that companies were 40 times more likely to suffer disasters and disruptions to their business as a result of the efforts of their plumbers rather than any terrorist factions! Nonetheless, the importance of contingency planning for any form of disaster – let alone an IT-related event – cannot be overstated.

The impact on any organisation of not preparing to handle a crisis can be severe. It has been assessed that half of all businesses experiencing a disaster, and which have no effective plans for recovery, fall flat within the following 12 months. 47% of all businesses experiencing a fire or major theft go out of circulation within two years, while 44% of companies that lose their records in a disaster never resume trading.

A staggering 93% of companies that experience a significant data loss are out of business within five years.

Accidental, natural or deliberate?
In attempting to better understand and manage the wide range of threats engendered by a disaster, it's wise for security managers to break those threats down into three groups. Disasters that are accidental, natural and deliberate.

Accidental disasters or events are unintentional, normally happen without apparent cause and can occur unexpectedly. Human error is a fact of life, and has to be considered as much of a threat as carelessness, poor training, ignorance, stupidity, excessive enthusiasm or misunderstandings. All of these factors can result in errors that are difficult to direct, contain or remove.

Unexpected simple equipment malfunction may also be classed as 'accidental', and can often be more difficult to detect and repair than catastrophic failure – with far-reaching consequences. In addition, such faults may only come to light after they have begun to affect the data, by which time any sort of recovery may well be impossible.

Software malfunction is another consideration, mainly due to the near impossibility of completely testing each program. These programs can contain millions of lines of code, and will be tested many times by a manufacturer prior to release. However, regardless of the thoroughness of these tests, unwanted software problems can appear when least expected – and sometimes with disastrous results.

A more fundamental 'accidental' cause of a disaster relates to the power used to operate IT systems. Any loss of power or fluctuation in the power supply normally provided by a public utility can be due to a number of reasons beyond the control of the organisation on the receiving end (such as damage during repair or maintenance, or dramatic increases in demand causing the system to shut down).

Natural disasters are, of course, caused by environmental factors and events, including floods, earthquakes, storms, hurricanes and fires, etc. Generally speaking, IT equipment works best in a stable environment and, unless specially designed and constructed (as is often the case with military equipment) will not perform well in extremes of cold, heat, humidity, dust or dirt. Electrical storms can also result in total power losses, as well as power surges that may damage equipment.

Any form of deliberate threat will always involve a human element that can vary in terms of motives as well as the methods used in an attack. The financial sector is particularly keen to protect its IT systems from deliberate attack due to their attractiveness as a target for criminals. However, the amount of major criminal activity against organisations' IT systems is difficult to determine. Although a number of cases have come to light through the actions of the police, and subsequently in the courts, there may be far more unreported deliberate disasters than those reported.

Why? Possibly because it's perceived that an organisation's reputation would be damaged if it reveals that a major computer crime has taken place – it may hit investor confidence, leading to a slide in share values and profits.

That said, it's also the case that simple criminal activity involving the theft of expensive IT equipment (such as laptops from vehicles, or desktop computers from an office) may be every bit as damaging to an organisation. Although the theft may be costly and inconvenient to the individual or company concerned, the loss of vital or sensitive data stored on that equipment can have much wider ramifications. Additionally, a given company's Security Department would then have to consider whether the theft was a simple criminal act or something more sinister (commercial espionage, for example).

Contrary to populist opinion, espionage is not just conducted by foreign governments. In the industrial or commercial sector, where possession of information about a rival's performance, forecasts and plans is invaluable, any competitor must be regarded as hostile and, as such, capable of espionage.

Methods of espionage can vary, from the simple recruitment of personnel or equipment theft through to more sophisticated means such as technical eavesdropping, the modification of equipment (to record or re-route particular bits of information) or hacking into an organisation's IT systems.

Disenchanted employees who have access to the company's IT systems pose a very real and dangerous threat. An employee may channel his or her resentment of the organisation or management into the physical damage of equipment or the corruption of data. Such employees are also an ideal target for recruitment by a commercial competitor to act as an 'insider' and information saboteur.

The sabotage of vital data or equipment can take many forms, ranging from the destruction of vital equipment through arson to damage caused by explosives. Don't forget the loading of a computer virus, either! Sabotage is not only attributable to disenchanted employees, but can also be carried out by criminals attempting to cover their tracks, terrorists, foreign intelligence organisations, commercial rivals or subversive groups.

Identifying the threat to security
The long list of potential disasters detailed above will vary from organisation to organisation, and from region to region. Any form of contingency planning, then, must first start with a threat assessment designed to identify all potential crisis events for that particular site (or sites).

Due to budgetary constraints, no organisation can effectively attempt to counter all potential threats. Therefore, it's an important aspect of any risk analysis process that an order of probability and effect is attached to each potential occurrence. Otherwise, valuable resources could be wasted on events that may indeed be disastrous, but which are unlikely to ever occur.

In conjunction with threat identification, an analysis of critical IT processes needs to take place so as to determine the company's minimum operating requirements. Some organisations may be able to operate effectively for a couple of days without access to some IT applications. Others may be considered vital, and are therefore afforded the highest recovery priority.

Disenchanted employees who have access to the company’s IT systems pose a very real and dangerous threat. An employee may channel his or her resentment of the organisation or management into the physical damage of equipment or the corruption of data

Whatever the scenario, this evaluation has to be conducted for each installation within an organisation rather than for the organisation as a whole. The findings should be presented to senior management so that they can assess the potential negative business impact of key applications should there be an interruption.

Management must then determine the time without such facilities that they're willing to accept. Such direction from management is vital in ensuring that resources are firstly made available, and then correctly targeted.

Effective disaster management
On completion of the threat assessment, the contingency planning phase should begin to put in place a framework that will manage – in both an effective and efficient manner – any form of IT disaster.

In basic terms, contingency planning is the process whereby advance preparation aims to engender the successful handling of any crisis. More commonly known as crisis management, the disaster recovery process covers the actual handling of that disaster using the resources and procedures established during the contingency planning phase. A good contingency plan will provide the organisation with a dedicated Crisis Management Team. One that is correctly funded, and trained to deal with the needs of an organisation during any disaster. The Team ought to be staffed by personnel who are able to take executive decisions and provide technical expertise, and who in turn will be the hub of any disaster recovery process that may be enacted.

Potential risks and contingency plans must be subject to ongoing and regular reviews. Members of the Crisis Management Team should themselves be subject to continuous review, and trained to respond to any new risks and/or changes of personnel.

The importance of this particular point is highlighted by another quote from The Financial Times in an article that appeared just after the World Trade Centre atrocity...

"Some companies did tremendously well in getting their IT systems back to normal, and in organising their staff. However, for others the continuity plans they had in place fell apart, and key executives could not be located. Many benefited from the planning they had done for the Millennium Bug, but sometimes those plans were not communicated well enough to the right people. There was much confusion over who was supposed to be in charge."

Mitigating the overall effects
Depending on the results of any risk analysis concerning an organisation's reliance on IT, and on the identification of critical applications, certain measures can be taken to aid the recovery of an organisation after a disaster has occurred. There are several major areas that must be considered in terms of IT security to mitigate the worst effects of a disaster, namely data back-up, alternative operating sites and power.

A back-up is a copy of all or part of a file to assist in the reconstruction of lost data. This can be carried out in a number of ways. First, a complete back-up can copy everything on the IT system. Although ideal for regenerating a system after a disaster, complete back-ups can be time consuming (depending on the amount of information to be stored and the technical facilities used in conducting the back-up).

Second, a quicker back-up can be achieved by only copying those files that have been changed or created since the last back-up. This is known as a selective back-up. Third, a number of back-ups may be retained. These are referred to as revolving back-ups (each time a back-up is carried out, the oldest is replaced).

Last but not least, in the evolving world of e-commerce, weekly or daily back-ups may not be adequate. In these cases, real time back-ups (called disk or data monitoring) may be the solution. Although more expensive than other forms of back-up, the cost can be justified in the case of some critical applications.

For example, it may be more appropriate for certain applications or systems in a bank or for an airline than a manufacturing organisation. A major disadvantage of this system is that it can provide little protection against viruses, sabotage, human error and other forms of online attack.

When dealing with back-ups, a further consideration is the location of that back-up once copying has taken place. It is standard practice to store back-ups a safe distance away from the original as it is useless if destroyed at the same time as the original. It's difficult to specify the exact distance at which a back-up should be stored. For example, in London it might be acceptable to store a back-up three to four miles away in order to counter the threat of a major fire, while a prominent bank in Seattle transports its back-ups no less than 280 miles to counter the possible threat of a catastrophic earthquake hitting the Pacific north west coast of America.

Depending on the time it takes for them to come into operation, alternative sites may be described under three categories: cold, warm and hot. A cold site is one that has been identified as suitable for hosting IT operations, but which is not equipped for immediate use.

It may be a suite of offices, a hotel conference room or some other large area that boasts the necessary facilities (power and communications access, etc) to support the organisation's IT operations.

Although relatively inexpensive in comparison with warm and hot sites, the major disadvantage of any cold site is that it will take time for equipment to be purchased or recovered from the primary site and set up to conduct full-time operations. This time period can vary from one week to a month depending on the complexity of the organisation, and the type of disaster.

Warm and hot sites are relatively similar in design and function. A hot site is a completely equipped, fully-functioning operations centre able to provide near-immediate resumption of critical IT operations. A warm site is partially equipped, and is thus only able to provide a resumption of IT systems over a couple of days.

A given organisation may either build its own warm or hot site, or subscribe to a service provider. The ideal solution is to have a dedicated recovery site, although prohibitive costs mean that many choose to take up the service of a specialist provider who is ready to facilitate the use of a site during an emergency, and is aware of the company's needs.

One disadvantage of this approach is that a number of other organisations may have signed up to use the facilities as well, with the result that the entire space may not be usable during a major disaster.

Power to the IT systems
The basic requirement of any IT system is power, without which operations will cease immediately. In the case of critical systems, any period of power loss may be considered unacceptable. Additionally, other power problems such as surges and variations in voltage can lead to major problems with electrical equipment.

Power losses and/or variations in voltage may be countered by using an uninterruptible power supply (or UPS).

A typical UPS stores energy during normal operations in order that emergency power can be provided during any mains power loss. There are three main types of UPS technology: standby, line interactive and online.

The most comprehensive form of power protection comes from an online UPS, which should be used for all of your company's most critical IT systems. In addition to providing emergency back-up power, this also dispenses with power variations by electronically synthesising them – resulting in a continuous stream of clean, regulated power to your precious IT equipment.

In the case of any prolonged power failure, security managers will need to consider the use of back-up generators. These can be hired or owned by the organisation ready for immediate use. Ideally, a standby generator should be connected to the mains supply and designed to start once a power failure has been detected.

The ASIS Chapter 208 Ted Legg Memorial Essay Competition 2002

The aim of the ASIS chapter 208 Ted Legg essay competition is to encourage and engender professionalism and excellence in the security industry’s ‘stars of tomorrow’, and to foster the participation of students in the ASIS International organisation. The UK’s own Chapter 208 is the largest of its kind outside of the main Stateside organisation. The competition is open to any student registered with a university undertaking a recognised security risk management module or course at Masters level. Submitted essays must be 5,000 words long, with the assessors looking for (among other qualities): originality of ideas, an intelligent arrangement of facts and a clear argument about the chosen security topic. Some convincing conclusions are also highly important. Stewart McCall received a cheque for £500 from SMT Editor Brian Sims as the 2002 winner, while runner-up Simon Houghton (of the Security Policy Division at the Home Office) picked up a £200 prize. Simon’s essay – which examines the relationship of corporate security professionals with members of the police service, and vice versa – will be reproduced in the March edition of Security Management Today.