When Systems Break Down

Dec 15, 2009

What happens when one or more of the complex systems that keep our civilization running break down? As unpleasant as it is to contemplate, this question dominates the thinking of Dr. Wolfgang Kröger, ­Professor and Director of the Laboratory for Safety Analysis of ETH Zurich (Swiss Federal Institute of Technology). Kröger recently discussed the discipline of disaster analysis to a group of journalists focusing on Swiss emergency ­preparedness technology.

According to Kröger, “we seek to help avoid systems collapse through the application of high quality technical and organizational tools; the goal being to prevent minor incidents from expanding into catastrophic failures.”

Modern Systems
Modern industrial societies are ruled today by highly complex, interlocking technologies which, although man made, may be poorly understood in the aggregate. In order to preserve the integrity of such systems, it is necessary to map their interactions, which become increasingly fragile as their complexity expands. The mitigation of disaster consequences must be based on the deep understanding of systems' behavior as a whole and a willingness to become aware and to be prepared. This in turn requires a definition of targets and acceptability criteria.

Failure to follow such a mapping program can exact a high price. Kröger cited the catastrophic failure of the wheel assembly at 250 km/hour on the ICE high speed German train on the 3rd of June 1998. This incident resulted in 101 fatalities, huge financial losses and the cancellation of train construction contracts throughout the world.

The effects of this incident reverberated throughout German society. To prevent a reoccurrence of such events, it is necessary to design a risk and vulnerability analysis, including maintenance issues and to have available a scientific support structure capable of strategic decision making.  

Information systems exacerbate the risks
The study of complex systems has grown into a major discipline in the last decade, and it encompasses both man-made and natural systems. Indeed, systems biology seeks to understand such complex pheno­mena as cancer, thought processes and aging, using the same tools as those applied to complex human constructs, including the power grid, transportation and communications networks.

Kröger emphasizes that, today, a broadened set of hazards and threats is arising throughout the world, exacerbated by a pervasive use of modern information and communication technology, a growing number of malicious attacks, as well as changing weather patterns that accompany global warming.

All complex systems display features which put them at risk for catastrophic and unpredictable failure; these include inadequate information regarding their constitutive elements, non-linearities and feedback loops, each of which tends to create ugly surprises. Nowhere are these features more in evidence than in the control of the electric power grid. Kröger describes the UCTE, the Union for the Coordination of Transmission of Electricity, which controls the trans-boundary flow of electrical energy, serving 450 million people within the European Union and continental Europe.

In recent years, there have been major failures due to different causes which resulted in a domino effect and a collapse of a large part of the power grid. One of the most dramatic was an outage in 2003 that shut down most of Italy’s electrical power.

Domino effects occur within electrical grids when they become overloaded, with resultant heating and sagging of the power lines, in turn causing flashovers (too short distances to obstacles like trees) and line cut-offs.  Mitigation of these events include technical devices, improved topology and, most importantly, adequate load shedding.  Much of Kröger’s work involves modeling of these events to better understand them.

More than the Sum of its Parts
Kröger and his associates have learned some important lessons from their analyses. Causes of systems malfunctions include operation of the structure beyond its original design parameters, unexpected response of critical equipment and protective devices, lack of emergency preparedness, poor cross border coordination, and failure to adequately implement security criteria. In order to understand these breakdowns and redesign the operational regimes responsible for them, the Kröger team has built an object-oriented modeling approach describing the behavior of the components and their interactions.  This includes stochastic or “Monte Carlo” simulation in order to investigate the macro-behavior of the system, i.e. no longer as the sum of the micro behavior of its parts. This modeling allows the emergence of scenarios and states that are not predicted or pre-defined. A robust model will then allow the prediction of frequencies and moreover the consequences of these events.

The application of these tools has yielded significant benefits. A study on vulnerability of the drinking water system is now an integral part of the review of legal recommendations for the protection of the water supply system in Switzerland against sabotage. The study focuses on the potential risk of cyber attacks and poisoning of the water after the point of last control.

Critical Infrastructure Warnings
“The work on other infrastructure components including the electrical grid, the Internet and rail lines, as well as interdependencies and couplings between them will provide input to the ongoing programs on critical infrastructure protection and hazard analysis,” Kröger states. These include two studies, “Risk Switzerland” and “Stressing issues of complexity and importance of SCADA-risks,” both undertakings of the Swiss Federal Office of Civil Protection.

One of the most important conclusions that Kröger draws from his investigations is that “until current research efforts to develop a much more secure internet are successful, the Internet should NOT be used for any function which is vital to the supervision, operation or control of any ­critical infrastructure.”

Taking into account the constant reports of security breaches in both the public and private sectors, this may be the most important and difficult recommendation for large organizations to adopt. Kröger counsels that to avoid such breaches, governments should rely on dedicated (their own fiber cables, non-commercial or diverse hard-and software) systems for data and command transfer rather than using commercial open access ones.

Kröger drew a sobering conclusion, a sort of Murphy’s Law, that given the complexity of the systems that run our society, “Even in the best of circumstances, preventive measures may fail, and severe problems occur.”  

K. John Morrow Jr, is a scientific writer and technical consultant and sits on the editorial board of several biotechnology journals.
© FrontLine Security 2009