As Hurricane Katrina took aim at the Gulf Coast in late August, officials at ConocoPhillips kept a close eye on their refinery located southeast of New Orleans. At the height of the storm, a decision was made to shut down the refinery's operations and move into disaster recovery mode.
"We monitored the weather, and at a certain point we had to take the facility down," says Bruce Colgate, ConocoPhillips' manager of automation for natural gas refining. In the face of gale force winds and flooding capable of collapsing entire cities, there's not much a company can do. "You keep a skeleton crew in there ... hunker down and hope for the best."
With 14 refineries, some situated directly in the path of passing hurricanes, ConocoPhillips adheres to a detailed disaster recovery plan. But it's tough to prepare for an event such as Katrina. In fact, there are few ways to fully prepare manufacturing operations for such unpredictable events as natural disasters, human errors or acts of terrorism.
But manufacturers must prepare nonetheless. In today's global manufacturing market, with its 24/7 operations, instantaneous communications and increasing digitization, the risks of not being as fully prepared as possible are larger than ever. Hurricane Katrina has sharply underscored the notion that preparedness has to rise to a much higher and more urgent level than ever before.
The pressure to do so has been increasing from other directions as well. Regulatory bodies, such as the Occupational Safety and Health Administration (OSHA) and the Environmental Protection Agency (EPA), as well as new government mandates such as Sarbanes-Oxley and the Bio-Terrorism Act, require companies to have a detailed disaster recovery plan. Facing such pressures, manufacturers are realizing that planning for the worst has become a much bigger job.
Manufacturers in the refining, chemical and energy industries that are dealing with explosive materials, as well as those in food and beverage - all areas in which an incident could not only result in lost production, but also in loss of life - realize a recovery plan is not good enough. Aware that in most cases they are dealing with the possibility of human error, not natural disasters or terrorism, their focus is on preventing serious or catastrophic incidents.
According to the Abnormal Situation Management (ASM) Consortium, a process industry group organized by Honeywell, human error is a significant factor in almost all accidents. An ASM study revealed that 42% of abnormal upsets occurring in plants are due to mistakes made by people. Equipment failures account for 36% of problems, and 22% result from a disruption in a particular manufacturing process. Even in these incidents, people are usually at the heart of the problem, pushing equipment over its limits or not following correct procedures.
Human Error Devastates
Take the explosion last March at BP Products North America Inc.'s Texas City refinery, which killed 15 workers and injured another 170 people. A preliminary investigation determined the explosion was a result of operator errors in the production of gasoline. The accident, still under investigation, could have been avoided. But all BP can do now is issue an apology to those harmed - which the president of the company did indeed do - and implement changes in practices, policies and technologies per the recommendation of the team investigating the incident.
The rule most companies follow is protect lives first, environment second and then financial assets. An accident in a chemical or processing plant, for instance, can cost billions of dollars in property damage, clean-up costs, legal fees, fines and market share loss.
With that in mind, every company should understand the lessons of the BP accident and of Hurricane Katrina. Management must adopt a proactive posture toward disaster avoidance. But how does a manufacturer get started?
First, assign a risk management team to read the incident reports and assess your own company's readiness around production. Next, build a disaster avoidance strategy based on best practices and technology that includes supply chain contingency planning and the ability to quickly switch manufacturing processes to another location.
More importantly, understand that the majority of major accidents start with small anomalies that need only to be identified immediately and managed appropriately. Putting the right tools, techniques and training in place can help avoid catastrophes.
Plan for Disaster
Taking such a proactive stance can help mitigate the risk of a disaster, which no manufacturer can afford today in a market that has become both global and increasingly real time. That's why risk assessment and mitigation is working its way into the corporate plan. "From an IT perspective, we've been doing disaster recovery for years and years, but now we refer to it as business continuity," says Charlie Massaglio, CIO of Dawn Food Products Inc., Jackson, MI. "It's not a question of how to recover hardware, it's how it fits in the overall plan for business to continue. ... 9/11 changed people's awareness of what the risks are to business."
Any time the issue of IT contingency planning comes up, the obvious answer is third-party disaster recovery services, such as those provided by SunGard Availability Services, Comdisco Inc. and others. It's a smart way to keep enterprise data tucked safely away in a remote location. But protecting the processes of manufacturing is trickier. If a fire breaks out on a factory line, for instance, you need to:
1) Identify the event via alarms, and contain it.
2) Take the facility down to a safe state via Safety Instrumented Systems (SIS).
3) Keep the integrity of the data intact using power management and change management tools.
4) Shift production to other sites.
5) Bring the facility back up without introducing new hazards.
That last point is an important, yet often overlooked, step. In the wake of Hurricane Katrina, the U.S. Chemical Safety and Hazard Investigation Board (CSB) issued a safety bulletin urging chemical and oil facilities to take special precautions when restarting processes. "From our past investigations we know firsthand the dangers of catastrophic incidents during startup," says CSB Chairman Carolyn Merritt, in a statement. "We are urging facilities to follow established startup procedures and checklists prior to restarting."
That means doing a visual inspection of storage tanks for evidence of floating displacement or damage, examining insulation systems, sewers, drains and furnaces, and testing electric motors and warning systems. When refineries or any other types of plants are shut down and fired back up, that's when they are operating in a non-steady state and are the most dangerous, say industry experts. To offset this scenario, safety needs to be cleverly designed into the manufacturing process using a layered, integrated approach that couples control with safety systems.
The industry-accepted safety standard, the International Electrotechnical Commission's (IEC) 61508, covers the basic functional aspects of safety, while IEC 61511, a subset of 61508 meant for the process industry, says a user must identify the hazards, decide the level of risk they can tolerate and specify safety requirements.
But just adhering to a written standard does not necessarily make a facility safe. The devil is in the details. "61511 is a lifecycle standard," says Andrew Dennant, business development manager for DeltaV Safety Instrumented Systems (SIS) at Emerson Process Management. "How it is implemented and maintained is the responsibility of the company."
Taking Responsibility
Some automation vendors have adopted the approach, as evidenced by recent announcements, of combining the automation controller and the safety controller into one product. Emerson, on the other hand, sells a safety system that runs completely separate from the main control system. That's because, according to Emerson, the role of the safety system is to kick in only when the main controller is not working and systematically take the plant down to a safe state. As a result, a company doesn't want a failed controller to impact the SIS, which could happen if the two were closely coupled, says Emerson officials.
Emerson's DeltaV and SIS controllers are joined via a gateway. The safety system has its own communication network that transfers shutdown signals. The issue is, however, if the SIS is used only in the event of an emergency, there's no way to know if it is running properly. "Since a safety system normally just sits there, you have to test it to make sure it will work," says Dennant. "But the very testing of a safety system can cause a disaster if it's done improperly."
Emerson, however, has designed a way to automate testing of the safety system from remote locations - such as control rooms - instead of in the field where sensors and instrumentation reside and many of the anomalies occur. The company has built a digital communication system using the HART protocol to read back signals and tell an operator the status of a device. It can even send a signal to test equipment to see if a valve is shutting correctly, for instance. This is a way to ensure hardware failure does not occur, thus avoiding an unnecessary shutdown.
Sometimes Less Is More
The other critical element in a safety design is alarm management. In most process plants, some alarm systems are tied into the distributed control system (DCS). Companies need to be careful with the number of alarms, however. Sometimes less is more. According to the ASM study, the typical operator can handle two to three alarms at once, meaning the person understands what the alarm means and can take corrective action. "If an operator is getting 20 to 30 alarms a minute, they are not getting any information; it's overload," says ConocoPhillips' Colgate.
Indeed, according to a report issued after the UK Health and Safety Executive's (HSE) investigation of the 1994 explosion and fire at Texaco's Milford Haven, UK, refinery that injured 26 people, the key finding was that there were too many alarms that were poorly prioritized. "In the last 11 minutes before the explosion, the two operators had to recognize, acknowledge and act on 275 alarms," the report says. On top of that, the operators were inadequately trained to deal with a stressful plant upset.
Regardless of how much technology is built into an operation, humans are still the most critical element in the control loop. "Companies need to start reengineering alarm systems so that it is a guide for the operator," says Peter Jofriet, marketing manager for refining at Honeywell Process Solutions. "They can be taken too far and overdesigned. Then you get into a culture of operating by alarm. That's something we don't want. We want them to operate on intuition and experience and on a continuous basis, not waiting for something to happen."
EEMS Italia SpA, a semiconductor test and assembly manufacturer, understands the concept of not sitting around and waiting for something to happen. The company, headquartered in Cittaducale, Italy, recently opened two new plants in Singapore and China. Adding more locations forced the company to rethink its infrastructure, keeping data integrity in mind.
The company wanted to manage all of its sites and standardize applications from a central location. EEMS standardized on the PROMIS manufacturing execution system (MES) from Brooks Software and outsourced the server farm it runs on to a local provider in Italy. "We centralized it to simplify support," says Elio Mungo, EEMS' IT director of products and customer center director. "In case of a problem, we can easily check and switch the data."
Complete Process Transfers
In addition, the Brooks software is multi-site enabled, which allows for load balancing or, if needed, disaster recovery. It includes a dispatching module to send pieces of a process to another factory. "It will run through scenarios, find the quickest transport time to another facility and [determine] the availability of that production area," says Jeff Nestel-Patt, Brooks' director of marketing. "The multi-site functionality is transferring the process that was being executed in the factory, including all of the steps and the WIP history."
At EEMS, all of the intellectual property about the manufacturing systems is maintained offsite. If something were to happen to the factory in China, Singapore or Italy, it is easy to transfer the processes to a different plant to pick up production. "The production can start the next day because everything is connected," Mungo says.
MES vendors realize that intellectual property - in the form of recipes and manufacturing instructions - is critical to recovery. "If a company doesn't at the very least have provisions so that they can archive and move IP offsite - even if they could physically recreate a facility somewhere else or transition manufacturing somewhere else - without capturing intellectual property, it could take a long time before getting back up and running," says Glenn Schulz, director of the risk management business at Rockwell Automation. Rockwell offers change management and services to archive a company's mission-critical data.
The second aspect of a successful manufacturing transition is getting inventory from one location to another. Many manufacturers don't factor transportation and fleet management into the disaster recovery plan. Nor do they consider the roles of the suppliers providing their inventory.
At Dawn Foods, for example, the company is gearing up to be in full compliance with the Bio-Terrorism Act by next month. The act requires companies in the food industry to be able to track by lot number everything that comes in or goes out of a facility. "This is a mandate; not a suggestion, but a requirement," says Dawn Foods' Massoglia.
To enable the company to get up to speed, Dawn Foods enlisted the help of eSync, a supply chain company that helped the manufacturer come up with a customized warehouse management lot tracking system that includes process changes, software integrated with its Geac Computer Corp.'s System21, RF and bar code technology. "We have to be able to trace our lot codes on any component and the finished goods from one level up and one level down the supply chain. And we have to comply within a four-hour window," says Bryan Sayles, Dawn Foods' business systems manager. That means that, in order to be fully compliant with the government mandate, food manufacturers need to know where they buy product from, what carriers were used, where materials were stored and where finished goods were shipped.
eSync designed a supply chain using a methodology that helps customers understand their choices in the face of a disaster. But companies need to design emergency planning into the supply chain as well. JPMorgan Chase Vastera has come up with 13 supply tips to "better weather the storm." (See "Supply Chain: How to Build a Flexible Process," on this page, for more information.) One of the points is to develop a flexible supply chain, which entails making sure it has the capacity to keep up with demand or can be slowed down to avoid unnecessary inventory buildup.
The question is: How? The answer is: information, a transportation network of multiple carriers and supplier redundancy. "The more information you have about your supply and demand is a key component," says John Brockwell, global supply chain management practice lead at JPMorgan Chase Vastera. "Also, have a network in terms of the transportation providers you use so that you have the ability to shift goods to different areas." That could mean pulling a crate off of a cargo ship and sending it via airplane instead of by truck in order to meet demand in a certain area. Lastly, "have redundancy built into your supply chain so that you can shift ports if there is a disaster or a labor strike. Use different carriers," Brockwell says.
But even with all of the best preparation and planning, an event such as Hurricane Katrina can happen and bring a company to its knees. That's where companies such as Agility Recovery Solutions and American Power Conversion (APC) can step in and help.
APC has designed a $1.5 million mobile data center that comes with an uninterruptible power supply (UPS) and the physical infrastructure of power cooling racks, IT racks and a network operations center that seats two people comfortably. It is also equipped with satellite so that, in the event of a disaster, operators have access to The Weather Channel or CNN.
A company considering purchasing a mobile unit from APC would want to make sure they had servers backed up with up-to-date data. Another option is signing up with Agility Recovery Services, which generally requires a $200-per-month fee to provide critical onsite services in times of need. The company will find a portable office or other space for an organization to move to in the event a facility is wiped out. It will also set up the phone systems, power generators and computers, and work with third-party data recovery services to make sure customers have access to the information they need to keep business going.
Regaining Control
Agility can't duplicate production operations, but it would help recover the business side of a manufacturing operation, such as billing, planning or scheduling. And even though the company specializes in recovery after the fact, Agility has the expertise to help manufacturers get ready. "We help them think about security, the environment, who is on the calling tree list, how many people have to be recovered quickly, what manufacturing processes can't be stopped, and how to get raw materials to a secondary site," says Bill Boyd, chairman of Agility Recovery Services in Charlotte, N.C. "If you've thought them through, most of the answers are quite simple. But if you don't think them through, and there is a disaster, it is very difficult."
Agility was in the middle of Katrina, helping a number of customers regain control of their businesses. Even in recovery, however, disaster planning must be an ongoing process. "Going through a hurricane like Katrina is not standard," says ConocoPhillips' Colgate. "But we go through certain levels of alertness and procedures several times a year ... just to be ready."