Project Quality & Alarm Management

I once worked for a desalination company where the design philosophy was pivoted around a rule.

Do not build the perfect plant, otherwise you will lose the job of fixing it till the end of its life.

How to make the plant imperfect? Just forget about Alarm Management.

It comes naturally in cases when the instrumentation and control engineering is outsourced - the preferred choice in mega-projects.

It makes the design engineers think that the alarm is just a notification to the plant operator; it does not add any value to the company's expertise. Nothing's far from the truth because it is the operator who scores the plant quality. It happens when something goes awry.

Any plant has two fundamental states: normal and abnormal.

The latter is the outer frontier of what we know about desalination or any other process. Beyond is uncharted waters where incessant know-how creation is a condition to stay afloat. The former is addressed by well-scoped operation and maintenance procedures. The latter - by Alarm Management offering a sure way to design perfection and the customer's satisfaction. And it is a road to a superior business culture forged by required collaboration between process, instrumentation & control, quality assurance, and O&M engineers.

Today this collaboration may be inexpensively implemented with available digital technologies. Compared to not the distant past, this differentiator is a good reason to rehash everything we know the internet knows about Alarm Management.

Googling invariably leads us to the monograph written by Douglas Rothenberg - "Alarm Management for Process Control". Though my digital pragmatism does not welcome this verbose book (656 pages), I consider it the ultimate knowledge base sufficient for alarm management implementation.

This implementation has two distinct parts - Alarm Design and Alarm Rationalization.

The former is deemed to be done by the process engineer during the project engineering phase, the latter - in collaboration with the control engineer after the plant commissioning. Alarm Design lays out the foundation and framework for Alarm Rationalization driven by actual performance data. In turn, rationalization validates the design and builds or destroys customer satisfaction.

Should we redefine the Alarm Management priorities in the context of advancing Artificial Intelligence (AI) victory over preventive maintenance hassle and bustle?

Task number one is still the same - the plant's uninterrupted operation at the maximum production.

Alarm Management is a hidden treasure the AI proponents have not discovered yet because it is the best way of boosting plant production.

Most alarms are set on overloading - the quick and easy way to get more for less. The initial settings selected by the plant designer are all conservative.

Settings re-validation is the primary target of continuous Alarm Rationalization. A classic example is the dual media gravity filtration. Its quality depends upon the fluid velocity. Good engineering practice recommends keeping it below 8-10 m/hour without regard to the medium type. I know of a plant where it took more than a year to select the best medium and move the alarm setting to 14 m/hour.

Task number two is the logic counterpart of the first: when and how to interrupt the plant operation to minimize possible damages - financial, environmental, and safety ones.

As Alarm Rationalization deals with SCADA software re-working, the above two tasks' digitization ball shall be in the Alarm Design's court. This logical assertion may shake up to the ground the current foundation of the plant design and engineering. So buckle up!

I start with a brief overview of the Alarm Design framework implemented by crenger.com. It is minimum viable, its further simplification does not meet the digitization targets.

The plant design shall start from the plant subsystems' hierarchy. It includes 2 distinct levels - process modules and operating modules - serving different objectives.

The process module is the lowest level; its task is described by a single verb like to pump, to filter, to backwash, etc. To produce some product we link several modules together.

Process module does not equal P&ID; it may include only part of it, or be extended to adjacent ones. (P&ID is a legacy notion primarily tied to the A3 paper format.)

P&ID items grouping automatically solves the problem of Alarms Grouping - the starting point of Alarm Rationalization.

Inside the process module, the P&ID items shall be logically interlinked. For example, the level transmitter shall point to the vessel, the flowmeter - to the piping piece, etc. This obvious requirement helps solve the companion task of Alarm Grouping - Alarm Suppression. For example, overloading of the pump accompanied by vibration increase, may trigger the high-high load alarm and high-high vibration one. In this case, the latter shall be suppressed. It can be done automatically if both alarm sources point to the same pump. Alarm Suppression foundation is the cause-and-effect tree having process origin.

The operating module is a group of process modules; its task is to stop the propagation of the emergency shutdown (ESD) upstream. It may coincide with project areas (like intake, pretreatment, etc.) introduced to implement concurrent engineering - the workhorse of project management. In this case, the operating module is implicitly defined. ESD is a cornerstone of the alarm philosophy.

Plant, operating modules, and process modules have the same visible basic states: General Fault, Not Ready, Ready, Running, Not Healthy. The latter is an alias for abnormal operation.

The plant state is defined by the operating modules, while the operating module state is defined by the process modules. The process module state is a combination of the P&ID item states and actions. An example is the "open" state and "to open" action for a valve. It is a design mistake if the P&ID item does not belong to a process module.

The above-mentioned plant hierarchy solidifies and further extends the recommendations developed by EEMUA 191 (UK) and NA 102 (Germany) standards.

"Whereas EEMUA 191 suggests the opportunity to link plant states to alarm and process conditions, NA 102 lays out the expectation to affirmatively do so" (Douglas Rothenberg).

Unlike the operating module one, the process module "Not Healthy" state has priority levels. They are implicitly defined by the module service. Its types include main, auxiliary, continuous, and batch. Each type may be attributed with standby or not option.

Priority levels belong to Alarm Design, their combination defines how healthy the plant is. Service types are the input to Process Design and a first step in the equipment specification.

Alarm-process synergy reaches its peak in ESD design. I start its overview with clarification of blocking the ESD propagation upstream mentioned previously.

Unlike process module interactions described by the math graph, the operating modules are always connected in train. We start the plant by starting the first module, then the second one with all the successive modules being idle, etc. If the plant design meets the startup propagation downstream, it is automatically ready to block the ESD propagation upstream.

The ESD scenarios are different for modules upstream and downstream of the failed one. Downstream modules join the last and begin automated ESD. Upstream modules implement Safe Park by shedding the load to the minimum sustainable level.

The Palmachim plant (2007, Israel) is the first example of a Safe Park in desalination. During noon hours of high electricity tariffs, this plant stopped production for 5 out of 6 SWRO units, all equipment being kept in operation. The procedure was developed by the author of this paper.

This ESD strategy substantially decreases the damage by the plant operation interruption and the time needed to restart the plant.

Automated ESD is the SCADA programmed sequence of actions not requiring the operator's intervention. To explain its generic character, let's consider two pumps - the booster and the high-pressure one - connected in train. The process engineer knows that the booster shutdown will damage the main pump. Therefore the booster ESD must start the main pump ESD. This sequence equally works when the main pump fails. In our case, the automated ESD replaces 2 conventional ESDs. But it requires process expertise.

Moving to automated ESD requires understanding compound and recipe-driven alarms and interlocks. In crenger.com parlance, both are different flavors of event.

A compound event is a micro-sequence of simple actions void of responses found in the normal start/stop sequences. Recipe is a function calculating alarm settings out of several signals. An example is the maximum flowrate of the pump which depends upon the pump rotation speed. Recipes are indispensable in modern alarm design; they are all rooted in process cause-and-effect.

Having defined the alarm context, we can now move to the alarm-related information collection. It includes measured value validity range (number or recipe), scan rate, process safety time, deadband, and severity-emergency data. With crenger.com it is done concurrently with the process engineering.

The value validity range may include low-low/low or high/high-high values or both. If the process engineer disregards them, crenger.com sets them automatically.

Scan rate and process safety time are interrelated: both reflect the abnormal process dynamics. The scan rate shows how fast the measured value is updated by the data acquisition system. Process safety time (PST) sets the limits for the operator SUDA - See, Understand, Decide, Act (Douglas Rothenberg). The PST selection is the process engineer's responsibility, and it has serious consequences. If PST is not sufficient for decision-making, two options are available. The first is to change alarm settings. The second is to move to automatic response - a drastic change in the plant automation design. Automatic response opens the door to the remotely operated plants.

Deadband answers the question of how reliable the measured or calculated value is in the abnormal metastable operation. Here the process engineer does not make a final decision - she/he just sets intuitive values - a good beginning for alarm rationalization.

Severity-emergency data scope is described in the ISA-5.06.0-2007 "Functional requirements documentation.." standard. In its current version, crenger.com does not set these data automatically; it provides the user interface to accelerate the task.

The following four interrelated tasks of alarm design are by far the most time- and resource-consuming. But the beauty of these tasks is that they are generic; their results may be applied to any desalination plant. In other words, they are an entry into the Internet of Desalination Plants.

Abnormal life scenarios identification and recording
Alarm patterns identification
Alarm tree creation and "leaves" prioritization
An after-alarm sequence of the operator actions

Alarm pattern identification assesses the probability that alarm A and alarm B will start within a time less than SPT and be initiated (!) by the same root problem. If the answer is positive, they are siblings of the alarm tree node.

The alarm tree directly relates to the alarm categories. Category 1 alarm is activated without restriction. This alarm is the tree root. Category 2 alarm activation is permitted pending being inhibited. It is a tree leaf (child) controlled by the root (parent). Category 3 alarm activation is inhibited pending a permissive. It is a leaf from a priority-ordered sequence of leaves (siblings). Priority may be dynamically set by using the recipes interface.

Thr final note is about the operator's confirmatory actions. Their purpose is to validate the alarm. With smart alarm management outlined above this task shall be definitely executed by software.