When it comes to selecting an approach to maintaining equipment, many organizations choose between FMEA and RCM, failure modes and effects analysis, and reliability centered maintenance. While both FMEA and RCM are important processes to maintaining equipment, FMEA alone is not usually robust enough.
First used in the 1940s by the U.S. military, FMEA is a common process analysis tool. It’s a step-by-step approach for identifying possible failures in a design. These may include a manufacturing or assembly process or a product or service. Currently, there are at least five published standards in different industries relating to different types of FMEA.
The RCM methodology was developed in 1978 in the airline industry. A report by Nowlan and Heap provides the first full discussion of reliability centered maintenance as a logical discipline for the development of scheduled maintenance programs. The objective of such programs is to realize the inherent reliability capabilities of the equipment for which they are designed, and to do so at minimal cost. SAE JA1011: Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes was first published in 1999 setting out criteria that any process must satisfy to be called RCM.
A review of RCM2TM and FMEA by a corporate leader at a Fortune 500 manufacturer which applies FMEA at all their sites globally yielded this conclusion: “A typical FMEA will almost never eliminate a Maintenance Task. It will only add tasks.”
There are three significant differences between FMEA and RCM:
- RCM features more robust and structured decision-making logic for developing risk management strategies (controls) and FMEA or FMECA is NOT a Maintenance Strategy Development Tool
- FMEA focuses on component parts, while RCM focuses on preserving the primary and secondary functions of the system
- RCM systematically analyzes the functions and associated risks with the failure of protective devices
Let’s break down exactly how each approach works.
How FMEA and FMECA work as a maintenance approach in an organization
The FMEA process was first used in the 1940s by the U.S. military. The Failure Modes and Effects Analysis (FMEA) is a step-by-step approach for identifying possible failures in a design, a manufacturing or assembly process, or a product or service. It is a common process analysis tool.
- “Failure modes” means the ways, or modes, in which something might fail. Failures are any errors or defects, especially ones that affect the customer, and can be potential or actual.
- “Effects analysis” refers to studying the consequences of those failures.
The goal of FMEA is to take actions to eliminate or reduce failures, starting with the highest-priority ones.
To do this, failures are then prioritized in three ways:
- How serious the consequences are associated with the failure
- How frequently the failure occurs
- How easily the failure can be detected
The FMEA process identifies highest risk failure modes that need to be reduced with some form of control source / strategy; however, the robust and structured decision-making logic of RCM is absent from the FMEA process.
Users of FMEA also document current knowledge and actions about the risks of failures for use in continuous improvement. FMEA is used during design to identify failures. Later, it’s used for control before and during ongoing operation of the process. Ideally, FMEA begins during the earliest conceptual stages of design and continues throughout the life of the product or service.
FMEA emphasizes the methodical study of component failures
The FMEA technique involves the methodical study of component failures. Users identify failure modes for each component in a system and evaluate the effect of each component failure on the system. As in HAZOP Analysis, users identify existing safeguards and document recommendations for further improvements. Because the FMEA technique focuses on component failures problems related to what the material being handled or conveyed does to the component if not captured. (i.e., a pump that conveys chlorine should have a certain set of seals.) If one does not take the material being handled into account, the failure rates and effects will be missed. For this reason, FMEA is most appropriately used for
processes that do not involve substances that greatly affect or influence the component failure modes of the components being analyzed.
What is Failure Mode, Effects, and Criticality Analysis (FMECA)?
A Failure Mode, Effects, and Criticality Analysis is a process which evaluates, documents, and ranks the potential impact, or consequence, of each functional and hardware failure on the following:
- mission success,
- personnel safety,
- system performance,
- maintainability, and
- maintenance requirements.
The information provided by FMECA is one of the primary inputs for the RCM process.
The FMECA is an analytical tool used to determine functions, function failures, failure modes, and failure effects for a given item of hardware. The standard used to develop FMECA is included in the MIL-STD-1629A, Procedure for Performing a Failure Mode, Effects and Criticality Analysis. This analysis can determine function, functional failure, failure mode, failure effect, and consequences.
Applications of FMEA and FMECA may (or may not) begin with identifying functions. If functions are included, the functional definition is typically defined at the component level rather than at the functional system level. This can lead to missing or overlooking a substantial number of failure modes.
Understanding the RCM Methodology
The RCM methodology was developed in the airline industry as a logical discipline for the development of scheduled maintenance programs. The objective is to realize the inherent reliability capabilities of the equipment for which they are designed, and to do so at minimum cost. This process results in a structured decision-making guide to ensure that users implement technically feasible and worthwhile strategies to eliminate or reduce the risk associated with the failure to a tolerable level.
Each scheduled maintenance task in an RCM program is generated for an identifiable and explicit reason. The consequences of each failure mode possibility are evaluated, and the failure modes are then classified according to the severity of their consequences.
Then for all significant items – those whose failure involves operating safety, environmental impacts, or has major economic consequences – users evaluate proposed tasks according to specific criteria of applicability and effectiveness. The resulting scheduled maintenance program thus includes only the necessary tasks to protect safety, environment, and operating reliability.
Note: A structured RCM decision logic diagram is employed to determine (initially maintenance only) tasks based on criteria to determine the most applicable and effective tasks as a first option to deal with potential failures of the asset. If no scheduled maintenance task can be identified, the decision logic defaults to a possible redesign.
In developing the RCM2 methodology, John Moubray acknowledged that the reliability of a physical asset is not only dependent upon how well users perform maintenance, but that reliability is a function of how well equipment is:
• designed
• installed / commissioned
• operated
• maintained
John Moubray updated the definition of RCM in 1996 in simple terms as follows:
“Reliability Centered Maintenance: a process used to determine what must be done to ensure that any physical asset continues to do whatever it is that the users want it to in its operating context.”
The SAE JA1011 Standard (1999) defines RCM as follows:
“RCM is a specific process used to identify the policies which must be implemented to manage the failure modes which could cause the functional failure of any physical asset in a given operating context.”
From the definition of RCM, it is clear that the operating context must be clearly defined right from the start of the RCM process. To do this, the RCM Facilitator uses two worksheets, Information Worksheet (IW) and Decision Worksheet (DW). The RCM Information Worksheet (IW) is completed with the information contained in the Operating Context and aligns with the FMEA process.
Defining the RCM Decision Logic, Task Evaluation, and Task Selection
Once the RCM Facilitator identifies significant functions in the FMEA process, they use RCM Decision Logic to further analyze them. The facilitator follows the RCM Decision Logic, which is a structured process (series of questions) used to determine what type of action would be appropriate to either eliminate or reduce the consequences of functional failures. The facilitator records the additional analysis of the failure modes in the RCM Decision Worksheet (DW). Every function has one or more failure modes. Each of these failure modes must be processed through the Decision Logic to determine if a preventive maintenance task can be developed to mitigate the consequences of its occurrence.
The Decision Logic requires that the following elements be considered for each failure mode being analyzed:
- Evidence of a functional failure to the operating crews or whether failures are hidden (protective systems)
- Inherent risk (consequences) associated with failure (safety, environmental, operational, economical)
- Evidence of reduced resistance to failure (total failure vs. partial failures)
- Age-reliability characteristics of each item (failure characteristics)
- Trade-off decision based on a comparison of the cost or revised risk of performing a predictive or preventive maintenance task to the cost or inherent risk of not performing the task
The well-defined and structured RCM Decision Logic is the most significant difference between the FMEA and RCM processes. This has the potential of eliminating non-value-added tasks (maintenance tasks that are not technically feasible and / or worth doing) as well as providing the technical foundation for the maintenance program. Additionally, the operating context allows further optimization and technical basis for decision making.
RCM has a unique and well-defined approach for dealing with protective devices for managing the risk of multiple failures. FMEA deals primarily with single failure modes of the primary system. Therefore, the outcome of RCM is by far a more sensible and defensible risk strategy definition for systems as a whole.
While valuable, the FMEA process is not typically enough
The initial intent of the FMEA process was to evaluate the potential failure modes that would impact design integrity. The FMEA process has been adopted by many for application to in-service equipment and although valuable, the FMEA process focuses primarily on design integrity.
The RCM process was developed to specifically look at how maintenance should be performed to ensure equipment continues to do what the users intend it to do.
At Aladon, our experience shows that the RCM process provides a far more robust maintenance program (based on technical correctness), with fewer tasks than an FMEA analysis since RCM examines whether the proposed task is worth doing. Additionally, the RCM analysis is performed at system level so that it’s more inclusive of all equipment in the system. The FMEA process only focuses on the component level analysis, which means some equipment may be excluded from a thorough review.
FMEA or FMECA is NOT a Maintenance Strategy Development Tool. Once the failure modes and effects are identified, users are on their own to come up with an approach for how to manage the failures. RCM2 and RCM3 both include a rigorous formal process to develop a Failure Management Policy that is Technically Feasible and Worth Doing for each Failure Mode.
The RCM Process begins with defining the functions of the system in a specific Operating Context. The FMEA portion of the RCM process is driven by “What the User wants the Asset to do” (Functions) and “How it can fail to do it” (Failures) at the system level. If functions are included, the FMEA (or FMECA) process typically defines the functions at the component level. As such an FMEA can overlook or miss a substantial number of the failure modes that are reasonably likely to occur.
An FMEA can be applied at the design phase in the interest of meeting a “rush” to get a line up and running. This will identify many of the possible failure modes and the tasks to manage them as best one can; however, once the design is complete and the line is up and running, one is still left with the objective of developing a formal failure management strategy to evaluate the policy decisions (Proactive Tasks, Failure Findings, Redesigns, and Run to Failures). This is what the RCM process will achieve.
Using FMEA substantially reduces the effort one needs to apply the RCM process. The Failure Modes and Effects from the FMEA can be imported into the RCM2 review. Then, the user can identify the remaining failure modes that are reasonably likely to occur in the present operating context. At this point the RCM facilitator can determine the Consequence Evaluation (at the System Level) and establish the Failure Management Policies.