centrifuge RCFA

Case study: Root Cause Failure Analysis of a Centrifuge Failure

On November 19, 2013, a Centrifuge experienced a catastrophic failure, throwing the pillow block bearing and the planetary gear box from the unit and causing secondary damage to the structure and Rotary Drum Thickener.

The 750-lb. gearbox on the unit came free and flew 30 feet in the air and damaged the 460v power rails for the bridge crane. The gear landed and spun like a top on the concrete and itched a design.  The bowl almost came out of the housing and was kept in place by only four bolts that were still intact on the cover.

Fortunately, there were no injuries. But, due to the severity of the failure, the leadership of the wastewater organization initiated a Root Cause Failure Analysis (RCFA).

Companies have been using RCFA to determine what caused equipment or processes to deviate from the norm and not so much as a continuous improvement process. Aladon promotes the RCFA methodology as a continuous improvement process and when used together with other reliability processes such as RCM and RBI, they provide a holistic approach to failure management. RCM3™ is applied proactively (before the failure occurs) and provides still to our knowledge the best proactive failure management strategy for all types of assets in all types of environments (Operating Contexts). RCFA is applied reactively (after the failure has happened) and when integrated with RCM and RBI, allows reliability engineers to maximize equipment availability and improve asset performance. Using RCM3 and RCFA together would lead to improved resource planning, forecasting as well as implementing effective corrective measures. Our RCFA process is fully integrated with our risk based approaches; RCD, RCM3 and RBI3™.

Centrifuges have a specific shutdown process that allows for the bowl to spin down:

  1. When a unit needs to be shut down, the operator pushes the STOP button on the Distributed Control System (DCS) screen.
  2. The motor driving the bowl is de-energized and the bowl spins down to zero RPM in approximately three minutes.
  3. When the motor starter contactor is de-energized, the contactor auxiliary switch sends a signal to the Programmable Logic Controller (PLC) to start the lubrication pump timer.
  4. The lubrication pump system will continue to deliver 1 to 1.5 GPM of oil to the two pillow blocks on the centrifuge for 30 minutes.
  5. Once the PLC 30-minute timer is finished, the lubrication pump is shut down.
typical centrifuge shutdown
EXHIBIT 1 Normal shutdown – Stop is initiated, lubrication timer counts down to 30 minutes, the bowl RPM drops and stops before the lubrication pump stops

In this case on a November morning in 2013, an operator pressed the shutdown button on the DCS and the centrifuge bowl graphic on the screen went from “red” to “green,” informing the operator that the centrifuge was stopping. However, the drive motor magnetic starter did not disengage, and the centrifuge bowl kept spinning at 2890 RPM. The PLC initiated the lubrication pump timer, and the lubrication pump followed the logic and shut down after 30 minutes. However, the centrifuge continued to spin at 2890 RPM until the back drive side pillow block bearing failed, causing a chain of events that destroyed the drive pulley, feed tube bracket, drive belt tension rod, drive pulley belt cover, bowl assembly cover, planetary gearbox cover, planetary gearbox, back drive shaft guard, and bent the frame on the centrifuge. The damage was estimated at $250,000.

failed centrifuge shutdown
EXHIBIT 2 Failed shutdown on 11/19/2013 – The stop was initiated, but the bowl did not slow down.

RCFA Process

In order to determine the causes of the failure, our team employed a six-step RCFA process to conduct the investigation. The process followed to analyze this failure of a centrifuge is similar to any equipment RCFA.

Step 1: Identify a failure that would benefit from the RCFA process, which in this case was obviously the Centrifuge.

Step 2: Gather initial information from those employees involved with the event. The RCFA facilitator completed the RCFA First Response Form here. The form is intended to be completed immediately in order to capture all relevant facts and circumstances surrounding the RCFA event under review.

Step 3: Make the RCFA First Response Form a formal document. The document was placed on a server so that all involved in the RCFA had access and could comment as needed during the process. Note, the RCFA facilitator owned the document and had to review and approve any comments/edits/changes/modifications.

Step 4: The most rigorous step in the process involved a detailed investigation of the data reported on the RCFA First Response Form. For this RCFA event, the following data was pulled into a central location and reviewed for relevance and connection to the cause of failure.

  1. 1. Work order history that included both preventive and corrective work
  2. 2. DCS history available in the historian
  3. 3. Centrifuge ladder logic and programming
  4. 4. Centrifuge control drawings
  5. 5. Operators’ logbook
  6. 6. Photographs

Step 5: Key stakeholders and staff involved with the event typically hold a formal meeting. In this RCFA case, the formal meeting was not conducted due to time constraints. However, the staff that witnessed the event and had intimate knowledge of the centrifuge and its controls were interviewed in-depth individually. The individual interviews prolonged the process and it is recommended for any future RCFA events, that the RCFA facilitator carries out formal meetings to ensure that all data needed is available in a timely manner. In some instances, the facilitated meeting(s) may reveal that more data may be needed and/or becomes available.

Ultimately, the root cause of failure should be identified during this meeting as well as short and long-term fixes. An activity log documenting what actions will be taken, when these actions will be taken, and by whom. This document, along with the RCFA First Response Form and supporting data should be kept on a server and owned by the facilitator.

Step 6: Reporting and documenting the findings so that future failures can be avoided.

The validation and implementation of the recommendations must be approved by the asset owner prior to implementation.

RCFA Centrifuge Results

After reviewing the centrifuge compiled documentation, the cause of the failure was determined to be a malfunction of the motor magnetic starter for the 450 HP centrifuge drive motor. The signal was sent from the local HMI screen to the PLC to initiate the shutdown. The DCS indicated that Centrifuge was shutting down since the bowl screen on the DCS changed states from “red” running to “green” shutdown. In order for this signal to be sent, the magnetic motor starter auxiliary switch had to change states from closed to open. The switch, located on the magnetic motor starter, must have indicated some movement in the starter. The contactor bar had to disengage at least 2-mm. in order for the switch to change states. However, the contractor did not disengage enough to stop the current flow to the bowl drive motor. Since the switch did change states, the PLC initiated the lubrication pump 30-minute count down while the bowl maintained 2890 RPM for approximately 54 minutes, the last 24 minutes with no lubrication applied. At this time, the back drive pillow block bearing failed catastrophically. This had a ripple effect on the other components on the centrifuge and caused catastrophic damage.

Evidence to Support Magnetic Motor Contact Failure

After the failure, mechanics witnessed and photographed the main and delta magnetic motor starters in the pulled in state. The photo blow shows the contactors pulled in. The two starters on the left (main starter on the far left, delta starter in the middle, and “Y” starter on the right) are still pulled in.

Main and Delta Magnetic Motor Starters after Failure
EXHIBIT 3 Main and Delta Magnetic Motor Starters after Failure

Later the next day, the delta magnetic motor starter was found in the open position. No one reported physically separating or opening the contactor. Sometime later, the main magnetic motor starter was found in the open position.

On December 12, the RCFA facilitator and a mechanic took apart the magnetic motor starters to look for a cause of the failure.  The contacts on all starters looked to be in good condition.  Note that maintenance had cleaned the contactors on November 1.  This work appears in both the maintenance records and the logbook.

Contacts from Main Starter

Delta Starter Contact

Contacts from Delta Starter

Contacts for Delta Starter

EXHIBIT 4

Condition of the starters

Failure Modes for Magnetic Motor Starter

There are several failure modes that are typical of magnetic motor starters and their explanation are shown below:

  1. 1. Contacts welding together – This occurs overtime during “chattering” or sever current in rush. This was not the case in centrifuge 6 as the contacts are not welded and show typical wear.
  2. 2. Auxiliary switch failure – The switch can fail for any reason. Typically, it would be terminal connections failing or the mechanics within the switch. The switch on the main magnetic starter was found in working condition after the failure. The switch was moved several times and the PLC detected change of state in each test.
  3. 3. Spring failed in the mechanism – The springs can lose tension over time. However, the springs were checked and still had more than enough force to open contactor
  4. 4. Mechanism dirty or “gummed up” – The mechanism moved freely when examined.
  5. 5. Mechanical interlock failed – The mechanical interlock worked as design when tested.
  6. 6. Coil thermal expansion – In some cases, the coil keeping the contactor engaged will experience excessive heat and expand. The expansion will then “grab” the iron bar that is used to pull in the contactor to apply power or disengage power. When the coil “grabs” the bar, the bar cannot slide through the magnetic coil to the open position or closed position. This is the suspected failure. However, there is no evidence of this failure on the coils. The evidence that supports this conclusion is the fact that the starters opened by themselves after the failure, in other words, when they cooled down.
EXHIBIT 5
Main Magnetic Contactor Coil – No damage or evidence supporting thermal expansion

Next Steps

The short-term fix to prevent this failure from occurring again was a modification to the Centrifuge Standard Operating Procedure (SOP). Upon shutdown, the operator verified bowl RPM reduction before leaving the screen. This visual observation ensured the contactors had indeed opened up and power had been removed from the drive motor.

The recommended permanent fix was to have the lubrication pump stop only after the bowl had a speed indication of zero RPM. There is an existing proximity switch that records bowl speed and was seen by the PLC.  An alternative was to use the existing current transformer in the control panel to initiate the lubrication countdown timer. This would ensure that there is no power applied to the drive motor.

Observations

It appears that PM task labor was sometimes not assigned to individual units, but rather all the units together. Labor, parts, etc., should be assigned at the asset level.

It was reported verbally, without logbook entries, that occasionally the centrifuges would trip out. When this trip occurred, the lubrication system stopped and the bowl would wind down without lubrication. It was further reported that the lubrication system could not be restarted manually at the local panel or DCS. The operator needed to start the lubrication system in these events.

There was only one vibration sensor, acceleration, on the centrifuge. It was recommended that, at a minimum, manual velocity (in/sec) vibration readings should be taken and trended. If the trend was moving up and a remedy could be found, full-spectrum vibration should be performed by a qualified vender. Ideally, both pillow block bearings would have vibration sensors attached and real-time data captured. An alarm level could be established as well as a shutdown set point.

Thermography of the centrifuge control panels should be considered. While this would not have prevented the November 19 failure, it will potentially save future failures with starters and systems within the control panel.

This case study was prepared by Jeffrey Sanford, Aladon senior consultant of reliability engineering and condition assessment, who has worked in maintenance management and reliability for 30 years with CH2MHILL/Jacobs. Jeffrey has been a CMRP for 14 years and an Aladon Network RCM2™ practitioner for nine years.

Jeffrey spent a decade working in the water and wastewater industry as a regional maintenance manager and worked as condition assessment global lead for eight years. He is owner of CH2MHILL ACES and has conducted over 100 condition assessments in 11 countries, including UAE, Australia, New Zealand, US, Canada, Bahrain, UK, Jordan, Kazakhstan and Czech Republic. He has performed RCM analyses and condition assessments for the following industries:

  • Aircraft
  • Automotive
  • Oil and Gas
  • Water and Wastewater
  • Power Generation
Tags: