What are common mistakes in FMEA?

nathaliecolpa ·
Worn engineering notebook open to a half-completed FMEA risk matrix beside a magnetic drive component showing wear marks, red pen left mid-entry.
Failure mode and effects analysis is only as effective as the process behind it. The most common FMEA mistakes include incomplete team composition, poor risk scoring, and failure to update the analysis after changes occur. These errors allow real failure modes to slip through undetected, turning a powerful reliability tool into a checkbox exercise. This article unpacks the most frequent pitfalls and how to avoid them.

Why do FMEA analyses fail to prevent real failures?

FMEA analyses fail to prevent real failures primarily because they are treated as documentation exercises rather than living engineering tools. When teams complete an FMEA once and file it away, the analysis quickly loses touch with the actual operating conditions of the equipment it was designed to protect. The result is a document that looks thorough on paper but offers little protection in practice.

The gap between FMEA theory and plant reality is almost always a process problem, not a methodology problem. FMEA as a framework is well-established and proven. What goes wrong is how organizations apply it. Common contributing factors include insufficient input from people who work directly with the equipment, risk scores that are assigned without real data, and corrective actions that are recorded but never implemented or tracked.

In asset-intensive industries where rotating equipment such as pumps, fans, compressors, and conveyors operate continuously, an outdated or poorly executed FMEA creates a false sense of security. Maintenance teams believe risks have been assessed when, in reality, the most damaging failure modes may have been overlooked entirely. Understanding where the process breaks down is the first step toward making FMEA genuinely effective.

What are the most common mistakes made during FMEA?

The most common FMEA mistakes are a scope that is too narrow, risk scoring based on opinion rather than evidence, missing failure modes caused by incomplete team knowledge, and corrective actions that are never closed out. Each of these errors independently weakens the analysis, and in combination they can render it almost useless.

The following mistakes appear consistently across industrial FMEA processes:

  • Starting too late in the design or modification process, when changes are expensive and options are limited
  • Defining failure modes at the symptom level rather than the root cause level, which leads to superficial mitigations
  • Omitting failure modes that are considered unlikely, removing low-probability but high-consequence events from the analysis entirely
  • Assigning severity, occurrence, and detection scores without supporting data, relying on gut feeling instead of maintenance records, historical failure data, or engineering analysis
  • Failing to distinguish between failure modes and failure effects, which creates circular logic in the risk assessment
  • Treating the FMEA as a one-time deliverable rather than a document that evolves with the equipment and its operating context
  • Recording recommended actions without assigning ownership or deadlines, so corrective measures remain theoretical

Each of these mistakes is avoidable with the right process discipline. The challenge is that FMEA requires sustained attention across multiple organizational functions, and shortcuts are easy to rationalize when teams are under time or resource pressure.

How does incorrect RPN scoring affect FMEA outcomes?

Incorrect Risk Priority Number scoring distorts the entire prioritization process, causing teams to over-invest in low-risk failure modes while genuinely dangerous ones go unaddressed. Since RPN is calculated by multiplying severity, occurrence, and detectability scores, errors in any one of these three inputs compound into a misleading final number.

The most damaging scoring error is underestimating severity. Teams sometimes assign low severity scores to failure modes that have a limited immediate impact on the component being assessed, without accounting for downstream consequences across the system. A bearing failure in a pump, for example, may score low if assessed in isolation, but could trigger an unplanned shutdown of an entire production line when the full system context is considered.

Occurrence scores are frequently optimistic because they rely on perceived frequency rather than actual failure history. Without access to maintenance records, mean time between failure data, or reliability databases, engineers default to conservative estimates that understate real-world failure rates. Detection scores carry a similar problem: teams often assume that existing monitoring or inspection routines will catch a failure mode in time, without verifying that those routines are actually effective for the specific failure mechanism in question.

A further structural weakness of pure RPN-based prioritization is that it can rank a moderate-severity, moderate-occurrence failure mode above a catastrophic but rare one. Many organizations now supplement RPN with criticality matrices or separate severity thresholds that flag any high-severity failure mode for action regardless of its overall RPN score. This approach better reflects the real risk profile of industrial equipment.

What happens when FMEA teams are not cross-functional?

When FMEA teams lack cross-functional representation, the analysis reflects only the perspective of whoever is in the room. Critical failure modes get missed because the people who observe them daily are not part of the conversation. An engineering-only team will miss operational failure modes; a maintenance-only team may miss design-level vulnerabilities.

Effective FMEA requires input from at least four distinct knowledge domains:

  1. Design and process engineering, who understand system intent, load conditions, and design margins
  2. Maintenance and reliability teams, who have direct experience with how equipment actually fails in service
  3. Operations personnel, who understand how the equipment is used, misused, and stressed during normal production
  4. Safety and compliance functions, who can identify regulatory requirements and consequence categories that engineering teams may not prioritize

Without operators in the room, FMEA teams routinely miss failure modes caused by process variability, operator interaction, or abnormal but recurring operating conditions. Without maintenance input, they miss failure modes that are well-known on the floor but never formally documented. The result is an analysis that is technically coherent but operationally incomplete.

Cross-functional participation also improves the quality of detection and occurrence scoring. Maintenance engineers know which inspection methods actually catch failures early; operators know which alarms are routinely ignored. This ground-level knowledge transforms abstract scores into realistic assessments.

When should an FMEA be updated or reviewed?

An FMEA should be reviewed whenever there is a significant change to the equipment, its operating conditions, or the consequences of failure. Treating an FMEA as a static document is one of the most common reasons it fails to prevent real failures. The analysis must remain synchronized with the system it represents.

Specific triggers that should initiate an FMEA review include:

  • Equipment modifications or upgrades, including changes to components, control systems, or process parameters
  • Actual failures or near-misses that were not anticipated by the original analysis, which indicate gaps in the failure mode coverage
  • Changes in operating profile, such as increased throughput, new process fluids, extended operating hours, or changes in ambient conditions
  • New regulatory or safety requirements that alter consequence categories or introduce new risk thresholds
  • Planned maintenance interventions, particularly for high-criticality assets where each intervention introduces its own risk of human error or incorrect reassembly
  • Scheduled periodic review, typically on an annual or biennial cycle for critical rotating equipment, regardless of whether changes have occurred

In practice, many organizations set a fixed review interval and then trigger additional reviews on an event-driven basis. The key discipline is ensuring that the FMEA is formally revisited after any unplanned failure, rather than simply resolving the immediate issue and moving on without updating the risk register.

How can FMEA findings be turned into actionable improvements?

FMEA findings become actionable when every recommended action has a named owner, a defined deadline, and a verification step that confirms the action was completed and effective. Without these three elements, even well-identified failure modes remain unmitigated because no one is accountable for closing the loop.

The transition from FMEA output to operational improvement requires a structured follow-through process. After the analysis is complete, recommended actions should be categorized by type: design changes, maintenance procedure updates, inspection frequency adjustments, spare parts stocking decisions, or monitoring and alarm configuration changes. Each category requires a different implementation path and a different owner.

Prioritization matters here. Not every finding can be addressed simultaneously, and teams that try to act on everything at once often accomplish nothing. Using the severity score as a hard filter, addressing all high-severity failure modes first regardless of RPN, ensures that the most consequential risks receive attention before resources are diluted across lower-priority items.

Closing actions also requires verification. A maintenance procedure update that exists in a document management system but has not been communicated to the maintenance team, or a new inspection step that was added to a checklist but not to the actual maintenance schedule, provides no real protection. Effective FMEA programs track action status through to confirmed implementation and measure whether the change had the intended effect on failure rates or detection capability.

How Zytec's Non-Contact Drives address the root causes FMEA identifies

FMEA consistently surfaces the same categories of failure risk for rotating equipment: misalignment-induced bearing and seal degradation, wear in mechanical couplings, vibration transmission, and the human error exposure that comes with frequent maintenance interventions. Zytec's non-contact magnetic drives are engineered to eliminate these failure mechanisms at the source rather than manage them through procedure.

Key ways Zytec technology resolves failure modes that FMEA teams regularly flag:

  • No mechanical contact between motor and driven equipment, eliminating wear-related failure modes in the coupling entirely and removing the need for lubrication or periodic element replacement
  • Tolerance for up to 12 mm of dynamic radial misalignment without transmitting forces into bearings or seals, directly addressing the misalignment failure modes that industry experience links to a significant share of rotating machinery breakdowns
  • Vibration reduction of up to 80%, extending the mean time between failures for downstream components including bearings, seals, gearboxes, and structural connections
  • Overload protection by design, preventing torque spikes from propagating into the drivetrain during fault conditions, a failure mode that conventional couplings cannot absorb
  • Reduction in maintenance intervention frequency, which structurally lowers the number of human error exposure events over the asset's lifecycle, an outcome directly aligned with ISO 45001 risk reduction principles
  • A maintenance-free operational lifespan exceeding 20 years for the coupling itself, which removes an entire category of scheduled maintenance tasks from the FMEA risk register

For industrial organizations that invest in thorough FMEA processes, Zytec's drives offer a way to close the gap between identified risk and implemented mitigation. If your FMEA keeps surfacing the same rotating equipment failure modes year after year, contact Zytec to explore how non-contact magnetic drive technology can eliminate those failure modes by design.

Frequently Asked Questions

How do we know if our current FMEA is actually reliable enough to trust?

Start by auditing your existing FMEA against three criteria: whether it was built with cross-functional input, whether RPN scores are supported by real maintenance data or failure history, and whether recommended actions have been formally closed out with verified implementation. If any of these are missing, the analysis should be treated as incomplete rather than valid. A practical first step is to pull your last three unplanned failures and check whether those failure modes appear in the current FMEA — if they don't, that's a clear signal the document needs revision.

What is a realistic team size and composition for an effective FMEA session?

An effective FMEA team typically consists of five to eight people, small enough to stay focused but large enough to cover the necessary knowledge domains. At minimum, you need representation from design or process engineering, maintenance, operations, and safety or compliance. For complex rotating equipment, adding a reliability engineer or OEM technical specialist is strongly recommended, particularly when scoring detection effectiveness for specific failure mechanisms. Larger groups tend to slow the process and dilute accountability, so if more stakeholders need input, structured pre-session interviews are more effective than expanding the core team.

Can FMEA be applied effectively to equipment that is already in service, or is it only useful during the design phase?

FMEA is fully applicable to in-service equipment and is often more valuable at that stage because real operational and failure history is available to inform scoring. The process is sometimes called a retrospective or operational FMEA, and it benefits directly from maintenance records, operator experience, and actual failure data that design-phase analyses cannot access. The main adjustment when working with existing assets is to prioritize failure modes based on current operating conditions rather than original design intent, particularly if the equipment has been modified, repurposed, or is running outside its original parameters.

What is the difference between FMEA and FMECA, and does it matter which one we use?

FMEA (Failure Mode and Effects Analysis) identifies failure modes and their effects on system function, while FMECA (Failure Mode, Effects, and Criticality Analysis) adds a formal criticality ranking step that categorizes failure modes by their combined probability and consequence severity. For most industrial rotating equipment applications, FMECA provides a more actionable output because it explicitly separates high-consequence failure modes from lower-risk ones, addressing one of the key weaknesses of pure RPN-based prioritization. If your current process relies solely on RPN and you find that catastrophic but rare failure modes are consistently ranked below moderate recurring ones, transitioning to an FMECA approach or adding a criticality matrix is worth the additional effort.

How should we handle failure modes where detection is genuinely difficult or impossible with current technology?

When a failure mode cannot be reliably detected before it causes a functional failure, the correct response is to address it through prevention rather than detection — either by eliminating the failure mechanism through design changes, reducing occurrence through improved operating conditions, or accepting a planned replacement interval based on known wear rates. Assigning a low detection score and moving on without a mitigation response is one of the most dangerous outcomes of a poorly managed FMEA. If detection is genuinely not feasible, that finding should escalate directly to an engineering review, not remain as an open item in the risk register.

How do we prioritize FMEA actions when budget and maintenance resources are constrained?

When resources are limited, severity should be used as the primary filter rather than overall RPN, ensuring that any failure mode with a high severity score receives attention first regardless of how it ranks on occurrence or detection. Within the high-severity tier, focus next on failure modes with poor detection scores, since undetectable catastrophic failures represent the greatest unmanaged risk. For lower-severity items, batching similar actions together — such as updating multiple maintenance procedures in a single revision cycle or adjusting several inspection intervals at once — reduces the implementation burden without leaving critical risks unaddressed.

What software or tools are commonly used to manage FMEA documentation and action tracking?

FMEA can be managed in dedicated reliability software platforms such as Relyence, Isograph, or PTC Windchill FMEA, which offer structured templates, RPN calculation, and action tracking in a single environment. For organizations without specialized tools, a well-structured spreadsheet combined with a formal action register in a CMMS (Computerized Maintenance Management System) is a practical and widely used alternative. The most important feature to prioritize in any tool is action ownership and status tracking — the ability to assign a named responsible party, a deadline, and a completion verification step — since this is where most FMEA processes break down regardless of how thorough the initial analysis was.