Fault Tree Analysis is the method of choice when you need to understand not just what caused a failure, but how combinations of events can conspire to produce a catastrophic outcome. Developed by Bell Labs in 1961 for the Minuteman missile program and later adopted by NASA, nuclear power, and the automotive safety industry, FTA uses Boolean logic to build a visual map from a top-level failure down to every root cause that could contribute to it. This guide explains the gate logic, walks through the 7-step process, shows three complete examples, and tells you exactly when to use FTA instead of FMEA or 5 Whys.
What Is Fault Tree Analysis?
A fault tree is a top-down, deductive diagram that starts with a single top event — an undesired failure, accident, or safety incident — and breaks it down into all possible causes using logic gates. The analysis answers: “What combinations of events, if they occurred simultaneously, would cause this top event?”
Three key characteristics define FTA:
- Top-down and deductive — you start with a known or hypothesized failure and work backwards to find causes. This is the opposite of FMEA, which starts from components and works upward.
- Boolean logic — causes are connected by AND gates and OR gates that define exactly how events combine.
- Combination failures — FTA excels at modelling scenarios where multiple safeguards must all fail simultaneously for the top event to occur. This is its primary advantage over simpler methods.
AND Gates and OR Gates: The Logic of FTA
Every node in a fault tree connects to its children through either an AND gate or an OR gate. Getting these right is the most important skill in FTA.
≝ OR Gate
The output event occurs if any one of the input events occurs. One failure is sufficient. Represents parallel failure paths — each path independently leads to the output.
Example: “Loss of power” occurs if the mains supply fails OR the backup generator fails OR the battery runs out.
⋀ AND Gate
The output event occurs only if all input events occur simultaneously. All safeguards must fail at once. Represents redundancy — where multiple independent protections exist.
Example: “Undetected fire” occurs only if Detector A fails AND Detector B fails AND the manual inspection is missed.
Why this matters for risk: OR gates increase system risk because any single failure propagates upward. AND gates represent safety redundancy — they are where defence-in-depth protects you. A system with all OR gates between the top event and basic events has no redundancy at all; every basic event is a single point of failure.
Other FTA Symbols
- Top Event — rectangle at the top. The specific undesired system state being analysed.
- Intermediate Event — rectangle in the middle layers. A fault condition that is itself caused by events below it.
- Basic Event — circle at the bottom. A root-level failure (component failure, human error, external event) that cannot be broken down further.
- Undeveloped Event — diamond. A basic event where further decomposition is possible but not yet done (insufficient data or out of scope).
- Transfer Gate — triangle. Used to connect subtrees when the diagram is too large to fit on one page.
Minimal Cut Sets: Where the Real Risk Lives
After building the fault tree, the most important analytical step is finding the minimal cut sets (MCS).
A cut set is any combination of basic events whose simultaneous occurrence causes the top event. A minimal cut set is the smallest such combination — you cannot remove any event from it without breaking the path to the top event.
Cut set priority ranking
Identifying MCSs tells you exactly where to focus corrective action: eliminate single-event MCSs first (add redundancy or remove the failure mode), then work through two-event MCSs.
How to Perform a Fault Tree Analysis: 7 Steps
- Define the top event precisely. State the exact undesired outcome: “Main landing gear fails to extend on approach”, not “landing gear problem”. Vague top events produce vague trees. Include the system boundary and operating conditions.
- Assemble a cross-functional team. Include design engineers, operators, maintenance staff, and safety specialists. FTA done alone misses failure modes that practitioners know from experience.
- Identify the immediate causes of the top event. Ask: what direct conditions would cause this? Connect them with AND or OR as appropriate. This is the most judgement-intensive step — getting the gate type wrong changes the entire risk picture.
- Decompose recursively until you reach basic events. For each intermediate event, repeat step 3. Keep decomposing until you reach component-level failures or human errors that cannot be broken down further. Typical fault trees have 3–7 levels.
- Identify all minimal cut sets. Work through the tree systematically using Boolean algebra or software to find every minimal combination of basic events that causes the top event. Flag all single-event MCSs as single points of failure.
- Perform quantitative analysis if data is available. Assign failure probabilities to basic events (from reliability data, field history, or engineering estimates). Calculate top event probability using the cut set probabilities. This step is optional but required for safety certifications.
- Define corrective actions and maintain the tree. For each high-priority MCS: add redundancy, strengthen detection, or eliminate the failure path. Document the revised tree. Update the fault tree whenever the design, process, or operating environment changes.
Fault Tree Example 1: Aircraft Engine Shutdown (Aerospace)
Top Event: In-flight engine shutdown (IFSD)
Context: Simplified fault tree for a turbofan engine in-flight shutdown event, used as part of SAE ARP4761 safety analysis for aircraft type certification.
Fuel supply failure (AND gate): Requires primary fuel pump failure AND backup pump failure simultaneously — two-event MCS. Low probability due to redundancy.
Engine control failure (OR gate): Either FADEC Channel A fails OR Channel B fails OR both lose power — three separate single-event MCS paths. Each is a single point of failure and must achieve target probability of < 10−7/flight hour.
Mechanical seizure (AND gate): Requires bearing failure AND oil starvation together — two-event MCS.
Minimal cut sets
Action: The three single-event FADEC MCSs drive the design to dual-channel architecture with independent power supplies and cross-channel monitoring. Each channel must independently meet the 10−7/hour target.
Fault Tree Example 2: Chemical Tank Overflow (Process Safety)
Top Event: Hazardous chemical tank overflow
Context: Process safety FTA for a storage tank containing a corrosive liquid at a chemical plant. Required under IEC 61025 and OSHA Process Safety Management (PSM). The tank has two independent high-level shutdown systems.
Top event decomposition (OR gate): Tank overflow occurs if either the fill system delivers too much product OR the overflow detection and shutdown fails to activate in time.
Fill system delivers excess (OR gate):
- Fill valve sticks open (single-event MCS — SPOF)
- Fill rate set incorrectly by operator (single-event MCS)
Detection and shutdown fails (AND gate): Both independent shutdown systems must fail simultaneously:
- High-level switch LS-101 fails AND High-level switch LS-102 fails (two-event MCS)
Minimal cut sets — Tank Overflow
Actions: Both single-event MCSs are addressed first: (1) add a solenoid valve with position feedback and hard interlock on fill valve; (2) add a flow meter with totalizer alarm to detect fill rate errors before overflow is reached. For the two-event MCS: implement staggered testing of LS-101 and LS-102 on alternating weeks to ensure latent failures are detected before both switches fail simultaneously.
Fault Tree Example 3: Web Application Data Breach (Software / Cybersecurity)
Top Event: Unauthorized access to customer PII database
Context: Security FTA for a SaaS application handling customer personally identifiable information (PII). Performed as part of a SOC 2 Type II preparation and threat modelling exercise.
Top event decomposition (OR gate): Unauthorized PII access occurs via external attack OR insider threat OR accidental exposure.
External attack path (AND gate): Attacker reaches database AND authentication is bypassed:
- Network perimeter bypassed (firewall misconfiguration OR VPN credential compromise)
- AND application authentication bypassed (SQL injection OR stolen session token OR MFA not enforced)
Insider threat path (OR gate): Malicious employee with DB access OR compromised contractor credential — both are single-event MCSs.
Accidental exposure (OR gate): DB credentials in public code repo OR unencrypted backup on public storage — both single-event MCSs.
Highest-priority minimal cut sets
Actions (in priority order): (1) Scan all repos with secret detection tool, rotate all exposed credentials immediately; (2) audit S3 bucket ACLs, enforce encryption at rest; (3) implement least-privilege DB access with role separation; (4) enforce MFA on all contractor accounts; (5) deploy WAF with SQLi ruleset and quarterly pen testing.
FTA vs FMEA vs 5 Whys vs Fishbone
| Tool | Direction | Timing | Best used for | Key strength |
|---|---|---|---|---|
| Fault Tree Analysis | Top-down | Proactive or reactive | Specific high-consequence event; combination failures; safety certification | Models AND logic — where multiple safeguards must all fail |
| FMEA | Bottom-up | Proactive | Comprehensive coverage of ALL failure modes in a product or process | Systematic — nothing slips through; produces prioritised RPN risk register |
| 5 Whys | Linear (single chain) | Reactive | Single known failure; quick root cause in 15–30 min | Fast, no training required, works with small teams |
| Fishbone diagram | Lateral (categories) | Reactive | Complex problem; brainstorm all possible causes across departments | Visual, team-based, covers 6M or 6P categories comprehensively |
| Pareto Analysis | Statistical | Reactive / analytical | Prioritise which problem to investigate first using 80/20 rule | Data-driven prioritisation — tells you where to focus first |
When to choose FTA specifically: Use FTA when (1) you are analysing a specific catastrophic or safety-critical event, (2) you need to model combinations of simultaneous failures where AND logic matters, (3) a safety standard requires it (SAE ARP4761, IEC 61025, ISO 26262, NRC PRA), or (4) you need to calculate failure probability from component reliability data. For everything else, 5 Whys or FMEA is faster and sufficient.
Quantitative FTA: Calculating Top Event Probability
When component failure rate data is available, FTA can produce a numerical probability for the top event. This is required in aerospace (FAA/EASA certification), nuclear (NRC probabilistic risk assessment), and automotive functional safety (ISO 26262).
The calculation uses the minimal cut sets:
- OR gate (single-event MCS): P(top) ≈ P(A) + P(B) + P(C) — for rare events where probabilities are small
- AND gate (two-event MCS): P(top) = P(A) × P(B) — if A and B are independent
For a two-event MCS with P(A) = 10−3 per hour and P(B) = 10−3 per hour, the combined probability is 10−6 per hour — this is how redundancy achieves the extreme reliability targets required in aviation and nuclear power.
For qualitative FTA (finding cut sets without probabilities), no special software is needed. For quantitative FTA with complex trees, tools like OpenFTA, CAFTA, or Relex are commonly used.
Common FTA Mistakes
- Using the wrong gate type. Confusing AND and OR fundamentally changes the risk picture. AND gates where OR gates belong give false confidence in redundancy that does not exist.
- Vague top event. “System failure” or “accident” is too broad. The top event must be specific enough that you can unambiguously say whether it occurred.
- Stopping too early. Stopping at intermediate events before reaching true basic events leaves the analysis incomplete. Go deep enough that corrective actions are actionable.
- Ignoring common cause failures. If two “independent” components share a common power supply, common maintenance team, or common environment, they are not truly independent. AND gates between them overestimate safety.
- Not updating the fault tree. A fault tree that does not reflect the current system design is misleading. Treat it as a living document — update it when the system changes.
- Confusing FTA with FMEA. FTA analyses one specific top event in depth. FMEA covers all failure modes broadly. They complement each other — FMEA finds failure modes you would not have thought to put in a fault tree.
Investigate Root Causes with Free RCA Tools
After identifying basic events in your fault tree, use our free 5 Whys and Fishbone tools to drill into the root cause of each one — no signup, no downloads.
Explore Free RCA Tools →Frequently Asked Questions
What is fault tree analysis?
Fault Tree Analysis (FTA) is a top-down, deductive failure analysis method that uses Boolean logic (AND and OR gates) to map all possible combinations of events that could lead to a specific undesired top event — such as a system failure, accident, or safety incident. Starting from the top event, you decompose it through logic gates down to basic events (component failures or human errors). Identifying the minimal cut sets — the smallest combinations of basic events that cause the top event — reveals single points of failure and guides corrective action.
What is the difference between an AND gate and an OR gate in FTA?
An OR gate outputs a failure if ANY one of its input events occurs. It represents parallel failure paths where one failure alone is sufficient — these are higher risk because no redundancy protects the output. An AND gate outputs a failure only if ALL input events occur simultaneously — it represents true redundancy where multiple independent safeguards must all fail at once. OR gates increase risk; AND gates contain it. Getting the gate type right is the most critical skill in FTA.
What is a minimal cut set in fault tree analysis?
A minimal cut set (MCS) is the smallest combination of basic events whose simultaneous occurrence causes the top event. Single-element MCSs are the most critical — they are single points of failure (one event alone triggers the top event). Two-element MCSs require two simultaneous failures. Identifying MCSs tells you exactly where to focus risk reduction: always eliminate single-element MCSs first by adding redundancy or removing the failure mode.
When should you use fault tree analysis instead of FMEA or 5 Whys?
Use FTA when: (1) you need to analyse a specific high-consequence event (fire, crash, data breach), (2) combination failures matter — where multiple safeguards must fail simultaneously, (3) you need to calculate the probability of the top event from component failure rates, or (4) a safety standard requires it (SAE ARP4761, IEC 61025, ISO 26262). Use FMEA for comprehensive coverage of all failure modes. Use 5 Whys when a single failure has already occurred and you need the root cause quickly.
What industries use fault tree analysis?
FTA is mandatory or widely used in: aerospace (SAE ARP4761 for aircraft certification, FAA/EASA type certification), nuclear power (NRC probabilistic risk assessment), chemical and process industries (IEC 61025, OSHA PSM), automotive (ISO 26262 functional safety for electronic systems), medical devices (IEC 62304, ISO 14971), and defence (MIL-STD-1629A). It is increasingly applied in software, cybersecurity, and financial risk modelling.
What is the difference between FTA and FMEA?
FTA is top-down: you start with one specific undesired top event and work downward to find all combinations of causes. FMEA is bottom-up: you start with every possible component failure and work upward to determine its effect on the system. FTA is better for modelling failure combinations (AND gate logic) and calculating system-level failure probability. FMEA is better for comprehensive coverage — ensuring no failure mode is missed. In safety-critical systems like aircraft and nuclear plants, both are typically required.
Can fault tree analysis be done without specialised software?
Yes — for qualitative FTA (identifying cut sets and logic structure), you can draw a fault tree on paper, a whiteboard, or any diagramming tool (Visio, Lucidchart, draw.io). Manual FTA is standard practice for team-based analysis and is entirely sufficient for most root cause analysis and safety review purposes. Quantitative FTA (calculating top event probability from component failure rates for large, complex trees) benefits from dedicated tools like OpenFTA, CAFTA, or Isograph. For learning and everyday quality work, manual FTA on a whiteboard is the right starting point.
Related Resources
- Fishbone Diagram vs. Fault Tree Analysis — side-by-side comparison, same problem analysed both ways
- RCA Tools Compared: 5 Whys, Fishbone, FTA, FMEA & Pareto
- FMEA: The Complete Guide — bottom-up complement to FTA
- Root Cause Analysis: A Complete Guide
- Free 5 Whys Tool — investigate basic events found in your fault tree