← All posts PHILOSOPHY

The SafeForge AI Dogma — Why AI Shouldn't Make Safety Decisions

18 April 2026 · SafeForge

The Problem

Safety engineering is in an uncomfortable position. The systems being engineered are larger and more complex than ever, the regulatory expectations are sharper, the workforce is stretched, and a wave of AI tools is arriving with promises that range from “we will make you faster” to “we will run the safety analysis for you.” Both promises are dangerous if they’re unexamined. The first because of how faster — the second because no one should run the safety analysis for you.

To understand where SafeForge fits, it helps to walk through five problems that safety teams genuinely face right now. None of them is solved by adding another tool. Most of them are made worse by it. The reason SafeForge exists is to address them in a way that respects the work and the engineer doing it.

1. The trust gap in AI

The headline finding from the Stack Overflow 2025 Developer Survey is uncomfortable reading for the AI industry. Despite 84% of developers using AI tools, only 29% trust them — a drop from 40% the previous year. 46% don’t trust the accuracy of AI output, up from 31% a year earlier. Among experienced developers, only 2.6% claim “high trust” in AI output, and 20% express “high distrust.”

The Sonar / ITPro report on developer verification habits is starker still:

“96% of developers don’t fully trust AI-generated code is functionally correct, yet fewer than half review it before committing.” — ITPro, 2025

“45% of developers say debugging AI-generated code takes longer than writing it themselves.”

Sonar’s CEO summarises the pattern this creates:

“While AI has made code generation nearly effortless, it has created a critical trust gap between output and deployment.” — Tariq Shaukat, Sonar CEO

The KPMG / University of Melbourne Global AI Trust Study 2025 generalises the finding outside software. Worldwide, only 46% of people are willing to trust AI systems. 66% use AI regularly, but 56% have made work mistakes due to AI, and 66% rely on AI output without verifying accuracy. Trust has declined since the 2022 baseline.

Aviation professionals — the closest cohort to safety engineers in the published surveys — report a mean comfort, trust, and acceptance rating of 4.4 out of 7 for AI systems, with two-thirds rejecting at least one of the eight hypothetical AI scenarios EASA tested.

“Aviation professionals express uncertainty about AI’s ethical soundness… key concerns include AI performance, negative consequences for humans, data protection, accountability, and potential threats to aviation safety.” — EASA Ethics for AI in Aviation Survey

The trust gap matters because the obvious response — “we’ll just check the AI’s work” — is not what people actually do. Sonar found that fewer than half of developers review AI output before committing, despite 96% not trusting it. KPMG found that 66% rely on AI output without verifying it. Trust gaps don’t manifest as careful scepticism. They manifest as a quiet, growing dependence on something the user doesn’t actually believe.

For safety-critical work, this is the worst-shaped problem you can have. An AI tool that the engineer distrusts but uses anyway, with cursory review, against a deadline, on a system whose failure could hurt people. That’s not a productivity gain. That’s a hazard.

2. The complexity has run past the people

The size of modern safety datasets has grown faster than anyone’s ability to comprehend them. A modest rail signalling project might carry several hundred hazards, a thousand controls, two thousand requirements, and a junction matrix between them with tens of thousands of rows. A defence programme with multiple subsystems can produce hazard logs that span tens of thousands of entries. Nobody — no individual engineer, no review committee — can hold that volume in their head.

The IChemE study of nuclear power plant safety cases describes the consequence directly:

“Safety case shortcomings have persisted over the years, with significant implications in terms of accessibility and understanding of these documents by key end-users, which have severely restricted their use as an effective tool, leaving them ‘gathering dust on a shelf’.” — IChemE Hazards 29 Poster 01

“Technically sound but complex and complicated documents, not easily accessible to and therefore not used by those charged with ensuring safe operations, including operations and maintenance staff and managers accountable for safety.”

When the safety case is “gathering dust on a shelf,” the operational decisions get made on the basis of memory, habit, and the loudest engineer in the room. That’s not a defect of the safety case author — it’s a structural failure of the medium they were forced to work in. A 600-page PDF is not an information system. It’s a deliverable. Once it ships, nobody opens it again until the next regulator audit.

The same pattern shows up in hazard logs. The UK Ministry of Defence’s own Aerospace Safety Toolkit names the failure mode directly:

“If it is not sufficiently robust or well-structured, this may obscure the identification and clearance of hazards… if hazards are not well defined when they are entered into the hazard log, then the rigour enforced by the need for a clear audit trail of changes made may make it very difficult to maintain the hazard and accident records.” — ASEMS Online, UK MoD Aerospace Safety Toolkit

The Safety Artisan, summarising twenty-five years of practitioner experience, puts it more bluntly:

“In my 25+ years in System Safety, I’ve seen many bad hazard logs… there are an infinite number of ways of not doing them well. Most of them were hosted in Microsoft Excel, but there were also commercial tools and bespoke databases.” — The Safety Artisan, 2024

The volume defeats the format. A spreadsheet works fine for ten hazards. It works clumsily for a hundred. At a thousand, the spreadsheet has become a liability — a document that nobody fully understands, that everyone is afraid to touch in case they break something, and that increasingly works against the safety engineer rather than with them.

The AIChE 2025 review of process hazard analyses found that the US Chemical Safety Board identified problems with the conduct of the PHA in 21 of 46 accident investigations between 1998 and 2008, and named hazard identification (HAZID) as “still the weakest link in risk assessment” because of incompleteness. The hazards that hurt people are disproportionately the ones that fell off the bottom of the page.

3. The obscurity of “good practice”

What does “well-managed risk” actually look like? Ask three safety engineers and you’ll often get three different answers. Not because they’re confused — because the published reference examples for what a good hazard analysis, a well-supported control, or an adequate risk argument actually contain are scarce, the standards leave deliberate room for judgement, and most of the genuinely good work sits in unpublished hazard logs and safety cases inside consultancies and primes that nobody outside ever sees.

Ken Rivers CEng FIChemE, writing in The Chemical Engineer, names this directly:

“The biggest challenge is making good practice into common practice… Most – if not all – major incidents have causes that we have seen in earlier events… And yet knowing all of that, we still repeat the same mistakes. We still haven’t cracked how to fully apply the knowledge and insight which is available and can demonstrably help.” — Ken Rivers, The Chemical Engineer, 2021

Trevor Kletz, the dean of process safety, said the same thing more bluntly:

“Accidents are not caused by lack of knowledge, but by a failure to use the knowledge that is available.”

“Organisations have no memory. Only people have memory, and they move on.” — Trevor Kletz, Lessons from Disaster: How Organisations Have No Memory and Accidents Recur (IChemE, 1993)

The US Chemical Safety Board’s investigation of the Deepwater Horizon disaster in 2014 found, in the words of CSB Investigator Cheryl MacKenzie, an “eerie resemblance” between the offshore blowout and the 2005 BP Texas City refinery explosion. The lessons existed. The mechanisms for transferring those lessons forward did not work.

The European Process Safety Centre summarises the failure mode:

“The organizational discipline to seek out, internalize, and act on external lessons learned is consistently the weakest link in process safety management.” — EPSC, Learning lessons from major incidents (2022)

The academic survey of safety case practice across industries confirms the inconsistency at the discipline level. A 2017 review identified nine schools of thought on safety arguments with six distinct internal inconsistencies:

“Despite the increasing adoption of safety cases across different industries and the establishment of safety argument notations and techniques, the standard of safety case practice remains mixed, with clearly examples of good practice across industry alongside many more examples where safety cases are not being used effectively.” — Graydon, NASA, 2017

A 2024 study of how practitioners actually gain confidence in assurance cases found that the workforce knows the system isn’t working:

“Industrial practice is a long way from academic practice around safety cases. So, the research that’s kinda like the leading-edge research is so far removed from what practices people need to get up a really big learning curve before you can actually apply it.” — Practitioner P15, arXiv 2024

“It could be that I put some numbers, and you put totally different numbers, and then who do we trust?” — Practitioner P12

A different ScienceDirect study, comparing safety practice “as required” versus “as observed,” concluded:

“For many different and complex reasons, ‘As Observed’ safety practice may not be equivalent to the safety practice ‘As Required’. Many of these interventions seem to have been largely ineffective, suggesting that they may not be addressing the real impediments to good safety engineering practice.” — ScienceDirect 2024

Translation: nobody quite agrees what good practice looks like, the people doing the work mostly don’t have access to the academic frontier, and the regulator’s expectations are clear in the abstract but interpreted differently in every project. New engineers entering the field can spend years assembling a personal map of what actually works, often by trial-and-error inside individual organisations, often without realising that the map they’re building is parochial to their employer.

This matters whether your project operates under SFAIRP, ALARP, ALARA, or any of the other risk-acceptance frameworks the major regulators use. The judgement of “is this risk well managed?” sits with the engineer, regardless of which legal test applies. But the engineer needs comparisons — to standards, to international practice, to known incidents, to what good actually looks like in an analogous project — to make that judgement defensibly. And those comparisons are often locked up in places they can’t easily reach.

4. The deadline at the end of the project

When good practice is unclear and the dataset has grown beyond comprehension, what fills the gap is the deadline. A hazard log isn’t maintained — it’s reconciled before an audit. A safety case isn’t argued — it’s assembled the week before a gate review. The work shifts from a continuous, judgement-led discipline to an end-of-project compliance push.

The Nimrod Review is the canonical case study of where this leads. Charles Haddon-Cave QC’s 2009 report into the loss of XV230 is required reading. The language he uses is unsparing:

“The Nimrod Safety Case was a lamentable job from start to finish. It was riddled with errors. It missed the key dangers.”

“There has been a yawning gap between the appearance and reality of safety.”

“The task of drawing up the Safety Case became essentially a paperwork and ‘tick-box’ exercise.”

“BAE Systems left 40% of the hazards ‘Open’ and 30% ‘Unclassified’.”

“The safety case was virtually worthless as a safety tool.”

“The MoD simply accepted BAE Systems’ work with little review or challenge and failed to ask intelligent questions.” — Risktec summary of Haddon-Cave Review

The Nimrod safety case had become “a documentary rather than analytical exercise” — produced by people who described the work as essentially archaeological, locating historical design data rather than performing fresh analysis. The hazard log was not a tool that helped them manage safety. It was a thing to be fixed, in time for sign-off, by a team who already privately believed the aircraft was “safe anyway.”

Australia’s rail regulator says the same thing in 2020s language:

“Risk assessment as an administrative task or hurdle rather than as a process to support or guide their decision-making… risk assessment may overlook certain risks which may result in mitigating controls not being introduced.” — ONRSR Safety Message

This is the failure mode that should keep every safety engineer awake. The hazard log wasn’t designed to be a thing to fix. It was designed to be the document that helps you see clearly the risks of what you are building. When it becomes a thing to fix, the work that the document was supposed to support — thinking carefully about what could go wrong — has already happened (or not) somewhere else.

The deeper diagnosis from Haddon-Cave’s report is uncomfortable: the people producing the Nimrod safety case were not unsophisticated. They were a major prime contractor and a major systems engineering organisation. What failed was the process and the tooling. When the tooling makes continuous engagement painful, continuous engagement does not happen, and the result is a “yawning gap between the appearance and reality of safety.”

5. The split between technical depth and lifecycle judgement

The fifth problem is the one that’s hardest to articulate publicly because it touches the workforce directly. The safety engineering profession has a wide gradient of experience. Some engineers are extraordinary at the technical depth of a particular issue — they know the failure modes of a particular sensor, the conditions under which a control software state machine will get into an unsafe state, or the human factors edge cases of a particular interface. They’ve earned that knowledge over a career.

But the lifecycle expectations of a safety case — what evidence to gather at concept versus PDR versus CDR versus TRR, what level of independence the verification needs, what the regulator will actually expect to see at SAR — are a different domain. They’re often poorly published. The mentors who carried that knowledge are increasingly retired. The 2025 NSPE engineering survey found 59% of engineering professionals are over 55, with succession planning now a critical industry concern. Aerospace and defence are losing institutional knowledge faster than they can backfill:

“The skill gap could cost a median aerospace and defense company approximately $300 million to $330 million per year in lost productivity. The retirement age and attrition rate in this sector is nearly 10% higher than the national industry average.” — McKinsey, The talent gap in aerospace and defense

The Grenfell Tower Inquiry’s Phase 2 report named the consequence in stark terms:

“The fire engineer involved was insufficiently qualified, lacked the requisite technical expertise to determine whether fire safety had been achieved, and failed to produce a final fire strategy.”

“Evidence indicating that the fire engineers considered their role as getting through building control rather than ensuring the safety of the public.” — CMS Law-Now summary

A safety engineer can be deeply expert at the part of safety they were trained on and still be unsupported in the lifecycle judgements that frame the case. That isn’t a moral failing. It’s an industry that hasn’t given its people the scaffolding they need.

What this looks like on a real project: an engineer who can identify five failure modes of a track circuit in conversation, but who isn’t sure what evidence the regulator will expect at the next gate, or how to argue that the residual risk is ALARP, or how to structure the safety case argument so that a reviewer can follow it. The technical knowledge is there. The lifecycle scaffolding is the gap.

This is the place where AI assistance is most useful — and most easily misused. A tool that gives the engineer reasonable knowledge at the point of decision, without taking the decision, can close the gap. A tool that “writes your hazard analysis for you” doesn’t close the gap. It papers over it.

How SafeForge Addresses It

SafeForge is built on a single principle that shapes our architecture, our roadmap, our user experience, and how we talk about our product:

AI empowers the decision-maker. AI does not make the decision.

We call this the SafeForge AI Dogma. It isn’t aspirational. It’s the rule we apply when we decide whether to build a feature, how to scope it, and what its UX must look like.

What the dogma means in practice

In SafeForge:

AI never changes your data autonomously. Every AI-generated suggestion is a proposal. A human reads it, edits or rejects it, and commits the change through our Change Request workflow. Nothing is silently applied.
AI never writes the final word. When AI drafts a hazard description, a control suggestion, or a rejection justification, the output goes into an editable field marked as unconfirmed. If you accept it, it becomes your decision — credited to you and traceable in the audit log.
AI does not run without consent. Organisation admins grant AI access explicitly. Users can see the prompts, see the model, see the credit cost. Nothing happens quietly in the background.
AI-assisted and human-decided items are indistinguishable in the audit trail. Once a safety engineer confirms an AI suggestion, it carries their name and their accountability — because that’s what happened.

We call this pattern the Confirmation Pattern. AI proposes. Human disposes. The audit log records both.

Reasonable knowledge, not replaced judgement

Looking at the five problems above, a useful question is: what would help an engineer working under those conditions, without taking their judgement away?

The answer is something we keep returning to: reasonable knowledge at the point of decision.

Not a tool that answers the question for you. A tool that lays out the relevant context — the standard, the precedent, the gap, the risk profile, the linked entities — so that you, the engineer, can apply your judgement faster and with more confidence. The AI’s role is to surface what’s already there, structure what is implicit, and draft a starting point that you sharpen with your professional knowledge.

This is what AI should be in a safety-critical setting:

A reviewer who reads your hazard log overnight and surfaces the gaps in control coverage at 8am
An assistant who drafts a control description from your title, that you then edit to match what you actually mean
A pattern-spotter who notices that a hazard you raised today looks similar to two others elsewhere in the project, in case you want to link them
An organiser who reminds you what evidence is typically expected at the next lifecycle gate
A drafter who proposes a rejection justification when you reject a change request, that you sharpen and sign

In every case, the AI’s contribution is to give you a head start. The decision is still yours. The accountability sits where it always sat: with the engineer.

Why this matters for the trust gap

If the trust gap shows up as quiet over-reliance on tools the user privately distrusts, the fix is to make the tool’s role explicit. Every AI-generated suggestion in SafeForge is visually distinct, marked as unconfirmed, and requires an active commit from the user. You can’t accidentally accept an AI suggestion. You have to read it, decide, and act.

This is slower than full automation, deliberately. The slowness is the point. We do not want a workflow where a busy engineer rubber-stamps AI output at 5pm on a Friday because the deadline is tomorrow. We want a workflow where the engineer’s attention is where the decision is.

Why this matters for complexity overload

When the dataset has grown past the practical comprehension of any individual, the role of AI is to act as a navigation aid, not a substitute. SafeForge’s AI features are scoped to “show me the gaps in my hazard log,” “draft me a starting point for this control description,” “suggest causes that look related to this hazard.” None of them produce final-form artefacts that bypass review.

The bow-tie editor and the structured data model do most of the comprehension work. The AI sits on top, helping you scan faster — but you are still the one looking.

Why this matters for the deadline trap

The hazard log shouldn’t be a thing to fix at the end of a project. It should be the tool you use to manage safety throughout the project. SafeForge is designed for the during, not the just-before-the-review.

That’s why every change is small, attributable, and audit-trailed. Why the bow-tie is on screen as you edit, not buried in an export. Why the AI features are about ongoing maintenance — coverage gaps, link suggestions, lifecycle reminders — not about generating bulk content the night before sign-off. We want your hazard log to be the artefact that captures your real, continuous engagement — and the safety case argument that builds on top of it to be the natural product of work you’ve already done, not a thing assembled in a panic from whatever data was lying around.

Why this matters for the obscurity of good practice

If “what does good look like?” is locked up in unpublished safety cases inside primes and consultancies, the engineer’s job is to reconstruct it from the fragments they can reach — standards, incident databases, conversations with senior colleagues, the back issues of The Chemical Engineer. AI is uniquely well suited to this kind of retrieval and comparison work. It does not need to make the judgement to be enormously useful at informing it.

Inside SafeForge, this shows up across several features:

Risk workshop assistance. When you run a structured risk identification workshop inside the app, the AI surfaces analogous hazards from your project history and from canonical reference data, so the workshop builds on top of what you already know rather than restarting from scratch each time.
AI chat scoped to your project. You can ask the project conversational questions — “what controls address this threat?”, “what hazards depend on this assumption?”, “show me hazards with no preventative coverage” — and get answers grounded in your data rather than in a generic model.
Top hazard summaries. For each hazard you mark as a top hazard, AI Assist drafts a Risk Summary, suggests comparable international incidents, and identifies the standards typically referenced for that hazard class. The engineer reviews, edits, and signs.
Lifecycle warnings. As your project moves through gates and your data evolves, AI flags the changes that affect open hazards — a closed control whose hazard reopens, an assumption that has invalidated, a control whose linked requirements have been modified, a top hazard that hasn’t been reviewed in 90 days.

The judgement of whether your risk meets ALARP, SFAIRP, ALARA, or any other test the regulator applies is yours. The standards, the international comparisons, the incident parallels, and the lifecycle warnings that inform that judgement are SafeForge’s job. We bring you what good looks like. You decide whether you’ve achieved it.

Why this matters for the depth-vs-lifecycle gap

When an engineer is deeply expert in a technical area but newer to lifecycle assurance, AI is most useful as a guide to the lifecycle scaffolding. Inside SafeForge, the Gold Book panel surfaces lifecycle gate guidance — what’s expected at PHA, SHA, PDR, CDR, TRR, SAR — referenced against the major standards (DEF STAN 00-56, EN 50126, ARP 4761, MIL-STD-882E, RISSB, IEC 61508). The AI Assist features can flag when a control’s evidence base looks weak relative to the gate, or when a requirement is unverified, or when an assumption that supports several hazards has gone stale.

This is what “reasonable knowledge” looks like in practice. The engineer keeps the technical depth they earned. SafeForge fills in the lifecycle scaffolding so they’re not building it from scratch on every project.

What the dogma rules out

The dogma also tells you what we won’t build:

AI does not approve change requests. That needs a human reviewer, every time.
AI does not mark hazards closed. That’s an explicit human action.
AI does not change data without a CR. Every edit goes through the same two-person review your manual edits do.
AI does not act on data it hasn’t been given access to. You grant AI access at the organisation level, and you can revoke it at any time.
We will not market AI as “hazard analysis automation” or “safety case automation.” It isn’t. It’s hazard analysis assistance — a tool that helps the engineer build the structured evidence base their wider safety case argument will draw on.

Why the regulators agree

The regulators have already drawn this line. The tools just haven’t always followed.

EASA’s Artificial Intelligence Concept Paper (Issue 2, March 2024) categorises AI assistance for safety-critical tasks into levels. SafeForge sits at Level 1B — assistance to the human, with the human retaining authority for every decision. Levels 2 (collaborative decision-making) and 3 (autonomous AI) introduce certification burdens that no mainstream hazard tool has cleared, and that we don’t believe should be cleared for safety case work.

UK MOD JSP 936 (Dependable AI in Defence, v1.1, November 2024) requires that AI outputs be traceable to the human decision they support, and that AI inference not be the sole basis for a safety-critical claim. The Confirmation Pattern is a direct response.

UK CAA CAP3064 (Response to Emerging AI-Enabled Automation, v1.3.2, November 2024) demands that AI tools preserve the engineer’s “meaningful human control” — the regulator’s term for what we call the Confirmation Pattern.

UK HSE’s 2024 Regulatory Position on AI is blunter still: organisations using AI in safety-related roles are responsible for demonstrating that the AI did not replace competent human judgement.

ISO/IEC TR 5469:2024 (functional safety and AI) and ISO/IEC 42001:2023 (AI management systems) both require clear demarcation between AI assistance and AI autonomy — exactly the boundary SafeForge enforces.

Legal analysis from Morgan Lewis (February 2026) makes the same point from a liability angle: if an AI tool makes a safety decision and something goes wrong, liability flows to whoever deployed it. The Confirmation Pattern keeps the accountability with the engineer — where the standard expects it.

What AI actually does in SafeForge

Concretely:

Task	AI’s role	Your role
Column mapping on import	Proposes which spreadsheet columns map to which SafeForge fields	Confirm, edit, or reject the mapping
Hazard description drafting	Drafts a first-cut description from a short title	Edit to match your engineering judgement before saving
Bow-tie pre-populate	Suggests causes and controls for a new hazard	Accept the ones that fit, delete the rest
Control coverage analysis	Flags hazards with no preventative controls, controls with no linked requirements	Decide whether each gap is real or acceptable
Rejection draft	Drafts a justification when you reject a change request	Review, edit, and sign as your justification
Requirement adequacy check	Reviews whether a control’s linked requirements fully cover the control’s intent	Decide if the coverage is sufficient
Lifecycle gate guidance	Surfaces what’s typically expected at the upcoming gate, referenced to standards	Decide what’s relevant to your project context

In every case, AI is the assistant. You are the engineer.

User Guide: Working With SafeForge AI

1. Enabling AI for your organisation

Only org admins can enable AI. Go to Settings → AI Configuration and read the consent text carefully. When you enable AI, the consent timestamp and the admin who consented are recorded, AI credits (pooled at the org level) become available, and all members of the org can trigger AI assist actions within the credit budget. If you disable AI, in-progress operations complete their current turn and stop. Nothing runs in the background without your active use.

2. AI column mapping on import

Upload a hazard log and SafeForge’s AI reads the first page of each sheet and proposes a column mapping. You see the mapping in the wizard with a confidence indicator per column, the original column name alongside the SafeForge field, and an option to override any mapping. You confirm or edit before any data is imported. The AI is not deciding what gets imported — you are.

3. Drafting with AI Assist

When you’re editing a hazard, you’ll see a small AI icon next to the description field. Click it to open an AI draft in the field, marked with a gold outline to indicate it’s unconfirmed. You read it, edit it to reflect your actual judgement, and save. The gold outline disappears and the change enters your CR. If you don’t save, the draft is discarded. AI never leaves content in your hazard log without your active commit.

4. Reviewing the audit trail

Every AI-assisted action is recorded with who triggered it, what model was used, how many credits it consumed, and the prompt and response. This isn’t AI transparency as a buzzword — it’s the evidence trail a regulator or your safety case auditor will want.

Our Six Foundational Principles

AI empowers the decision-maker. AI does not make the decision.
AI’s contribution is explicit. You see when AI helped, and when it didn’t.
AI operates under consent. Organisations opt in. Users don’t surprise anyone.
AI runs at Level 1B assistance. Not collaborative. Not autonomous.
AI output is traceable. Prompt, model, credit cost — all logged.
AI does not bypass safety process. Change Requests, two-person review, audit trails — all apply equally to AI-assisted changes.