What does human-in-the-loop mean for AI agents?

It means agents handle routine decisions automatically but pause and route critical decisions — those involving money, quality, or customer commitments — to a person for approval, with full context and a confidence score. People stay in control of what matters.

How do confidence scores work?

A confidence score combines factors such as similarity to past situations, the outcomes of those situations, data completeness, and how often humans have overridden similar recommendations. It tells you how much to rely on a given recommendation.

Is it safe to let AI agents act autonomously in a factory?

With graduated autonomy, yes. You set thresholds for what can auto-execute, start cautious, expand only as confidence proves out, and retain a full audit trail — so autonomy is deliberate, reversible, and explainable rather than a black box.

Back to insights

ai_and_dataJune 14, 2026

Human-in-the-Loop AI in Manufacturing: Governance Without Killing Speed

TechseriaTeam

Human-in-the-Loop AI in Manufacturing: Governance Without Killing Speed

The appeal of AI agents in manufacturing is speed. A procurement agent that raises a Purchase Requisition within minutes of an MRP trigger. A quality agent that flags a process anomaly within seconds of a sensor reading. A maintenance agent that creates a work order before a bearing fails.

The concern about AI agents in manufacturing is also speed. What if the procurement agent raises a PO for the wrong quantity? What if the quality agent generates a material hold that stops a production line unnecessarily? What if the maintenance work order is wrong and technicians spend four hours on the wrong machine?

These concerns are legitimate. They are not reasons to avoid AI agents. They are requirements for how AI agents must be designed.

Human-in-the-Loop (HITL) architecture is the engineering answer to the governance problem. Properly implemented, it gives you the speed and consistency of autonomous AI action where the decision is clear-cut, and human judgment exactly where it is needed — without creating a system where every action requires approval and the speed advantage disappears.

The Real Tension in Manufacturing AI

Manufacturing has three characteristics that make HITL design non-negotiable:

Regulatory accountability. ISO 9001:2015 requires documented evidence of quality decisions — who made them, when, on what basis. If an AI agent creates a quality hold, the hold decision must be auditable. If a non-conformance is dispositioned as "use as is", that decision must have a human name attached to it. An AI system that acts without leaving an audit trail is not ISO 9001 compliant, regardless of whether its decisions are correct.

Financial consequence. A production hold on a high-volume line costs thousands of pounds per hour. An incorrectly routed PO for a long-lead-time component can delay a product launch. An unwarranted maintenance stop pulls technician resource and disrupts production. AI agents acting autonomously on high-value decisions without human review creates financial risk that is difficult to quantify and harder to recover from.

Safety implications. In some manufacturing environments — food, pharmaceutical, aerospace — quality and process decisions have direct safety consequences. An AI agent that releases material from hold without appropriate review is not a process improvement. It is a liability.

HITL architecture addresses all three. The design question is not whether to include human oversight but where to include it, at what threshold, and how to make the oversight process efficient enough that it does not become a bottleneck.

The Confidence Threshold Architecture

Techseria's HITL design uses a three-zone confidence threshold model applied to every AI agent decision:

Zone 1: Autonomous action (confidence above 85%) The agent acts without human approval. The action is logged with timestamp, rationale, and decision inputs. A human can review the log at any time but does not need to actively approve the decision.

Examples:

Procurement agent auto-creates a Purchase Requisition for a catalogued item from a preferred supplier within auto-approve value limits
Inventory agent updates safety stock levels within a pre-defined adjustment range
Production planning agent reschedules a work order within a 24-hour window to resolve a sequencing conflict

Zone 2: Escalation required (confidence 60–85%) The agent makes a provisional decision and routes it to the appropriate human reviewer for confirmation before the action takes effect. The reviewer sees the agent's recommended action, the confidence score, and the factors driving the uncertainty. The reviewer approves, modifies, or rejects. The agent executes the approved version.

Examples:

Quality agent flags a process parameter as potentially out of control but the anomaly is ambiguous (trending toward control limit, not clearly outside it)
Procurement agent identifies a preferred supplier for a non-standard item but the item-supplier mapping is inferred, not explicit
Maintenance agent detects a sensor anomaly but the sensor has flagged false positives before (tracked in its history)

Zone 3: Human review mandatory (confidence below 60%) The agent does not act. It creates an exception in the human review queue with its analysis of the situation, the data it observed, and the options it considered. A human makes the decision from scratch, using the agent's analysis as context. The human's decision is logged, and the agent's model is updated to improve future confidence in similar situations.

Examples:

Quality agent detects a pattern it has not seen before in the process history
Procurement agent encounters a requisition for an item with no supplier history and no comparable pricing data
Maintenance agent observes sensor readings that do not match any known failure mode pattern

The threshold values (85% and 60%) are the starting points. They are calibrated per agent and per decision type during the parallel run phase of implementation. Some decision types warrant tighter thresholds (quality holds in a pharmaceutical environment might require 95%+ confidence for autonomous action). Others can operate at wider thresholds where the cost of delay exceeds the cost of a false positive.

Exception Queue Design: Where Humans Actually Work

The exception queue is where Zone 2 and Zone 3 decisions surface for human action. Its design determines whether HITL is genuinely useful or whether it becomes a bottleneck that managers stop trusting.

Techseria implements exception queues in two channels, depending on client preference and existing tooling:

ERPNext portal: A dedicated view in the ERPNext interface showing all pending agent exceptions, sorted by urgency and value. Each exception card shows: the agent's recommended action, the confidence score, the triggering data, and the available responses (approve / modify / reject). The reviewer clicks to action; the agent executes the approved version within 60 seconds.

Microsoft Teams integration: For operations where managers are rarely at a desktop, a Teams bot delivers exception notifications with action buttons directly in the Teams mobile app. One-tap approve/reject with an optional comment. Responses sync back to ERPNext and trigger agent execution.

Both channels maintain the same audit trail. The exception record stores: original agent recommendation, confidence score, reviewer name, review timestamp, final decision, and any modification made to the agent's recommendation.

Exception queue design principles:

Urgency stratification — a quality hold on a running production line sits at the top of the queue with a countdown showing estimated cost of delay. A safety stock adjustment surfaces at the bottom with a 48-hour review window. Reviewers should know immediately which exceptions demand attention now.

Batch-able exceptions — for high-frequency, similar decisions (e.g., multiple inventory reorder approvals), the queue groups them with a bulk-approve function. A manager can approve 12 similar safety stock adjustments in 30 seconds.

Timeout handling — if a Zone 2 exception is not reviewed within the configured timeout period, it escalates automatically to a senior reviewer. If the escalation is also not actioned, the agent falls back to a safe default (typically: do nothing and flag for priority review). Systems do not stall; they escalate.

Specific Scenarios: How HITL Plays Out in Practice

Scenario 1: Procurement — auto-approve vs. escalate

A Purchase Requisition is triggered for a 500-unit order of a standard fastener from Supplier A (preferred, 94% lead time compliance score) at £1,800 total value, within the £2,000 auto-approve threshold.

Agent confidence: 97%. Action: autonomous. Requisition created and converted to PO. Log entry: "Auto-approved: preferred supplier, within threshold, lead time compliance 94%, no open disputes."

Next day, a requisition for 200 units of a custom machined bracket from a new supplier at £3,400 total value, above the £2,000 threshold, supplier has no ERPNext performance history.

Agent confidence: 51%. Action: Zone 3 exception. Queue entry: "New supplier, no performance history, value above auto-approve threshold. Recommended action: approve with 30-day payment terms and QC hold on first delivery. Requires buyer review."

Scenario 2: Quality — hold vs. flag

Quality agent detects temperature reading at 3.1 sigma above mean on a curing oven. This has triggered confirmed defects in historical data 78% of the time when sustained for more than 2 minutes.

Sustained for 2.5 minutes. Agent confidence: 89%. Action: autonomous quality hold on current batch. Quality Inspection record created. Batch status set to "On Hold" in ERPNext. Shift supervisor notified. Log entry: "Autonomous hold: temperature 3.1σ above mean, 2.5 minutes sustained, historical defect correlation 78%, confidence 89%."

Shift supervisor receives the notification, reviews the oven readings on the dashboard, confirms the hold is warranted. Adds a comment: "Oven B heating element likely — maintenance notified." This comment becomes part of the Quality Inspection record.

Scenario 3: Maintenance — planned vs. immediate

Maintenance agent detects vibration anomaly on Motor 7B consistent with early-stage bearing inner race defect. Estimated time to failure: 14–21 days based on degradation rate.

Agent confidence: 81%. Action: Zone 2 escalation. Queue entry: "Early bearing defect signature on Motor 7B. Recommended action: schedule bearing replacement in next planned maintenance window (3 days). Parts required: [specific part number]. Confidence 81% — please confirm scheduling."

Maintenance manager reviews, approves the scheduling recommendation, adjusts the planned window by one day to align with an already-planned maintenance shift. Agent updates ERPNext Maintenance Schedule accordingly.

Audit Trail: ISO 9001 Compliance Built In

ISO 9001:2015 Clause 7.1.6 (Organisational knowledge) and Clause 8.5.1 (Control of production) require that quality-relevant decisions are documented and traceable. Clause 8.7 (Control of nonconforming outputs) requires documented evidence of the authority and accountability for disposition decisions on held material.

Techseria's HITL architecture generates a complete audit trail for every agent decision — autonomous or human-reviewed — stored in ERPNext with:

Timestamp of the triggering event
Triggering data (sensor reading, stock level, supplier score, etc.)
Agent confidence score and the factors contributing to it
Decision made (autonomous or reviewed)
If reviewed: reviewer name, review timestamp, original agent recommendation, final decision, any modification
Downstream action taken (record created, field updated, notification sent)
Outcome tracking (was the decision correct? Tracked 30 days post-decision)

The audit trail is queryable in ERPNext and can be exported for ISO 9001 surveillance audits. Quality managers find that AI-generated audit trails are more complete and consistent than manually maintained paper records — because the system logs every decision automatically, without relying on an inspector to fill in a form.

Rollback capability — for any automated action, a rollback function is available from the audit trail. If a quality hold was created incorrectly, a quality engineer can reverse it with one click from the hold record, with a mandatory comment explaining the reversal. The reversal is logged alongside the original hold.

How Techseria Designs HITL From Day One

HITL is not a feature added to a manufacturing AI system after deployment. It is an architectural decision that must be made before the first line of code is written. The confidence threshold model, the exception queue design, the audit trail schema, and the rollback mechanism all affect the fundamental structure of the LangGraph.js state machine.

In Techseria's engagement model, the HITL design workshop happens in week 1 of every manufacturing AI implementation. The outputs of that workshop:

Decision inventory — a complete list of every decision the agent will make, with the decision type, affected parties, financial value range, and regulatory context.
Threshold map — for each decision type, the initial confidence thresholds for autonomous action, escalation, and mandatory review.
Queue design specification — who sees which exceptions, in which channel, with what timeout.
Audit trail requirements — what fields must be captured per decision type to satisfy ISO 9001 and any sector-specific regulatory requirements.

This workshop output is the governance specification. It is signed off by the Operations Director, Quality Director, and IT representative before development begins. It is also the document that answers the audit question "how do you ensure AI decisions are accountable?" with a specific, documented, tested answer.

The Governance-Speed Balance in Practice

The question operations teams ask most frequently: does all this governance slow the system down?

The answer depends on design quality. A well-designed HITL system with appropriate auto-approve thresholds means that 60–75% of agent decisions require no human action at all. They execute autonomously, log to the audit trail, and are visible for review if anyone wants to check. The remaining 25–40% surface in the exception queue with clear urgency stratification.

For the procurement agent, 62% of purchase events in a typical implementation are fully autonomous. A buyer spends 20–30 minutes per morning reviewing and clearing the exception queue. The rest of their time is on supplier relationship management, strategic sourcing, and the complex decisions that genuinely require judgment.

For the quality agent, shift supervisors spend approximately 15 minutes per shift reviewing the exception queue. Before AI, they spent 2–3 hours per shift on routine inspection tasks that are now handled autonomously with better consistency than manual inspection achieved.

Speed is preserved where clarity is high. Governance is applied where uncertainty warrants it. The result is a manufacturing operation that is simultaneously faster and more accountable than the manual alternative.

Talk to Techseria about governance-first AI design. Our HITL framework is included in every manufacturing AI engagement — there is no additional cost for audit trail capability, exception queue design, or ISO 9001 documentation support. Book a scoping call to discuss your specific regulatory context and governance requirements.

[Book a Strategy Session] | [Get a Fixed-Fee Quote]

Ready to accelerate your operations?

See how custom AI solutions, ERPNext integration, and workflow automations can lower your operating costs. Book your free 30-minute Workflow Audit with a senior engineer.

Book Free Audit Learn more about us

Human-in-the-Loop AI in Manufacturing: Governance Without Killing Speed

Human-in-the-Loop AI in Manufacturing: Governance Without Killing Speed

The Real Tension in Manufacturing AI

The Confidence Threshold Architecture

Exception Queue Design: Where Humans Actually Work

Specific Scenarios: How HITL Plays Out in Practice

Audit Trail: ISO 9001 Compliance Built In

How Techseria Designs HITL From Day One

The Governance-Speed Balance in Practice

Ready to accelerate your operations?

Recent Articles

Measuring ROI on AI Agent Deployment: The Only 5 KPIs That Actually Tell You If It's Working

Azure DevOps for Mid-Market: Is the Complexity Worth It vs GitHub Actions?

Azure AI Foundry vs Custom LLM Integration: Decision Guide for Enterprise Teams