Human-in-the-Loop AI: How to Automate Without Losing Control
The fear that stops mid-market businesses from deploying AI agents is rarely capability — it's control. If the agent makes the wrong call on a £150,000 purchase order or auto-approves a fraudulent invoice, who is accountable? How do you prove to auditors what the system decided and why?
The answer is human-in-the-loop (HITL) design — not as a workaround for AI limitations, but as a deliberate architectural choice that defines precisely where human judgment is required and where automation is safe. Done well, HITL lets you achieve automation rates above 90% while maintaining auditable oversight on every decision that warrants it.
This guide covers LangGraph.js's interrupt() mechanism with TypeScript implementation, the four core HITL patterns, compliance implications under GDPR and SOC 2, and how to calibrate automation levels for different process risks.
Why Full Automation Is the Wrong Goal
The instinct in most automation projects is to maximise the percentage of cases handled without human intervention. This is the wrong optimisation target.
The right target is: automate every decision that can be made reliably without human judgment; escalate every decision that cannot, with the context the human needs to decide quickly and accurately.
This reframing matters because:
- Some decisions have asymmetric consequences. An incorrectly auto-approved £5,000 invoice is recoverable. An incorrectly auto-approved £500,000 payment or an incorrectly signed supplier contract may not be.
- Trust in the system depends on predictable escalation. Stakeholders accept high automation rates when they trust that exceptions will reach them. When agents surprise users by handling something they expected to review, confidence erodes — even if the agent got it right.
- Regulatory frameworks require human oversight for certain decision categories. GDPR Article 22 restricts fully automated decisions that "significantly affect" individuals. SOC 2 Type II controls typically require documented human approval for material financial decisions.
LangGraph.js interrupt(): The Mechanism
LangGraph.js implements HITL through the `interrupt()` function, which pauses graph execution at a defined node, persists the current state, and waits for an external input to resume.
Here is a complete implementation for an invoice approval interrupt:
import { interrupt } from "@langchain/langgraph"; import { Annotation, StateGraph } from "@langchain/langgraph"; // Define the workflow state
When `interrupt()` is called, LangGraph.js:
- Persists the full graph state to the configured checkpointer (PostgreSQL, Redis, or Azure Cosmos DB)
- Returns control to the calling application with the interrupt payload
- Waits — the graph does not time out; it can wait hours or days for a human response
- Resumes from the exact node when the application calls `graph.invoke({ approved: true, notes: "..." }, { threadId })`
This means human approvers can respond via email, Slack, a custom web UI, or any channel — the graph simply waits until the response is provided.
The Four Core HITL Patterns
Pattern 1: Approval Gate
When to use: High-value or high-consequence decisions where auto-approval above a threshold is appropriate but requires explicit authorisation below.
Architecture: The agent completes all automated processing (data extraction, validation, matching), then routes to an approval gate node. The gate evaluates configurable criteria (amount threshold, risk classification, supplier status) and either auto-approves or interrupts for human decision.
Real example: AP invoice processing. Invoices under £10,000 from verified suppliers with a clean 3-way match: auto-approved. All others: human approval required within 24 hours (SLA enforced by the agent, which sends escalation reminders after 12 hours of no response).
Client result: 94% of invoices auto-approved; 6% escalated. Finance team processes escalations in an average of 47 minutes compared to 3+ hours previously, because each escalation arrives with complete context — invoice image, extracted data, match discrepancy detail, and a draft approval recommendation.
Pattern 2: Exception Escalation
When to use: Processes where the agent should handle the expected range of cases and escalate only when confidence falls below threshold or when a case falls outside normal parameters.
Architecture: Each processing node returns both a result and a confidence score. A routing node evaluates confidence against configurable thresholds per decision type. Below-threshold cases interrupt to a human reviewer with the agent's best-effort analysis and the specific reason for low confidence.
TypeScript routing node:
const confidenceRouter = async (state: WorkflowState) => { const { extractionConfidence, classificationConfidence } = state; const EXTRACTION_THRESHOLD = 0.85;
Real example: Customer enquiry classification. The agent classifies incoming support tickets by type and urgency. Tickets where classification confidence exceeds 80% are auto-routed. Below 80%, the ticket is escalated with the agent's analysis and confidence breakdown — the human decides the classification, and this feedback can be used to improve the model over time.
Pattern 3: Audit Checkpoint
When to use: Regulatory compliance scenarios where a human must review and sign off on automated decisions, even when the automation is reliable.
Architecture: The agent completes its full reasoning and prepares an action recommendation. Before execution, an audit checkpoint interrupt requires a named human to confirm. The confirmation is logged with timestamp, user ID, and the state at the time of confirmation — creating an immutable audit record.
When this is required: Under GDPR Article 22, automated decisions that produce "legal or similarly significant effects" on individuals require either human oversight or explicit consent. For financial services firms, FCA guidelines on automated decision-making require documented oversight. Audit checkpoints provide this without eliminating the automation efficiency.
Pattern 4: Confidence-Threshold Routing
When to use: Multi-path workflows where different confidence levels warrant different levels of human involvement.
Architecture: Rather than binary auto/human routing, this pattern defines three tiers:
- High confidence (>90%): Fully automated, logged for audit
- Medium confidence (70–90%): Agent executes but flags for post-hoc human review (next-day review queue, not a blocker)
- Low confidence (<70%): Pre-execution interrupt, human decides
This tiered approach achieves higher automation rates than binary escalation while maintaining appropriate oversight at each tier.
Real example: Contract review for a professional services firm. Standard contracts from known client types at established rates: auto-processed. Contracts with non-standard clauses flagged by the agent: queued for legal review within 24 hours. Contracts with novel structures the agent can't assess reliably: immediate interrupt to senior legal reviewer.
GDPR and SOC 2 Implications
GDPR Article 22: Automated Decision-Making
Article 22 gives individuals the right not to be subject to decisions "based solely on automated processing" where those decisions produce "legal or similarly significant effects." This applies most directly to:
- Credit decisions
- Employment decisions (automated screening, termination)
- Loan approvals
- Insurance pricing
- Fraud assessments leading to account suspension
For these categories, you need either explicit consent or a human in the loop who "meaningfully reviews" the automated recommendation before a decision is made. The audit checkpoint pattern satisfies this requirement.
What "meaningfully reviews" means in practice: A human who rubber-stamps every automated recommendation is not meaningful review. The human must have access to the agent's reasoning, the data it used, and the ability to override — and must exercise this review with sufficient time and context to make an independent judgment. The agent's interrupt payload should provide all of this.
SOC 2 Type II Controls
For technology businesses pursuing SOC 2 Type II certification, automated financial approval workflows are typically covered under the Change Management or Availability controls. Auditors will want:
- Evidence that material financial decisions have human approval (approval gate pattern)
- An audit log of all automated decisions with the data used (LangSmith tracing or equivalent)
- Evidence that the automation criteria have been reviewed and approved (documented thresholds)
- A process for changing the automation criteria (version-controlled configuration)
LangGraph.js's built-in state persistence and LangSmith integration provide the logging foundation. The approval gate pattern provides the human oversight evidence. Together, they satisfy the typical SOC 2 auditor's requirements for AI-assisted financial automation.
Calibrating Automation Levels by Process Risk
Not all processes warrant the same level of human oversight. The right framework for calibration:
Risk Category Process Examples Recommended Pattern Typical Auto Rate
Very High (material financial, regulatory) Payments >£50k, contract signing, payroll changes Audit Checkpoint — always human pre-approval 0% auto-execute, 100% human-confirmed
High (operational, reversible within 24h) Invoice approval £10k–£50k, supplier onboarding Approval Gate — above threshold auto, below escalate 60–80%
Medium (routine operational) Invoice processing <£10k, CRM updates, order routing Exception Escalation — escalate low-confidence 85–95%
Low (informational, easily reversed) Report generation, email drafting for human send, data enrichment Confidence-Threshold Routing 95–99%
These thresholds should be established collaboratively between operations, finance, compliance, and IT — not set arbitrarily by the development team.
A Real Deployment: AP Automation with 94% Auto-Approval
A Techseria client — UK manufacturing, 800 invoices/month — set the following HITL parameters after a calibration workshop:
Auto-approval criteria (all must be met):
- Invoice amount ≤ £10,000
- Supplier in verified supplier list (approved in ERPNext)
- 3-way match passed (PO quantity ±2%, price ±0.5%)
- No duplicate invoice number detected
- Payment terms match contracted terms
Escalation triggers (any one triggers interrupt):
- Amount > £10,000 (routes to finance manager, 4-hour SLA)
- 3-way match discrepancy (routes to AP team lead with discrepancy detail)
- New supplier not yet verified (routes to procurement for supplier verification)
- Duplicate invoice number detected (routes to AP team with comparison)
- Extraction confidence < 85% on any field (routes to AP team for manual verification)
Results after 6 months:
- Auto-approval rate: 94.2% of invoices
- Average escalation resolution time: 47 minutes (vs 3.2 hours previously, because context is pre-packaged)
- False positive escalations (items escalated that turned out to be auto-approvable): 1.3% — well within acceptable parameters
- Fraudulent invoice caught by duplicate detection: 2 in 6 months (would have been paid manually)
Implementation Checklist
Before deploying any HITL-enabled agent in production:
- [ ] Defined auto-approval criteria documented and signed off by operations and finance
- [ ] Escalation paths defined: who receives which type of escalation, at what SLA
- [ ] Interrupt payload designed to give human reviewers everything they need without requiring them to navigate to source systems
- [ ] Audit logging configured: every auto-approval logged with state snapshot; every human approval logged with user ID, timestamp, and decision context
- [ ] GDPR assessment completed: are any decisions in scope for Article 22?
- [ ] Escalation reminder logic configured: what happens if a required human response doesn't arrive within SLA?
- [ ] Override and correction process defined: when an auto-approved decision is found to be wrong, what is the remediation process and how is it logged?
- [ ] Monthly review cadence: auto-approval thresholds should be reviewed quarterly based on exception patterns
The Control You Actually Need
Full automation is not the goal. Appropriate automation — with human judgment deployed precisely where it adds value and oversight where it's required — is the goal. LangGraph.js's interrupt() mechanism makes this achievable with production-grade reliability and the audit trail that compliance teams and external auditors require.
[Book a Strategy Session with Techseria](/contact) — we'll design the HITL architecture for your specific workflows, including the calibration workshop that sets your automation thresholds with stakeholder buy-in. Fixed-fee delivery, production-ready in 8–14 weeks.
Ready to accelerate your operations?
See how custom AI solutions, ERPNext integration, and workflow automations can lower your operating costs. Book your free 30-minute Workflow Audit with a senior engineer.


