Predictive Maintenance with AI Agents: Stop Fixing Machines After They Break
Every maintenance manager understands the failure spectrum intuitively. Reactive maintenance is expensive, disruptive, and embarrassing — the machine breaks, production stops, everyone scrambles. Preventive maintenance is better: you schedule work before things break. But preventive maintenance has its own problem, one that rarely gets discussed openly.
It is wasteful. And it still fails.
The Problem with Preventive Maintenance
Time-based preventive maintenance — change the oil every 1,000 hours, replace the bearing every 90 days, inspect the drive belt every 500 cycles — is an improvement over reactive maintenance. But it is built on an assumption that is almost never true: that all machines of the same type fail at the same rate, regardless of load, environment, quality of lubrication, operator behaviour, or production intensity.
Research from the maintenance engineering field consistently shows that approximately 30% of preventive maintenance work is performed unnecessarily early — on components that had significant useful life remaining. That work is pure cost: technician time, replacement parts, and the production downtime of a planned maintenance window.
Simultaneously, PM schedules miss a category of failures entirely: condition-based failures that are not time-dependent. A bearing that is running hot and vibrating outside normal parameters will fail whether or not the scheduled PM date has arrived. If the calendar says the next service is in six weeks, the bearing will not wait.
The result: manufacturers performing diligent PM programs still suffer unplanned downtime from condition-based failures. They also carry inflated maintenance costs from unnecessary scheduled work.
Predictive maintenance replaces both problems with a single approach: monitor the actual condition of the machine, intervene when the condition indicates a failure is approaching, and do nothing when the condition is healthy.
The Full Maintenance Spectrum
Understanding where your facility sits on the maintenance maturity curve helps identify the right investment level:
Reactive (Run-to-Failure) — no planned maintenance. Machines run until they break. Appropriate only for non-critical equipment with very low replacement cost and no production impact. Estimated 30–40% of manufacturers are primarily reactive.
Preventive (Time-Based) — scheduled maintenance regardless of actual condition. Standard industry practice for critical equipment. Reduces catastrophic failure risk but generates unnecessary work and misses condition-based failures. Estimated 50–60% of manufacturers are primarily preventive.
Predictive (Condition-Based) — maintenance performed when sensor data indicates a fault is developing. Eliminates unnecessary PM work and catches condition-based failures before they cause downtime. Fewer than 15% of mid-market manufacturers have mature predictive maintenance programs.
Prescriptive — AI not only predicts the failure but recommends the specific intervention, optimal timing, and spare parts needed. The leading edge of the capability, achievable once predictive programs are mature.
AI predictive maintenance moves facilities from preventive to predictive. The prescriptive layer can be added in a later phase once sufficient failure event data is accumulated.
How the AI Predictive Maintenance Agent Works
The architecture has four stages: sensor data collection, anomaly detection, maintenance agent decision, and ERPNext work order creation.
Stage 1: Sensor Data Collection
Three sensor types provide the signal for the majority of mechanical failure modes:
Vibration sensors — accelerometers mounted on bearings, gearboxes, and rotating shafts. Vibration signature changes precede bearing failure by days to weeks. Frequency-domain analysis (FFT) identifies specific fault types: inner race, outer race, ball, and cage defects each produce characteristic frequency signatures.
Temperature sensors — thermocouples or IR sensors on motors, bearings, and electrical cabinets. Temperature rise above baseline indicates friction increase (bearing degradation), electrical resistance increase (motor insulation breakdown), or cooling system failure.
Current draw sensors — clamp-on current transformers on motor power leads. Current signature analysis detects rotor bar breaks, load imbalance, and mechanical overload conditions that are invisible to vibration and temperature monitoring alone.
These sensors connect to Azure IoT Hub, which handles ingestion, timestamping, and routing to the analysis layer. Azure IoT Hub supports millions of messages per day and manages device connectivity, security, and protocol translation.
Stage 2: Anomaly Detection
The agent uses two complementary anomaly detection approaches:
Azure Anomaly Detector (for well-behaved time series) — Microsoft's managed service applies statistical models to time-series data and returns anomaly scores. Effective for detecting sudden level shifts and trend changes in temperature and current data.
LSTM (Long Short-Term Memory) neural network (for complex multivariate patterns) — for equipment with multiple interacting parameters, a custom LSTM model trained on historical sensor data learns the normal operational pattern of the machine and flags deviations. This approach catches the subtle, multi-parameter patterns that precede bearing failure more accurately than univariate statistical methods.
The choice between Azure Anomaly Detector and LSTM is made per machine type based on data volume and failure mode complexity. Simple equipment with clear failure signatures uses the Azure managed service; complex, high-value equipment with multi-axis sensor arrays uses the custom LSTM approach.
Stage 3: The LangGraph.js Maintenance Agent
The maintenance agent receives anomaly scores from the detection layer and makes decisions through a state machine:
Anomaly Assessment Node — evaluates incoming anomaly scores against thresholds calibrated for each piece of equipment. Considers historical false positive rate for this equipment and sensor combination.
Failure Mode Classification Node — if an anomaly is confirmed, classifies the probable failure mode based on the sensor pattern. A vibration anomaly at bearing defect frequency with rising temperature suggests imminent bearing failure. A current spike with normal vibration and temperature suggests an electrical load issue.
Urgency Scoring Node — calculates a time-to-failure estimate based on the rate of change of the anomaly signature. Assigns urgency: immediate (within 24 hours), short-term (within 7 days), planned (within 30 days).
ERPNext Action Node — creates the appropriate ERPNext record based on urgency.
Stage 4: ERPNext Work Order Creation
The agent writes to ERPNext's maintenance module via API:
- Maintenance Visit (`/api/resource/Maintenance Visit`) for immediate and short-term urgency — creates an unscheduled maintenance task with the affected asset, failure mode description, recommended action, and triggering sensor data summary.
- Maintenance Schedule (`/api/resource/Maintenance Schedule`) update for planned urgency — reschedules the next planned maintenance event to the predicted optimal intervention window.
- Asset (`/api/resource/Asset`) — logs the anomaly event against the asset record for trend history.
Maintenance managers see the exception in ERPNext's maintenance queue with full context: which machine, which sensor, what the anomaly looks like, what failure mode is probable, and how urgent the intervention is. The manager makes the call on when and how to respond. The agent identifies and communicates; the human decides and dispatches.
Real Metrics: What Predictive Maintenance Delivers
Techseria client outcomes across predictive maintenance implementations:
28% reduction in unplanned downtime events — measured as the number of unexpected production stoppages caused by equipment failure, comparing 12 months before and after implementation. The reduction is not uniform: high-vibration equipment (motors, pumps, compressors) shows 40–50% improvement; hydraulic and electrical equipment shows 15–25%.
18% reduction in maintenance cost — achieved by eliminating unnecessary PM work on components that sensor data shows are healthy. The savings offset sensor and software costs within the first year for most implementations.
63% of failures detected with at least 72 hours advance warning — allowing maintenance to be planned, parts to be ordered, and work to be scheduled for a low-impact window. The remaining 37% are detected with less than 72 hours warning — still actionable but requiring expedited response.
Sensor Infrastructure: The Physical Prerequisite
The most common question we receive: do we need to retrofit sensors on every machine?
No. The answer depends on criticality and current sensor coverage:
Start with critical equipment — identify the 10–20% of machines whose failure would immediately stop production, cause quality escapes, or violate safety requirements. These are the candidates for predictive maintenance investment.
Audit existing sensor coverage — many machines have temperature sensors feeding SCADA systems that are never analysed. OPC-UA-capable CNC machines log vibration and load data that sits unused. The sensor audit (weeks 1–2 of implementation) identifies what is already available.
Retrofit cost — where sensors must be added: industrial vibration sensors run £80–£300 per installation point; temperature sensors £50–£150; current transducers £100–£250. A typical critical machine requires 3–6 sensor points. Total retrofit cost for 10 machines: £5k–£20k depending on machine type and accessibility.
The sensor hardware cost is separate from the software development cost and is typically procured directly by the client or through Techseria's hardware partner network.
Implementation Timeline: 14–18 Weeks
The timeline is longer than pure software implementations because sensor installation requires physical access to equipment, which must be coordinated with production schedules and sometimes requires planned maintenance windows.
Weeks 1–2: Asset criticality mapping and sensor audit. Identify target equipment, assess existing sensor coverage, plan retrofit requirements, establish baseline failure rate and unplanned downtime metrics from maintenance logs.
Weeks 3–5: Sensor installation and Azure IoT Hub deployment. Physical sensor installation on target equipment (scheduled during planned downtime windows), Azure IoT Hub configuration, data pipeline from sensors to cloud.
Weeks 6–8: Data collection and model training. 4–6 weeks of live sensor data captured. Normal operational profiles established. LSTM models trained where applicable. Azure Anomaly Detector baselines established.
Weeks 9–11: Agent development and ERPNext integration. LangGraph.js maintenance agent built, Maintenance Visit and Schedule API integration tested, alert routing and notification configuration.
Weeks 12–13: Parallel validation. Agent runs live against incoming sensor data. All generated alerts reviewed by maintenance manager and validated: are they real anomalies or false positives? Thresholds tuned.
Weeks 14–18: Live deployment and handover. Agent operates autonomously. Maintenance team trained on exception queue management. 30-day post-go-live review at end of week 18 to assess detection accuracy and tune models.
Investment: £25,000–£55,000 Fixed Fee (Software Only)
- £25k–£35k: 5–10 monitored machines, Azure Anomaly Detector-based detection (no custom LSTM), single site.
- £35k–£55k: 10–25 machines, custom LSTM models for complex equipment, multi-site deployment, advanced prescriptive recommendation layer.
Sensor hardware: £5k–£20k additional, sourced separately. Azure IoT Hub and compute: £200–£800/month depending on message volume and model inference frequency. Annual software maintenance: 15–18% of build cost.
For a facility with 8 production lines where a single line stoppage costs £4k–£8k per hour, preventing even 3–4 unplanned stops per year delivers ROI that covers the entire implementation cost in months.
Find out which of your machines is most at risk right now. Techseria's maintenance readiness assessment analyses your current downtime logs, identifies your highest-risk equipment, and estimates the detection value of predictive monitoring — before you commit to an implementation. Fixed-fee assessment, completed in 5 business days.
[Book a Strategy Session] | [Get a Fixed-Fee Quote]
Ready to accelerate your operations?
See how custom AI solutions, ERPNext integration, and workflow automations can lower your operating costs. Book your free 30-minute Workflow Audit with a senior engineer.


