How to Build a Data Pipeline Between Your CRM, ERP and Finance System
If your sales team closes a deal in Salesforce, your ops team should not have to re-enter it in ERPNext. If your finance system is pulling numbers that contradict what your ERP shows, you have a data pipeline problem — and it is costing you more than you think.
The average mid-market business wastes 4.5 hours per person per week on manual data entry and reconciliation across disconnected systems, according to Gartner. At 50 people, that is roughly 11,000 hours per year. At an average blended cost of £35/hour, you are losing £385,000 annually to a solvable infrastructure problem.
This guide gives you the architecture decisions you need to make before a single line of code is written — and the specific technology choices that make a CRM-ERP-Finance integration maintainable rather than a perpetual support burden.
The First Decision: ETL vs ELT
ETL (Extract, Transform, Load) transforms your data before it lands in the destination. You clean, deduplicate, and reshape it in transit. This made sense when storage was expensive and destination systems were rigid.
ELT (Extract, Load, Transform) loads raw data first, then transforms it inside the destination warehouse using SQL or a tool like dbt. This is the right pattern for most mid-market integrations today because:
- Storage in Azure Synapse or Azure SQL is cheap — raw data retention costs pence per GB
- You can re-run transformations when business logic changes without re-extracting source data
- dbt creates auditable, version-controlled transformation logic
- Debugging is far easier when you can inspect the raw layer
The exception: if you are dealing with PII that must not land in a staging environment unmasked, you may need ETL with in-flight masking. But for most CRM-ERP-Finance pipelines, ELT is the right choice.
The Second Decision: Batch vs Streaming
Batch processing moves data on a schedule — hourly, nightly, or triggered. Azure Data Factory handles this well with pipeline runs and tumbling window triggers.
Streaming moves data in near real-time via event streams — Kafka, Azure Event Hubs, or webhooks. This is necessary when downstream systems need sub-minute latency: live inventory availability on an e-commerce platform, real-time credit limit enforcement during order entry.
For most mid-market CRM-ERP-Finance integrations, batch is the right answer. Your CFO's weekly P&L does not need real-time data. Your sales ops team's account sync can tolerate a 15-minute delay. Streaming adds significant operational complexity — partitioning, offset management, consumer group configuration — that is not justified unless you have a clear latency requirement.
Rule of thumb: if the business process that consumes the data runs daily or weekly, use batch. If the process needs to react within minutes, use streaming.
The Third Decision: Direct DB vs API vs Webhook
Direct database access is the fastest option for reading large volumes of data. If you have read access to the source system's database, you can bulk-extract efficiently. The problem: most SaaS platforms (Salesforce, HubSpot) do not expose their database, and on-premise ERPs that do expose their DB often change schema between versions without warning.
API-based extraction is the standard pattern for SaaS-to-SaaS integration. Salesforce's REST API, HubSpot's CRM API, and ERPNext's Frappe REST API are all well-documented and version-stable. Azure Data Factory has native connectors for Salesforce and REST endpoints that handle pagination, rate limiting, and incremental extraction via watermark columns.
Webhook-based ingestion is event-driven — the source system pushes to you when something changes. HubSpot and Salesforce both support webhooks. This reduces polling overhead but requires you to maintain a reliable endpoint and handle retries when your receiver is down. For event-critical flows (deal closed → create work order in ERPNext), webhooks are worth the complexity.
The Recommended Stack: Azure Data Factory + Azure Synapse + dbt
For mid-market businesses already in or moving toward the Microsoft Azure ecosystem, this stack delivers the best balance of capability, cost, and maintainability:
Azure Data Factory handles orchestration and extraction. It connects natively to Salesforce, REST APIs, Azure SQL, and file-based sources. Its visual pipeline designer reduces the code surface area. Scheduling, monitoring, alerting, and retry logic are built in. At scale, ADF pipelines for a mid-market integration cost £200–£800/month in Azure compute and activity costs.
Azure Synapse Analytics (or Azure SQL Database for smaller workloads) is the warehouse layer. Raw data lands here in schema-on-write tables. For organisations with under 500GB of analytical data and fewer than 20 concurrent users, Azure SQL is often sufficient at a fraction of Synapse's cost.
dbt (data build tool) runs transformations inside the warehouse. Your business logic — "a closed-won Salesforce opportunity becomes a confirmed sales order in ERPNext at over £5,000 value" — lives in dbt models as SQL, committed to Git, tested with schema assertions, and documented automatically. When the business rule changes, you change one SQL file, not a maze of stored procedures.
Power BI sits on top for reporting. With DirectQuery against Synapse or import mode against Azure SQL, your dashboards always reflect the current transformation layer.
Two Proven Integration Patterns
Pattern 1: Salesforce to ERPNext to Power BI
This is the most common pattern for businesses using Salesforce as their CRM and ERPNext as their ERP.
Step 1: ADF pipeline extracts Salesforce Opportunities (stage = Closed Won) via REST API on a 15-minute schedule. Raw data lands in `raw.sf_opportunities` in Azure Synapse.
Step 2: dbt model `stg_sf_opportunities` cleans and standardises the data — normalising customer names, mapping Salesforce Account IDs to ERPNext Customer codes, converting currency.
Step 3: A second ADF pipeline pushes validated records to ERPNext via the Frappe REST API, creating Sales Orders. A lookup table maps Salesforce product codes to ERPNext item codes.
Step 4: ERPNext's Sales Order, Delivery Note, and Sales Invoice data is extracted nightly into `raw.erpnext_transactions`.
Step 5: dbt models join Salesforce pipeline data with ERPNext fulfilment data, creating a unified `fct_order_to_cash` fact table in Synapse.
Step 6: Power BI report surfaces order-to-cash cycle time, fulfilment rate by product line, and revenue recognised vs booked — all from one source of truth.
Pattern 2: HubSpot to ERPNext to Azure SQL
For businesses on HubSpot CRM with ERPNext for operations, the pattern is similar but uses HubSpot's webhook-first architecture.
HubSpot fires a webhook on deal stage change. An Azure Function receives it, validates the payload, and writes to a staging table in Azure SQL. A dbt job transforms the record and pushes it to ERPNext via REST. Finance reporting runs against Azure SQL views combining ERPNext invoice data with HubSpot deal attribution.
The Three Data Quality Problems You Will Hit
1. Duplicate Records
When Salesforce has "Acme Corp" and ERPNext has "ACME Corporation Ltd", your pipeline creates two separate customer records in your warehouse. Over 12 months, you accumulate hundreds of duplicates that make revenue reconciliation impossible.
Solution: implement a master data management (MDM) layer as a dbt model before any downstream writes. Use fuzzy matching on company name plus postcode or company number (from Companies House API for UK businesses). Flag unmatched records for human review rather than auto-creating. Techseria's implementations typically catch 15–25% of incoming records as potential duplicates in the first 30 days.
2. Schema Drift
Salesforce admins add a custom field. HubSpot releases a new API version. ERPNext is upgraded and a column is renamed. Your pipeline breaks silently — or worse, loads nulls without alerting anyone.
Solution: dbt schema tests catch this at transformation time. ADF pipeline alerts fire on extraction failures. A schema registry table logs the expected shape of each source and raises incidents when it changes. This is not optional — schema drift is the single most common cause of silent data quality failures in production pipelines.
3. Null Propagation
A required field in your finance system comes through as null from the CRM. The pipeline accepts it, the transformation layer passes it through, and your finance report silently excludes 8% of revenue because the region field is blank.
Solution: enforce not-null constraints as dbt tests on every staging model. Fail the pipeline loudly rather than load incomplete data quietly. Build a data quality dashboard in Power BI that tracks null rates per field per source over time — this becomes your early warning system.
Build Cost and Timeline
Cost range: £20,000–£60,000 depending on:
- Number of source systems (each additional system adds £4,000–£8,000 in extraction logic)
- Complexity of transformation rules (flat mapping vs complex business logic)
- Data volume and Azure infrastructure sizing
- Whether MDM/deduplication logic is required
Timeline: 6–12 weeks
- Weeks 1–2: Architecture design, source system API access, environment setup
- Weeks 3–5: Extraction layer build (ADF pipelines per source)
- Weeks 6–8: Transformation layer build (dbt models, data quality tests)
- Weeks 9–10: Power BI dashboards and stakeholder UAT
- Weeks 11–12: Production deployment, monitoring setup, documentation
Ongoing maintenance cost if built right: £800–£1,500/month for Azure infrastructure, dbt Cloud, and minor pipeline updates. Schema changes and new source system additions handled on a retainer or time-and-materials basis.
Ongoing maintenance cost if built wrong (point-to-point connectors, no transformation layer, no monitoring): 2–4 days of developer time per month firefighting broken syncs, plus data reconciliation time for finance staff. Typical hidden cost: £3,000–£6,000/month.
What Built Right Looks Like After 12 Months
One Techseria client — a UK-based manufacturer with 120 employees — came to us after 18 months of a HubSpot-ERPNext integration built by their internal team using Zapier and custom Python scripts. They had 847 duplicate customer records, a 12% null rate on the revenue region field, and their finance team spent 3 days every month reconciling numbers before the board pack could be produced.
After rebuilding on the ADF + dbt stack, they went from a 3-day month-end reconciliation to a 4-hour process. Duplicate records dropped to zero new duplicates per month. The finance team's time was redirected to analysis rather than data hygiene. ROI was realised within 4 months.
The Decision You Are Really Making
When you build a data pipeline, you are not just solving today's integration problem. You are making a 3–5 year architecture commitment. Every source system you add, every report your leadership team requests, every M&A integration you face — all of it will run through this layer.
Built right, it becomes a competitive asset: faster decisions, cleaner data, lower operational overhead. Built wrong, it becomes technical debt that compounds monthly.
Techseria delivers fixed-fee data pipeline projects on the Azure stack, with a defined scope, a defined timeline, and a defined cost. No open-ended engagements. No surprises.
Ready to get a fixed-fee quote for your CRM-ERP-Finance integration? Book a 45-minute architecture session with our data engineering team. We will assess your current systems, identify the three highest-impact integration points, and give you a written scope and cost estimate within 5 business days.
[Book a Strategy Session →]
Ready to accelerate your operations?
See how custom AI solutions, ERPNext integration, and workflow automations can lower your operating costs. Book your free 30-minute Workflow Audit with a senior engineer.


