How to Choose an AI Agent Development Company: 12 Questions to Ask
The AI agent development market has expanded faster than expertise has. In 2023, "LangChain developer" was a specialist credential. By 2025, every digital agency and offshore dev shop has added "AI agents" to their homepage. The result: buyers face a market where claims are uniform but capability varies enormously.
The 12 questions below are the ones Techseria's clients have told us they wish they'd asked before choosing a previous vendor. Each question targets a specific dimension of genuine production expertise — and the answer patterns tell you clearly whether you're talking to someone who has shipped working systems or someone who has watched demos.
Question 1: "Show me production code for a human-in-the-loop interrupt."
What you're testing: Whether they actually know LangGraph.js, or whether they're describing LangChain concepts and calling it LangGraph.
A good answer looks like: They walk you through LangGraph.js's `interrupt()` function and explain how state is persisted at the interrupt checkpoint — so when a human approver responds, the workflow resumes exactly where it paused. They show you TypeScript code, not Python (LangGraph.js is the JavaScript/TypeScript implementation). They explain how they expose the interrupt state to a human-facing UI (email, Slack, or web interface).
// What a real implementation looks like const approvalNode = async (state: WorkflowState) => { const decision = interrupt({
A bad answer looks like: They describe human-in-the-loop conceptually without showing code. They show Python LangChain code (not LangGraph.js). They say "we handle that with a webhook" without explaining the state persistence mechanism. They haven't heard the term `interrupt()` and start talking about "callback functions."
Why it matters: Human-in-the-loop is not optional for business-critical workflows. An agent that auto-executes financial transactions without appropriate human checkpoints is a compliance and operational risk.
Question 2: "What is your data architecture approach before you write any agent code?"
What you're testing: Whether they understand that AI agents are only as good as the data they operate on — and whether they do the work to establish clean, consistent data sources before building.
A good answer looks like: They describe a structured discovery phase (typically 2–5 days) in which they audit your existing systems, identify the single source of truth for each data entity, document data quality issues, and map the integration points before any development begins. They understand the ERPNext single-source-of-truth principle: all business entities should live in the ERP, with other systems reading from it rather than maintaining parallel records.
A bad answer looks like: They jump straight to "what do you want the agent to do?" without asking about your data architecture. They propose building the agent first and "dealing with data issues if they come up." They haven't worked with ERPNext before and don't ask about your ERP schema.
Why it matters: The majority of AI agent failures in production are data failures, not model failures. An agent that retrieves inconsistent or incomplete data will produce unreliable outputs regardless of how sophisticated its reasoning is.
Question 3: "What LangGraph.js version are you currently building with, and what changed in the last major release?"
What you're testing: Whether they're actively building with LangGraph.js or just familiar with it conceptually from documentation.
A good answer looks like: They know the current version, can describe a specific API change or new feature in a recent release (e.g., the streaming enhancements, the LangGraph Platform deployment model, changes to the checkpoint mechanism), and have an opinion about which features they use and why.
A bad answer looks like: Vague answers about "staying current with the ecosystem." Inability to name a specific version. Descriptions of the LangChain Expression Language (LCEL) rather than LangGraph.js graph primitives.
Question 4: "Walk me through your error handling and retry architecture."
What you're testing: Production maturity. Demo systems have happy paths. Production systems handle 401s, timeouts, rate limits, and partial failures.
A good answer looks like: They describe their standard approach to: (1) transient errors — exponential backoff with jitter; (2) authentication failures — token refresh or alert escalation; (3) business logic errors — human-in-the-loop escalation with context; (4) partial workflow failures — saga/compensating transaction pattern with state checkpointing. They understand that different error types need different handling and that retrying an authentication error is not the right response.
A bad answer looks like: "We use try-catch." Generic descriptions of error handling without specifics on how they handle each error category differently. No mention of compensating transactions for distributed write failures.
Question 5: "Who owns the IP, and how is it structured in the contract?"
What you're testing: A basic but frequently overlooked commercial question. Some development firms retain IP in agent code; others include license-back clauses that constrain your use.
A good answer looks like: Clear statement that you own all code, prompts, configurations, and integration patterns developed for your project. Transfer of IP included in the master services agreement, not an optional add-on.
A bad answer looks like: Hedging about what constitutes "their methodology" versus "your code." Statements that the underlying framework components remain theirs (reasonable) but that the agent logic and prompt engineering are also theirs (not acceptable). Request for an ongoing licence fee for you to run your own system.
Question 6: "What's your post-go-live support model and what does it cost?"
What you're testing: Whether they've thought beyond the build phase. AI agents require maintenance: LLM API updates, model version changes, prompt drift, integration endpoint changes, and ongoing calibration.
A good answer looks like: A defined support tier with specific SLAs (e.g., P1 production down = 4-hour response; P2 degraded performance = 8-hour response). A clear statement of what triggers a maintenance release vs a change request. Annual retainer options with named capacity for enhancements. Documentation of what monitoring and alerting they set up so you can see issues before they become incidents.
A bad answer looks like: "We'll be available if you need us." No defined SLA. No proactive monitoring. Treating post-launch as purely reactive rather than an ongoing relationship with defined parameters.
Question 7: "Have you integrated agents with ERPNext? What was the hardest part?"
What you're testing: Specific ERPNext technical experience. Any competent developer can claim ERPNext experience; the follow-up question surfaces whether it's real.
A good answer looks like: They describe specific ERPNext API patterns — frappe REST conventions, doctype filtering syntax, the Document Event webhook system, the key:secret authentication mechanism. They mention a specific challenge: schema complexity in customised instances, handling the linked document reference pattern in frappe, or managing the difference between draft and submitted document states in their tool nodes.
A bad answer looks like: "Yes, we've worked with ERP systems" — without ERPNext-specific detail. Describing ERPNext as if it's a standard database (it's a document-oriented framework with its own conventions that differ significantly from CRUD APIs). Not knowing what a "doctype" is.
Question 8: "How do you handle GDPR compliance in the agent architecture?"
What you're testing: Whether they've thought about data protection at the architectural level, not as an afterthought.
A good answer looks like: They describe data minimisation at the tool node level (agents retrieve only the fields they need, not full records). They explain how PII is handled in LLM prompts (whether personal data is sent to the model, and if so, under what legal basis and with what contractual protections). They know whether your Azure OpenAI deployment processes data in-region and can confirm it under your DPA. They have an approach for audit logging that captures what the agent processed without creating additional GDPR risk.
A bad answer looks like: "Azure is GDPR-compliant so you're covered." Inability to describe specific data flows. No discussion of which legal basis applies to automated processing decisions.
Question 9: "What does your pricing model look like, and can you deliver on a fixed fee?"
What you're testing: Commercial transparency and willingness to take delivery risk.
A good answer looks like: A phased fixed-fee structure with defined deliverables at each phase. A discovery phase at low cost before the main build commitment is made. Clear definition of what's in scope and what constitutes a change request. A stated process for scope changes (agreed written change request, quoted separately).
A bad answer looks like: Time-and-materials only for the full project — this transfers all delivery risk to you. Inability to scope without weeks of paid discovery at vague rates. No defined deliverables at each phase.
Question 10: "Show me a deployment architecture diagram for a recent project."
What you're testing: Whether they design complete systems, not just agent code.
A good answer looks like: A diagram showing: where the LangGraph.js agent runs (container, serverless, dedicated VM), how it connects to external APIs (direct, via API gateway, via message queue), where state is persisted (PostgreSQL, Redis, Azure Cosmos DB), what the monitoring stack looks like, and how the human-in-the-loop interface integrates. Bonus: they explain why they made each architectural choice.
A bad answer looks like: "We can deploy it wherever you need." No concrete architecture. Inability to explain the trade-offs between different deployment approaches. No mention of state persistence.
Question 11: "What percentage of your engagements result in production-live agents, and what's the typical timeline from project start to go-live?"
What you're testing: Delivery track record. Many AI agent projects are prototypes that never reach production.
A good answer looks like: A high percentage (>80%) of completed engagements reaching production. Specific timelines with context: "A single-integration agent typically goes live in 8–10 weeks. Multi-system orchestration takes 12–20 weeks. Projects that stall usually do so at the data quality phase, not the development phase." They're honest about where projects get delayed and have learned from it.
A bad answer looks like: Inability to give a number. "It depends" without any reference range. Defensive responses suggesting the question is unreasonable.
Question 12: "What are the three most common reasons AI agent projects fail, and how do you prevent each?"
What you're testing: Hard-won production experience. Anyone who has shipped multiple agents will have a clear answer to this, specific to the problems they've actually seen.
What the best answers cover:
- Data quality failure — agents built before establishing clean, consistent data sources produce unreliable outputs from day one. Prevention: mandatory data architecture phase before development.
- Scope creep into unbounded AI — projects that start with clear process automation expand into "make the agent smarter" requests that have no defined success criteria. Prevention: fixed-scope phases with defined acceptance criteria.
- No human-in-the-loop design — agents deployed with no escalation path for exceptions eventually hit edge cases they can't handle, make wrong decisions autonomously, and lose user trust. Prevention: designing the human escalation layer before the automation layer.
A bad answer looks like: Blaming clients. Describing generic software project failure reasons with no AI-specific content. Inability to give a concrete example from their own experience.
The Red Flags List
Beyond specific question answers, watch for these patterns:
- No production references. Legitimate firms can connect you with clients who will confirm go-live deployments. Be sceptical of case studies with no named clients.
- Proposes starting with a "proof of concept" that isn't scoped to production. PoCs that don't address the production architecture (authentication, error handling, observability) add cost without reducing delivery risk.
- Can't explain their stack. LangGraph.js, TypeScript, Azure/AWS deployment, state persistence layer, observability tooling — they should be able to describe their standard stack and why they chose it.
- Oversells the AI. Promises of "fully autonomous" agents without discussing human oversight, confidence thresholds, or exception handling are red flags.
- No fixed-fee option. Time-and-materials projects without defined deliverables transfer all cost risk to the client.
What Genuine Expertise Looks Like
Techseria's AI agent practice is built on LangGraph.js, TypeScript, and Azure — the same stack we recommend to clients. We've delivered production agents for AP automation, customer onboarding, supply chain management, and sales pipeline intelligence. Every engagement includes a data architecture phase before build, a fixed-fee structure with defined deliverables, and a post-go-live support model with defined SLAs.
We're happy to walk through any of these 12 questions in a strategy session — and we'll show you the code, not just describe it.
[Book a Strategy Session with Techseria](/contact) — bring your automation requirements and we'll demonstrate our approach, answer every question on this list, and scope a delivery plan within 5 business days.
Ready to accelerate your operations?
See how custom AI solutions, ERPNext integration, and workflow automations can lower your operating costs. Book your free 30-minute Workflow Audit with a senior engineer.


