Agents that work in production need architecture, not prompts.
Most AI pilots fail not because the model can't do the task, but because the system around the model isn't built for production work. Here's how we build ours.
The difference between a prompt and an agent.
Our architecture principles.
We use classical software for the 90% of logic that's rule-based and reserve the LLM for the judgement calls. That keeps cost, latency, and unpredictability low.
Every agent decision has a confidence score. Above the threshold, it acts. Below it, a human sees it in a workbench with the evidence attached.
When a human resolves an exception, the agent learns the pattern. Next time, the threshold is lower or the rule is internalised. ROI compounds over time.
Every step an agent takes — every input, every reasoning step, every output — is logged. Regulated operators (healthcare, finance, aviation) need this to deploy at all.
We can run the same agentic system against cloud Claude, local Ollama with Qwen 3 72B or Llama 3.3 70B, or a hybrid. The architecture doesn't change — only the inference endpoint.