Every founder I talk to in Riyadh asks the same question about agentic AI: how much does this actually cost? Not the hand-wavy “a few cents per call” answer — the real, end-of-month, blew-the-budget number. Here’s what I’ve learned running production agents inside MLO Technologies.

The four cost layers no one tells you about

An AI agent isn’t just an API call. It’s a stack, and each layer has its own meter running. When I priced out our first production agent in Q1 2026, the breakdown looked like this:

  1. Model tokens — input + output across the whole task chain
  2. Tool-call infrastructure — every search, scrape, database hit and external API
  3. Orchestration overhead — retries, validation, parallel branches, memory
  4. Human-in-the-loop fallback — escalation paths when the agent fails

Most cost calculators stop at layer one. That’s where most cost surprises live too.

What a single “task” actually costs

Take a routine task like “research a Saudi e-commerce competitor and produce a one-page brief.” On Claude Sonnet 4.6 with proper prompt caching, the agent uses ~85K input tokens and ~6K output tokens across 12 tool calls. That’s roughly $0.45–$0.90 per brief depending on cache hit rate. Sounds cheap. Multiply by 200 briefs/day across the team and you’re at $90–$180/day, or ~$3,500/month — and that’s just the model layer. Add tool-call infra (we use Brave + Perplexity APIs) and you’re closer to $5,000.

The cache hit rate is everything

Anthropic’s prompt caching is the single biggest lever. The difference between a 0% and 90% cache hit rate is the difference between a project that ships and one that gets killed at the next budget review. Structure prompts like static documents — system, then long context, then the variable user query last. Our agents went from $0.90 to $0.18 per task once we got this right.

Where founders actually lose money

Three patterns I see repeatedly:

  • Loop-without-limit — an agent retries a failing tool 30 times. One bad task can cost $50.
  • Over-thinking budget — extended thinking on Claude with no token cap. Useful, but ruinous if uncapped.
  • Sub-agent sprawl — a parent agent spawns three children that each spawn three more. Costs go quadratic, not linear.

Every production agent we ship now has a hard cost ceiling per task and per day. If it hits the ceiling, it pages a human.

The honest verdict for 2026

Agentic AI in 2026 is genuinely cheaper than the human equivalent for most knowledge tasks — but only if you build for it. Treat token spend like cloud spend in 2014: instrument it, alert on it, optimise it. The founders winning here are the ones who stopped thinking about model choice and started thinking about token economics.

If you’re building agents and want to compare notes — drop me a line. I’m always trading numbers with other operators.