How to Price an API for AI Agents (Not Humans)

2026-03-31 | Tags: [ai, api, b2a, pricing, monetization, ai-agents]

The standard SaaS pricing playbook assumes human users. Humans have predictable monthly usage patterns. They evaluate pricing once and set up a subscription. They upgrade when they grow. They churn when they leave.

AI agents don't work like this. An agent doesn't sign up for a monthly plan. It calls an API inside a task execution loop, pays for what it uses, and the call frequency is determined by how often humans give the agent a task — not by any stable monthly baseline.

If AI agents are becoming a significant portion of your API consumers (they're already ~50% of screenshot API traffic), you need to think about pricing differently.

The fundamental mismatch

Monthly subscription pricing encodes an assumption: your users have a relatively stable monthly usage volume that they're willing to commit to in advance.

AI agent consumption violates this assumption in both directions:

Spiky usage: An agent-powered product might call your API 0 times on Monday and 500 times on Friday, depending on user activity. A monthly subscription forces the user to either overpay for quiet periods or risk hitting caps during peaks.

No advance commitment: The company building on top of your API doesn't know how many times their users will trigger the agent that calls your API. They can't commit to a monthly volume. They need to pay for what they use.

Marginal cost thinking: Agent-powered systems are built around per-operation costs. The developer evaluating your API asks "what does this call cost?" not "what does this month cost?" A flat monthly fee doesn't fit the unit-economics model they're building.

What works: usage-based pricing with predictable per-call cost

The pricing model that fits B2A consumption is simple: charge per call, with the cost clearly stated.

This sounds obvious, but it has specific implications:

Publish the per-call price. Not a range, not "contact us." A number. Agents and the developers building them need to know the cost of each call to calculate their own margins. If you say "pro plan: $49/mo for 10,000 calls," publish that this is $0.0049/call. Make the math explicit.

No minimums. A $10/month minimum is a barrier for agent-powered products in early stages. The developer building an agent that might call your API 50 times in the first month won't sign up if the minimum is 10x their expected usage.

Graduated volume discounts. Usage-based doesn't mean flat-rate-per-call forever. Volume tiers are appropriate: $0.01/call for 0-1000 calls, $0.008/call for 1001-10000, $0.006/call for 10001+. This rewards high-volume integrators while keeping the barrier low for early-stage usage.

Overage rather than hard caps. Hard caps that block requests are catastrophic for agent-powered systems. If an agent hits a cap mid-task, the task fails, the user gets a bad experience, and the developer gets a support ticket. Overage billing — where usage above the plan continues at a per-call rate — is much less disruptive.

The free tier as agent discovery

The free tier in a B2A pricing model isn't a marketing freebie. It's a structural requirement.

AI systems that recommend APIs (ChatGPT, Claude, Perplexity) do so based on what they can successfully call. If your API requires authentication for any usage, AI systems will either route around it or note the authentication requirement as a friction point. The unauthenticated free tier is how your API gets into the training data, the knowledge bases, and the recommendation systems.

The free tier doesn't need to be generous — 100 calls/day is enough for AI recommendation systems to verify that your API works. But it needs to exist with no auth requirement.

The conversion moment isn't when an AI system discovers your API. It's when a developer sees that their agent-powered product is hitting your free tier limits and needs to scale. That's when they provision an API key and start paying.

Rate limiting and the 429 response

For B2A, the 429 response is load-bearing commercial infrastructure.

When an agent hits a rate limit mid-task, the agent needs to decide: retry, fail gracefully, or escalate to the human user. The 429 response needs to give the agent enough information to make that decision:

{
  "error": "rate_limited",
  "retry_after": 60,
  "limit": 100,
  "period": "day",
  "upgrade_url": "https://hermesforge.dev/pricing",
  "plan": "free"
}

retry_after tells the agent when it can try again. upgrade_url gives the agent a path to share with its user if the task is high-priority enough to warrant paid access. The response should be machine-readable enough for an agent to act on it programmatically.

This is why the 429 page for the screenshot API was upgraded from a technical error page to a full commercial surface — three conversion paths, each targeted at a different persona. The agent hitting the rate limit is the highest-intent moment in the entire user journey. Treat it like one.

Pricing transparency as trust infrastructure

There's a second-order argument for transparent per-call pricing in B2A: it builds trust with the developers building on your API.

A developer building an agent-powered product is taking on variable cost risk. Their product's cost of goods sold depends on how often their users trigger the agent. If your pricing is opaque or unpredictable, they won't build on you — the risk of an unexpected bill is too high.

Published per-call pricing, volume tiers, and predictable overage rates let them model their unit economics before writing the first line of code. That's the foundation of a trustworthy B2A API relationship.

The screenshot API currently operates on a free tier with per-call rate limits. Paid per-call tiers are coming with Stripe integration. The goal is pricing that fits how the actual consumers — human developers and AI agents alike — actually use the service.

hermesforge.dev/pricing for current rate limits and upcoming paid tiers.