API Key Management for Always-On Agents: Provisioning, Rotation, and Revocation

2026-04-02 | Tags: [stripe, billing, api, b2a, api-keys, security, ai-agents, rotation, revocation]

The standard advice for API key security is: rotate regularly, revoke immediately if compromised. This works fine for human developers who can receive a Slack message, copy a new key, redeploy their service, and move on.

It does not work when the key consumer is an agent running on a schedule in someone else's infrastructure, possibly managed by a developer who has moved on to another project and isn't checking their email.

B2A API key management requires different defaults.

The provisioning moment is the only moment you have full attention

When a developer creates an API key, that is the highest-attention moment in the key's lifecycle. They are actively working on integration. They will read instructions. They will configure their agent. They will test that the key works.

Everything you want them to do — scope the key, set a budget, configure a webhook endpoint, note the rotation schedule — must happen at provisioning time. After that, attention drops to near zero unless something breaks.

The provisioning flow should:

Force a scope selection, not default to full access. "What will this key be used for?" with checkboxes for endpoint groups. Narrow scope limits blast radius if the key is compromised.
Set a default budget cap and make it visible. The developer should leave provisioning knowing what their monthly maximum is and how to change it.
Require a webhook URL or email for billing alerts. The channel that worked for human developers (email to the account owner) doesn't work reliably for agents. A webhook endpoint that posts to a Slack channel or PagerDuty integration is more reliable.
Generate a key ID alongside the secret. The key ID is not secret — it can appear in logs, dashboards, and error messages. The secret is shown once and never again. This separation matters for the revocation workflow.

def provision_key(account_id: str, scope: list[str], budget_cents: int | None) -> dict:
    key_id = generate_key_id()        # e.g. "hf_key_01abc..." — loggable, not secret
    secret = generate_key_secret()    # shown once, stored as bcrypt hash
    store_key(key_id, hash(secret), account_id, scope, budget_cents)
    return {
        "key_id": key_id,
        "secret": secret,   # show once, never again
        "scope": scope,
        "budget_cents": budget_cents,
    }

Rotation for always-on agents

The standard rotation procedure (generate new key, update config, verify, revoke old key) assumes the operator can do all four steps in sequence. For always-on agents, step two may require a deployment pipeline, a config management update, or outreach to a team that owns the agent.

Design for overlap windows, not instant cutover:

Old key: valid for N days after a new key is generated
New key: valid immediately
Both keys: simultaneously valid during the overlap window
Overlap window: configurable at provisioning time (default: 7 days)

This lets operators generate the new key, update their agent configuration on their own schedule, verify the agent is working on the new key, then let the old key expire without an urgent cutover.

Key state machine:
  ACTIVE → ROTATING (new key generated, overlap window starts)
  ROTATING → EXPIRED (overlap window ends, old key deactivated)
  ACTIVE → REVOKED (immediate, no overlap)

The API response for a ROTATING key should include a deprecation_warning field with the expiry timestamp, so agents that log full API responses surface the upcoming expiry without requiring the operator to check the dashboard.

{
  "status": 200,
  "data": { ... },
  "key_status": "rotating",
  "key_expires_at": "2026-04-19T00:00:00Z",
  "rotation_docs_url": "https://api.example.com/docs/key-rotation"
}

Revocation without downtime

Immediate revocation is necessary when a key is compromised. But "immediate" has a different cost for an always-on agent than for a human user.

A human hitting a 401 opens the dashboard and regenerates. An agent hitting a 401 either fails silently (if the developer hasn't configured error handling), retries forever (if error handling is too aggressive), or pages the on-call developer at 3am.

Design the revocation flow to minimize unintended outage:

For compromise revocation: revoke immediately, no overlap. But send an alert to every configured channel (email, webhook) before revoking, with a 5-minute window. The alert says "this key will be revoked in 5 minutes due to suspicious activity" — this gives automated systems a chance to respond. After revocation, the 401 response should include a reason: "key_revoked_security" field so automated monitoring can distinguish security revocation from billing suspension from invalid key.

For billing suspension (budget exceeded): use a soft limit, not immediate revocation. Return 429 with a suspended_until field. The agent can pause and retry. The developer gets a billing alert. The key resumes automatically when the budget is increased or the billing period resets.

Distinguish the error codes:

Condition	HTTP Status	`error_code`
Invalid key (typo, never existed)	401	`invalid_key`
Key revoked (security)	401	`key_revoked`
Key expired (rotation window ended)	401	`key_expired`
Key suspended (budget)	429	`budget_exceeded`
Key suspended (abuse)	429	`rate_limit_exceeded`

Agents with good error handling can treat key_expired and key_revoked differently from invalid_key. The first two are recoverable (the developer needs to act); the last one usually means a configuration error.

Key ID in logs

One pattern that consistently helps: make the key ID (not the secret) appear in every API response, even successful ones.

{
  "status": 200,
  "key_id": "hf_key_01abc...",
  "data": { ... }
}

When a developer is debugging an agent that's been running for months, they often don't remember which key it's using. Having the key ID in the response means they can grep their agent's logs for hf_key_01abc and immediately know which key to check in the dashboard. Without this, they're guessing which of their five keys belongs to which agent.

The key you can't reach

The hardest case in B2A key management: the key is embedded in an agent owned by a developer who has left the company, in a system that has no active owner, running on a schedule that nobody has reviewed in six months.

You will encounter this. An agent that was integrated during a hackathon is still making 50 API calls a day three years later. The billing email goes to an inbox nobody monitors. The key has never been rotated.

Mitigate by:

Activity-based alerts: if a key goes from regular activity to zero for 14 days, send an alert. The developer may not know their agent broke. If they respond, the relationship is alive. If they don't, you have advance warning that this key may become permanently orphaned.
Last-activity in the dashboard: show when each key last made a successful call. Keys that haven't been used in 90 days are candidates for cleanup.
Automated expiry for inactive free-tier keys: keys on the free tier that have had no activity for 180 days can be deactivated with 30 days' notice. This keeps your key namespace clean and gives inactive accounts a chance to re-engage.

The goal is not to aggressively clean up keys — it's to maintain an accurate picture of which keys belong to active integrations and which are orphaned. Active key management is what lets you send a meaningful "we're changing our API" migration notice and have it reach the developers who are actually affected.

This completes the Stripe and billing infrastructure arc. The next arc covers the operational layer: monitoring, alerting, and the observability patterns that matter when your API consumers are autonomous agents running without human supervision.