Per-Key Usage Dashboards for B2A APIs: What to Show, What to Hide, What to Compute
The previous posts in this arc covered what to log, when to alert, and what SLOs to set. This post covers the fourth layer: what to show integrators in a per-key dashboard.
The design challenge is that B2A dashboards need to serve two audiences with different needs:
- The developer who built the integration and wants to know if it's working correctly
- The agent itself which, in sophisticated setups, may need to query its own usage programmatically to self-regulate
Standard API dashboards — requests this month, error rate, latency — are designed for the first audience and often miss what matters most. This post covers how to design for both.
What the standard dashboard gets wrong
Most API usage dashboards show: - Total requests (this month / last month) - Error rate (%) - Average latency (ms) - Geographic distribution
These are designed for humans building web apps who want aggregate reassurance. For B2A consumers, these metrics are nearly useless:
- Total requests doesn't tell you if the agent is running on schedule. A well-functioning agent making 1,000 calls/day and a broken agent that ran a runaway loop making 1,000 calls in 2 hours show the same monthly total.
- Error rate doesn't distinguish between errors the agent handles gracefully and errors that caused task failure.
- Average latency hides the tail that matters (see the previous post on SLOs).
- Geographic distribution is irrelevant for agent traffic — the agent's datacenter doesn't change.
What B2A dashboards should show instead
1. Activity timeline, not totals
Show a bar chart of calls per hour over the last 7 days. This immediately surfaces: - Whether the agent is running on its expected schedule - Whether there was a gap (agent stopped) or spike (runaway loop) - Whether the pattern has changed recently
The first question any developer asks about their agent is "is it running?" A timeline answers this in one glance. A request total does not.
2. Status code distribution by day
Show a stacked bar of status codes per day: 200 (success), 4xx (client errors), 5xx (server errors), and a separate slice for 429 (rate limited).
The 429 slice is the most important for B2A. An agent hitting rate limits produces a distinctive pattern: bursts of 429s followed by backoff gaps. This is often invisible in a simple error rate because 429 represents a different problem from a 500 — it means the agent is working but consuming quota faster than expected.
3. Call pattern analysis
Show derived patterns computed from the raw log:
- Calls per hour (smoothed): is the rate consistent or spiky?
- Unique parameter fingerprints: how many distinct params_hash values? A high number means variety (healthy batch processing); a very low number might mean a stuck retry loop.
- Gap detection: longest observed gap between calls. Flag if this exceeds 24h for a key with regular history.
- Duplicate run detection: consecutive identical calls within 60 seconds. These indicate retry loops without proper deduplication.
4. Cost and quota tracking
Show: - Calls used vs. plan limit (this billing period) - Estimated cost at current rate - Days until quota reset - Projected overage (if current rate continues)
Agents consuming APIs on a per-call basis need to self-regulate to stay within budget. A developer reviewing their agent's dashboard should be able to answer "will this run over budget this month?" without doing mental arithmetic.
5. Endpoint breakdown
If the API has multiple endpoints, show per-endpoint call counts and error rates. An agent that's configured to use a batch endpoint but falling back to single-call mode due to parameter errors will be visible here: high calls on /api/screenshot, near-zero on /api/screenshot/batch.
The machine-readable dashboard: /api/usage
For sophisticated B2A integrations, the agent itself needs to query its usage. This is especially common in agent frameworks that implement self-regulation: "check how many calls I've made today; if over threshold, slow down."
Provide a /api/usage endpoint that returns structured data a key can query with itself:
GET /api/usage
Authorization: Bearer hf_key_01xyz
{
"key_id": "hf_key_01xyz",
"period": {
"start": "2026-04-01T00:00:00Z",
"end": "2026-04-30T23:59:59Z",
"days_remaining": 15
},
"calls": {
"total": 4821,
"limit": 10000,
"remaining": 5179,
"today": 312,
"yesterday": 287
},
"errors": {
"rate_7d": 0.023,
"rate_limit_7d": 0.008,
"server_errors_7d": 0.002
},
"latency": {
"p50_ms": 1240,
"p95_ms": 3820,
"p99_ms": 7400
},
"activity": {
"last_call": "2026-04-16T08:32:14Z",
"calls_last_hour": 18,
"gap_hours_7d_max": 6.2
}
}
An agent with access to this endpoint can implement patterns like: - "If calls_remaining < 1000, switch to cached mode" - "If rate_limit_7d > 0.05, reduce call frequency" - "If last_call was > 2 hours ago, emit a warning log"
This turns your dashboard from a passive observation tool into an active feedback channel that the agent can close the loop on.
What not to show
Don't show raw request logs. Per-request detail is a privacy and data volume problem. Aggregate to hourly buckets, show patterns, don't expose every individual URL your API processed. If a developer needs per-request detail for debugging, that's a support ticket, not a dashboard feature.
Don't show other keys' data. This sounds obvious but async dashboards with shared data sources have accidentally exposed comparative data (e.g., "your usage vs. average user"). For B2A, the only relevant comparison is the key's own history.
Don't surface latency information that might reveal target site characteristics. If your screenshot API logs per-URL latency and a developer can query their usage history, you're potentially revealing how long competitor sites take to render — which is competitive intelligence that could embarrass you.
The self-service debugging workflow
The most valuable thing a B2A dashboard can enable is this workflow:
- Developer gets an alert: "your agent has been silent for 24 hours"
- Developer opens dashboard
- Dashboard shows: last call was yesterday at 14:32Z, everything was fine, then silence
- Developer also sees: 47 rate-limit errors in the 2 hours before silence
- Developer understands: agent hit rate limits, retry logic failed, agent stalled
Without the dashboard, the developer has to look at their agent's internal logs. With the dashboard, they can reconstruct the failure from the API provider's side. That's a support call that doesn't happen, and a problem that gets resolved faster.
Design the dashboard to make this workflow possible in under 60 seconds.
Part of the API observability for autonomous agents arc. Previous: Latency SLOs for B2A APIs. Next: the gap between agent health and API health — why your API can be healthy while your consumer's workflow is failing.