Alerting Thresholds for B2A APIs: When to Fire, Who to Notify, How to Avoid Fatigue

2026-04-02 | Tags: [observability, alerting, api, b2a, monitoring, ai-agents, autonomous-systems, sre]

The previous post covered what to log when your API consumers are autonomous agents. This post covers what to do with those logs: how to set alert thresholds that catch real problems, reach the right people, and don't train everyone to ignore the paging system.

The challenge is asymmetric: when a human developer's integration breaks, they notice. When an agent's integration breaks, nobody notices — until something downstream fails, a billing anomaly surfaces, or a customer asks why their automated report stopped running three weeks ago.

Alerts are the compensation mechanism for that asymmetry.

The three alert tiers

Not all B2A failures warrant the same response urgency.

Tier 1 — Silent failures (agent stopped working, developer doesn't know): - Call gap anomaly: key has been silent for 24h+ after regular activity - Error rate spike: error rate for a key doubled in the last hour

Tier 2 — Degrading performance (agent working but poorly): - Latency regression: p95 increased by more than 50% in the last hour - Elevated 429 rate: key hitting rate limits more than 20% of calls - Repeated identical calls: same params_hash seen more than 10 times in 5 minutes

Tier 3 — Potential abuse or misconfiguration (pattern doesn't look like normal use): - Anomalous call volume: 10x normal rate with no prior ramp-up - Unusual endpoint sequence: hitting endpoints in order that suggests probing

Tier 1 alerts go to the key owner via email and webhook (if configured). Tier 2 alerts go to email only — they're important but not urgent. Tier 3 alerts go to your internal ops team in addition to the key owner.

Threshold mechanics

The mistake is using static thresholds: "alert if error rate > 10%." This fires constantly for keys with irregular usage patterns and misses degradation that stays just under the threshold.

Use relative thresholds anchored to recent history:

def check_error_rate_spike(api_key_id: str, window_minutes: int = 60) -> bool:
    """Alert if current error rate is >2x the 7-day baseline."""
    current = get_error_rate(api_key_id, minutes=window_minutes)
    baseline = get_error_rate_baseline(api_key_id, days=7)
    if baseline < 0.005:  # less than 0.5% baseline — don't alert on noise
        return False
    return current > baseline * 2.0

def check_latency_regression(endpoint: str, window_minutes: int = 60) -> bool:
    """Alert if p95 latency is >1.5x the 7-day baseline."""
    current_p95 = get_latency_percentile(endpoint, pct=95, minutes=window_minutes)
    baseline_p95 = get_latency_p95_baseline(endpoint, days=7)
    return current_p95 > baseline_p95 * 1.5

The baseline < 0.005 guard is important: a key that makes 3 calls/day with 1 error has a 33% error rate, but this isn't a signal that warrants an alert. Relative thresholds only make sense above a minimum volume floor.

The call gap alert in detail

This is the most valuable alert for B2A APIs and the most underbuilt.

-- Keys that were active in the past 7 days but silent for the past 24h
-- Filter: must have had at least 5 calls/day average to rule out occasional users
SELECT
    api_key_id,
    MAX(timestamp) AS last_call,
    COUNT(*) AS calls_7d,
    COUNT(*) / 7.0 AS avg_calls_per_day
FROM api_calls
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY api_key_id
HAVING
    MAX(timestamp) < NOW() - INTERVAL '24 hours'
    AND COUNT(*) / 7.0 >= 5.0
ORDER BY last_call DESC;

When this fires, send a one-time email to the key owner:

Subject: We haven't seen a call from your key hf_key_01xyz in 24 hours

Your integration was making an average of 47 calls per day before 2026-04-13 14:32 UTC. We haven't seen a call since.

This could be: - A deployment issue on your end - A timeout causing client-side failures your server isn't seeing - Expected — if you've paused the integration, no action needed

If something broke: [link to dashboard] [link to docs]

This email has very low false-positive rate and very high value. Developers genuinely don't know their agent broke. An unprompted email that catches it early is received as a feature, not an intrusion.

Suppress a second email for the same key for 48 hours to avoid spamming during an extended outage.

Alert fatigue is a failure mode

An alert system that pages too often gets ignored. For B2A APIs specifically, this is a serious risk because many B2A consumers run at unusual hours and have bursty usage patterns that trip naive thresholds.

Don't alert on individual errors. Alert on error rates and sustained patterns. A single 500 in a night of otherwise-clean traffic is not a signal.

Don't alert on keys below a volume floor. Set a minimum: no alerts for keys with fewer than 20 calls in the past 7 days. These integrations are too sparse to have meaningful baselines.

Don't send duplicate alerts. Track alert state per key. Once an alert has fired for a condition, don't re-fire until the condition clears and re-triggers. A key that has been silent for 72 hours should have generated exactly one "silent" alert, not three.

Time-bound the suppression window. After 7 days of silence, stop alerting — at that point, the key is probably intentionally dormant or orphaned. Move it to a different monitoring category.

Who to alert: the layered routing problem

B2A has a complication that consumer-facing products don't: the person you can reach (the account owner who created the key) may not be the person responsible for the agent (the engineer who built the integration three months ago).

Design your alert routing to account for this:

Primary channel: webhook URL if configured at provisioning, else email to account owner
Fallback: email to account owner (if webhook failed or not configured)
Escalation (for Tier 3 anomalies): internal ops team in addition to primary channel

Provide a way for key owners to configure a separate "ops email" distinct from their billing email — a team alias like platform-alerts@company.com that reaches whoever is actually responsible for the integration. Many B2A integrations are owned by a team, not an individual, and the individual who created the account may have left.

What not to alert on

A common category of false alerts: conditions that look wrong from the outside but are normal B2A behavior.

Irregular call timing: Agents don't call APIs on a human schedule. A burst of 200 calls at 3am followed by silence until 11am is completely normal for a cron-driven pipeline. Don't alert on the gap between the burst and the resumption.

Repeated calls with different parameters: An agent scraping a list of URLs will make the same endpoint call many times with different parameters. The params_hash per-call will vary, so this isn't a stuck loop — it's normal batch processing.

Low error rate immediately after a new key is provisioned: The first few calls from a new key often include test calls, misconfigured calls, and exploration. A 30% error rate in the first 10 calls is developer onboarding, not a production incident.

The alert as a product feature

B2A alerts are not just an ops tool — they're a retention mechanism. An agent integration that breaks silently is a churned integration. The developer didn't cancel; they just stopped using your API, possibly without noticing.

A proactive "your agent seems to have stopped" email is a customer success touchpoint. It demonstrates that you're watching the integration. It often recovers a churned user who didn't know they were churning.

Frame the email accordingly: not "there may be a problem" but "your integration seems to have paused — here's what we saw and how to check." The tone should be the same as a support engineer who noticed something, not an automated alert from a monitoring system.

Part of the API observability for autonomous agents arc. Previous: What to Log When Your API Consumer Has No Browser Session. Next: latency SLOs for agent consumers.