Agent Error Recovery: Handling CAPTCHAs, Login Walls, and 500s from Screenshots

2026-03-25 | Tags: [screenshot-api, ai-agents, error-handling, python, llm, vision, robustness, automation]

The first time I ran the deploy verification agent against production, it returned a success on every page. I felt good about it. Then I realized the agent had checked the pages right after a redirect to a maintenance page — all five pages were the same "We'll be right back" banner, and the agent had decided each one met its goal (page looks healthy, no error indicators).

That failure mode is specific: the agent had no way to distinguish "goal achieved" from "entirely different page that happens to look okay." It needed to recognize unexpected page states — not just success and failure within the expected page, but the full taxonomy of what can go wrong when you navigate to a URL.

Here's the error recovery taxonomy I've settled on after running these agents in production for several months.

The Unexpected State Taxonomy

Expected states:
  ✓ Goal achieved — page matches expectation
  ✗ Goal failed — page loaded but expected state is wrong

Unexpected states (require special handling):
  🔒 Auth wall — login/signup page instead of destination
  🤖 CAPTCHA — challenge gate blocking access
  ⚠️  Error page — 4xx/5xx rendered in the page content
  🔧 Maintenance — maintenance/deployment in progress page
  🌐 Domain mismatch — redirected to unexpected domain
  ⌛ Loading timeout — page captured mid-load
  📭 Empty page — blank or near-blank render
  🔄 Redirect loop — final URL differs significantly from requested URL

Each of these requires a different recovery strategy. Lumping them together as "failed" loses the information needed to recover.

Step 1: Classify the Page State

The classifier runs before the goal check. It's fast — one LLM call per page, focused only on state detection.

import os
import io
import base64
import json
import requests
from PIL import Image
from openai import OpenAI

client = OpenAI()
API_KEY = os.environ['SCREENSHOT_API_KEY']
BASE_URL = 'https://hermesforge.dev/api/screenshot'

PAGE_STATES = [
    'goal_achieved',
    'goal_failed',
    'auth_wall',
    'captcha',
    'error_page',
    'maintenance',
    'domain_mismatch',
    'loading_timeout',
    'empty_page',
    'unexpected',
]


def capture(url, delay=2000, width=1280, height=900):
    resp = requests.get(
        BASE_URL,
        params={
            'url': url,
            'width': width,
            'height': height,
            'format': 'png',
            'delay': delay,
        },
        headers={'X-API-Key': API_KEY},
        timeout=60,
    )
    resp.raise_for_status()
    return Image.open(io.BytesIO(resp.content)).convert('RGB')


def image_to_base64(img):
    buf = io.BytesIO()
    img.save(buf, format='PNG')
    return base64.b64encode(buf.getvalue()).decode('utf-8')


def classify_page_state(img, goal, requested_url, actual_url=None):
    """
    Classify what kind of page the agent has encountered.
    Returns structured state classification.
    """
    img_b64 = image_to_base64(img)

    url_note = ''
    if actual_url and actual_url != requested_url:
        url_note = f'\nNOTE: Requested URL was {requested_url} but final URL is {actual_url}. This may indicate a redirect.'

    resp = client.chat.completions.create(
        model='gpt-4o',
        messages=[{
            'role': 'user',
            'content': [
                {
                    'type': 'text',
                    'text': (
                        f'You are an agent navigation classifier. Analyze this screenshot and determine the page state.\n\n'
                        f'Goal: {goal}\n'
                        f'URL: {requested_url}{url_note}\n\n'
                        f'Classify the page state as one of:\n'
                        f'- goal_achieved: Page matches the goal expectation\n'
                        f'- goal_failed: Page loaded correctly but goal is not met (e.g. wrong content, error state within the app)\n'
                        f'- auth_wall: A login, signup, or authentication page is shown instead of the destination\n'
                        f'- captcha: A CAPTCHA, bot challenge, or human verification gate is shown\n'
                        f'- error_page: An HTTP error page (404, 500, 403, etc.) or application error is shown\n'
                        f'- maintenance: A maintenance mode, deployment in progress, or "be right back" page\n'
                        f'- domain_mismatch: Page content appears to be from a completely different site/domain\n'
                        f'- loading_timeout: Page appears to still be loading (spinner, blank, partial content)\n'
                        f'- empty_page: Page is blank or near-blank with no meaningful content\n'
                        f'- unexpected: None of the above — describe what you see\n\n'
                        f'Return JSON: {{"state": "...", "confidence": 0.0-1.0, "evidence": "what you see that led to this classification", "recovery_hint": "suggested action"}}'
                    )
                },
                {
                    'type': 'image_url',
                    'image_url': {
                        'url': f'data:image/png;base64,{img_b64}',
                        'detail': 'high',
                    }
                },
            ],
        }],
        response_format={'type': 'json_object'},
        max_tokens=300,
    )

    return json.loads(resp.choices[0].message.content)

Step 2: Recovery Strategies Per State

Each state has a defined recovery strategy. The key design principle: recovery strategies should be stateless and idempotent — they can be retried safely without side effects.

import time
from enum import Enum
from dataclasses import dataclass, field
from typing import Optional


class RecoveryAction(Enum):
    RETRY_WITH_LONGER_DELAY = 'retry_with_longer_delay'
    RETRY_WITH_AUTH = 'retry_with_auth'
    SKIP_PAGE = 'skip_page'
    ABORT_MISSION = 'abort_mission'
    ALERT_HUMAN = 'alert_human'
    MARK_AS_FAILED = 'mark_as_failed'


@dataclass
class PageResult:
    url: str
    state: str
    goal: str
    success: bool
    evidence: str
    recovery_action: Optional[RecoveryAction] = None
    retry_count: int = 0
    notes: list = field(default_factory=list)


RECOVERY_STRATEGIES = {
    'goal_achieved': {
        'action': None,  # No recovery needed
        'retryable': False,
        'success': True,
    },
    'goal_failed': {
        'action': RecoveryAction.MARK_AS_FAILED,
        'retryable': False,
        'success': False,
    },
    'auth_wall': {
        'action': RecoveryAction.RETRY_WITH_AUTH,
        'retryable': True,
        'max_retries': 1,
        'success': False,
        'note': 'Session may have expired or page requires authenticated access',
    },
    'captcha': {
        'action': RecoveryAction.ALERT_HUMAN,
        'retryable': False,
        'success': False,
        'note': 'CAPTCHA gates cannot be bypassed by automated agents',
    },
    'error_page': {
        'action': RecoveryAction.RETRY_WITH_LONGER_DELAY,
        'retryable': True,
        'max_retries': 2,
        'retry_delay': 30,  # seconds
        'success': False,
        'note': 'Server error — may be transient',
    },
    'maintenance': {
        'action': RecoveryAction.RETRY_WITH_LONGER_DELAY,
        'retryable': True,
        'max_retries': 3,
        'retry_delay': 120,  # seconds — maintenance windows often last minutes
        'success': False,
        'note': 'Deployment or maintenance in progress',
    },
    'domain_mismatch': {
        'action': RecoveryAction.ALERT_HUMAN,
        'retryable': False,
        'success': False,
        'note': 'Unexpected redirect to different domain — possible DNS issue or domain expiry',
    },
    'loading_timeout': {
        'action': RecoveryAction.RETRY_WITH_LONGER_DELAY,
        'retryable': True,
        'max_retries': 2,
        'retry_delay': 5,
        'success': False,
        'note': 'Page may need more time to render',
    },
    'empty_page': {
        'action': RecoveryAction.RETRY_WITH_LONGER_DELAY,
        'retryable': True,
        'max_retries': 1,
        'retry_delay': 3,
        'success': False,
        'note': 'Blank page — possible JS rendering failure or very slow load',
    },
    'unexpected': {
        'action': RecoveryAction.ALERT_HUMAN,
        'retryable': False,
        'success': False,
        'note': 'Unrecognized page state — human review required',
    },
}

Step 3: The Recovery-Aware Agent

def check_page_with_recovery(url, goal, auth_cookies=None, max_retries=None):
    """
    Navigate to a page, classify its state, and apply the appropriate recovery strategy.
    Returns a PageResult with full context.
    """
    result = PageResult(url=url, state='unknown', goal=goal, success=False, evidence='')
    delay = 2000

    for attempt in range(10):  # Outer limit — never infinite
        try:
            img = capture(url, delay=delay)
            classification = classify_page_state(img, goal, url)

            state = classification['state']
            strategy = RECOVERY_STRATEGIES.get(state, RECOVERY_STRATEGIES['unexpected'])

            result.state = state
            result.evidence = classification.get('evidence', '')
            result.retry_count = attempt

            # Success — done
            if strategy['success']:
                result.success = True
                return result

            # Non-retryable state
            if not strategy['retryable']:
                result.recovery_action = strategy['action']
                result.notes.append(strategy.get('note', ''))
                result.notes.append(f'Recovery hint: {classification.get("recovery_hint", "N/A")}')
                return result

            # Check retry budget
            effective_max = max_retries or strategy.get('max_retries', 1)
            if attempt >= effective_max:
                result.recovery_action = RecoveryAction.MARK_AS_FAILED
                result.notes.append(f'Retry budget exhausted after {attempt + 1} attempts')
                return result

            # Apply retry strategy
            if state == 'auth_wall' and auth_cookies and attempt == 0:
                # Try with auth on first retry
                print(f'  Auth wall detected — retrying with session cookies')
                result.notes.append('Retrying with provided auth cookies')
                # Note: actual cookie injection would require the screenshot API to support it
                # or a proxy approach; flag for human review if not supported
                result.recovery_action = RecoveryAction.ALERT_HUMAN
                result.notes.append('Auth cookie injection not supported — human login required')
                return result

            elif state in ('loading_timeout', 'empty_page'):
                delay = min(delay * 2, 8000)  # Double delay, max 8 seconds
                print(f'  Slow page — retrying with {delay}ms delay')

            elif state in ('error_page', 'maintenance'):
                wait_seconds = strategy.get('retry_delay', 10)
                print(f'  {state} — waiting {wait_seconds}s before retry')
                time.sleep(wait_seconds)

            result.notes.append(f'Attempt {attempt + 1}: {state} — retrying')

        except requests.HTTPError as e:
            result.state = 'capture_failed'
            result.evidence = str(e)
            result.recovery_action = RecoveryAction.MARK_AS_FAILED
            return result

    result.recovery_action = RecoveryAction.MARK_AS_FAILED
    result.notes.append('Exceeded maximum retry attempts')
    return result

Step 4: Running a Resilient Mission

def run_mission(pages, mission_name='Agent Mission'):
    """
    Run a multi-page verification mission with full error recovery.
    """
    print(f'\n=== {mission_name} ===')
    results = []
    human_review_required = []

    for url, goal in pages:
        print(f'\nChecking: {url}')
        result = check_page_with_recovery(url, goal)
        results.append(result)

        if result.success:
            print(f'  ✓ {result.state}')
        elif result.recovery_action == RecoveryAction.ALERT_HUMAN:
            print(f'  ⚠️  {result.state} — human review required')
            human_review_required.append(result)
        else:
            print(f'  ✗ {result.state}: {result.evidence[:100]}')

        if result.notes:
            for note in result.notes:
                print(f'    → {note}')

    # Summary
    passed = sum(1 for r in results if r.success)
    failed = sum(1 for r in results if not r.success and r.recovery_action != RecoveryAction.ALERT_HUMAN)
    needs_human = len(human_review_required)

    print(f'\n=== Results: {passed} passed, {failed} failed, {needs_human} need human review ===')

    if human_review_required:
        print('\nPages requiring human review:')
        for r in human_review_required:
            print(f'  - {r.url}: {r.state}')
            for note in r.notes:
                print(f'    {note}')

    return results, human_review_required

Real-World Error Patterns

After running this in production for several months, here's what I actually see and how each gets handled:

Auth walls (most common): Session tokens expire. The agent hits a login page instead of the dashboard. Recovery: flag for human review with a note that session needs to be refreshed. Alternatively, inject a fresh auth token as a cookie if your screenshot API supports it.

Loading timeouts (second most common): SPAs with heavy API calls need 3000-4000ms to fully render. The default 2000ms catches ~80% of pages. The rest need longer delays. The doubling strategy (2s → 4s → 8s) resolves 95% of these without human intervention.

Maintenance pages: These cluster around deploy windows. If you run deployment verification immediately after a deploy, there's a race with the maintenance page going down. The 120-second retry strategy catches the end of maintenance windows automatically.

CAPTCHAs: Usually appear when the same IP makes many requests in a short window. The screenshot API's IP may get flagged on high-security pages (banking, government sites). No automated recovery — alert human and investigate rate limiting.

Domain mismatches: These are the scariest ones. I've seen them from: domain expiry (the registrar's parking page showed up), CDN misconfiguration (wrong origin served), and legitimate A/B tests (Optimizely redirected to a variant domain). All require human investigation — automated recovery would be dangerous.

Empty pages: Often a race condition with client-side rendering frameworks. The JavaScript bundle hasn't executed yet when the screenshot is captured. Doubling the delay resolves these reliably.

The Confidence Threshold

One more useful pattern: for states where the classifier is uncertain, treat low confidence as "unexpected" regardless of the classification:

def get_effective_state(classification):
    """
    Downgrade low-confidence classifications to 'unexpected'.
    A confident wrong classification is worse than an honest 'I don't know'.
    """
    confidence = classification.get('confidence', 0.0)
    state = classification.get('state', 'unexpected')

    if confidence < 0.75:
        return 'unexpected'  # Trigger human review when uncertain

    return state

The threshold of 0.75 came from testing — below that, the classification is wrong often enough that treating it as "unexpected" and triggering human review is the better outcome.

Putting It Together

The error recovery layer transforms agents from fragile to production-grade. Without it, any unexpected page state — a maintenance window, an expired session, a slow CDN — silently produces wrong results. With it, the agent either recovers automatically (transient errors) or surfaces the right information for human resolution (structural problems).

The taxonomy is the key insight. "Failed" is not a useful state for an agent to report. "Failed because of CAPTCHA" and "failed because of auth wall" require completely different responses. The agent should know the difference and communicate it.

Free API key at hermesforge.dev/screenshot.