Screenshot API Error Handling: Retries, Timeouts, and Debugging Failures

2026-05-13 | Tags: [screenshot-api, tutorial, error-handling, debugging, reliability, how-to, recipe]

Screenshot API calls fail. Pages time out, selectors don't match, rate limits are hit, and dynamic content doesn't finish rendering in time. Building a reliable screenshot pipeline means handling these failures gracefully rather than letting them bubble up as unhandled exceptions. Here's how.

The Error Taxonomy

Screenshot API errors fall into four categories, each requiring a different response:

HTTP status Category Action
400 Client error (bad params) Fix the request, don't retry
401 Authentication failure Check API key, don't retry
404 URL not found Log and skip, don't retry
408 / timeout Render timeout Retry with longer timeout
429 Rate limit Retry after backoff
500 Server error Retry with exponential backoff
Connection error Network failure Retry immediately

Client errors (4xx except 429) are your problem to fix. Server errors (5xx) and rate limits are transient — they benefit from retry logic.

Basic Retry with Exponential Backoff

import requests
import time
import random

def screenshot_with_retry(url, params, api_key, max_retries=3):
    """Capture screenshot with exponential backoff retry."""
    headers = {"X-API-Key": api_key}
    last_error = None

    for attempt in range(max_retries + 1):
        try:
            response = requests.get(
                "https://api.hermesforge.dev/api/screenshot",
                params={"url": url, **params},
                headers=headers,
                timeout=45
            )

            if response.status_code == 200:
                return response.content

            # Don't retry client errors
            if response.status_code in (400, 401, 403, 404):
                response.raise_for_status()

            # Rate limit: respect Retry-After header if present
            if response.status_code == 429:
                retry_after = int(response.headers.get("Retry-After", 60))
                time.sleep(retry_after)
                continue

            # Server error: exponential backoff
            last_error = f"HTTP {response.status_code}"

        except requests.Timeout:
            last_error = "Timeout"
        except requests.ConnectionError as e:
            last_error = f"Connection error: {e}"

        if attempt < max_retries:
            wait = (2 ** attempt) + random.uniform(0, 1)  # jitter
            time.sleep(wait)

    raise RuntimeError(f"Screenshot failed after {max_retries + 1} attempts: {last_error}")


# Usage
try:
    image = screenshot_with_retry(
        "https://example.com",
        {"width": 1280, "format": "webp"},
        "your_api_key"
    )
    with open("output.webp", "wb") as f:
        f.write(image)
except RuntimeError as e:
    print(f"Capture failed: {e}")

The jitter (random.uniform(0, 1)) prevents thundering herd problems when multiple workers retry simultaneously. Without jitter, all retries hit the server at the same moment.

Handling Render Timeouts

Timeouts are often a signal that a page needs more time to render, not that the request should be abandoned:

def screenshot_with_adaptive_timeout(url, api_key, base_delay=0):
    """Try with progressive delays for slow-rendering pages."""
    delays = [base_delay, base_delay + 1000, base_delay + 3000]

    for delay in delays:
        try:
            response = requests.get(
                "https://api.hermesforge.dev/api/screenshot",
                params={
                    "url": url,
                    "width": 1280,
                    "delay": delay,
                    "timeout": 30000   # 30s render timeout
                },
                headers={"X-API-Key": api_key},
                timeout=60  # HTTP client timeout > render timeout
            )
            if response.status_code == 200:
                return response.content, delay

        except requests.Timeout:
            if delay == delays[-1]:
                raise
            continue

    return None, None


image, used_delay = screenshot_with_adaptive_timeout(
    "https://app.example.com/dashboard",
    "your_api_key",
    base_delay=500
)
if used_delay > 500:
    print(f"Note: page needed {used_delay}ms delay to render")

The HTTP client timeout (60s) must be longer than the render timeout parameter (30s). If they're equal or inverted, the HTTP client will close the connection before the server finishes rendering.

Selector Failure Handling

When capturing by CSS selector, the element might not exist on the page:

class SelectorNotFoundError(Exception):
    pass

def capture_element_safe(url, selector, api_key, fallback=None):
    """Capture element by selector with optional fallback."""
    for sel in filter(None, [selector, fallback]):
        response = requests.get(
            "https://api.hermesforge.dev/api/screenshot",
            params={"url": url, "selector": sel, "width": 1280},
            headers={"X-API-Key": api_key},
            timeout=30
        )

        if response.status_code == 200:
            return response.content

        if response.status_code == 400:
            error_body = response.json()
            if "selector" in error_body.get("error", "").lower():
                continue  # Try fallback
            response.raise_for_status()

    raise SelectorNotFoundError(
        f"Selector '{selector}' not found (fallback: '{fallback}')"
    )


try:
    image = capture_element_safe(
        "https://example.com",
        ".PricingTable-v2",
        "your_api_key",
        fallback="[data-testid='pricing-table']"
    )
except SelectorNotFoundError as e:
    # Log and capture full page as fallback
    print(f"Element not found: {e} — falling back to full page")
    response = requests.get(
        "https://api.hermesforge.dev/api/screenshot",
        params={"url": "https://example.com", "width": 1280},
        headers={"X-API-Key": "your_api_key"},
        timeout=30
    )
    image = response.content

Circuit Breaker for Batch Operations

When running hundreds of screenshots in batch, a circuit breaker prevents cascading failures from hammering a degraded API:

from dataclasses import dataclass, field
from datetime import datetime, timedelta

@dataclass
class CircuitBreaker:
    failure_threshold: int = 5
    recovery_timeout: int = 60  # seconds
    failures: int = 0
    last_failure_time: datetime = None
    state: str = "closed"  # closed, open, half-open

    def call(self, fn, *args, **kwargs):
        if self.state == "open":
            if datetime.now() - self.last_failure_time > timedelta(seconds=self.recovery_timeout):
                self.state = "half-open"
            else:
                raise RuntimeError("Circuit breaker open — API unreachable")

        try:
            result = fn(*args, **kwargs)
            if self.state == "half-open":
                self.reset()
            return result
        except Exception as e:
            self.record_failure()
            raise

    def record_failure(self):
        self.failures += 1
        self.last_failure_time = datetime.now()
        if self.failures >= self.failure_threshold:
            self.state = "open"

    def reset(self):
        self.failures = 0
        self.state = "closed"


breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=60)

def capture(url, api_key):
    response = requests.get(
        "https://api.hermesforge.dev/api/screenshot",
        params={"url": url, "width": 1280},
        headers={"X-API-Key": api_key},
        timeout=30
    )
    response.raise_for_status()
    return response.content


urls = ["https://example.com/page-1", "https://example.com/page-2", ...]
results = {}

for url in urls:
    try:
        results[url] = breaker.call(capture, url, "your_api_key")
    except RuntimeError as e:
        results[url] = None
        print(f"Skipped {url}: {e}")

After 5 consecutive failures, the circuit opens and all subsequent calls fail immediately — no wasted time waiting on timeouts. After 60 seconds, it tries one request in half-open state. On success, it resets.

Debugging Rendering Failures

When a screenshot succeeds but shows unexpected content (blank page, error state, loading spinners):

import base64
import os

def debug_screenshot(url, api_key, output_dir="/tmp/debug"):
    """Capture screenshot with full debug information."""
    os.makedirs(output_dir, exist_ok=True)

    response = requests.get(
        "https://api.hermesforge.dev/api/screenshot",
        params={
            "url": url,
            "width": 1280,
            "full_page": "true",
            "delay": 0,           # No delay — capture initial load state
            "format": "png"
        },
        headers={"X-API-Key": api_key},
        timeout=45
    )

    # Save screenshot
    with open(f"{output_dir}/initial.png", "wb") as f:
        f.write(response.content)

    # Also capture with delay to compare
    response_delayed = requests.get(
        "https://api.hermesforge.dev/api/screenshot",
        params={
            "url": url,
            "width": 1280,
            "full_page": "true",
            "delay": 3000,
            "format": "png"
        },
        headers={"X-API-Key": api_key},
        timeout=45
    )

    with open(f"{output_dir}/after_3s.png", "wb") as f:
        f.write(response_delayed.content)

    print(f"Debug screenshots saved to {output_dir}/")
    print(f"Initial: {len(response.content):,} bytes")
    print(f"After 3s: {len(response_delayed.content):,} bytes")

    # If sizes differ significantly, content was still loading at t=0
    size_ratio = len(response_delayed.content) / max(len(response.content), 1)
    if size_ratio > 1.5:
        print("⚠ Page content changed significantly after delay — increase delay parameter")
    elif len(response.content) < 5000:
        print("⚠ Initial screenshot is very small — page may be blank or erroring")

Comparing screenshot sizes at t=0 vs t=3s is a quick heuristic for detecting loading issues. A significantly larger delayed screenshot means the page wasn't done rendering.

Logging Failures for Monitoring

In production, log enough context to diagnose failures without re-running:

import logging
import json

logging.basicConfig(
    filename="/var/log/screenshot_errors.log",
    level=logging.ERROR,
    format="%(asctime)s %(message)s"
)

def logged_screenshot(url, params, api_key):
    try:
        response = requests.get(
            "https://api.hermesforge.dev/api/screenshot",
            params={"url": url, **params},
            headers={"X-API-Key": api_key},
            timeout=45
        )
        response.raise_for_status()
        return response.content

    except requests.HTTPError as e:
        logging.error(json.dumps({
            "event": "screenshot_http_error",
            "url": url,
            "status": e.response.status_code,
            "body": e.response.text[:500],
            "params": {k: v for k, v in params.items() if k != "inject_js"}
        }))
        raise

    except requests.Timeout:
        logging.error(json.dumps({
            "event": "screenshot_timeout",
            "url": url,
            "params": params
        }))
        raise

    except requests.ConnectionError as e:
        logging.error(json.dumps({
            "event": "screenshot_connection_error",
            "url": url,
            "error": str(e)
        }))
        raise

Log the params (minus large inject_js content), URL, and full response body up to 500 characters. The response body from a 4xx error usually contains a specific error message that tells you exactly what to fix.

Common Failures and Their Fixes

Symptom Likely cause Fix
Blank white screenshot Page requires auth Pass cookies or auth headers
Partially loaded content Insufficient delay Increase delay parameter
Spinner visible in screenshot JS not finished Use wait_for selector or increase delay
Element not found (400) Selector doesn't match Check selector against live page, add fallback
Timeout on every request Page never finishes loading Use delay instead of waiting for load; check for infinite loops
Screenshot cuts off content full_page not set Add full_page=true
Blurry text in screenshot 1x DPR Add device_pixel_ratio=2
Cookie banner visible No consent handling Inject CSS to hide .cookie-banner, #cookie-consent

Most "unexpected content" failures are delay-related — the screenshot captured before the page finished rendering. When in doubt, add delay before investigating other causes.