Screenshot API Error Handling: Retries, Timeouts, and Debugging Failures
Screenshot API calls fail. Pages time out, selectors don't match, rate limits are hit, and dynamic content doesn't finish rendering in time. Building a reliable screenshot pipeline means handling these failures gracefully rather than letting them bubble up as unhandled exceptions. Here's how.
The Error Taxonomy
Screenshot API errors fall into four categories, each requiring a different response:
| HTTP status | Category | Action |
|---|---|---|
| 400 | Client error (bad params) | Fix the request, don't retry |
| 401 | Authentication failure | Check API key, don't retry |
| 404 | URL not found | Log and skip, don't retry |
| 408 / timeout | Render timeout | Retry with longer timeout |
| 429 | Rate limit | Retry after backoff |
| 500 | Server error | Retry with exponential backoff |
| Connection error | Network failure | Retry immediately |
Client errors (4xx except 429) are your problem to fix. Server errors (5xx) and rate limits are transient — they benefit from retry logic.
Basic Retry with Exponential Backoff
import requests
import time
import random
def screenshot_with_retry(url, params, api_key, max_retries=3):
"""Capture screenshot with exponential backoff retry."""
headers = {"X-API-Key": api_key}
last_error = None
for attempt in range(max_retries + 1):
try:
response = requests.get(
"https://api.hermesforge.dev/api/screenshot",
params={"url": url, **params},
headers=headers,
timeout=45
)
if response.status_code == 200:
return response.content
# Don't retry client errors
if response.status_code in (400, 401, 403, 404):
response.raise_for_status()
# Rate limit: respect Retry-After header if present
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
time.sleep(retry_after)
continue
# Server error: exponential backoff
last_error = f"HTTP {response.status_code}"
except requests.Timeout:
last_error = "Timeout"
except requests.ConnectionError as e:
last_error = f"Connection error: {e}"
if attempt < max_retries:
wait = (2 ** attempt) + random.uniform(0, 1) # jitter
time.sleep(wait)
raise RuntimeError(f"Screenshot failed after {max_retries + 1} attempts: {last_error}")
# Usage
try:
image = screenshot_with_retry(
"https://example.com",
{"width": 1280, "format": "webp"},
"your_api_key"
)
with open("output.webp", "wb") as f:
f.write(image)
except RuntimeError as e:
print(f"Capture failed: {e}")
The jitter (random.uniform(0, 1)) prevents thundering herd problems when multiple workers retry simultaneously. Without jitter, all retries hit the server at the same moment.
Handling Render Timeouts
Timeouts are often a signal that a page needs more time to render, not that the request should be abandoned:
def screenshot_with_adaptive_timeout(url, api_key, base_delay=0):
"""Try with progressive delays for slow-rendering pages."""
delays = [base_delay, base_delay + 1000, base_delay + 3000]
for delay in delays:
try:
response = requests.get(
"https://api.hermesforge.dev/api/screenshot",
params={
"url": url,
"width": 1280,
"delay": delay,
"timeout": 30000 # 30s render timeout
},
headers={"X-API-Key": api_key},
timeout=60 # HTTP client timeout > render timeout
)
if response.status_code == 200:
return response.content, delay
except requests.Timeout:
if delay == delays[-1]:
raise
continue
return None, None
image, used_delay = screenshot_with_adaptive_timeout(
"https://app.example.com/dashboard",
"your_api_key",
base_delay=500
)
if used_delay > 500:
print(f"Note: page needed {used_delay}ms delay to render")
The HTTP client timeout (60s) must be longer than the render timeout parameter (30s). If they're equal or inverted, the HTTP client will close the connection before the server finishes rendering.
Selector Failure Handling
When capturing by CSS selector, the element might not exist on the page:
class SelectorNotFoundError(Exception):
pass
def capture_element_safe(url, selector, api_key, fallback=None):
"""Capture element by selector with optional fallback."""
for sel in filter(None, [selector, fallback]):
response = requests.get(
"https://api.hermesforge.dev/api/screenshot",
params={"url": url, "selector": sel, "width": 1280},
headers={"X-API-Key": api_key},
timeout=30
)
if response.status_code == 200:
return response.content
if response.status_code == 400:
error_body = response.json()
if "selector" in error_body.get("error", "").lower():
continue # Try fallback
response.raise_for_status()
raise SelectorNotFoundError(
f"Selector '{selector}' not found (fallback: '{fallback}')"
)
try:
image = capture_element_safe(
"https://example.com",
".PricingTable-v2",
"your_api_key",
fallback="[data-testid='pricing-table']"
)
except SelectorNotFoundError as e:
# Log and capture full page as fallback
print(f"Element not found: {e} — falling back to full page")
response = requests.get(
"https://api.hermesforge.dev/api/screenshot",
params={"url": "https://example.com", "width": 1280},
headers={"X-API-Key": "your_api_key"},
timeout=30
)
image = response.content
Circuit Breaker for Batch Operations
When running hundreds of screenshots in batch, a circuit breaker prevents cascading failures from hammering a degraded API:
from dataclasses import dataclass, field
from datetime import datetime, timedelta
@dataclass
class CircuitBreaker:
failure_threshold: int = 5
recovery_timeout: int = 60 # seconds
failures: int = 0
last_failure_time: datetime = None
state: str = "closed" # closed, open, half-open
def call(self, fn, *args, **kwargs):
if self.state == "open":
if datetime.now() - self.last_failure_time > timedelta(seconds=self.recovery_timeout):
self.state = "half-open"
else:
raise RuntimeError("Circuit breaker open — API unreachable")
try:
result = fn(*args, **kwargs)
if self.state == "half-open":
self.reset()
return result
except Exception as e:
self.record_failure()
raise
def record_failure(self):
self.failures += 1
self.last_failure_time = datetime.now()
if self.failures >= self.failure_threshold:
self.state = "open"
def reset(self):
self.failures = 0
self.state = "closed"
breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=60)
def capture(url, api_key):
response = requests.get(
"https://api.hermesforge.dev/api/screenshot",
params={"url": url, "width": 1280},
headers={"X-API-Key": api_key},
timeout=30
)
response.raise_for_status()
return response.content
urls = ["https://example.com/page-1", "https://example.com/page-2", ...]
results = {}
for url in urls:
try:
results[url] = breaker.call(capture, url, "your_api_key")
except RuntimeError as e:
results[url] = None
print(f"Skipped {url}: {e}")
After 5 consecutive failures, the circuit opens and all subsequent calls fail immediately — no wasted time waiting on timeouts. After 60 seconds, it tries one request in half-open state. On success, it resets.
Debugging Rendering Failures
When a screenshot succeeds but shows unexpected content (blank page, error state, loading spinners):
import base64
import os
def debug_screenshot(url, api_key, output_dir="/tmp/debug"):
"""Capture screenshot with full debug information."""
os.makedirs(output_dir, exist_ok=True)
response = requests.get(
"https://api.hermesforge.dev/api/screenshot",
params={
"url": url,
"width": 1280,
"full_page": "true",
"delay": 0, # No delay — capture initial load state
"format": "png"
},
headers={"X-API-Key": api_key},
timeout=45
)
# Save screenshot
with open(f"{output_dir}/initial.png", "wb") as f:
f.write(response.content)
# Also capture with delay to compare
response_delayed = requests.get(
"https://api.hermesforge.dev/api/screenshot",
params={
"url": url,
"width": 1280,
"full_page": "true",
"delay": 3000,
"format": "png"
},
headers={"X-API-Key": api_key},
timeout=45
)
with open(f"{output_dir}/after_3s.png", "wb") as f:
f.write(response_delayed.content)
print(f"Debug screenshots saved to {output_dir}/")
print(f"Initial: {len(response.content):,} bytes")
print(f"After 3s: {len(response_delayed.content):,} bytes")
# If sizes differ significantly, content was still loading at t=0
size_ratio = len(response_delayed.content) / max(len(response.content), 1)
if size_ratio > 1.5:
print("⚠ Page content changed significantly after delay — increase delay parameter")
elif len(response.content) < 5000:
print("⚠ Initial screenshot is very small — page may be blank or erroring")
Comparing screenshot sizes at t=0 vs t=3s is a quick heuristic for detecting loading issues. A significantly larger delayed screenshot means the page wasn't done rendering.
Logging Failures for Monitoring
In production, log enough context to diagnose failures without re-running:
import logging
import json
logging.basicConfig(
filename="/var/log/screenshot_errors.log",
level=logging.ERROR,
format="%(asctime)s %(message)s"
)
def logged_screenshot(url, params, api_key):
try:
response = requests.get(
"https://api.hermesforge.dev/api/screenshot",
params={"url": url, **params},
headers={"X-API-Key": api_key},
timeout=45
)
response.raise_for_status()
return response.content
except requests.HTTPError as e:
logging.error(json.dumps({
"event": "screenshot_http_error",
"url": url,
"status": e.response.status_code,
"body": e.response.text[:500],
"params": {k: v for k, v in params.items() if k != "inject_js"}
}))
raise
except requests.Timeout:
logging.error(json.dumps({
"event": "screenshot_timeout",
"url": url,
"params": params
}))
raise
except requests.ConnectionError as e:
logging.error(json.dumps({
"event": "screenshot_connection_error",
"url": url,
"error": str(e)
}))
raise
Log the params (minus large inject_js content), URL, and full response body up to 500 characters. The response body from a 4xx error usually contains a specific error message that tells you exactly what to fix.
Common Failures and Their Fixes
| Symptom | Likely cause | Fix |
|---|---|---|
| Blank white screenshot | Page requires auth | Pass cookies or auth headers |
| Partially loaded content | Insufficient delay | Increase delay parameter |
| Spinner visible in screenshot | JS not finished | Use wait_for selector or increase delay |
| Element not found (400) | Selector doesn't match | Check selector against live page, add fallback |
| Timeout on every request | Page never finishes loading | Use delay instead of waiting for load; check for infinite loops |
| Screenshot cuts off content | full_page not set |
Add full_page=true |
| Blurry text in screenshot | 1x DPR | Add device_pixel_ratio=2 |
| Cookie banner visible | No consent handling | Inject CSS to hide .cookie-banner, #cookie-consent |
Most "unexpected content" failures are delay-related — the screenshot captured before the page finished rendering. When in doubt, add delay before investigating other causes.