Caching Screenshot API Responses: When to Cache, When to Skip, and How to Invalidate

2026-05-10 | Tags: [screenshot-api, caching, redis, cdn, performance, production, architecture]

A screenshot request costs real resources: a browser launch, a page load, rendering time, I/O. For a URL you've already captured recently, serving a cached version is strictly better — faster response, lower compute cost, same result.

But 'same result' is the assumption that fails in practice. Pages change. A screenshot of a dashboard from three minutes ago may already be stale. A screenshot of a static marketing page from yesterday is fine. The caching question is not 'should I cache?' but 'how long is this result valid, and how do I know when it isn't?'

The Three-Layer Cache Model

Effective screenshot caching operates at three layers:

Layer 1: Application cache (Redis) — per-URL, per-parameter caching with configurable TTL. Fastest. Lives in process memory or a local Redis instance. Hits don't touch the browser.

Layer 2: CDN cache — for publicly-accessible screenshots (shareable links, embed URLs), a CDN layer serves cached images globally. Adds geographic performance. Cache-Control headers control TTL.

Layer 3: Client-side cache — the caller caches results and avoids redundant requests entirely. You provide ETag and Last-Modified headers; the client handles the rest.

Each layer has different characteristics, costs, and appropriate use cases. Most production systems need at least layers 1 and 3; layer 2 only if you're serving screenshots publicly.

Layer 1: Redis Application Cache

The cache key should include everything that affects the output: URL, viewport dimensions, device scale, full-page flag, wait delay, format, and any injected JavaScript hash.

import hashlib
import json
import redis
from typing import Optional

r = redis.Redis(host='localhost', port=6379, db=0)

def make_cache_key(url: str, params: dict) -> str:
    canonical = {
        'url': url,
        'width': params.get('width', 1280),
        'height': params.get('height', 720),
        'scale': params.get('scale', 1),
        'full_page': params.get('full_page', False),
        'format': params.get('format', 'png'),
        'wait': params.get('wait', 0),
        'js_hash': hashlib.sha256(
            params.get('js', '').encode()
        ).hexdigest()[:8]
    }
    key_data = json.dumps(canonical, sort_keys=True)
    return f"screenshot:{hashlib.sha256(key_data.encode()).hexdigest()}"

def get_cached(cache_key: str) -> Optional[bytes]:
    return r.get(cache_key)

def set_cached(cache_key: str, image_bytes: bytes, ttl_seconds: int):
    r.setex(cache_key, ttl_seconds, image_bytes)

The TTL is where judgment lives. A reasonable starting point:

def get_ttl(url: str, params: dict) -> int:
    # Never cache if caller explicitly requests fresh capture
    if params.get('no_cache'):
        return 0

    # Short TTL for dynamic content patterns
    dynamic_patterns = [
        '/dashboard', '/admin', '/realtime',
        'feed', 'live', 'stream'
    ]
    if any(p in url for p in dynamic_patterns):
        return 60  # 1 minute

    # Medium TTL for likely-stable content
    if url.endswith(('.html', '.htm', '/')):
        return 3600  # 1 hour

    # Longer TTL for static assets and documentation
    static_patterns = ['/docs', '/blog', '/about', '/pricing']
    if any(p in url for p in static_patterns):
        return 86400  # 24 hours

    # Default: 15 minutes
    return 900

The actual TTL strategy should be informed by your users. If you have analytics on which URLs are recaptured most frequently — and with what interval — that data should drive your defaults.

Serving from Cache

import time

def handle_screenshot_request(url: str, params: dict) -> dict:
    cache_key = make_cache_key(url, params)
    ttl = get_ttl(url, params)

    if ttl > 0:
        cached = get_cached(cache_key)
        if cached:
            remaining_ttl = r.ttl(cache_key)
            return {
                'image': cached,
                'cache': 'hit',
                'cache_age': ttl - remaining_ttl,
                'cache_ttl': remaining_ttl,
                'headers': {
                    'X-Cache': 'HIT',
                    'X-Cache-Age': str(ttl - remaining_ttl),
                    'Cache-Control': f'max-age={remaining_ttl}'
                }
            }

    # Cache miss — capture fresh
    image = capture_screenshot(url, params)

    if ttl > 0:
        set_cached(cache_key, image, ttl)

    return {
        'image': image,
        'cache': 'miss',
        'cache_age': 0,
        'cache_ttl': ttl,
        'headers': {
            'X-Cache': 'MISS',
            'X-Cache-Age': '0',
            'Cache-Control': f'max-age={ttl}' if ttl > 0 else 'no-store'
        }
    }

The X-Cache header lets callers see whether they're getting a fresh or cached result without having to instrument response times themselves. This is worth exposing — it helps users understand why a screenshot looks stale, and it gives you debugging visibility.

Layer 3: ETag and Conditional Requests

For clients that repeatedly screenshot the same URL, ETags eliminate redundant data transfer even when the cache is warm:

import hashlib

def make_etag(image_bytes: bytes) -> str:
    return f'"{hashlib.sha256(image_bytes).hexdigest()[:16]}"'

def handle_with_etag(url: str, params: dict, request_etag: str = None) -> dict:
    result = handle_screenshot_request(url, params)
    etag = make_etag(result['image'])

    # If client has current version, return 304 Not Modified
    if request_etag and request_etag == etag:
        return {
            'status': 304,
            'image': None,
            'headers': {
                'ETag': etag,
                'X-Cache': result['headers']['X-Cache']
            }
        }

    return {
        'status': 200,
        'image': result['image'],
        'headers': {
            **result['headers'],
            'ETag': etag
        }
    }

A well-behaved client sending If-None-Match: "abc123" and receiving a 304 avoids both the capture cost (if cache hit) and the transfer cost (always). For monitoring use cases where the client polls the same URL repeatedly, this can reduce bandwidth by 90%+.

Cache Invalidation

The hardest problem in caching is knowing when a cached result is wrong. Three mechanisms:

1. URL-based purge — caller explicitly invalidates a cached URL:

def purge_url(url: str, params: dict = None):
    if params:
        # Purge specific parameter combination
        cache_key = make_cache_key(url, params)
        r.delete(cache_key)
    else:
        # Purge all cached versions of this URL (pattern delete)
        # Note: SCAN is safer than KEYS for production Redis
        cursor = 0
        while True:
            cursor, keys = r.scan(cursor, match=f"screenshot:*", count=100)
            # Filter keys that match this URL
            # (requires storing URL→key mapping or using a secondary index)
            if cursor == 0:
                break

A cleaner approach for URL-based purge is to maintain a secondary index: when caching, also add the cache key to a Redis set keyed by URL hash. Purging the URL means deleting all keys in that set.

2. Webhook-triggered invalidation — if you control the target site, post a webhook when content updates:

@app.route('/api/cache/invalidate', methods=['POST'])
def invalidate_webhook():
    data = request.json
    if not verify_hmac(data, request.headers.get('X-Signature')):
        return {'error': 'invalid signature'}, 401

    url = data.get('url')
    if url:
        purge_url(url)
        return {'purged': url}, 200

    return {'error': 'url required'}, 400

3. TTL expiry — the default. When nothing else is available, TTL-based expiry is the fallback. Set conservative TTLs for dynamic content, generous TTLs for static content.

What Not to Cache

Some requests should never be cached:

def should_cache(url: str, params: dict) -> bool:
    # Explicit opt-out
    if params.get('no_cache'):
        return False

    # Cache-busting parameters in URL
    import urllib.parse
    parsed = urllib.parse.urlparse(url)
    query = urllib.parse.parse_qs(parsed.query)
    if any(k in query for k in ['ts', 'timestamp', 't', 'v', 'ver', 'bust', 'nocache']):
        return False

    # Custom JS injection — results are non-deterministic
    if params.get('js'):
        return False

    return True

The Cache Warming Problem

For high-traffic URLs, a cold cache means every request triggers a capture until the first one completes — the thundering herd problem. One mitigation: probabilistic early expiration.

import random
import math

def get_with_probabilistic_refresh(cache_key: str, ttl: int, beta: float = 1.0) -> Optional[bytes]:
    """
    XFetch algorithm: start recomputing before TTL expires,
    with probability proportional to how close to expiry.
    Prevents thundering herd on cache expiration.
    """
    cached = r.get(cache_key)
    if not cached:
        return None

    remaining = r.ttl(cache_key)
    if remaining <= 0:
        return None

    # Compute probability of early refresh
    # Higher beta = more aggressive early refresh
    elapsed = ttl - remaining
    if elapsed > 0:
        early_refresh_prob = math.exp(-remaining / (beta * math.log(elapsed + 1)))
        if random.random() < early_refresh_prob:
            return None  # Trigger refresh

    return cached

This is probably overkill for most screenshot APIs, but the pattern is worth knowing — it's the right solution when cache expiry causes visible latency spikes.

Measuring Cache Effectiveness

The metrics that matter: hit rate, average response time by cache status, storage size, and invalidation lag.

# Track in Redis itself
def record_cache_event(event_type: str):  # 'hit' or 'miss'
    pipe = r.pipeline()
    pipe.incr(f"cache_stats:{event_type}:total")
    pipe.incr(f"cache_stats:{event_type}:{int(time.time() // 3600)}h")
    pipe.execute()

def get_hit_rate() -> float:
    hits = int(r.get('cache_stats:hit:total') or 0)
    misses = int(r.get('cache_stats:miss:total') or 0)
    total = hits + misses
    return hits / total if total > 0 else 0.0

A healthy screenshot API cache should show 60-80% hit rate for steady-state traffic. Lower means TTLs are too short or the URL space is too varied. Higher might mean TTLs are too long and stale results are being served.

The Bottom Line

Cache decisions are business decisions disguised as technical ones. A 24-hour TTL for a pricing page screenshot means your screenshots might show yesterday's prices. A 60-second TTL for a dashboard screenshot might still be too stale for a financial monitoring use case. The right TTL depends on what the screenshot is being used for — and that's something your caller knows better than you do.

Exposing TTL as a parameter (&ttl=3600) and defaulting to something sensible gives callers control while protecting your infrastructure. The cache should serve the use case, not constrain it.


Part of the production patterns series. Previous: Rate Limiting.