Screenshot APIs for Automated Visual Regression Testing

2026-04-24 | Tags: [testing, api, screenshot, automation, python, ci-cd]

Visual regression testing catches bugs that unit tests miss: a CSS change that breaks the layout on mobile, a z-index that hides a button, a font that fails to load. But running headless browsers in CI is heavy — slow build times, flaky infrastructure, maintenance overhead.

A screenshot API moves that headless browser to someone else's infrastructure. Your CI pipeline makes an HTTP request and gets back an image.

The Basic Pattern

import requests
import hashlib
from pathlib import Path

def capture_screenshot(url: str, api_key: str) -> bytes:
    resp = requests.get(
        "https://hermesforge.dev/api/screenshot",
        params={"url": url, "width": 1280, "format": "png"},
        headers={"Authorization": f"Bearer {api_key}"}
    )
    resp.raise_for_status()
    return resp.content

def hash_image(image_bytes: bytes) -> str:
    return hashlib.sha256(image_bytes).hexdigest()

For regression testing, the workflow is:

  1. Baseline: capture screenshots before a deploy, store them
  2. Comparison: capture screenshots after a deploy
  3. Diff: compare the two — flag differences above a threshold

Pixel-Level Diffing with Pillow

from PIL import Image, ImageChops
import io
import numpy as np

def pixel_diff(before: bytes, after: bytes) -> float:
    """Returns the fraction of pixels that changed."""
    img_before = Image.open(io.BytesIO(before)).convert("RGB")
    img_after = Image.open(io.BytesIO(after)).convert("RGB")

    # Resize to same dimensions if needed
    if img_before.size != img_after.size:
        img_after = img_after.resize(img_before.size)

    diff = ImageChops.difference(img_before, img_after)
    arr = np.array(diff)
    changed_pixels = np.any(arr > 10, axis=2).sum()
    total_pixels = arr.shape[0] * arr.shape[1]
    return changed_pixels / total_pixels

A 1% pixel change might indicate a font shift. A 30% change probably means something broke.

CI Integration (GitHub Actions)

name: Visual Regression

on:
  pull_request:
    branches: [main]

jobs:
  visual-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install requests Pillow numpy

      - name: Capture baseline (production)
        env:
          SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
        run: |
          python3 scripts/capture_baseline.py \
            --url https://yoursite.com \
            --output baseline.png

      - name: Deploy preview
        run: ./deploy-preview.sh

      - name: Capture after deploy
        env:
          SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
          PREVIEW_URL: ${{ steps.deploy.outputs.url }}
        run: |
          python3 scripts/capture_baseline.py \
            --url $PREVIEW_URL \
            --output after.png

      - name: Compare
        run: python3 scripts/compare_screenshots.py baseline.png after.png --threshold 0.02

      - name: Upload diff on failure
        if: failure()
        uses: actions/upload-artifact@v3
        with:
          name: visual-diff
          path: diff.png

The Full Comparison Script

#!/usr/bin/env python3
"""compare_screenshots.py — fail CI if images differ beyond threshold"""

import sys
import argparse
import io
import numpy as np
from PIL import Image, ImageChops, ImageDraw

def compare(before_path: str, after_path: str, threshold: float) -> bool:
    before = Image.open(before_path).convert("RGB")
    after = Image.open(after_path).convert("RGB")

    if before.size != after.size:
        print(f"Size mismatch: {before.size} vs {after.size}")
        after = after.resize(before.size)

    diff = ImageChops.difference(before, after)
    arr = np.array(diff)

    changed = np.any(arr > 10, axis=2).sum()
    total = arr.shape[0] * arr.shape[1]
    ratio = changed / total

    print(f"Changed pixels: {changed:,} / {total:,} ({ratio:.2%})")

    if ratio > threshold:
        # Save highlighted diff
        highlight = before.copy()
        mask = np.any(arr > 10, axis=2)
        highlight_arr = np.array(highlight)
        highlight_arr[mask] = [255, 0, 0]  # red overlay on changed regions
        Image.fromarray(highlight_arr).save("diff.png")
        print(f"FAIL: {ratio:.2%} changed, threshold is {threshold:.2%}")
        print("Diff saved to diff.png")
        return False

    print(f"PASS: {ratio:.2%} changed (within threshold)")
    return True

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("before")
    parser.add_argument("after")
    parser.add_argument("--threshold", type=float, default=0.01)
    args = parser.parse_args()

    ok = compare(args.before, args.after, args.threshold)
    sys.exit(0 if ok else 1)

Capturing Multiple Pages

For sites with more than one page to check:

PAGES_TO_CHECK = [
    "/",
    "/pricing",
    "/docs",
    "/blog",
    "/login",
]

def run_regression(base_url: str, api_key: str, output_dir: str):
    Path(output_dir).mkdir(exist_ok=True)

    for path in PAGES_TO_CHECK:
        url = base_url.rstrip("/") + path
        slug = path.strip("/").replace("/", "-") or "home"

        print(f"Capturing {url}...")
        img = capture_screenshot(url, api_key)

        with open(f"{output_dir}/{slug}.png", "wb") as f:
            f.write(img)

        print(f"  Saved {len(img):,} bytes")

Caveats

Dynamic content: Pages with live data (timestamps, stock prices, personalized content) will always show pixel differences. Screenshot pages you control, or use selectors to mask dynamic regions before comparison.

Rate limits: If your CI checks 20 pages on every PR, you'll hit daily limits fast. Cache baseline screenshots in CI artifacts rather than re-capturing production on every run.

Timing: A screenshot captures state at a moment in time. For pages with animations or async content, add a delay parameter or use the API's wait_for option to ensure the page is fully rendered before capture.

When to Use This

Screenshot API visual regression makes sense when: - You don't want to maintain a headless browser in CI - Your team is small and a simple pixel-diff is good enough - You're checking marketing pages, landing pages, or other visually stable content

For full visual testing suites on complex applications, dedicated tools like Percy or Chromatic are more appropriate. But for lightweight "did we break the homepage" checks, an API call + Pillow is hard to beat.

Full API docs: hermesforge.dev/docs