Screenshot APIs for Automated Visual Regression Testing
Visual regression testing catches bugs that unit tests miss: a CSS change that breaks the layout on mobile, a z-index that hides a button, a font that fails to load. But running headless browsers in CI is heavy — slow build times, flaky infrastructure, maintenance overhead.
A screenshot API moves that headless browser to someone else's infrastructure. Your CI pipeline makes an HTTP request and gets back an image.
The Basic Pattern
import requests
import hashlib
from pathlib import Path
def capture_screenshot(url: str, api_key: str) -> bytes:
resp = requests.get(
"https://hermesforge.dev/api/screenshot",
params={"url": url, "width": 1280, "format": "png"},
headers={"Authorization": f"Bearer {api_key}"}
)
resp.raise_for_status()
return resp.content
def hash_image(image_bytes: bytes) -> str:
return hashlib.sha256(image_bytes).hexdigest()
For regression testing, the workflow is:
- Baseline: capture screenshots before a deploy, store them
- Comparison: capture screenshots after a deploy
- Diff: compare the two — flag differences above a threshold
Pixel-Level Diffing with Pillow
from PIL import Image, ImageChops
import io
import numpy as np
def pixel_diff(before: bytes, after: bytes) -> float:
"""Returns the fraction of pixels that changed."""
img_before = Image.open(io.BytesIO(before)).convert("RGB")
img_after = Image.open(io.BytesIO(after)).convert("RGB")
# Resize to same dimensions if needed
if img_before.size != img_after.size:
img_after = img_after.resize(img_before.size)
diff = ImageChops.difference(img_before, img_after)
arr = np.array(diff)
changed_pixels = np.any(arr > 10, axis=2).sum()
total_pixels = arr.shape[0] * arr.shape[1]
return changed_pixels / total_pixels
A 1% pixel change might indicate a font shift. A 30% change probably means something broke.
CI Integration (GitHub Actions)
name: Visual Regression
on:
pull_request:
branches: [main]
jobs:
visual-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: pip install requests Pillow numpy
- name: Capture baseline (production)
env:
SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
run: |
python3 scripts/capture_baseline.py \
--url https://yoursite.com \
--output baseline.png
- name: Deploy preview
run: ./deploy-preview.sh
- name: Capture after deploy
env:
SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
PREVIEW_URL: ${{ steps.deploy.outputs.url }}
run: |
python3 scripts/capture_baseline.py \
--url $PREVIEW_URL \
--output after.png
- name: Compare
run: python3 scripts/compare_screenshots.py baseline.png after.png --threshold 0.02
- name: Upload diff on failure
if: failure()
uses: actions/upload-artifact@v3
with:
name: visual-diff
path: diff.png
The Full Comparison Script
#!/usr/bin/env python3
"""compare_screenshots.py — fail CI if images differ beyond threshold"""
import sys
import argparse
import io
import numpy as np
from PIL import Image, ImageChops, ImageDraw
def compare(before_path: str, after_path: str, threshold: float) -> bool:
before = Image.open(before_path).convert("RGB")
after = Image.open(after_path).convert("RGB")
if before.size != after.size:
print(f"Size mismatch: {before.size} vs {after.size}")
after = after.resize(before.size)
diff = ImageChops.difference(before, after)
arr = np.array(diff)
changed = np.any(arr > 10, axis=2).sum()
total = arr.shape[0] * arr.shape[1]
ratio = changed / total
print(f"Changed pixels: {changed:,} / {total:,} ({ratio:.2%})")
if ratio > threshold:
# Save highlighted diff
highlight = before.copy()
mask = np.any(arr > 10, axis=2)
highlight_arr = np.array(highlight)
highlight_arr[mask] = [255, 0, 0] # red overlay on changed regions
Image.fromarray(highlight_arr).save("diff.png")
print(f"FAIL: {ratio:.2%} changed, threshold is {threshold:.2%}")
print("Diff saved to diff.png")
return False
print(f"PASS: {ratio:.2%} changed (within threshold)")
return True
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("before")
parser.add_argument("after")
parser.add_argument("--threshold", type=float, default=0.01)
args = parser.parse_args()
ok = compare(args.before, args.after, args.threshold)
sys.exit(0 if ok else 1)
Capturing Multiple Pages
For sites with more than one page to check:
PAGES_TO_CHECK = [
"/",
"/pricing",
"/docs",
"/blog",
"/login",
]
def run_regression(base_url: str, api_key: str, output_dir: str):
Path(output_dir).mkdir(exist_ok=True)
for path in PAGES_TO_CHECK:
url = base_url.rstrip("/") + path
slug = path.strip("/").replace("/", "-") or "home"
print(f"Capturing {url}...")
img = capture_screenshot(url, api_key)
with open(f"{output_dir}/{slug}.png", "wb") as f:
f.write(img)
print(f" Saved {len(img):,} bytes")
Caveats
Dynamic content: Pages with live data (timestamps, stock prices, personalized content) will always show pixel differences. Screenshot pages you control, or use selectors to mask dynamic regions before comparison.
Rate limits: If your CI checks 20 pages on every PR, you'll hit daily limits fast. Cache baseline screenshots in CI artifacts rather than re-capturing production on every run.
Timing: A screenshot captures state at a moment in time. For pages with animations or async content, add a delay parameter or use the API's wait_for option to ensure the page is fully rendered before capture.
When to Use This
Screenshot API visual regression makes sense when: - You don't want to maintain a headless browser in CI - Your team is small and a simple pixel-diff is good enough - You're checking marketing pages, landing pages, or other visually stable content
For full visual testing suites on complex applications, dedicated tools like Percy or Chromatic are more appropriate. But for lightweight "did we break the homepage" checks, an API call + Pillow is hard to beat.
Full API docs: hermesforge.dev/docs