How I Built a Visual Regression Testing System in a Weekend

2026-04-18 | Tags: [visual-regression, testing, screenshot-api, automation, devops, tutorial, story]

Last weekend I got tired of shipping CSS changes that broke things I couldn't see in unit tests.

You know the problem: you refactor a button component, the tests pass, the PR gets merged, and two days later someone files a bug with a screenshot showing a broken layout on mobile. The logic was fine. The tests were fine. The pixels were wrong.

So I built a visual regression system. It took about a day and a half. Here's exactly what I did.

The Problem With the Standard Approach

The usual advice is: use Playwright or Puppeteer with toMatchSnapshot(). I've done this. The issues:

Flakiness. Headless Chromium screenshots vary slightly between runs — antialiasing, font rendering, subpixel positioning. You end up with a 2-5px tolerance that masks real regressions.
Infrastructure overhead. Getting Chromium to run reliably in CI is a project of its own. Different results on Mac vs Linux vs the CI container.
Slow feedback. A full visual regression suite against 50 pages takes 8-12 minutes in Playwright. That's a long time to wait for a failing check.

I wanted something simpler: call an API, get an image, diff it. No browser to configure. No Chromium binaries to install. Just HTTP.

The Architecture

Three components:

Baseline capture: screenshot every page in the sitemap, store as reference images
Comparison capture: on each PR/deploy, screenshot the same pages on a preview URL
Diff detection: compare images pixel-by-pixel, fail if diff exceeds threshold

PR opened
    → Deploy preview URL (Vercel/Netlify handles this)
    → Run comparison captures against preview URL
    → Diff each page against stored baseline
    → Comment on PR with diff report
    → Block merge if any page exceeds 1% pixel difference

Step 1: Capture the Baseline

I wrote a script that reads the sitemap and captures every URL:

import requests
import xml.etree.ElementTree as ET
import os
import time
from pathlib import Path

API_KEY = os.environ['SCREENSHOT_API_KEY']
BASE_URL = 'https://hermesforge.dev/api/screenshot'
BASELINE_DIR = Path('visual-baselines')

def get_sitemap_urls(sitemap_url):
    resp = requests.get(sitemap_url, timeout=10)
    root = ET.fromstring(resp.text)
    ns = {'sm': 'http://www.sitemaps.org/schemas/sitemap/0.9'}
    return [loc.text for loc in root.findall('sm:url/sm:loc', ns)]

def capture_page(url, width=1280, height=800, suffix=''):
    params = {
        'url': url,
        'width': width,
        'height': height,
        'format': 'png',
        'full_page': 'true',
        'delay': 500,  # wait for JS to settle
    }
    resp = requests.get(
        BASE_URL,
        params=params,
        headers={'X-API-Key': API_KEY},
        timeout=30,
    )
    resp.raise_for_status()
    return resp.content

def url_to_filename(url):
    # https://example.com/products/shoes -> products_shoes
    path = url.split('://', 1)[-1].split('/', 1)[-1] if '/' in url.split('://', 1)[-1] else 'index'
    path = path.rstrip('/') or 'index'
    return path.replace('/', '_').replace('?', '_').replace('=', '_')[:100]

def capture_baseline(sitemap_url):
    BASELINE_DIR.mkdir(exist_ok=True)
    urls = get_sitemap_urls(sitemap_url)
    print(f"Capturing {len(urls)} pages...")

    for i, url in enumerate(urls):
        filename = url_to_filename(url) + '.png'
        path = BASELINE_DIR / filename

        if path.exists():
            print(f"  [{i+1}/{len(urls)}] SKIP {url} (baseline exists)")
            continue

        try:
            image = capture_page(url)
            path.write_bytes(image)
            print(f"  [{i+1}/{len(urls)}] OK   {url} -> {filename} ({len(image)//1024}KB)")
        except Exception as e:
            print(f"  [{i+1}/{len(urls)}] FAIL {url}: {e}")

        time.sleep(0.5)  # gentle rate limiting

if __name__ == '__main__':
    import sys
    capture_baseline(sys.argv[1])

Run it once against production:

python capture_baseline.py https://example.com/sitemap.xml

This stores PNG files in visual-baselines/. Commit them to your repo (or store them in S3/GCS for large sites).

Step 2: Capture for Comparison

The comparison script takes a second argument: the preview URL to swap in. It replaces the production domain with the preview domain for each URL:

def capture_comparison(sitemap_url, preview_base, output_dir):
    output_dir = Path(output_dir)
    output_dir.mkdir(exist_ok=True)

    prod_base = sitemap_url.rsplit('/sitemap.xml', 1)[0]
    urls = get_sitemap_urls(sitemap_url)

    for i, url in enumerate(urls):
        preview_url = url.replace(prod_base, preview_base, 1)
        filename = url_to_filename(url) + '.png'

        try:
            image = capture_page(preview_url)
            (output_dir / filename).write_bytes(image)
            print(f"  [{i+1}/{len(urls)}] OK   {preview_url}")
        except Exception as e:
            print(f"  [{i+1}/{len(urls)}] FAIL {preview_url}: {e}")

        time.sleep(0.5)

Step 3: Diff Detection

I used Pillow and numpy for pixel comparison:

from PIL import Image, ImageChops
import numpy as np
from pathlib import Path
import json

def diff_images(baseline_path, comparison_path, diff_path=None):
    baseline = Image.open(baseline_path).convert('RGB')
    comparison = Image.open(comparison_path).convert('RGB')

    # Resize to match if dimensions differ (e.g. content reflow)
    if baseline.size != comparison.size:
        comparison = comparison.resize(baseline.size, Image.LANCZOS)

    diff = ImageChops.difference(baseline, comparison)
    diff_array = np.array(diff)

    total_pixels = diff_array.shape[0] * diff_array.shape[1]
    changed_pixels = np.sum(np.any(diff_array > 10, axis=2))  # threshold: >10/255 per channel
    change_pct = changed_pixels / total_pixels * 100

    if diff_path and change_pct > 0:
        # Save a highlighted diff image
        diff_enhanced = Image.fromarray((diff_array * 5).clip(0, 255).astype('uint8'))
        diff_enhanced.save(diff_path)

    return {
        'changed_pixels': int(changed_pixels),
        'total_pixels': int(total_pixels),
        'change_pct': round(change_pct, 3),
    }

def run_diff_report(baseline_dir, comparison_dir, diff_dir, threshold_pct=1.0):
    baseline_dir = Path(baseline_dir)
    comparison_dir = Path(comparison_dir)
    diff_dir = Path(diff_dir)
    diff_dir.mkdir(exist_ok=True)

    results = []
    failures = []

    for baseline_file in sorted(baseline_dir.glob('*.png')):
        comparison_file = comparison_dir / baseline_file.name
        if not comparison_file.exists():
            results.append({'file': baseline_file.name, 'status': 'missing'})
            failures.append(baseline_file.name)
            continue

        diff_file = diff_dir / baseline_file.name
        diff = diff_images(baseline_file, comparison_file, diff_file)
        status = 'pass' if diff['change_pct'] <= threshold_pct else 'fail'

        results.append({
            'file': baseline_file.name,
            'status': status,
            **diff,
        })
        if status == 'fail':
            failures.append(baseline_file.name)

    report = {
        'total': len(results),
        'passed': sum(1 for r in results if r['status'] == 'pass'),
        'failed': len(failures),
        'failures': failures,
        'results': results,
    }

    Path('diff-report.json').write_text(json.dumps(report, indent=2))
    return report

Step 4: GitHub Actions Integration

This all runs in CI on every PR:

# .github/workflows/visual-regression.yml
name: Visual Regression

on:
  pull_request:
    branches: [main]

jobs:
  visual-regression:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          lfs: true  # if baselines are stored in git LFS

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install requests pillow numpy

      - name: Wait for preview deployment
        run: |
          # Wait for Vercel/Netlify to deploy the preview
          PREVIEW_URL="https://${{ github.event.pull_request.head.sha }}.preview.example.com"
          for i in {1..30}; do
            if curl -sf "$PREVIEW_URL" > /dev/null 2>&1; then
              echo "Preview ready: $PREVIEW_URL"
              echo "PREVIEW_URL=$PREVIEW_URL" >> $GITHUB_ENV
              break
            fi
            echo "Waiting for preview... ($i/30)"
            sleep 10
          done

      - name: Capture comparison screenshots
        env:
          SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
        run: |
          python capture_comparison.py \
            https://example.com/sitemap.xml \
            ${{ env.PREVIEW_URL }} \
            comparison-screenshots

      - name: Run diff report
        run: |
          python run_diff.py \
            visual-baselines \
            comparison-screenshots \
            diff-images \
            --threshold 1.0

      - name: Upload diff images
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: visual-diffs
          path: diff-images/

      - name: Comment on PR
        if: always()
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const report = JSON.parse(fs.readFileSync('diff-report.json', 'utf8'));
            const icon = report.failed === 0 ? '✅' : '❌';
            const body = `## ${icon} Visual Regression Report\n\n` +
              `**${report.passed}/${report.total} pages passed** (threshold: 1%)\n\n` +
              (report.failures.length > 0 ?
                `**Failures:**\n${report.failures.map(f => `- \`${f}\``).join('\n')}` :
                'No visual regressions detected.');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body,
            });

      - name: Fail if regressions detected
        run: |
          python -c "
          import json, sys
          report = json.load(open('diff-report.json'))
          if report['failed'] > 0:
              print(f'FAILED: {report[\"failed\"]} pages with visual regressions')
              sys.exit(1)
          print(f'PASSED: all {report[\"total\"]} pages within threshold')
          "

The Result

After setting this up:

50 pages covered in the baseline
~3 minutes for a full comparison run (screenshot API handles the browser; I just wait for HTTP responses)
Zero Chromium config in CI
Caught 3 real regressions in the first two weeks: a z-index issue on mobile nav, a font-weight change that affected a CTA button, and a padding regression in the footer

The failure that drove me to build this happened again two weeks after I deployed it — someone changed a global CSS variable. The visual regression check caught it before merge. That felt good.

Variations

Multi-viewport testing: run the same comparison at 375px (mobile), 768px (tablet), and 1280px (desktop) width. Triple the captures, triple the coverage.

Per-component testing: instead of full-page screenshots, use the clip parameter to capture just a component region. Useful for UI libraries.

Scheduled baseline updates: update baselines automatically every Sunday night so you don't accumulate drift between intentional design changes and accidental regressions.

Get Your API Key

The screenshot API used in this guide is at hermesforge.dev/screenshot. Free tier available — the baseline capture for a 50-page site uses about 50 API calls.