Using Screenshots to Automate Acceptance Testing Without Playwright

2026-04-22 | Tags: [screenshot-api, testing, qa, automation, python, acceptance-testing, visual-regression, story]

The first thing people say when I mention screenshot-based testing is: "Why not just use Playwright?"

Fair question. Playwright is excellent. But it carries significant weight: a browser binary (100MB+), a test runner, a DOM query model, async/await coordination, and the full surface area of a browser automation API. For many acceptance testing scenarios — especially in CI, or when you own the rendering environment — you don't need any of that.

What you actually need is: does this page look right? Is the right content visible? Does it render without errors?

Screenshots can answer those questions directly.

The Basic Idea

The workflow is:

Capture a screenshot of a page (or set of pages) in your staging environment
Compare pixel-by-pixel against a stored baseline
Fail if the diff exceeds a threshold
On approval, update the baseline

That's it. No DOM selectors, no page.waitForSelector(), no browser-specific quirks. The screenshot is the assertion.

Setting Up the Test Runner

import requests
import os
import io
import json
from pathlib import Path
from PIL import Image
import numpy as np

API_KEY = os.environ['SCREENSHOT_API_KEY']
SCREENSHOT_URL = 'https://hermesforge.dev/api/screenshot'
BASELINE_DIR = Path('test/baselines')
DIFF_DIR = Path('test/diffs')
BASELINE_DIR.mkdir(parents=True, exist_ok=True)
DIFF_DIR.mkdir(parents=True, exist_ok=True)


def capture(url, width=1280, height=900, delay=1000, full_page=False):
    resp = requests.get(
        SCREENSHOT_URL,
        params={
            'url': url,
            'width': width,
            'height': height,
            'format': 'png',
            'full_page': str(full_page).lower(),
            'delay': delay,
        },
        headers={'X-API-Key': API_KEY},
        timeout=60,
    )
    resp.raise_for_status()
    return Image.open(io.BytesIO(resp.content)).convert('RGB')


def pixel_diff(img_a, img_b):
    """Return (changed_pct, diff_image) between two same-size images."""
    arr_a = np.array(img_a, dtype=np.int16)
    arr_b = np.array(img_b, dtype=np.int16)

    delta = np.abs(arr_a - arr_b).max(axis=2)  # max channel diff per pixel
    changed = (delta > 10).astype(np.uint8)     # 10/255 tolerance for JPEG compression noise

    changed_pct = changed.mean() * 100

    # Produce a diff image: white background, red where changed
    diff = Image.new('RGB', img_a.size, 'white')
    diff_arr = np.array(diff)
    diff_arr[changed == 1] = [255, 0, 0]
    diff_img = Image.fromarray(diff_arr.astype(np.uint8))

    return changed_pct, diff_img


class ScreenshotTest:
    def __init__(self, name, url, threshold_pct=0.5, **capture_kwargs):
        self.name = name
        self.url = url
        self.threshold_pct = threshold_pct
        self.capture_kwargs = capture_kwargs

    @property
    def baseline_path(self):
        return BASELINE_DIR / f'{self.name}.png'

    def run(self, update_baseline=False):
        print(f'  [{self.name}] Capturing...')
        current = capture(self.url, **self.capture_kwargs)

        if update_baseline or not self.baseline_path.exists():
            current.save(self.baseline_path)
            print(f'  [{self.name}] Baseline saved.')
            return True, 0.0, None

        baseline = Image.open(self.baseline_path).convert('RGB')

        # Resize if dimensions changed (layout shift)
        if current.size != baseline.size:
            print(f'  [{self.name}] Size changed: {baseline.size} → {current.size}')
            current = current.resize(baseline.size, Image.LANCZOS)

        pct, diff_img = pixel_diff(baseline, current)
        diff_path = DIFF_DIR / f'{self.name}_diff.png'
        diff_img.save(diff_path)

        passed = pct <= self.threshold_pct
        status = 'PASS' if passed else 'FAIL'
        print(f'  [{self.name}] {status}: {pct:.2f}% pixels changed (threshold: {self.threshold_pct}%)')

        return passed, pct, diff_path

Defining Your Test Suite

TESTS = [
    ScreenshotTest('homepage',         'https://staging.yoursite.com/',
                   threshold_pct=0.5),
    ScreenshotTest('pricing',          'https://staging.yoursite.com/pricing',
                   threshold_pct=0.5),
    ScreenshotTest('dashboard',        'https://staging.yoursite.com/app/dashboard',
                   threshold_pct=1.0, delay=2000),   # charts need extra time
    ScreenshotTest('signup',           'https://staging.yoursite.com/signup',
                   threshold_pct=0.3),   # tighter: any UI change here should be intentional
    ScreenshotTest('docs-index',       'https://staging.yoursite.com/docs',
                   threshold_pct=0.5, full_page=True),
    ScreenshotTest('mobile-homepage',  'https://staging.yoursite.com/',
                   threshold_pct=0.5, width=390, height=844),   # iPhone 14
]

Running the Suite

import sys

def run_suite(tests, update_baseline=False):
    results = []
    for test in tests:
        try:
            passed, pct, diff_path = test.run(update_baseline=update_baseline)
            results.append({
                'name': test.name,
                'passed': passed,
                'pct': pct,
                'diff': str(diff_path) if diff_path else None,
            })
        except Exception as e:
            print(f'  [{test.name}] ERROR: {e}')
            results.append({'name': test.name, 'passed': False, 'pct': None, 'error': str(e)})

    print()
    total = len(results)
    failed = [r for r in results if not r['passed']]
    passed_count = total - len(failed)

    print(f'Results: {passed_count}/{total} passed')

    if failed:
        print('Failed tests:')
        for r in failed:
            if 'error' in r:
                print(f'  - {r["name"]}: ERROR — {r["error"]}')
            else:
                print(f'  - {r["name"]}: {r["pct"]:.2f}% changed, diff at {r["diff"]}')

    return len(failed) == 0


if __name__ == '__main__':
    update = '--update' in sys.argv
    if update:
        print('Updating baselines...')
    ok = run_suite(TESTS, update_baseline=update)
    sys.exit(0 if ok else 1)

Run normally: python test_visual.py — exits 1 if any test fails.

Update baselines (after intentional changes): python test_visual.py --update.

CI Integration

# .github/workflows/visual-tests.yml
name: Visual Acceptance Tests

on:
  pull_request:
    branches: [main]

jobs:
  visual:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install requests Pillow numpy

      - name: Run visual tests
        env:
          SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
        run: python test_visual.py

      - name: Upload diffs on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: visual-diffs
          path: test/diffs/
          retention-days: 7

Diffs are uploaded as CI artifacts on failure. Pull the artifact to see exactly which pixels changed.

Handling Dynamic Content

Some pages have content that legitimately changes on every load: timestamps, "last updated" text, live counters, ads. There are three ways to handle this:

Option 1: Mask the dynamic region before comparison

def mask_region(img, box):
    """Set a rectangle to solid grey before diffing."""
    arr = np.array(img).copy()
    x1, y1, x2, y2 = box
    arr[y1:y2, x1:x2] = [200, 200, 200]
    return Image.fromarray(arr)

# Usage: mask the timestamp region at the top-right
baseline_masked = mask_region(baseline, (1100, 20, 1280, 60))
current_masked  = mask_region(current,  (1100, 20, 1280, 60))
pct, diff = pixel_diff(baseline_masked, current_masked)

Option 2: Use a higher threshold for pages with dynamic content

ScreenshotTest('dashboard', url, threshold_pct=3.0)  # 3% accepts small counter changes

Option 3: Inject JS to freeze dynamic content

freeze_js = """
// Replace all timestamps with a fixed value
document.querySelectorAll('[data-timestamp]').forEach(el => {
    el.textContent = 'Jan 1, 2026';
});
// Hide live chat widget
const chat = document.querySelector('#intercom-container');
if (chat) chat.style.display = 'none';
"""

capture(url, js=freeze_js, delay=1500)

What This Catches That Unit Tests Miss

This workflow catches the class of failure that's invisible to unit tests:

CSS regressions: A z-index change buries a button. A margin change truncates text. Font loading fails and all text renders in Times New Roman. None of these have testable DOM state — the DOM is fine; the visual output is broken.

Third-party content changes: Your payment form iframe changed its styling. Your embedded map started showing an error. Your CDN-hosted logo returned a broken image. Unit tests don't touch third-party frames.

Deployment failures: You deployed a build that included a broken CSS bundle. The page renders but everything is unstyled. Visual tests catch this; unit tests pass because the JS still executes.

Responsive breakpoints: Mobile layout is broken at 390px but fine at 1280px. You wouldn't catch this unless you specifically test mobile viewports — and most unit tests run headless at a fixed width.

What It Doesn't Catch

Screenshot testing is a complement to unit tests, not a replacement.

Logic errors: Correct-looking UI with wrong data. A graph that renders fine but uses the wrong dataset.
Interaction flows: Clicking a button and checking what happens.
Performance regressions: The page looks the same but takes 8 seconds to load.
Accessibility: Correct pixels, wrong semantics.

For those, use your existing test suite. Screenshot testing handles the visual contract.

Practical Baseline Management

Baselines are checked into version control alongside the test code. This means:

Visual changes require an explicit --update + git add test/baselines/ — intentional, reviewable
CI uses the baseline from the commit being tested, not from main
PR reviews include baseline diffs when screenshots change

# After updating baselines, show what changed
git diff --stat test/baselines/
# → test/baselines/pricing.png  | Bin 45000 -> 48200 bytes
# → test/baselines/homepage.png | Bin 38000 -> 39100 bytes

Reviewers can download the updated baseline PNGs and inspect them alongside the diff images. It becomes part of the normal code review flow.

Setup

pip install requests Pillow numpy

That's the full dependency list. No browser binary, no Playwright, no test framework. The test runner is 80 lines of Python.

Get Your API Key

Free API key at hermesforge.dev/screenshot. A full 6-test suite against your staging environment costs 6 API calls per run — well within any free tier.