Using Screenshots to Automate Acceptance Testing Without Playwright
The first thing people say when I mention screenshot-based testing is: "Why not just use Playwright?"
Fair question. Playwright is excellent. But it carries significant weight: a browser binary (100MB+), a test runner, a DOM query model, async/await coordination, and the full surface area of a browser automation API. For many acceptance testing scenarios — especially in CI, or when you own the rendering environment — you don't need any of that.
What you actually need is: does this page look right? Is the right content visible? Does it render without errors?
Screenshots can answer those questions directly.
The Basic Idea
The workflow is:
- Capture a screenshot of a page (or set of pages) in your staging environment
- Compare pixel-by-pixel against a stored baseline
- Fail if the diff exceeds a threshold
- On approval, update the baseline
That's it. No DOM selectors, no page.waitForSelector(), no browser-specific quirks. The screenshot is the assertion.
Setting Up the Test Runner
import requests
import os
import io
import json
from pathlib import Path
from PIL import Image
import numpy as np
API_KEY = os.environ['SCREENSHOT_API_KEY']
SCREENSHOT_URL = 'https://hermesforge.dev/api/screenshot'
BASELINE_DIR = Path('test/baselines')
DIFF_DIR = Path('test/diffs')
BASELINE_DIR.mkdir(parents=True, exist_ok=True)
DIFF_DIR.mkdir(parents=True, exist_ok=True)
def capture(url, width=1280, height=900, delay=1000, full_page=False):
resp = requests.get(
SCREENSHOT_URL,
params={
'url': url,
'width': width,
'height': height,
'format': 'png',
'full_page': str(full_page).lower(),
'delay': delay,
},
headers={'X-API-Key': API_KEY},
timeout=60,
)
resp.raise_for_status()
return Image.open(io.BytesIO(resp.content)).convert('RGB')
def pixel_diff(img_a, img_b):
"""Return (changed_pct, diff_image) between two same-size images."""
arr_a = np.array(img_a, dtype=np.int16)
arr_b = np.array(img_b, dtype=np.int16)
delta = np.abs(arr_a - arr_b).max(axis=2) # max channel diff per pixel
changed = (delta > 10).astype(np.uint8) # 10/255 tolerance for JPEG compression noise
changed_pct = changed.mean() * 100
# Produce a diff image: white background, red where changed
diff = Image.new('RGB', img_a.size, 'white')
diff_arr = np.array(diff)
diff_arr[changed == 1] = [255, 0, 0]
diff_img = Image.fromarray(diff_arr.astype(np.uint8))
return changed_pct, diff_img
class ScreenshotTest:
def __init__(self, name, url, threshold_pct=0.5, **capture_kwargs):
self.name = name
self.url = url
self.threshold_pct = threshold_pct
self.capture_kwargs = capture_kwargs
@property
def baseline_path(self):
return BASELINE_DIR / f'{self.name}.png'
def run(self, update_baseline=False):
print(f' [{self.name}] Capturing...')
current = capture(self.url, **self.capture_kwargs)
if update_baseline or not self.baseline_path.exists():
current.save(self.baseline_path)
print(f' [{self.name}] Baseline saved.')
return True, 0.0, None
baseline = Image.open(self.baseline_path).convert('RGB')
# Resize if dimensions changed (layout shift)
if current.size != baseline.size:
print(f' [{self.name}] Size changed: {baseline.size} → {current.size}')
current = current.resize(baseline.size, Image.LANCZOS)
pct, diff_img = pixel_diff(baseline, current)
diff_path = DIFF_DIR / f'{self.name}_diff.png'
diff_img.save(diff_path)
passed = pct <= self.threshold_pct
status = 'PASS' if passed else 'FAIL'
print(f' [{self.name}] {status}: {pct:.2f}% pixels changed (threshold: {self.threshold_pct}%)')
return passed, pct, diff_path
Defining Your Test Suite
TESTS = [
ScreenshotTest('homepage', 'https://staging.yoursite.com/',
threshold_pct=0.5),
ScreenshotTest('pricing', 'https://staging.yoursite.com/pricing',
threshold_pct=0.5),
ScreenshotTest('dashboard', 'https://staging.yoursite.com/app/dashboard',
threshold_pct=1.0, delay=2000), # charts need extra time
ScreenshotTest('signup', 'https://staging.yoursite.com/signup',
threshold_pct=0.3), # tighter: any UI change here should be intentional
ScreenshotTest('docs-index', 'https://staging.yoursite.com/docs',
threshold_pct=0.5, full_page=True),
ScreenshotTest('mobile-homepage', 'https://staging.yoursite.com/',
threshold_pct=0.5, width=390, height=844), # iPhone 14
]
Running the Suite
import sys
def run_suite(tests, update_baseline=False):
results = []
for test in tests:
try:
passed, pct, diff_path = test.run(update_baseline=update_baseline)
results.append({
'name': test.name,
'passed': passed,
'pct': pct,
'diff': str(diff_path) if diff_path else None,
})
except Exception as e:
print(f' [{test.name}] ERROR: {e}')
results.append({'name': test.name, 'passed': False, 'pct': None, 'error': str(e)})
print()
total = len(results)
failed = [r for r in results if not r['passed']]
passed_count = total - len(failed)
print(f'Results: {passed_count}/{total} passed')
if failed:
print('Failed tests:')
for r in failed:
if 'error' in r:
print(f' - {r["name"]}: ERROR — {r["error"]}')
else:
print(f' - {r["name"]}: {r["pct"]:.2f}% changed, diff at {r["diff"]}')
return len(failed) == 0
if __name__ == '__main__':
update = '--update' in sys.argv
if update:
print('Updating baselines...')
ok = run_suite(TESTS, update_baseline=update)
sys.exit(0 if ok else 1)
Run normally: python test_visual.py — exits 1 if any test fails.
Update baselines (after intentional changes): python test_visual.py --update.
CI Integration
# .github/workflows/visual-tests.yml
name: Visual Acceptance Tests
on:
pull_request:
branches: [main]
jobs:
visual:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install requests Pillow numpy
- name: Run visual tests
env:
SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
run: python test_visual.py
- name: Upload diffs on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: visual-diffs
path: test/diffs/
retention-days: 7
Diffs are uploaded as CI artifacts on failure. Pull the artifact to see exactly which pixels changed.
Handling Dynamic Content
Some pages have content that legitimately changes on every load: timestamps, "last updated" text, live counters, ads. There are three ways to handle this:
Option 1: Mask the dynamic region before comparison
def mask_region(img, box):
"""Set a rectangle to solid grey before diffing."""
arr = np.array(img).copy()
x1, y1, x2, y2 = box
arr[y1:y2, x1:x2] = [200, 200, 200]
return Image.fromarray(arr)
# Usage: mask the timestamp region at the top-right
baseline_masked = mask_region(baseline, (1100, 20, 1280, 60))
current_masked = mask_region(current, (1100, 20, 1280, 60))
pct, diff = pixel_diff(baseline_masked, current_masked)
Option 2: Use a higher threshold for pages with dynamic content
ScreenshotTest('dashboard', url, threshold_pct=3.0) # 3% accepts small counter changes
Option 3: Inject JS to freeze dynamic content
freeze_js = """
// Replace all timestamps with a fixed value
document.querySelectorAll('[data-timestamp]').forEach(el => {
el.textContent = 'Jan 1, 2026';
});
// Hide live chat widget
const chat = document.querySelector('#intercom-container');
if (chat) chat.style.display = 'none';
"""
capture(url, js=freeze_js, delay=1500)
What This Catches That Unit Tests Miss
This workflow catches the class of failure that's invisible to unit tests:
CSS regressions: A z-index change buries a button. A margin change truncates text. Font loading fails and all text renders in Times New Roman. None of these have testable DOM state — the DOM is fine; the visual output is broken.
Third-party content changes: Your payment form iframe changed its styling. Your embedded map started showing an error. Your CDN-hosted logo returned a broken image. Unit tests don't touch third-party frames.
Deployment failures: You deployed a build that included a broken CSS bundle. The page renders but everything is unstyled. Visual tests catch this; unit tests pass because the JS still executes.
Responsive breakpoints: Mobile layout is broken at 390px but fine at 1280px. You wouldn't catch this unless you specifically test mobile viewports — and most unit tests run headless at a fixed width.
What It Doesn't Catch
Screenshot testing is a complement to unit tests, not a replacement.
- Logic errors: Correct-looking UI with wrong data. A graph that renders fine but uses the wrong dataset.
- Interaction flows: Clicking a button and checking what happens.
- Performance regressions: The page looks the same but takes 8 seconds to load.
- Accessibility: Correct pixels, wrong semantics.
For those, use your existing test suite. Screenshot testing handles the visual contract.
Practical Baseline Management
Baselines are checked into version control alongside the test code. This means:
- Visual changes require an explicit
--update+git add test/baselines/— intentional, reviewable - CI uses the baseline from the commit being tested, not from
main - PR reviews include baseline diffs when screenshots change
# After updating baselines, show what changed
git diff --stat test/baselines/
# → test/baselines/pricing.png | Bin 45000 -> 48200 bytes
# → test/baselines/homepage.png | Bin 38000 -> 39100 bytes
Reviewers can download the updated baseline PNGs and inspect them alongside the diff images. It becomes part of the normal code review flow.
Setup
pip install requests Pillow numpy
That's the full dependency list. No browser binary, no Playwright, no test framework. The test runner is 80 lines of Python.
Get Your API Key
Free API key at hermesforge.dev/screenshot. A full 6-test suite against your staging environment costs 6 API calls per run — well within any free tier.