Automating Multi-Language UI Testing with Screenshot API

2026-04-16 | Tags: [tutorial, screenshot-api, i18n, localization, testing, python]

Localization bugs are invisible until they aren't. A German string that's 40% longer than its English source overflows a button. An Arabic page that hasn't been mirrored renders RTL text in an LTR container. A Japanese font fallback renders at a smaller size than the design expects. These bugs don't surface in unit tests or Lighthouse audits — they surface when a user in Munich or Cairo or Tokyo reports that something looks broken.

Screenshot APIs make it practical to build localization QA into your CI/CD pipeline: capture your app in every supported locale, compare against a baseline, and flag visual regressions before they reach production.

What Localization Visual Testing Catches

Text overflow: Translated strings frequently expand relative to English. German is typically 30-40% longer. Finnish and Polish can run even longer. Buttons, labels, and navigation items designed for English text may clip, wrap unexpectedly, or push adjacent elements out of position.

RTL layout breaks: Arabic, Hebrew, Persian, and Urdu read right-to-left. A properly localized app mirrors its entire layout. A broken RTL implementation might mirror text direction without mirroring the layout, or mirror some components but not others. Only a visual capture shows the full picture.

Missing translations: Untranslated strings appear in the source language, creating a jarring mixed-language UI. Screenshot comparison against a full-translation baseline makes these gaps visible.

Font rendering differences: CJK (Chinese, Japanese, Korean) fonts render at different sizes and weights than Latin fonts. A line that fits perfectly with an English font may wrap with a CJK font at the same point size.

Locale-specific formatting: Dates, numbers, and currencies rendered differently than expected (wrong separators, wrong order, wrong currency symbol placement).

Setting Up Locale Test Pages

To screenshot your app in a specific locale, you need a URL that loads with that locale active. Most modern frameworks support this either through URL-based locale routing or query parameter overrides:

import requests
from pathlib import Path
from typing import Optional
from datetime import datetime, timezone
import hashlib

SCREENSHOT_API_KEY = "your-api-key"
SCREENSHOT_API_URL = "https://hermesforge.dev/api/screenshot"

# URL patterns for locale-specific pages
# Adapt to your routing strategy
LOCALE_URL_PATTERNS = {
    "url_prefix": "https://your-app.com/{locale}/dashboard",   # /de/dashboard, /ja/dashboard
    "query_param": "https://your-app.com/dashboard?lang={locale}",
    "subdomain": "https://{locale}.your-app.com/dashboard",    # de.your-app.com
}


def locale_url(base_pattern: str, locale: str) -> str:
    """Generate locale-specific URL from a pattern."""
    return base_pattern.format(locale=locale)


# Locales to test, with metadata for visual configuration
LOCALES = [
    {"code": "en", "name": "English", "direction": "ltr", "font_scale": 1.0},
    {"code": "de", "name": "German", "direction": "ltr", "font_scale": 1.0},
    {"code": "fr", "name": "French", "direction": "ltr", "font_scale": 1.0},
    {"code": "ja", "name": "Japanese", "direction": "ltr", "font_scale": 0.95},
    {"code": "zh", "name": "Chinese", "direction": "ltr", "font_scale": 0.95},
    {"code": "ar", "name": "Arabic", "direction": "rtl", "font_scale": 1.0},
    {"code": "he", "name": "Hebrew", "direction": "rtl", "font_scale": 1.0},
    {"code": "ko", "name": "Korean", "direction": "ltr", "font_scale": 0.95},
]

Capturing Locale Screenshots

def capture_locale(
    url: str,
    locale: dict,
    page_label: str,
    output_dir: str = "./locale_screenshots",
) -> Optional[dict]:
    """
    Capture a page in a specific locale.
    RTL locales use the same viewport but may need dir attribute inspection.
    """
    Path(output_dir).mkdir(parents=True, exist_ok=True)

    # Standard desktop viewport for most locale testing
    # Use 1280px to ensure text has room to overflow visibly
    params = {
        "url": url,
        "format": "png",
        "width": 1280,
        "full_page": "true",
        "wait": "networkidle",
        "block_ads": "true",
    }

    response = requests.get(
        SCREENSHOT_API_URL,
        params=params,
        headers={"X-API-Key": SCREENSHOT_API_KEY},
        timeout=30,
    )

    if response.status_code != 200:
        print(f"  Failed {locale['code']}: HTTP {response.status_code}")
        return None

    content_hash = hashlib.sha256(response.content).hexdigest()[:12]
    filename = f"{page_label}_{locale['code']}_{content_hash}.png"
    filepath = Path(output_dir) / filename
    filepath.write_bytes(response.content)

    return {
        "locale": locale["code"],
        "locale_name": locale["name"],
        "direction": locale["direction"],
        "url": url,
        "page": page_label,
        "file": str(filepath),
        "content_hash": content_hash,
        "captured_at": datetime.now(timezone.utc).isoformat(),
    }


def capture_all_locales(
    url_pattern: str,
    page_label: str,
    locales: list = LOCALES,
) -> list[dict]:
    """Capture a page across all supported locales."""
    results = []
    for locale in locales:
        url = locale_url(url_pattern, locale["code"])
        print(f"  Capturing {locale['name']} ({locale['code']})...")
        result = capture_locale(url, locale, page_label)
        if result:
            results.append(result)
    return results

Visual Comparison Against English Baseline

English is typically the source locale. Compare all other locales against it to detect layout breaks:

from PIL import Image, ImageChops
import numpy as np


def compare_to_baseline(
    baseline_path: str,
    locale_path: str,
    diff_output_path: str,
    overflow_threshold_pct: float = 5.0,
) -> dict:
    """
    Compare a locale screenshot to the English baseline.
    Detects: overflow (image is taller), significant layout shifts, missing content.
    """
    baseline = Image.open(baseline_path).convert("RGB")
    locale_img = Image.open(locale_path).convert("RGB")

    # Height difference: translated text wrapping → taller page
    height_diff = locale_img.height - baseline.height
    height_diff_pct = (height_diff / baseline.height) * 100

    # Width should be identical (same viewport)
    width_mismatch = locale_img.width != baseline.width

    # For pixel diff, align at top (crop or pad to baseline height)
    if locale_img.height != baseline.height:
        # Pad shorter image or crop longer image for comparison
        comparison_height = min(baseline.height, locale_img.height)
        baseline_crop = baseline.crop((0, 0, baseline.width, comparison_height))
        locale_crop = locale_img.crop((0, 0, locale_img.width, comparison_height))
    else:
        baseline_crop = baseline
        locale_crop = locale_img

    if baseline_crop.size != locale_crop.size:
        locale_crop = locale_crop.resize(baseline_crop.size, Image.LANCZOS)

    diff = ImageChops.difference(baseline_crop, locale_crop)
    diff_array = np.array(diff)
    significant_pixels = (diff_array > 15).any(axis=2).mean() * 100

    # Highlight diff
    diff_highlight = np.array(locale_crop).copy()
    changed_mask = (diff_array > 15).any(axis=2)
    diff_highlight[changed_mask] = [255, 80, 80]
    blended = (np.array(locale_crop) * 0.7 + diff_highlight * 0.3).astype(np.uint8)
    Image.fromarray(blended).save(diff_output_path)

    return {
        "height_diff_px": height_diff,
        "height_diff_pct": round(height_diff_pct, 1),
        "width_mismatch": width_mismatch,
        "pixel_diff_pct": round(float(significant_pixels), 2),
        "overflow_detected": height_diff_pct > overflow_threshold_pct,
        "significant_change": significant_pixels > 10.0,
        "diff_path": diff_output_path,
    }


def run_locale_comparison(
    captures: list[dict],
    baseline_locale: str = "en",
    output_dir: str = "./locale_diffs",
) -> list[dict]:
    """
    Compare all locale captures against the baseline locale.
    Returns comparison results with overflow detection.
    """
    Path(output_dir).mkdir(parents=True, exist_ok=True)

    # Find baseline
    baseline = next((c for c in captures if c["locale"] == baseline_locale), None)
    if not baseline:
        print(f"No baseline capture found for locale '{baseline_locale}'")
        return []

    results = []
    for capture in captures:
        if capture["locale"] == baseline_locale:
            continue

        diff_path = Path(output_dir) / f"diff_{capture['locale']}.png"
        comparison = compare_to_baseline(
            baseline["file"],
            capture["file"],
            str(diff_path),
        )
        comparison["locale"] = capture["locale"]
        comparison["locale_name"] = capture["locale_name"]
        comparison["direction"] = capture["direction"]

        if comparison["overflow_detected"]:
            print(f"  OVERFLOW: {capture['locale_name']} — page is {comparison['height_diff_pct']}% taller than English")
        elif comparison["significant_change"]:
            print(f"  CHANGED: {capture['locale_name']} — {comparison['pixel_diff_pct']}% pixel difference")
        else:
            print(f"  OK: {capture['locale_name']}")

        results.append(comparison)

    return results

Detecting Missing Translations

Missing translations show the source language string instead of the translated one. This creates a visual pattern: a block of English text surrounded by translated text. You can detect this heuristically by checking whether the page has very low visual difference from the English baseline — if a "translated" page looks nearly identical to English, translations may be missing:

def detect_missing_translations(
    baseline_capture: dict,
    locale_captures: list[dict],
    similarity_threshold_pct: float = 3.0,
) -> list[dict]:
    """
    Flag locales where the page looks suspiciously similar to English.
    High similarity = likely missing translations.
    """
    suspects = []
    for capture in locale_captures:
        if capture["locale"] == "en":
            continue

        # Quick hash check: identical hash = definitely no translation
        if capture["content_hash"] == baseline_capture["content_hash"]:
            suspects.append({
                "locale": capture["locale"],
                "reason": "Identical to English (same content hash)",
                "confidence": "certain",
            })
            continue

        # Pixel similarity check
        baseline_img = np.array(Image.open(baseline_capture["file"]).convert("RGB"))
        locale_img = np.array(Image.open(capture["file"]).convert("RGB"))

        # Align heights
        h = min(baseline_img.shape[0], locale_img.shape[0])
        diff_pct = (np.abs(baseline_img[:h] - locale_img[:h]).mean(axis=2) > 15).mean() * 100

        if diff_pct < similarity_threshold_pct:
            suspects.append({
                "locale": capture["locale"],
                "reason": f"Only {diff_pct:.1f}% visual difference from English",
                "confidence": "likely",
            })

    return suspects

Full Localization QA Pipeline

import json


def run_localization_qa(
    url_pattern: str,
    page_label: str,
    output_base: str = "./locale_qa",
) -> dict:
    """
    Full localization QA run for a single page.
    Captures all locales, compares to English baseline, generates report.
    """
    timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
    run_dir = Path(output_base) / page_label / timestamp
    screenshots_dir = run_dir / "screenshots"
    diffs_dir = run_dir / "diffs"

    print(f"Locale QA: {page_label}")
    print(f"Pattern: {url_pattern}")

    # Step 1: Capture all locales
    captures = capture_all_locales(url_pattern, page_label, str(screenshots_dir))
    print(f"Captured {len(captures)} locales")

    # Step 2: Compare against English baseline
    comparisons = run_locale_comparison(captures, "en", str(diffs_dir))

    # Step 3: Detect missing translations
    baseline = next((c for c in captures if c["locale"] == "en"), None)
    if baseline:
        missing_suspects = detect_missing_translations(baseline, captures)
    else:
        missing_suspects = []

    # Step 4: Build report
    overflows = [c for c in comparisons if c["overflow_detected"]]
    significant_changes = [c for c in comparisons if c["significant_change"] and not c["overflow_detected"]]
    rtl_locales = [c for c in captures if c["direction"] == "rtl"]

    report = {
        "page": page_label,
        "timestamp": timestamp,
        "locales_tested": len(captures),
        "overflows": overflows,
        "significant_changes": significant_changes,
        "missing_translation_suspects": missing_suspects,
        "rtl_locales_captured": [c["locale"] for c in rtl_locales],
        "all_comparisons": comparisons,
        "captures": captures,
    }

    # Save report
    (run_dir / "report.json").parent.mkdir(parents=True, exist_ok=True)
    (run_dir / "report.json").write_text(json.dumps(report, indent=2))

    print(f"\nSummary:")
    print(f"  Overflows: {len(overflows)}")
    print(f"  Significant changes: {len(significant_changes)}")
    print(f"  Missing translation suspects: {len(missing_suspects)}")

    return report


# Usage
report = run_localization_qa(
    url_pattern="https://your-app.com/{locale}/pricing",
    page_label="pricing_page",
)

Integrating with CI/CD

Add locale QA to pull requests that touch i18n strings or layout:

# .github/workflows/locale-qa.yml
name: Localization Visual QA

on:
  pull_request:
    paths:
      - 'src/locales/**'
      - 'src/components/**'
      - 'src/styles/**'

jobs:
  locale-qa:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run locale QA
        env:
          SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
        run: |
          python3 scripts/locale_qa.py \
            --pages pricing,dashboard,settings \
            --fail-on overflow
      - name: Upload locale screenshots
        uses: actions/upload-artifact@v3
        with:
          name: locale-qa-${{ github.sha }}
          path: ./locale_qa/

Rate limit planning: 8 locales × 1 page = 8 API calls per page. A 10-page application with 8 locale checks = 80 calls per CI run. At the Starter tier (200/day), you can run the full locale suite twice per day. At the Pro tier (1000/day), this handles a 12-page app with 10 locales with room for multiple daily runs and re-runs on failures.

hermesforge.dev — screenshot API for automated UI testing. Free: 10/day. Starter: $4/30 days (200/day). Pro: $9 (1000/day). Business: $29 (5000/day).