Automating Multi-Language UI Testing with Screenshot API
Localization bugs are invisible until they aren't. A German string that's 40% longer than its English source overflows a button. An Arabic page that hasn't been mirrored renders RTL text in an LTR container. A Japanese font fallback renders at a smaller size than the design expects. These bugs don't surface in unit tests or Lighthouse audits — they surface when a user in Munich or Cairo or Tokyo reports that something looks broken.
Screenshot APIs make it practical to build localization QA into your CI/CD pipeline: capture your app in every supported locale, compare against a baseline, and flag visual regressions before they reach production.
What Localization Visual Testing Catches
Text overflow: Translated strings frequently expand relative to English. German is typically 30-40% longer. Finnish and Polish can run even longer. Buttons, labels, and navigation items designed for English text may clip, wrap unexpectedly, or push adjacent elements out of position.
RTL layout breaks: Arabic, Hebrew, Persian, and Urdu read right-to-left. A properly localized app mirrors its entire layout. A broken RTL implementation might mirror text direction without mirroring the layout, or mirror some components but not others. Only a visual capture shows the full picture.
Missing translations: Untranslated strings appear in the source language, creating a jarring mixed-language UI. Screenshot comparison against a full-translation baseline makes these gaps visible.
Font rendering differences: CJK (Chinese, Japanese, Korean) fonts render at different sizes and weights than Latin fonts. A line that fits perfectly with an English font may wrap with a CJK font at the same point size.
Locale-specific formatting: Dates, numbers, and currencies rendered differently than expected (wrong separators, wrong order, wrong currency symbol placement).
Setting Up Locale Test Pages
To screenshot your app in a specific locale, you need a URL that loads with that locale active. Most modern frameworks support this either through URL-based locale routing or query parameter overrides:
import requests
from pathlib import Path
from typing import Optional
from datetime import datetime, timezone
import hashlib
SCREENSHOT_API_KEY = "your-api-key"
SCREENSHOT_API_URL = "https://hermesforge.dev/api/screenshot"
# URL patterns for locale-specific pages
# Adapt to your routing strategy
LOCALE_URL_PATTERNS = {
"url_prefix": "https://your-app.com/{locale}/dashboard", # /de/dashboard, /ja/dashboard
"query_param": "https://your-app.com/dashboard?lang={locale}",
"subdomain": "https://{locale}.your-app.com/dashboard", # de.your-app.com
}
def locale_url(base_pattern: str, locale: str) -> str:
"""Generate locale-specific URL from a pattern."""
return base_pattern.format(locale=locale)
# Locales to test, with metadata for visual configuration
LOCALES = [
{"code": "en", "name": "English", "direction": "ltr", "font_scale": 1.0},
{"code": "de", "name": "German", "direction": "ltr", "font_scale": 1.0},
{"code": "fr", "name": "French", "direction": "ltr", "font_scale": 1.0},
{"code": "ja", "name": "Japanese", "direction": "ltr", "font_scale": 0.95},
{"code": "zh", "name": "Chinese", "direction": "ltr", "font_scale": 0.95},
{"code": "ar", "name": "Arabic", "direction": "rtl", "font_scale": 1.0},
{"code": "he", "name": "Hebrew", "direction": "rtl", "font_scale": 1.0},
{"code": "ko", "name": "Korean", "direction": "ltr", "font_scale": 0.95},
]
Capturing Locale Screenshots
def capture_locale(
url: str,
locale: dict,
page_label: str,
output_dir: str = "./locale_screenshots",
) -> Optional[dict]:
"""
Capture a page in a specific locale.
RTL locales use the same viewport but may need dir attribute inspection.
"""
Path(output_dir).mkdir(parents=True, exist_ok=True)
# Standard desktop viewport for most locale testing
# Use 1280px to ensure text has room to overflow visibly
params = {
"url": url,
"format": "png",
"width": 1280,
"full_page": "true",
"wait": "networkidle",
"block_ads": "true",
}
response = requests.get(
SCREENSHOT_API_URL,
params=params,
headers={"X-API-Key": SCREENSHOT_API_KEY},
timeout=30,
)
if response.status_code != 200:
print(f" Failed {locale['code']}: HTTP {response.status_code}")
return None
content_hash = hashlib.sha256(response.content).hexdigest()[:12]
filename = f"{page_label}_{locale['code']}_{content_hash}.png"
filepath = Path(output_dir) / filename
filepath.write_bytes(response.content)
return {
"locale": locale["code"],
"locale_name": locale["name"],
"direction": locale["direction"],
"url": url,
"page": page_label,
"file": str(filepath),
"content_hash": content_hash,
"captured_at": datetime.now(timezone.utc).isoformat(),
}
def capture_all_locales(
url_pattern: str,
page_label: str,
locales: list = LOCALES,
) -> list[dict]:
"""Capture a page across all supported locales."""
results = []
for locale in locales:
url = locale_url(url_pattern, locale["code"])
print(f" Capturing {locale['name']} ({locale['code']})...")
result = capture_locale(url, locale, page_label)
if result:
results.append(result)
return results
Visual Comparison Against English Baseline
English is typically the source locale. Compare all other locales against it to detect layout breaks:
from PIL import Image, ImageChops
import numpy as np
def compare_to_baseline(
baseline_path: str,
locale_path: str,
diff_output_path: str,
overflow_threshold_pct: float = 5.0,
) -> dict:
"""
Compare a locale screenshot to the English baseline.
Detects: overflow (image is taller), significant layout shifts, missing content.
"""
baseline = Image.open(baseline_path).convert("RGB")
locale_img = Image.open(locale_path).convert("RGB")
# Height difference: translated text wrapping → taller page
height_diff = locale_img.height - baseline.height
height_diff_pct = (height_diff / baseline.height) * 100
# Width should be identical (same viewport)
width_mismatch = locale_img.width != baseline.width
# For pixel diff, align at top (crop or pad to baseline height)
if locale_img.height != baseline.height:
# Pad shorter image or crop longer image for comparison
comparison_height = min(baseline.height, locale_img.height)
baseline_crop = baseline.crop((0, 0, baseline.width, comparison_height))
locale_crop = locale_img.crop((0, 0, locale_img.width, comparison_height))
else:
baseline_crop = baseline
locale_crop = locale_img
if baseline_crop.size != locale_crop.size:
locale_crop = locale_crop.resize(baseline_crop.size, Image.LANCZOS)
diff = ImageChops.difference(baseline_crop, locale_crop)
diff_array = np.array(diff)
significant_pixels = (diff_array > 15).any(axis=2).mean() * 100
# Highlight diff
diff_highlight = np.array(locale_crop).copy()
changed_mask = (diff_array > 15).any(axis=2)
diff_highlight[changed_mask] = [255, 80, 80]
blended = (np.array(locale_crop) * 0.7 + diff_highlight * 0.3).astype(np.uint8)
Image.fromarray(blended).save(diff_output_path)
return {
"height_diff_px": height_diff,
"height_diff_pct": round(height_diff_pct, 1),
"width_mismatch": width_mismatch,
"pixel_diff_pct": round(float(significant_pixels), 2),
"overflow_detected": height_diff_pct > overflow_threshold_pct,
"significant_change": significant_pixels > 10.0,
"diff_path": diff_output_path,
}
def run_locale_comparison(
captures: list[dict],
baseline_locale: str = "en",
output_dir: str = "./locale_diffs",
) -> list[dict]:
"""
Compare all locale captures against the baseline locale.
Returns comparison results with overflow detection.
"""
Path(output_dir).mkdir(parents=True, exist_ok=True)
# Find baseline
baseline = next((c for c in captures if c["locale"] == baseline_locale), None)
if not baseline:
print(f"No baseline capture found for locale '{baseline_locale}'")
return []
results = []
for capture in captures:
if capture["locale"] == baseline_locale:
continue
diff_path = Path(output_dir) / f"diff_{capture['locale']}.png"
comparison = compare_to_baseline(
baseline["file"],
capture["file"],
str(diff_path),
)
comparison["locale"] = capture["locale"]
comparison["locale_name"] = capture["locale_name"]
comparison["direction"] = capture["direction"]
if comparison["overflow_detected"]:
print(f" OVERFLOW: {capture['locale_name']} — page is {comparison['height_diff_pct']}% taller than English")
elif comparison["significant_change"]:
print(f" CHANGED: {capture['locale_name']} — {comparison['pixel_diff_pct']}% pixel difference")
else:
print(f" OK: {capture['locale_name']}")
results.append(comparison)
return results
Detecting Missing Translations
Missing translations show the source language string instead of the translated one. This creates a visual pattern: a block of English text surrounded by translated text. You can detect this heuristically by checking whether the page has very low visual difference from the English baseline — if a "translated" page looks nearly identical to English, translations may be missing:
def detect_missing_translations(
baseline_capture: dict,
locale_captures: list[dict],
similarity_threshold_pct: float = 3.0,
) -> list[dict]:
"""
Flag locales where the page looks suspiciously similar to English.
High similarity = likely missing translations.
"""
suspects = []
for capture in locale_captures:
if capture["locale"] == "en":
continue
# Quick hash check: identical hash = definitely no translation
if capture["content_hash"] == baseline_capture["content_hash"]:
suspects.append({
"locale": capture["locale"],
"reason": "Identical to English (same content hash)",
"confidence": "certain",
})
continue
# Pixel similarity check
baseline_img = np.array(Image.open(baseline_capture["file"]).convert("RGB"))
locale_img = np.array(Image.open(capture["file"]).convert("RGB"))
# Align heights
h = min(baseline_img.shape[0], locale_img.shape[0])
diff_pct = (np.abs(baseline_img[:h] - locale_img[:h]).mean(axis=2) > 15).mean() * 100
if diff_pct < similarity_threshold_pct:
suspects.append({
"locale": capture["locale"],
"reason": f"Only {diff_pct:.1f}% visual difference from English",
"confidence": "likely",
})
return suspects
Full Localization QA Pipeline
import json
def run_localization_qa(
url_pattern: str,
page_label: str,
output_base: str = "./locale_qa",
) -> dict:
"""
Full localization QA run for a single page.
Captures all locales, compares to English baseline, generates report.
"""
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
run_dir = Path(output_base) / page_label / timestamp
screenshots_dir = run_dir / "screenshots"
diffs_dir = run_dir / "diffs"
print(f"Locale QA: {page_label}")
print(f"Pattern: {url_pattern}")
# Step 1: Capture all locales
captures = capture_all_locales(url_pattern, page_label, str(screenshots_dir))
print(f"Captured {len(captures)} locales")
# Step 2: Compare against English baseline
comparisons = run_locale_comparison(captures, "en", str(diffs_dir))
# Step 3: Detect missing translations
baseline = next((c for c in captures if c["locale"] == "en"), None)
if baseline:
missing_suspects = detect_missing_translations(baseline, captures)
else:
missing_suspects = []
# Step 4: Build report
overflows = [c for c in comparisons if c["overflow_detected"]]
significant_changes = [c for c in comparisons if c["significant_change"] and not c["overflow_detected"]]
rtl_locales = [c for c in captures if c["direction"] == "rtl"]
report = {
"page": page_label,
"timestamp": timestamp,
"locales_tested": len(captures),
"overflows": overflows,
"significant_changes": significant_changes,
"missing_translation_suspects": missing_suspects,
"rtl_locales_captured": [c["locale"] for c in rtl_locales],
"all_comparisons": comparisons,
"captures": captures,
}
# Save report
(run_dir / "report.json").parent.mkdir(parents=True, exist_ok=True)
(run_dir / "report.json").write_text(json.dumps(report, indent=2))
print(f"\nSummary:")
print(f" Overflows: {len(overflows)}")
print(f" Significant changes: {len(significant_changes)}")
print(f" Missing translation suspects: {len(missing_suspects)}")
return report
# Usage
report = run_localization_qa(
url_pattern="https://your-app.com/{locale}/pricing",
page_label="pricing_page",
)
Integrating with CI/CD
Add locale QA to pull requests that touch i18n strings or layout:
# .github/workflows/locale-qa.yml
name: Localization Visual QA
on:
pull_request:
paths:
- 'src/locales/**'
- 'src/components/**'
- 'src/styles/**'
jobs:
locale-qa:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run locale QA
env:
SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
run: |
python3 scripts/locale_qa.py \
--pages pricing,dashboard,settings \
--fail-on overflow
- name: Upload locale screenshots
uses: actions/upload-artifact@v3
with:
name: locale-qa-${{ github.sha }}
path: ./locale_qa/
Rate limit planning: 8 locales × 1 page = 8 API calls per page. A 10-page application with 8 locale checks = 80 calls per CI run. At the Starter tier (200/day), you can run the full locale suite twice per day. At the Pro tier (1000/day), this handles a 12-page app with 10 locales with room for multiple daily runs and re-runs on failures.
hermesforge.dev — screenshot API for automated UI testing. Free: 10/day. Starter: $4/30 days (200/day). Pro: $9 (1000/day). Business: $29 (5000/day).