Screenshot API for Legal Tech: Evidence Preservation and Case Documentation

2026-05-17 | Tags: [screenshot-api, legal-tech, evidence, compliance, automation, tutorials]

Legal tech has a specific relationship with screenshots that no other industry shares: in litigation, a screenshot can be evidence. How it was captured, when it was captured, whether it has been modified — these questions have legal consequences. A screenshot used as evidence in a commercial dispute needs a different capture pipeline than a screenshot used for a newsletter.

This post covers the screenshot API patterns that serve legal technology platforms: evidence preservation, case material documentation, court-ready archive generation, and automated notice capture.

Before the code, the constraint: legal evidence screenshots must satisfy authenticity requirements. In most jurisdictions this means:

This immediately rules out most convenient screenshot API defaults. Block ads? No. Inject CSS to hide cookie banners? No. WebP compression? Use PNG instead — lossless preserves hash integrity better than lossy formats, though either works as long as you hash after capture.

Evidence Preservation Pipeline

import requests
import hashlib
import json
import hmac
from datetime import datetime, timezone
from pathlib import Path

SCREENSHOT_API = "https://hermesforge.dev/api/screenshot"
API_KEY = "your_api_key_here"

def capture_legal_evidence(url: str, matter_ref: str = None,
                            notes: str = None) -> dict:
    """
    Capture a URL as legal evidence. No modifications — raw first-visit state.

    Returns capture metadata including hash for chain-of-custody records.
    """
    captured_at = datetime.now(timezone.utc).isoformat()

    resp = requests.get(SCREENSHOT_API, params={
        "url": url,
        "width": 1440,         # Standard desktop viewport
        "full_page": True,      # Complete page, not just above fold
        "wait_for": "networkidle",  # Wait for dynamic content
        "format": "png"         # Lossless — no quality degradation
        # No block_ads, no inject_css, no inject_js
        # Evidence must represent what a user actually saw
    }, headers={"X-API-Key": API_KEY})
    resp.raise_for_status()

    image_bytes = resp.content
    image_hash = hashlib.sha256(image_bytes).hexdigest()

    return {
        "url": url,
        "matter_ref": matter_ref,
        "notes": notes,
        "captured_at": captured_at,
        "sha256": image_hash,
        "file_size_bytes": len(image_bytes),
        "image_bytes": image_bytes
    }

Chain-of-Custody Archive

Legal evidence requires a chain of custody: a record showing who captured the evidence, when, and that it hasn't been altered since. HMAC-chained manifests provide this:

class LegalEvidenceArchive:
    def __init__(self, archive_dir: str, matter_ref: str, secret_key: bytes):
        """
        Create an evidence archive for a specific legal matter.

        archive_dir: directory to store evidence files
        matter_ref: case/matter reference number
        secret_key: HMAC key — store securely, required for chain verification
        """
        self.matter_ref = matter_ref
        self.archive_dir = Path(archive_dir) / matter_ref
        self.archive_dir.mkdir(parents=True, exist_ok=True)
        self.secret_key = secret_key
        self.manifest_path = self.archive_dir / "chain_of_custody.jsonl"

    def preserve_evidence(self, url: str, exhibit_label: str = None,
                           notes: str = None) -> dict:
        """
        Capture and archive a URL as evidence for this matter.
        Returns the manifest entry for records.
        """
        capture = capture_legal_evidence(url, self.matter_ref, notes)
        timestamp = capture["captured_at"]

        filename = f"exhibit_{exhibit_label or 'evidence'}_{timestamp.replace(':', '-')}.png"
        file_path = self.archive_dir / filename
        file_path.write_bytes(capture["image_bytes"])

        entry = {
            "matter_ref": self.matter_ref,
            "exhibit_label": exhibit_label,
            "url": url,
            "notes": notes,
            "captured_at": timestamp,
            "file": filename,
            "sha256": capture["sha256"],
            "file_size_bytes": capture["file_size_bytes"]
        }

        # HMAC chain: sign this entry + previous entry's HMAC
        prev_hmac = self._get_last_hmac()
        chain_input = json.dumps(entry, sort_keys=True) + (prev_hmac or "")
        entry["hmac"] = hmac.new(
            self.secret_key,
            chain_input.encode("utf-8"),
            hashlib.sha256
        ).hexdigest()

        with open(self.manifest_path, "a") as f:
            f.write(json.dumps(entry) + "\n")

        return entry

    def verify_chain(self) -> dict:
        """Verify the chain of custody is intact. Returns status + any break location."""
        if not self.manifest_path.exists():
            return {"valid": True, "entries": 0, "matter_ref": self.matter_ref}

        entries = []
        with open(self.manifest_path) as f:
            for line in f:
                if line.strip():
                    entries.append(json.loads(line))

        prev_hmac = None
        for i, entry in enumerate(entries):
            stored_hmac = entry.pop("hmac")
            chain_input = json.dumps(entry, sort_keys=True) + (prev_hmac or "")
            expected = hmac.new(
                self.secret_key,
                chain_input.encode("utf-8"),
                hashlib.sha256
            ).hexdigest()

            if stored_hmac != expected:
                return {
                    "valid": False,
                    "matter_ref": self.matter_ref,
                    "break_at_entry": i,
                    "break_at_timestamp": entry.get("captured_at"),
                    "exhibit": entry.get("exhibit_label")
                }

            entry["hmac"] = stored_hmac
            prev_hmac = stored_hmac

        return {
            "valid": True,
            "matter_ref": self.matter_ref,
            "entries": len(entries)
        }

    def export_manifest(self) -> list[dict]:
        """Export all chain-of-custody entries for this matter."""
        if not self.manifest_path.exists():
            return []
        entries = []
        with open(self.manifest_path) as f:
            for line in f:
                if line.strip():
                    entries.append(json.loads(line))
        return entries

    def _get_last_hmac(self) -> str | None:
        if not self.manifest_path.exists():
            return None
        last_line = None
        with open(self.manifest_path) as f:
            for line in f:
                if line.strip():
                    last_line = line.strip()
        return json.loads(last_line).get("hmac") if last_line else None

Legal teams need to monitor for notices posted on regulatory portals, court filing systems, or competitor websites. When a material legal notice appears, time matters:

import sqlite3
from datetime import datetime, timezone

class LegalNoticeMonitor:
    def __init__(self, db_path: str = "legal_notices.db",
                 screenshot_dir: str = "notice_screenshots"):
        self.screenshot_dir = Path(screenshot_dir)
        self.screenshot_dir.mkdir(exist_ok=True)
        self.db_path = db_path
        self._init_db()

    def _init_db(self):
        with sqlite3.connect(self.db_path) as conn:
            conn.execute("""
                CREATE TABLE IF NOT EXISTS monitored_pages (
                    id TEXT PRIMARY KEY,
                    name TEXT,
                    url TEXT,
                    category TEXT,
                    check_interval_minutes INTEGER DEFAULT 60,
                    last_checked TEXT,
                    last_hash TEXT,
                    change_count INTEGER DEFAULT 0
                )
            """)
            conn.execute("""
                CREATE TABLE IF NOT EXISTS notice_snapshots (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    page_id TEXT,
                    captured_at TEXT,
                    file_path TEXT,
                    image_hash TEXT,
                    change_detected BOOLEAN DEFAULT 0,
                    alerted BOOLEAN DEFAULT 0
                )
            """)

    def add_monitored_page(self, page_id: str, name: str, url: str,
                            category: str = "regulatory",
                            check_interval_minutes: int = 60):
        """Register a legal notice page for monitoring."""
        with sqlite3.connect(self.db_path) as conn:
            conn.execute("""
                INSERT OR IGNORE INTO monitored_pages
                (id, name, url, category, check_interval_minutes)
                VALUES (?, ?, ?, ?, ?)
            """, (page_id, name, url, category, check_interval_minutes))

    def check_page(self, page_id: str) -> dict:
        """Check a monitored page for changes. Returns result with change flag."""
        with sqlite3.connect(self.db_path) as conn:
            row = conn.execute(
                "SELECT url, last_hash, name, category FROM monitored_pages WHERE id = ?",
                (page_id,)
            ).fetchone()

        if not row:
            return {"error": f"unknown page_id: {page_id}"}

        url, prev_hash, name, category = row

        # For legal notice monitoring: capture unmodified
        resp = requests.get(SCREENSHOT_API, params={
            "url": url,
            "width": 1440,
            "full_page": True,
            "wait_for": "networkidle",
            "format": "png"
            # No modifications — legal notice capture
        }, headers={"X-API-Key": API_KEY})

        if resp.status_code != 200:
            return {"page_id": page_id, "error": f"HTTP {resp.status_code}"}

        image_bytes = resp.content
        image_hash = hashlib.sha256(image_bytes).hexdigest()
        timestamp = datetime.now(timezone.utc).isoformat()
        change_detected = prev_hash is not None and prev_hash != image_hash

        filename = f"{page_id}_{timestamp.replace(':', '-')}.png"
        file_path = self.screenshot_dir / filename
        file_path.write_bytes(image_bytes)

        with sqlite3.connect(self.db_path) as conn:
            conn.execute("""
                INSERT INTO notice_snapshots
                (page_id, captured_at, file_path, image_hash, change_detected)
                VALUES (?, ?, ?, ?, ?)
            """, (page_id, timestamp, str(file_path), image_hash, change_detected))

            conn.execute("""
                UPDATE monitored_pages SET last_checked = ?, last_hash = ?,
                    change_count = change_count + ? WHERE id = ?
            """, (timestamp, image_hash, 1 if change_detected else 0, page_id))

        return {
            "page_id": page_id,
            "name": name,
            "category": category,
            "change_detected": change_detected,
            "file": str(file_path),
            "hash": image_hash
        }

Court Filing Snapshot Pipeline

When documents are filed in court systems, legal teams often need visual records of the filing confirmation pages — timestamped evidence that a filing was submitted and accepted:

def capture_filing_confirmation(court_system_url: str, case_ref: str,
                                  filing_type: str,
                                  archive: LegalEvidenceArchive) -> dict:
    """
    Capture a court filing confirmation page as evidence.

    Call this immediately after a successful filing, while the confirmation
    page is still displayed. Filing confirmation pages often expire on refresh.
    """
    exhibit_label = f"filing_confirmation_{filing_type}"
    notes = f"Filing confirmation for {filing_type} in case {case_ref}"

    return archive.preserve_evidence(
        url=court_system_url,
        exhibit_label=exhibit_label,
        notes=notes
    )

Generating Court-Ready Evidence Reports

Legal tech platforms often need to produce formatted exhibit bundles from collected evidence:

def generate_evidence_report(archive: LegalEvidenceArchive,
                               output_path: str = None) -> dict:
    """
    Generate a structured evidence report from an archive.

    Returns metadata for all exhibits + chain verification status.
    Suitable for attachment to legal submissions.
    """
    entries = archive.export_manifest()
    chain_status = archive.verify_chain()

    report = {
        "matter_ref": archive.matter_ref,
        "generated_at": datetime.now(timezone.utc).isoformat(),
        "chain_of_custody_valid": chain_status["valid"],
        "total_exhibits": len(entries),
        "exhibits": []
    }

    for entry in entries:
        exhibit = {
            "label": entry.get("exhibit_label", "unlabelled"),
            "url": entry["url"],
            "captured_at": entry["captured_at"],
            "sha256": entry["sha256"],
            "file": entry["file"],
            "notes": entry.get("notes")
        }
        report["exhibits"].append(exhibit)

    if output_path:
        with open(output_path, "w") as f:
            json.dump(report, f, indent=2)

    return report
Workload Frequency Volume Calls/day Tier
Single matter evidence capture As-needed ~10 exhibits 10 Free (50/day)
Active litigation monitoring Hourly 10 pages 240 Starter ($4)
Regulatory notice monitoring 30-min 20 pages 960 Pro ($9)
Full law firm portfolio Combined ~200 matters 2,000+ Business ($29)

Legal evidence capture has the most restrictive parameter set of any industry covered in this series. Nearly every "nice to have" screenshot option — ad blocking, CSS injection, stability CSS, WebP format — becomes a liability in legal contexts because it changes what was captured relative to what a user saw.

The legal tech screenshot pipeline is defined by what you don't do as much as what you do. Lossless PNG, no modifications, full-page, networkidle wait — and then HMAC chain everything.

The other distinguishing feature is time-sensitivity without scheduling. Unlike compliance runs (daily cron) or monitoring (hourly cron), legal evidence often needs to be captured at a specific transactional moment: when a notice appears, when a filing confirms, when a webpage changes in a material way. Event-driven capture, not scheduled capture, is the primary pattern.


Hermesforge Screenshot API: full-page PNG capture, JavaScript rendering, configurable viewports. Get a free API key — 50 calls/day, no signup required.