Website Screenshot Archiving for Legal and Compliance Teams

2026-05-22 | Tags: [legal, compliance, archiving, screenshot-api, evidence-preservation]

Website Screenshot Archiving for Legal and Compliance Teams

Legal disputes increasingly involve online content. A competitor's false advertising claim. A vendor's breach of representations made on their website. A contractor who claimed capabilities they didn't have. An employee's public statements that violate a non-compete.

The problem: websites change. By the time litigation begins, the evidence may be gone. Defendants are under no obligation to preserve their own damaging web content, and they know it.

Screenshot archiving is one layer of a content preservation strategy. When combined with timestamps, it creates a record of what existed online at a specific moment — useful for both offensive and defensive legal purposes.

The Basic Archiving Pattern

import requests
import json
import hashlib
from datetime import datetime, timezone

SCREENSHOT_API = "https://hermesforge.dev/api/screenshot"
API_KEY = "YOUR_KEY"

def archive_url(url, case_id, notes=""):
    """
    Capture and archive a URL with metadata for legal purposes.
    Returns a record suitable for inclusion in a case file.
    """
    captured_at = datetime.now(timezone.utc).isoformat()

    response = requests.get(SCREENSHOT_API, params={
        "url": url,
        "width": 1440,
        "height": 900,
        "delay": 3000,   # allow full page render
        "format": "png"
    }, headers={"X-API-Key": API_KEY})

    # Generate hash for integrity verification
    content_hash = hashlib.sha256(response.content).hexdigest()

    # Save screenshot
    filename = f"archive/{case_id}/{captured_at.replace(':', '-')}.png"
    with open(filename, "wb") as f:
        f.write(response.content)

    # Save metadata record
    record = {
        "url": url,
        "captured_at": captured_at,
        "case_id": case_id,
        "notes": notes,
        "screenshot_file": filename,
        "sha256": content_hash,
        "api_response_status": response.status_code,
        "content_length": len(response.content)
    }

    metadata_file = f"archive/{case_id}/{captured_at.replace(':', '-')}.json"
    with open(metadata_file, "w") as f:
        json.dump(record, f, indent=2)

    return record

The SHA-256 hash of the screenshot file is important: it provides a cryptographic fingerprint that can verify the image hasn't been altered after capture.

Building a Litigation Hold Archive

When litigation is anticipated or commenced, a litigation hold requires preserving all potentially relevant evidence. For web content under your control, that's straightforward. For third-party web content, you need to capture it before it disappears.

import os

def create_litigation_hold(urls, case_name, interval_hours=24):
    """
    Set up regular captures of URLs under litigation hold.
    Captures multiple times to show content over time.
    """
    case_dir = f"holds/{case_name}"
    os.makedirs(case_dir, exist_ok=True)

    records = []
    for url in urls:
        record = archive_url(url, case_name, notes="litigation hold capture")
        records.append(record)
        print(f"Captured: {url} -> {record['sha256'][:16]}...")

    # Append to hold log
    log_file = f"{case_dir}/capture_log.jsonl"
    with open(log_file, "a") as f:
        for record in records:
            f.write(json.dumps(record) + "\n")

    return records

# Example: competitor false advertising claim
HOLD_URLS = [
    "https://competitor.com/product/claims",
    "https://competitor.com/about",
    "https://competitor.com/certifications",
]

create_litigation_hold(HOLD_URLS, "acme-vs-competitor-2026-q1")

Run this daily via cron once a hold is in place. Each capture is independently logged with timestamp and hash.

Full-Page Capture for Long Documents

Terms of service, privacy policies, and vendor agreements often extend below the viewport. Use a tall viewport to capture the full page:

def capture_full_page(url, case_id, notes=""):
    """Capture an extended-height screenshot for long documents."""
    return archive_url_with_params(url, case_id, notes, width=1280, height=5000)

def archive_url_with_params(url, case_id, notes, width=1440, height=900):
    """Archive with specific viewport parameters."""
    captured_at = datetime.now(timezone.utc).isoformat()

    response = requests.get(SCREENSHOT_API, params={
        "url": url,
        "width": width,
        "height": height,
        "delay": 2000,
        "format": "png"
    }, headers={"X-API-Key": API_KEY})

    content_hash = hashlib.sha256(response.content).hexdigest()
    filename = f"archive/{case_id}/fullpage-{captured_at.replace(':', '-')}.png"

    with open(filename, "wb") as f:
        f.write(response.content)

    return {
        "url": url,
        "captured_at": captured_at,
        "sha256": content_hash,
        "viewport": f"{width}x{height}",
        "file": filename
    }

Regulatory Compliance Monitoring

Regulated industries (financial services, healthcare, pharma) often have requirements around monitoring third-party representations. A screenshot archive provides an audit trail:

COMPLIANCE_WATCHLIST = [
    {
        "url": "https://partner.com/compliance-certifications",
        "requirement": "ISO 27001 certification claim",
        "check_frequency": "weekly"
    },
    {
        "url": "https://vendor.com/hipaa",
        "requirement": "HIPAA compliance representation",
        "check_frequency": "monthly"
    },
]

def compliance_capture(watchlist, audit_period):
    """Regular compliance monitoring captures."""
    results = []
    for item in watchlist:
        record = archive_url(
            item["url"],
            f"compliance-{audit_period}",
            notes=f"Requirement: {item['requirement']}"
        )
        results.append({**record, "requirement": item["requirement"]})
    return results

What This Is Not

Screenshot archives have evidentiary limitations that legal counsel should understand:

They show appearance, not origin. A screenshot proves a page looked a certain way when captured, not that the content was created or controlled by any particular party. Authentication requires additional evidence.

Timestamps are not independently verifiable. The timestamp in the metadata comes from the capturing system's clock. For legally significant evidence, consider services that provide third-party timestamping (RFC 3161 timestamps or notarized captures).

Bot detection may return misleading content. Some sites detect non-human access and display different content. If a site returns a CAPTCHA or error page to your capture system, you've documented that — not the intended content. Always verify captures manually before relying on them.

Archival services may be more appropriate. For serious litigation, services like PageFreezer, Hanzo, or the Wayback Machine's Save Page Now feature provide third-party attestation that screenshot tools cannot. Legal counsel should evaluate the evidentiary requirements before choosing a preservation method.

For internal compliance monitoring, competitive intelligence archiving, and preliminary evidence gathering, screenshot APIs are a cost-effective starting point. For court submissions, consult your legal team about appropriate authentication methods.

The screenshot API supports the viewport sizes and timing controls needed for thorough page capture. For bulk archiving of multiple URLs, the batch endpoint reduces capture time significantly.