Inside the Screenshot API: What Happens Between Request and Response
Most screenshot APIs look the same from the outside: send a URL, get an image back. The differences that matter — reliability, speed, accuracy, failure handling — are invisible at the API surface. They live in the implementation.
Here is what happens between your API call and the image in the response.
The Request Arrives
You send a POST request to /api/screenshot with a JSON body: a URL, maybe a viewport width, a format preference, a wait condition. The API server receives it, validates the parameters, checks your rate limit, and hands the job to a worker process.
The critical constraint at this stage: concurrency limits. A screenshot involves launching a real browser, which consumes memory. A VPS with 4GB RAM can sustain maybe 3-4 concurrent headless browsers before it starts swapping. Production screenshot APIs manage a pool — they queue requests and dispatch workers as they become available. If you send 10 simultaneous requests, the first few run immediately; the rest wait.
This is why screenshot APIs have rate limits measured in calls per day rather than per second. The actual bottleneck isn't bandwidth or CPU — it's browser process slots.
The Browser Launches
The workhorse for most screenshot APIs is Playwright or Puppeteer, both of which control a headless Chromium instance. The browser launches with specific flags:
--no-sandbox(required for containerized environments; the sandbox doesn't work without kernel privileges)--disable-dev-shm-usage(prevents shared memory issues in Docker)--disable-gpu(no GPU in most VPS environments)--headless=new(Playwright's improved headless mode, more faithful to headed rendering)
A fresh browser context is created for each request — not a fresh browser process. The distinction matters: launching a new Chromium process takes 1-3 seconds. Creating a new context within an existing process takes milliseconds. Good screenshot APIs reuse browser processes across requests and create isolated contexts per request.
The Page Loads
Navigation starts. Playwright calls page.goto(url) with a timeout — typically 15-30 seconds. What happens next depends on the page:
For a simple static page: the HTML downloads, DOM parses, CSS loads, a few images load. Navigation completes in under a second.
For a JavaScript-heavy page: the HTML is a shell. The JS bundle loads, executes, makes API calls, renders the actual content. Playwright's waitUntil: 'networkidle' option waits until there are no more than 2 network connections for 500ms — usually a reasonable proxy for "the page is done loading."
For pages with lazy loading: content below the fold may not exist in the DOM at all. If the viewport is 1280×800, only the top 800px of content has been rendered. This is why some screenshots show blank sections — the content exists in the live page but hasn't been fetched yet because it's not in the viewport.
For pages that redirect: navigation follows redirects automatically. A URL that redirects through a tracking link before landing on the actual page will screenshot the final destination, not the intermediate pages.
The Viewport
The viewport is the browser window size, set before navigation begins. Default is typically 1280×800 (desktop), but this is configurable — 375×812 for mobile, 1920×1080 for large desktop, custom dimensions for specific use cases.
Viewport matters more than most people expect. It affects:
- Layout: responsive designs render differently at different widths. A page that looks fine at 1280px may have a broken layout at 375px.
- Content: as noted above, content below the viewport height may not be loaded or rendered.
- Screenshots vs. full-page screenshots: a viewport screenshot captures exactly the viewport rectangle. A full-page screenshot requires scrolling the page programmatically, capturing each section, and stitching the images together — or using Playwright's
fullPage: trueoption, which expands the viewport height to the full document height before rendering.
Ad and Tracker Blocking
Most production screenshot APIs block advertising and tracking domains. This matters for two reasons:
Performance: third-party ad scripts are frequently slow, sometimes timing out after 5-10 seconds waiting for a bid response. Blocking them can cut page load time by 50% or more.
Visual accuracy: ad slots that fail to load show either blank space or broken placeholder images. Blocking them before the page loads means the layout renders without the gap.
The blocking list typically covers 25-50 major ad and tracker domains: DoubleClick, Google Ads, Facebook Pixel, Hotjar, Intercom, Segment, etc. It does not need to be comprehensive — 80% of the performance benefit comes from the top 10 blocked domains.
Cookie Banner Handling
Cookie consent banners are increasingly a problem. A banner that covers 40% of the viewport makes the screenshot useless for most applications.
There are two approaches:
JavaScript injection: inject a script before the page loads that removes elements matching common cookie banner selectors. This works for most well-structured banners but misses custom implementations.
Waiting and clicking: after the page loads, find a "Accept" or "Agree" button and click it. This works for more banners but adds latency and can fail if the click target is ambiguous.
Most screenshot APIs support at least the injection approach, often configurable via API parameter.
The Screenshot
Once the page is loaded and stable, Playwright calls page.screenshot():
screenshot_bytes = await page.screenshot(
type='webp',
quality=85,
full_page=False,
clip={'x': 0, 'y': 0, 'width': 1280, 'height': 800}
)
PNG: lossless, larger files (typically 300KB-2MB for a full desktop viewport). Best for: pixel-accurate comparison, text legibility, archival.
WebP: lossy compression at configurable quality. At quality=85, typically 50-70% smaller than PNG with no perceptible quality loss for most pages. Best for: thumbnails, previews, bandwidth-sensitive applications.
The screenshot is returned in the HTTP response body with the appropriate Content-Type header. Total time from request to response: 2-8 seconds for a typical page, depending on page complexity and whether any waiting conditions are used.
Why Some Sites Fail
Timeout: the page takes longer than the configured timeout to reach a stable state. Common causes: JavaScript-heavy SPA waiting for slow API data, long-running analytics scripts, pages that continuously poll and never reach networkidle.
Bot detection: some sites detect headless Chromium via user agent, navigator.webdriver property, or behavioral analysis and serve a different page — often a CAPTCHA or empty content. Production APIs patch the most common detection signals but cannot reliably circumvent determined bot blocking.
HTTPS certificate errors: self-signed or expired certs cause navigation to fail by default. Can be bypassed with ignoreHTTPSErrors: true but should be optional — you generally want to know if you're screenshotting a broken cert.
Cross-origin restrictions: some pages block iframe embedding or have CORS policies that prevent certain resources from loading in headless context. The visual result is a partially rendered page.
Memory exhaustion: a page with extremely heavy JavaScript can crash the browser process. The API returns an error rather than a screenshot. Mitigation: process isolation so one crash doesn't affect other concurrent requests.
The Rate Limit Response
When you exceed your daily limit, you get a 429 Too Many Requests response with headers like X-RateLimit-Remaining: 0 and X-RateLimit-Reset: 1711065600. A well-implemented screenshot API tells you exactly when you can retry — in Unix timestamp or ISO 8601 format, not just "try again later."
This is the moment where an API either serves or frustrates you. If you're building something that needs more than the free tier, the 429 response is where the conversion happens.
hermesforge.dev — screenshot API with machine-readable errors, WebP support, and ad blocking. 50 screenshots/day free, no card required.