Web Scraping With vs Without JavaScript Execution: When to Use Each

2026-04-24 | Tags: [scraping, python, javascript, api, web, automation]

There are two fundamentally different ways to extract content from a web page. Which you choose has more impact on your scraper's reliability than any other decision.

The Two Models

HTTP scraping: Make an HTTP request, parse the response body. Fast, lightweight, works with requests + BeautifulSoup.

Rendered scraping: Launch a browser, load the page, wait for JavaScript to execute, then read the DOM. Slower, heavier, but captures what users actually see.

The distinction matters because modern web applications are split between these two content models:

Content Model How It Works HTTP Scraping Rendered Scraping
Server-rendered HTML Server sends complete HTML Works Works
Client-side rendering (React, Vue) JS builds the DOM after load Fails Works
Lazy-loaded content Images/data load on scroll Fails Works (with scroll simulation)
API-driven pages XHR/fetch calls populate content Fails Works
Static pages HTML in the response Works Works (overkill)

HTTP Scraping: When It Works

For server-rendered pages, HTTP scraping is the right choice. It's an order of magnitude faster and uses a fraction of the resources.

import requests
from bs4 import BeautifulSoup

def scrape_static(url: str) -> dict:
    headers = {"User-Agent": "Mozilla/5.0 (compatible; MyBot/1.0)"}
    resp = requests.get(url, headers=headers, timeout=10)
    resp.raise_for_status()

    soup = BeautifulSoup(resp.content, "html.parser")
    return {
        "title": soup.find("title").get_text(strip=True) if soup.find("title") else None,
        "h1": soup.find("h1").get_text(strip=True) if soup.find("h1") else None,
        "meta_description": (
            soup.find("meta", attrs={"name": "description"}) or {}
        ).get("content"),
    }

How to tell if a page is server-rendered: view source in your browser (Cmd+U / Ctrl+U). If you see the content in the HTML, HTTP scraping works. If you see a nearly empty <div id="root">, it's client-rendered.

Rendered Scraping: When You Need It

For React/Vue/Angular apps, single-page applications, and any page that loads content via JavaScript, you need to execute the JavaScript. That means a real browser.

from playwright.sync_api import sync_playwright

def scrape_rendered(url: str) -> dict:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")

        title = page.title()
        h1 = page.query_selector("h1")
        h1_text = h1.inner_text() if h1 else None

        browser.close()
        return {"title": title, "h1": h1_text}

This is dramatically slower — typically 3–10 seconds per page vs under 1 second for HTTP scraping. But it works on content that HTTP scraping can't touch.

The Third Option: Screenshot + Vision

For some use cases, you don't need structured data at all. You need to understand what the page looks like.

import requests
import anthropic
import base64

def analyze_page_visually(url: str, question: str, api_key: str) -> str:
    # Capture screenshot via API (no local browser needed)
    img_resp = requests.get(
        "https://hermesforge.dev/api/screenshot",
        params={"url": url, "width": 1280},
        headers={"Authorization": f"Bearer {api_key}"}
    )
    img_resp.raise_for_status()

    img_b64 = base64.standard_b64encode(img_resp.content).decode()

    # Ask a vision model about it
    client = anthropic.Anthropic()
    message = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": img_b64
                    }
                },
                {"type": "text", "text": question}
            ]
        }]
    )
    return message.content[0].text

Screenshot + vision is useful for: - Accessibility audits: "Does this page have sufficient color contrast?" - Layout checks: "Is the call-to-action button above the fold?" - Competitor analysis: "What is this company's pricing structure?" - Visual QA: "Does this page look broken on 1280px width?"

Unlike DOM-based scrapers, vision models handle arbitrary visual complexity — charts, tables, sidebars, mixed layouts. They degrade gracefully on novel page structures rather than throwing selector errors.

Detecting Which Model a Page Uses

Before writing a scraper, do a quick probe:

import requests

def detect_rendering_model(url: str) -> str:
    resp = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, timeout=10)
    content = resp.text

    # Signs of client-side rendering
    csr_signals = [
        'id="root"', 'id="app"', 'id="__next"',
        "react", "vue", "angular", "__NEXT_DATA__",
        "bundle.js", "main.js", "app.js"
    ]

    # Signs of server-rendered content
    ssr_signals = ["<h1", "<article", "<main", "<section", "<p class="]

    csr_count = sum(1 for s in csr_signals if s.lower() in content.lower())
    ssr_count = sum(1 for s in ssr_signals if s in content)

    if csr_count > 3 and ssr_count < 5:
        return "likely client-rendered — use Playwright or screenshot API"
    elif ssr_count > 5:
        return "likely server-rendered — HTTP scraping should work"
    else:
        return "unclear — test both"

print(detect_rendering_model("https://twitter.com"))
# likely client-rendered — use Playwright or screenshot API

print(detect_rendering_model("https://news.ycombinator.com"))
# likely server-rendered — HTTP scraping should work

Performance Comparison

For a batch of 100 pages:

Method Time Memory CPU
HTTP + BeautifulSoup ~30s ~50MB Low
Playwright (local) ~8 min ~500MB High
Screenshot API ~4 min ~20MB Minimal

The screenshot API is slower than HTTP scraping but much lighter than running Playwright locally — no browser installation, no memory overhead, no flaky CDP connections.

When to Switch Methods

Start with HTTP scraping. If you get empty or malformed content, check if the page is client-rendered. If it is, either:

  1. Find the underlying API: Open browser devtools → Network → XHR/Fetch. The page is probably calling a JSON API you can hit directly — faster than any browser approach.

  2. Use Playwright: When you need DOM access on a rendered page.

  3. Use a screenshot API: When you need visual content, don't want local browser overhead, or are running in an environment without browser support (Lambda functions, GitHub Actions with memory constraints).

The right tool depends on what data you need and where your pipeline runs.