Find Broken Links Before Your Users Do

2026-04-26 | Tags: ["dead-links", "seo", "api", "automation", "web"]

Find Broken Links Before Your Users Do

Every website accumulates broken links over time. Pages get moved, external sites go down, typos slip through code review. The result: users hitting 404 pages, search engines downranking your site, and a general sense that nobody's maintaining things.

Most teams find broken links when a user reports them. Here's how to find them first.

Quick Scan: One Command

curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com" | python3 -m json.tool

This crawls the page at the given URL and checks every link it finds. The response includes:

{
  "url": "https://example.com",
  "total_links": 47,
  "broken_links": [
    {
      "url": "https://example.com/old-page",
      "status": 404,
      "source": "https://example.com",
      "anchor_text": "Read more"
    }
  ],
  "working_links": 45,
  "broken_count": 2,
  "crawl_time_seconds": 3.2
}

No API key needed. No installation. Just curl.

To scan beyond a single page, use the depth parameter:

# Crawl up to 2 levels deep
curl -s "https://hermesforge.dev/api/deadlinks?url=https://mysite.com&depth=2"

For large sites, add max_duration to prevent timeouts:

# Crawl with a 30-second time limit
curl -s "https://hermesforge.dev/api/deadlinks?url=https://bigsite.com&depth=3&max_duration=30"

If the crawl hits the time limit, you'll get partial results with whatever was checked so far — better than no results.

CI/CD Integration

Add broken link checking to your deployment pipeline:

#!/bin/bash
# check-links.sh — Exit 1 if broken links found
RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=${DEPLOY_URL}&depth=2")
BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_count', 0))")

if [ "$BROKEN" -gt 0 ]; then
  echo "FAILED: Found $BROKEN broken links"
  echo "$RESULT" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for link in data.get('broken_links', []):
    print(f\"  {link['status']} {link['url']} (from {link.get('source', '?')})\")
"
  exit 1
fi

echo "PASSED: No broken links found"

GitHub Actions

# .github/workflows/link-check.yml
name: Link Check
on:
  schedule:
    - cron: '0 9 * * 1'  # Every Monday at 9am
  workflow_dispatch:

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
      - name: Check for broken links
        run: |
          RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=https://mysite.com&depth=2")
          BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_count', 0))")
          if [ "$BROKEN" -gt 0 ]; then
            echo "::error::Found $BROKEN broken links"
            echo "$RESULT" | python3 -m json.tool
            exit 1
          fi
          echo "No broken links found"

Weekly Monitoring Script

#!/bin/bash
# weekly-link-audit.sh
SITES=(
  "https://mysite.com"
  "https://docs.mysite.com"
  "https://blog.mysite.com"
)

echo "=== Weekly Link Audit — $(date -u) ==="

for SITE in "${SITES[@]}"; do
  RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=${SITE}&depth=2&max_duration=60")
  BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_count', 0))")
  TOTAL=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total_links', 0))")

  if [ "$BROKEN" -gt 0 ]; then
    echo "WARNING: $SITE — $BROKEN/$TOTAL links broken"
    echo "$RESULT" | python3 -c "
import sys, json
for link in json.load(sys.stdin).get('broken_links', []):
    print(f\"  [{link['status']}] {link['url']}\")
"
  else
    echo "OK: $SITE — $TOTAL links checked, all working"
  fi
  echo ""
done

Output Formats

The API supports multiple output formats for different workflows:

# JSON (default)
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com&format=json"

# CSV — import into spreadsheets
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com&format=csv"

# Markdown — paste into issues or docs
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com&format=markdown"
  1. SEO impact: Google treats broken links as a signal of poor site quality. A site with many 404s will rank lower than a well-maintained competitor.

  2. User trust: Clicking a link and getting a 404 breaks the user's flow. Do it twice and they leave.

  3. Link rot is inevitable: Studies show that about 2% of web links break per year. A site with 500 links will accumulate ~10 broken links per year without maintenance.

  4. External links are unpredictable: You control your own pages, but external sites can change URLs, shut down, or block requests at any time.

Rate Limits

For most monitoring setups, the free tier is more than enough.


Try it now: curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com" | python3 -m json.tool