Automate SEO Audits: Find Broken Links with an API

2026-04-25 | Tags: [dead-links, seo, automation, api, web-quality]

Broken links hurt SEO. Google treats excessive 404 errors as a quality signal, users bounce when they hit dead ends, and link equity leaks through broken outbound links. Finding broken links manually is tedious. An API makes it automatic.

Quick Check: Single Page

The fastest way to check one page for broken links:

curl -s "https://hermesforge.dev/api/deadlinks?url=https://yoursite.com&quick=true" | python3 -m json.tool

The quick=true parameter skips browser rendering and checks links via HTTP HEAD requests. Results come back in under a second for most pages.

Response:

{
  "url": "https://yoursite.com",
  "total_links": 47,
  "broken_links": 3,
  "links": [
    {
      "url": "https://oldpartner.com/page",
      "status": 404,
      "source_page": "https://yoursite.com",
      "anchor_text": "Our Partner"
    },
    {
      "url": "https://expired-domain.io",
      "status": 0,
      "source_page": "https://yoursite.com",
      "anchor_text": "Resources"
    }
  ]
}

Full Site Crawl

To crawl an entire site and check every link on every page:

curl -s "https://hermesforge.dev/api/deadlinks?url=https://yoursite.com" | python3 -m json.tool

Without quick=true, the API crawls the site using a real browser (Playwright), follows internal links, and checks every outbound URL. This catches links that are only visible after JavaScript renders.

Python Integration

import requests
import json

API = "https://hermesforge.dev/api/deadlinks"

def check_site(url, quick=False):
    """Check a site for broken links."""
    resp = requests.get(API, params={
        "url": url,
        "quick": str(quick).lower(),
    })
    return resp.json()

# Quick single-page check
result = check_site("https://yoursite.com/blog", quick=True)
print(f"Found {result['broken_links']} broken links out of {result['total_links']}")

for link in result.get("links", []):
    if link.get("status") in (404, 0):
        print(f"  BROKEN: {link['url']} (status {link['status']})")

Scheduled Monitoring

Check your site weekly and get notified about new broken links:

import requests
import smtplib
from email.mime.text import MIMEText
from datetime import datetime

API = "https://hermesforge.dev/api/deadlinks"

def weekly_link_check():
    result = requests.get(API, params={
        "url": "https://yoursite.com",
        "quick": "true",
    }).json()

    broken = [l for l in result.get("links", []) if l.get("status") in (404, 0, 500)]

    if not broken:
        print(f"{datetime.now()}: All {result['total_links']} links healthy")
        return

    # Build report
    report = f"Broken Link Report - {datetime.now().strftime('%Y-%m-%d')}\n"
    report += f"Site: {result['url']}\n"
    report += f"Total links: {result['total_links']}\n"
    report += f"Broken: {len(broken)}\n\n"

    for link in broken:
        report += f"  [{link.get('status', '?')}] {link['url']}\n"
        if link.get('anchor_text'):
            report += f"       Anchor: {link['anchor_text']}\n"
        report += f"       Found on: {link.get('source_page', 'unknown')}\n\n"

    print(report)
    # Send via email, Slack webhook, etc.

weekly_link_check()

Add to crontab for weekly execution:

0 6 * * 1  python3 /opt/scripts/weekly_link_check.py  # Every Monday at 6am

CI/CD Integration

Catch broken links before they reach production. Add to your deployment pipeline:

#!/bin/bash
# check_links.sh — fail deployment if broken links found

RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=${STAGING_URL}&quick=true")
BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_links',0))")

if [ "$BROKEN" -gt 0 ]; then
  echo "DEPLOY BLOCKED: $BROKEN broken links found on staging"
  echo "$RESULT" | python3 -m json.tool
  exit 1
fi

echo "Link check passed: 0 broken links"

GitHub Actions

name: Link Check
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'  # Weekly Monday 6am

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
      - name: Check for broken links
        run: |
          RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=https://yoursite.com&quick=true")
          BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_links',0))")
          echo "Broken links: $BROKEN"
          echo "$RESULT" | python3 -m json.tool
          if [ "$BROKEN" -gt 0 ]; then
            echo "::error::Found $BROKEN broken links"
            exit 1
          fi

Checking Multiple Sites

Monitor a portfolio of sites:

import requests

API = "https://hermesforge.dev/api/deadlinks"

sites = [
    "https://site1.com",
    "https://site2.com",
    "https://blog.company.com",
    "https://docs.company.com",
]

for site in sites:
    result = requests.get(API, params={"url": site, "quick": "true"}).json()
    broken = result.get("broken_links", 0)
    total = result.get("total_links", 0)
    status = "OK" if broken == 0 else f"BROKEN ({broken})"
    print(f"{status}: {site} — {total} links checked")

Output Formats

The API supports multiple output formats for different workflows:

# JSON (default)
curl -s "${API}?url=https://yoursite.com&quick=true"

# CSV for spreadsheets
curl -s "${API}?url=https://yoursite.com&quick=true&format=csv"

# Markdown for reports
curl -s "${API}?url=https://yoursite.com&quick=true&format=markdown"

# GitHub annotations (for CI/CD)
curl -s "${API}?url=https://yoursite.com&quick=true&format=github"

Large Sites: Timeout Control

For large sites, use max_duration to get partial results instead of a timeout error:

# Crawl for up to 30 seconds, return whatever was found
curl -s "${API}?url=https://large-site.com&max_duration=30"

Threshold-Based Checks

Set a maximum number of allowed broken links:

# Fail only if more than 5 broken links found
curl -s "${API}?url=https://yoursite.com&threshold=5&check_only=true"

Returns a simple pass/fail response for pipeline integration.

  1. Crawl budget waste — Googlebot spends time on 404 pages instead of indexing new content
  2. Link equity loss — PageRank flows through links. Broken outbound links leak authority
  3. User experience — Bounce rate increases. Time on site decreases
  4. Trust signals — A site full of 404s looks unmaintained to both users and search engines

Regular automated checking catches problems before they accumulate.

Getting Started

No signup needed:

# Check any page right now
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com&quick=true" | python3 -m json.tool

For higher rate limits in automated monitoring, get a free API key at /api. Full documentation at /tools/deadlinks.