Automate SEO Audits: Find Broken Links with an API
Broken links hurt SEO. Google treats excessive 404 errors as a quality signal, users bounce when they hit dead ends, and link equity leaks through broken outbound links. Finding broken links manually is tedious. An API makes it automatic.
Quick Check: Single Page
The fastest way to check one page for broken links:
curl -s "https://hermesforge.dev/api/deadlinks?url=https://yoursite.com&quick=true" | python3 -m json.tool
The quick=true parameter skips browser rendering and checks links via HTTP HEAD requests. Results come back in under a second for most pages.
Response:
{
"url": "https://yoursite.com",
"total_links": 47,
"broken_links": 3,
"links": [
{
"url": "https://oldpartner.com/page",
"status": 404,
"source_page": "https://yoursite.com",
"anchor_text": "Our Partner"
},
{
"url": "https://expired-domain.io",
"status": 0,
"source_page": "https://yoursite.com",
"anchor_text": "Resources"
}
]
}
Full Site Crawl
To crawl an entire site and check every link on every page:
curl -s "https://hermesforge.dev/api/deadlinks?url=https://yoursite.com" | python3 -m json.tool
Without quick=true, the API crawls the site using a real browser (Playwright), follows internal links, and checks every outbound URL. This catches links that are only visible after JavaScript renders.
Python Integration
import requests
import json
API = "https://hermesforge.dev/api/deadlinks"
def check_site(url, quick=False):
"""Check a site for broken links."""
resp = requests.get(API, params={
"url": url,
"quick": str(quick).lower(),
})
return resp.json()
# Quick single-page check
result = check_site("https://yoursite.com/blog", quick=True)
print(f"Found {result['broken_links']} broken links out of {result['total_links']}")
for link in result.get("links", []):
if link.get("status") in (404, 0):
print(f" BROKEN: {link['url']} (status {link['status']})")
Scheduled Monitoring
Check your site weekly and get notified about new broken links:
import requests
import smtplib
from email.mime.text import MIMEText
from datetime import datetime
API = "https://hermesforge.dev/api/deadlinks"
def weekly_link_check():
result = requests.get(API, params={
"url": "https://yoursite.com",
"quick": "true",
}).json()
broken = [l for l in result.get("links", []) if l.get("status") in (404, 0, 500)]
if not broken:
print(f"{datetime.now()}: All {result['total_links']} links healthy")
return
# Build report
report = f"Broken Link Report - {datetime.now().strftime('%Y-%m-%d')}\n"
report += f"Site: {result['url']}\n"
report += f"Total links: {result['total_links']}\n"
report += f"Broken: {len(broken)}\n\n"
for link in broken:
report += f" [{link.get('status', '?')}] {link['url']}\n"
if link.get('anchor_text'):
report += f" Anchor: {link['anchor_text']}\n"
report += f" Found on: {link.get('source_page', 'unknown')}\n\n"
print(report)
# Send via email, Slack webhook, etc.
weekly_link_check()
Add to crontab for weekly execution:
0 6 * * 1 python3 /opt/scripts/weekly_link_check.py # Every Monday at 6am
CI/CD Integration
Catch broken links before they reach production. Add to your deployment pipeline:
#!/bin/bash
# check_links.sh — fail deployment if broken links found
RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=${STAGING_URL}&quick=true")
BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_links',0))")
if [ "$BROKEN" -gt 0 ]; then
echo "DEPLOY BLOCKED: $BROKEN broken links found on staging"
echo "$RESULT" | python3 -m json.tool
exit 1
fi
echo "Link check passed: 0 broken links"
GitHub Actions
name: Link Check
on:
push:
branches: [main]
schedule:
- cron: '0 6 * * 1' # Weekly Monday 6am
jobs:
check-links:
runs-on: ubuntu-latest
steps:
- name: Check for broken links
run: |
RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=https://yoursite.com&quick=true")
BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_links',0))")
echo "Broken links: $BROKEN"
echo "$RESULT" | python3 -m json.tool
if [ "$BROKEN" -gt 0 ]; then
echo "::error::Found $BROKEN broken links"
exit 1
fi
Checking Multiple Sites
Monitor a portfolio of sites:
import requests
API = "https://hermesforge.dev/api/deadlinks"
sites = [
"https://site1.com",
"https://site2.com",
"https://blog.company.com",
"https://docs.company.com",
]
for site in sites:
result = requests.get(API, params={"url": site, "quick": "true"}).json()
broken = result.get("broken_links", 0)
total = result.get("total_links", 0)
status = "OK" if broken == 0 else f"BROKEN ({broken})"
print(f"{status}: {site} — {total} links checked")
Output Formats
The API supports multiple output formats for different workflows:
# JSON (default)
curl -s "${API}?url=https://yoursite.com&quick=true"
# CSV for spreadsheets
curl -s "${API}?url=https://yoursite.com&quick=true&format=csv"
# Markdown for reports
curl -s "${API}?url=https://yoursite.com&quick=true&format=markdown"
# GitHub annotations (for CI/CD)
curl -s "${API}?url=https://yoursite.com&quick=true&format=github"
Large Sites: Timeout Control
For large sites, use max_duration to get partial results instead of a timeout error:
# Crawl for up to 30 seconds, return whatever was found
curl -s "${API}?url=https://large-site.com&max_duration=30"
Threshold-Based Checks
Set a maximum number of allowed broken links:
# Fail only if more than 5 broken links found
curl -s "${API}?url=https://yoursite.com&threshold=5&check_only=true"
Returns a simple pass/fail response for pipeline integration.
Why Broken Links Matter for SEO
- Crawl budget waste — Googlebot spends time on 404 pages instead of indexing new content
- Link equity loss — PageRank flows through links. Broken outbound links leak authority
- User experience — Bounce rate increases. Time on site decreases
- Trust signals — A site full of 404s looks unmaintained to both users and search engines
Regular automated checking catches problems before they accumulate.
Getting Started
No signup needed:
# Check any page right now
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com&quick=true" | python3 -m json.tool
For higher rate limits in automated monitoring, get a free API key at /api. Full documentation at /tools/deadlinks.