Find Broken Links Before Your Users Do
Find Broken Links Before Your Users Do
Every website accumulates broken links over time. Pages get moved, external sites go down, typos slip through code review. The result: users hitting 404 pages, search engines downranking your site, and a general sense that nobody's maintaining things.
Most teams find broken links when a user reports them. Here's how to find them first.
Quick Scan: One Command
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com" | python3 -m json.tool
This crawls the page at the given URL and checks every link it finds. The response includes:
{
"url": "https://example.com",
"total_links": 47,
"broken_links": [
{
"url": "https://example.com/old-page",
"status": 404,
"source": "https://example.com",
"anchor_text": "Read more"
}
],
"working_links": 45,
"broken_count": 2,
"crawl_time_seconds": 3.2
}
No API key needed. No installation. Just curl.
Deep Crawl: Follow Internal Links
To scan beyond a single page, use the depth parameter:
# Crawl up to 2 levels deep
curl -s "https://hermesforge.dev/api/deadlinks?url=https://mysite.com&depth=2"
depth=1(default): Check links on the target page onlydepth=2: Follow internal links one level and check those pages toodepth=3: Two levels of following — covers most small-to-medium sites
For large sites, add max_duration to prevent timeouts:
# Crawl with a 30-second time limit
curl -s "https://hermesforge.dev/api/deadlinks?url=https://bigsite.com&depth=3&max_duration=30"
If the crawl hits the time limit, you'll get partial results with whatever was checked so far — better than no results.
CI/CD Integration
Add broken link checking to your deployment pipeline:
#!/bin/bash
# check-links.sh — Exit 1 if broken links found
RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=${DEPLOY_URL}&depth=2")
BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_count', 0))")
if [ "$BROKEN" -gt 0 ]; then
echo "FAILED: Found $BROKEN broken links"
echo "$RESULT" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for link in data.get('broken_links', []):
print(f\" {link['status']} {link['url']} (from {link.get('source', '?')})\")
"
exit 1
fi
echo "PASSED: No broken links found"
GitHub Actions
# .github/workflows/link-check.yml
name: Link Check
on:
schedule:
- cron: '0 9 * * 1' # Every Monday at 9am
workflow_dispatch:
jobs:
check-links:
runs-on: ubuntu-latest
steps:
- name: Check for broken links
run: |
RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=https://mysite.com&depth=2")
BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_count', 0))")
if [ "$BROKEN" -gt 0 ]; then
echo "::error::Found $BROKEN broken links"
echo "$RESULT" | python3 -m json.tool
exit 1
fi
echo "No broken links found"
Weekly Monitoring Script
#!/bin/bash
# weekly-link-audit.sh
SITES=(
"https://mysite.com"
"https://docs.mysite.com"
"https://blog.mysite.com"
)
echo "=== Weekly Link Audit — $(date -u) ==="
for SITE in "${SITES[@]}"; do
RESULT=$(curl -s "https://hermesforge.dev/api/deadlinks?url=${SITE}&depth=2&max_duration=60")
BROKEN=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('broken_count', 0))")
TOTAL=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total_links', 0))")
if [ "$BROKEN" -gt 0 ]; then
echo "WARNING: $SITE — $BROKEN/$TOTAL links broken"
echo "$RESULT" | python3 -c "
import sys, json
for link in json.load(sys.stdin).get('broken_links', []):
print(f\" [{link['status']}] {link['url']}\")
"
else
echo "OK: $SITE — $TOTAL links checked, all working"
fi
echo ""
done
Output Formats
The API supports multiple output formats for different workflows:
# JSON (default)
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com&format=json"
# CSV — import into spreadsheets
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com&format=csv"
# Markdown — paste into issues or docs
curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com&format=markdown"
Why Broken Links Matter
-
SEO impact: Google treats broken links as a signal of poor site quality. A site with many 404s will rank lower than a well-maintained competitor.
-
User trust: Clicking a link and getting a 404 breaks the user's flow. Do it twice and they leave.
-
Link rot is inevitable: Studies show that about 2% of web links break per year. A site with 500 links will accumulate ~10 broken links per year without maintenance.
-
External links are unpredictable: You control your own pages, but external sites can change URLs, shut down, or block requests at any time.
Rate Limits
- Without API key: 5 requests per minute
- With free API key: 20 requests per minute
- Deep crawls (depth 2+) count as one request regardless of pages crawled
For most monitoring setups, the free tier is more than enough.
Try it now: curl -s "https://hermesforge.dev/api/deadlinks?url=https://example.com" | python3 -m json.tool