How to Add a Screenshot API to Your ChatGPT Custom GPT (Step-by-Step)
How to Add a Screenshot API to Your ChatGPT Custom GPT (Step-by-Step)
ChatGPT Custom GPTs can call external APIs through "Actions" — OpenAPI-defined endpoints your GPT invokes automatically when a user asks for something. One of the most useful capabilities you can add is live website screenshots: your GPT can visually inspect any URL on demand, without the user leaving the conversation.
This walkthrough shows you exactly how to wire the Hermes Screenshot API into a Custom GPT or OpenAI Assistant. No authentication headaches — the API accepts a simple key via query parameter.
What Your GPT Will Be Able to Do
Once configured, users can say things like:
- "Can you screenshot the homepage of acme.com?"
- "Show me what example.com looks like on mobile"
- "Take a screenshot of this URL and tell me if the layout looks broken"
The GPT calls your action, gets back a screenshot URL, and renders the image inline in the conversation. No browser extension, no copy-paste.
Step 1: Get a Free API Key
Go to hermesforge.dev/api/keys and create a free key. The free tier gives you 10 screenshots per day — enough to prototype and test your GPT.
Step 2: Open Your Custom GPT Editor
In ChatGPT, click your profile → My GPTs → Create a GPT (or edit an existing one). Navigate to the Configure tab, then scroll to Actions → Add Action.
Step 3: Paste the OpenAPI Schema
In the schema editor, paste the following. Replace YOUR_KEY_HERE with your actual API key:
openapi: 3.1.0
info:
title: Hermes Screenshot API
version: 1.0.0
description: Capture full-page screenshots of any public URL
servers:
- url: https://hermesforge.dev
paths:
/api/screenshot:
get:
operationId: captureScreenshot
summary: Capture a screenshot of a webpage
parameters:
- name: url
in: query
required: true
schema:
type: string
description: The full URL to screenshot (include https://)
- name: key
in: query
required: true
schema:
type: string
default: YOUR_KEY_HERE
description: Your Hermes API key
- name: width
in: query
required: false
schema:
type: integer
default: 1280
description: Viewport width in pixels (375 for mobile)
- name: height
in: query
required: false
schema:
type: integer
default: 800
description: Viewport height in pixels
- name: full_page
in: query
required: false
schema:
type: boolean
default: false
description: Capture full scrollable page (not just viewport)
responses:
"200":
description: Screenshot image (PNG)
content:
image/png:
schema:
type: string
format: binary
ChatGPT will validate the schema automatically. If there are no errors, you'll see your captureScreenshot operation listed.
Step 4: Update Your System Prompt
Add a short instruction block to your GPT's system prompt so it knows when and how to use the action:
You have access to a screenshot tool via the captureScreenshot action.
When a user asks to see, check, inspect, or screenshot any website or URL:
1. Call captureScreenshot with the full URL (always include https://)
2. Display the returned image inline
3. Describe what you observe in the screenshot
For mobile layout checks, use width=375, height=812.
For full-page captures, add full_page=true.
This keeps the GPT's behavior predictable — it won't try to guess what a page looks like or refuse because it "can't browse the web."
Step 5: Test It
In the GPT preview panel, type:
Screenshot https://example.com
You should see a screenshot appear inline. Try a mobile view:
Show me example.com on a mobile screen
The GPT will call captureScreenshot with width=375 and return the mobile viewport.
Common Use Cases
Competitive analysis GPT: "Screenshot the pricing pages of these 5 competitors and summarize what's different."
QA assistant: "Take a screenshot of our staging URL and flag any obvious layout issues."
Content research: "Screenshot the top 3 results for this search and tell me what patterns you see."
Client reporting: "Capture screenshots of these 10 pages for my monthly report."
Rate Limits
| Tier | Requests/day | Notes |
|---|---|---|
| Free (no key) | 10/day per IP | For testing only |
| Free key | 10/day | Sufficient for low-volume GPTs |
| Paid | Higher limits | Contact for details |
Responses are cached for 1 hour — repeated screenshots of the same URL within that window return immediately without consuming quota.
Troubleshooting
"API error: 429" — You've hit the rate limit. Your GPT will see this as an error; add a note to your system prompt: "If captureScreenshot returns 429, tell the user they've hit the daily limit and suggest they get a free API key at hermesforge.dev/api/keys."
"API error: 502" — The target URL failed to load (invalid URL, site blocked external bots, or TLS issue). Ask the user to verify the URL is publicly accessible.
Screenshot looks wrong — Try adding full_page=false explicitly, or adjust the viewport dimensions.
Full Integration Guide
For a complete reference including the OpenAPI spec, async endpoint documentation, and more system prompt templates, see the ChatGPT Action integration guide.
The entire setup takes about 5 minutes. Once it's running, your GPT gains a visual perception layer — it can see the web the same way a human would, on demand, inline in the conversation.