How to Add a Screenshot API to Your ChatGPT Custom GPT (Step-by-Step)

2026-04-16 | Tags: [chatgpt, custom-gpt, openai-assistants, screenshot-api, tutorial]

How to Add a Screenshot API to Your ChatGPT Custom GPT (Step-by-Step)

ChatGPT Custom GPTs can call external APIs through "Actions" — OpenAPI-defined endpoints your GPT invokes automatically when a user asks for something. One of the most useful capabilities you can add is live website screenshots: your GPT can visually inspect any URL on demand, without the user leaving the conversation.

This walkthrough shows you exactly how to wire the Hermes Screenshot API into a Custom GPT or OpenAI Assistant. No authentication headaches — the API accepts a simple key via query parameter.

What Your GPT Will Be Able to Do

Once configured, users can say things like:

"Can you screenshot the homepage of acme.com?"
"Show me what example.com looks like on mobile"
"Take a screenshot of this URL and tell me if the layout looks broken"

The GPT calls your action, gets back a screenshot URL, and renders the image inline in the conversation. No browser extension, no copy-paste.

Step 1: Get a Free API Key

Go to hermesforge.dev/api/keys and create a free key. The free tier gives you 10 screenshots per day — enough to prototype and test your GPT.

Step 2: Open Your Custom GPT Editor

In ChatGPT, click your profile → My GPTs → Create a GPT (or edit an existing one). Navigate to the Configure tab, then scroll to Actions → Add Action.

Step 3: Paste the OpenAPI Schema

In the schema editor, paste the following. Replace YOUR_KEY_HERE with your actual API key:

openapi: 3.1.0
info:
  title: Hermes Screenshot API
  version: 1.0.0
  description: Capture full-page screenshots of any public URL
servers:
  - url: https://hermesforge.dev
paths:
  /api/screenshot:
    get:
      operationId: captureScreenshot
      summary: Capture a screenshot of a webpage
      parameters:
        - name: url
          in: query
          required: true
          schema:
            type: string
          description: The full URL to screenshot (include https://)
        - name: key
          in: query
          required: true
          schema:
            type: string
          default: YOUR_KEY_HERE
          description: Your Hermes API key
        - name: width
          in: query
          required: false
          schema:
            type: integer
            default: 1280
          description: Viewport width in pixels (375 for mobile)
        - name: height
          in: query
          required: false
          schema:
            type: integer
            default: 800
          description: Viewport height in pixels
        - name: full_page
          in: query
          required: false
          schema:
            type: boolean
            default: false
          description: Capture full scrollable page (not just viewport)
      responses:
        "200":
          description: Screenshot image (PNG)
          content:
            image/png:
              schema:
                type: string
                format: binary

ChatGPT will validate the schema automatically. If there are no errors, you'll see your captureScreenshot operation listed.

Step 4: Update Your System Prompt

Add a short instruction block to your GPT's system prompt so it knows when and how to use the action:

You have access to a screenshot tool via the captureScreenshot action.

When a user asks to see, check, inspect, or screenshot any website or URL:
1. Call captureScreenshot with the full URL (always include https://)
2. Display the returned image inline
3. Describe what you observe in the screenshot

For mobile layout checks, use width=375, height=812.
For full-page captures, add full_page=true.

This keeps the GPT's behavior predictable — it won't try to guess what a page looks like or refuse because it "can't browse the web."

Step 5: Test It

In the GPT preview panel, type:

Screenshot https://example.com

You should see a screenshot appear inline. Try a mobile view:

Show me example.com on a mobile screen

The GPT will call captureScreenshot with width=375 and return the mobile viewport.

Common Use Cases

Competitive analysis GPT: "Screenshot the pricing pages of these 5 competitors and summarize what's different."

QA assistant: "Take a screenshot of our staging URL and flag any obvious layout issues."

Content research: "Screenshot the top 3 results for this search and tell me what patterns you see."

Client reporting: "Capture screenshots of these 10 pages for my monthly report."

Rate Limits

Tier	Requests/day	Notes
Free (no key)	10/day per IP	For testing only
Free key	10/day	Sufficient for low-volume GPTs
Paid	Higher limits	Contact for details

Responses are cached for 1 hour — repeated screenshots of the same URL within that window return immediately without consuming quota.

Troubleshooting

"API error: 429" — You've hit the rate limit. Your GPT will see this as an error; add a note to your system prompt: "If captureScreenshot returns 429, tell the user they've hit the daily limit and suggest they get a free API key at hermesforge.dev/api/keys."

"API error: 502" — The target URL failed to load (invalid URL, site blocked external bots, or TLS issue). Ask the user to verify the URL is publicly accessible.

Screenshot looks wrong — Try adding full_page=false explicitly, or adjust the viewport dimensions.

Full Integration Guide

For a complete reference including the OpenAPI spec, async endpoint documentation, and more system prompt templates, see the ChatGPT Action integration guide.

The entire setup takes about 5 minutes. Once it's running, your GPT gains a visual perception layer — it can see the web the same way a human would, on demand, inline in the conversation.