How to Add Screenshot Capability to a Custom GPT
Custom GPTs can call external APIs using GPT Actions. This makes it possible to give your GPT new capabilities that go beyond its training data: real-time web content, live screenshots, current pricing, rendered UI states.
Here's how to add screenshot capability to a Custom GPT.
What You're Building
A Custom GPT that can: - Accept a URL from the user - Capture a live screenshot of that page - Return the image for visual analysis
Use cases: competitor monitoring, web accessibility checks, content verification, visual QA, link previews.
Step 1: Get an API Key
Sign up at hermesforge.dev/api/keys and create a key. The free tier gives you 10 screenshots per day.
Step 2: Create the Action Schema
In your Custom GPT editor, go to Configure → Actions → Create new action. Paste this OpenAPI schema:
openapi: 3.1.0
info:
title: Screenshot API
description: Capture screenshots of web pages
version: 1.0.0
servers:
- url: https://hermesforge.dev
paths:
/api/screenshot:
get:
operationId: captureScreenshot
summary: Capture a screenshot of a URL
parameters:
- name: url
in: query
required: true
schema:
type: string
description: The URL to screenshot
- name: format
in: query
required: false
schema:
type: string
enum: [png, jpeg, webp]
default: png
description: Image format
- name: width
in: query
required: false
schema:
type: integer
default: 1280
description: Viewport width in pixels
responses:
'200':
description: Screenshot image
content:
image/png:
schema:
type: string
format: binary
Step 3: Configure Authentication
Under Authentication, select API Key and configure: - Auth Type: Bearer - API Key: your key from step 1
Step 4: Write the System Prompt
This is the part most tutorials skip. The system prompt determines how your GPT uses the action. Here's a template:
You have access to a screenshot tool that can capture any web page.
When a user asks you to:
- Analyze a website or web page
- Check what a URL looks like
- Compare two sites visually
- Verify that a page is working
...use the captureScreenshot action to get a current image, then describe what you see in detail.
When analyzing screenshots:
- Note the layout, visual hierarchy, and key UI elements
- Identify any errors, broken elements, or accessibility issues
- Describe text content that's visible
- Note the apparent purpose of the page
Always capture a fresh screenshot rather than relying on training data about what a site looks like — sites change.
Step 5: Test It
In the GPT preview pane, try:
"Take a screenshot of example.com and describe what you see."
The GPT should call the action, receive an image, and describe the rendered page.
Common Issues
The GPT describes the schema instead of calling the action: Your system prompt needs to explicitly instruct it to use the action. Add: "Always use the captureScreenshot action when asked about a URL."
Authentication fails: Check that you're using Bearer token format. The header should be Authorization: Bearer YOUR_KEY.
Images aren't displaying: GPT Actions return binary data. The GPT's vision model receives the image directly — it doesn't show the raw binary to the user.
Rate Limits
The free tier allows 10 screenshots per day. If you're building something production-grade, paid keys have higher limits. The API returns HTTP 429 with a JSON body when you hit the limit — your GPT will relay this to the user automatically.
Beyond Screenshots
The same pattern works for the Chart Rendering API: POST /api/charts/render accepts a Chart.js config and returns a chart image. You can build a Custom GPT that generates data visualizations on demand.
Full API documentation and the OpenAPI spec are at hermesforge.dev/docs.