How to Add Screenshot Capability to a Custom GPT

2026-04-24 | Tags: [chatgpt, api, screenshot, openai, custom-gpt, gpt-actions]

Custom GPTs can call external APIs using GPT Actions. This makes it possible to give your GPT new capabilities that go beyond its training data: real-time web content, live screenshots, current pricing, rendered UI states.

Here's how to add screenshot capability to a Custom GPT.

What You're Building

A Custom GPT that can: - Accept a URL from the user - Capture a live screenshot of that page - Return the image for visual analysis

Use cases: competitor monitoring, web accessibility checks, content verification, visual QA, link previews.

Step 1: Get an API Key

Step 2: Create the Action Schema

In your Custom GPT editor, go to Configure → Actions → Create new action. Paste this OpenAPI schema:

openapi: 3.1.0
info:
  title: Screenshot API
  description: Capture screenshots of web pages
  version: 1.0.0
servers:
  - url: https://hermesforge.dev
paths:
  /api/screenshot:
    get:
      operationId: captureScreenshot
      summary: Capture a screenshot of a URL
      parameters:
        - name: url
          in: query
          required: true
          schema:
            type: string
          description: The URL to screenshot
        - name: format
          in: query
          required: false
          schema:
            type: string
            enum: [png, jpeg, webp]
            default: png
          description: Image format
        - name: width
          in: query
          required: false
          schema:
            type: integer
            default: 1280
          description: Viewport width in pixels
      responses:
        '200':
          description: Screenshot image
          content:
            image/png:
              schema:
                type: string
                format: binary

Step 3: Configure Authentication

Under Authentication, select API Key and configure: - Auth Type: Bearer - API Key: your key from step 1

Step 4: Write the System Prompt

This is the part most tutorials skip. The system prompt determines how your GPT uses the action. Here's a template:

You have access to a screenshot tool that can capture any web page.

When a user asks you to:
- Analyze a website or web page
- Check what a URL looks like
- Compare two sites visually
- Verify that a page is working

...use the captureScreenshot action to get a current image, then describe what you see in detail.

When analyzing screenshots:
- Note the layout, visual hierarchy, and key UI elements
- Identify any errors, broken elements, or accessibility issues
- Describe text content that's visible
- Note the apparent purpose of the page

Always capture a fresh screenshot rather than relying on training data about what a site looks like — sites change.

Step 5: Test It

In the GPT preview pane, try:

"Take a screenshot of example.com and describe what you see."

The GPT should call the action, receive an image, and describe the rendered page.

Common Issues

The GPT describes the schema instead of calling the action: Your system prompt needs to explicitly instruct it to use the action. Add: "Always use the captureScreenshot action when asked about a URL."

Authentication fails: Check that you're using Bearer token format. The header should be Authorization: Bearer YOUR_KEY.

Images aren't displaying: GPT Actions return binary data. The GPT's vision model receives the image directly — it doesn't show the raw binary to the user.

Rate Limits

The free tier allows 10 screenshots per day. If you're building something production-grade, paid keys have higher limits. The API returns HTTP 429 with a JSON body when you hit the limit — your GPT will relay this to the user automatically.

Beyond Screenshots

The same pattern works for the Chart Rendering API: POST /api/charts/render accepts a Chart.js config and returns a chart image. You can build a Custom GPT that generates data visualizations on demand.

Full API documentation and the OpenAPI spec are at hermesforge.dev/docs.