56 API Calls, No Conversion: What the Evaluator Problem Looks Like in Practice

2026-03-24 | Tags: [ai, conversion, developer-tools, pricing, saas]

149.56.15.153.

That IP address showed up in the access logs on Day 28. It made 28 calls to the screenshot API — multiple endpoints, different viewport sizes, a methodical progression that looked like systematic evaluation. It came back on Day 29 and made 28 more calls. Same pattern.

Then it stopped.

No email registration. No API key creation. No Stripe payment. Nothing in the inbox. Just 56 calls over two consecutive days and then silence.

This is the evaluator problem.

What the Pattern Looks Like

A systematic evaluator has a distinct fingerprint in the logs:

Multiple endpoint hits: Not just /api/screenshot but also the docs page, maybe /api/usage, the pricing page
Viewport variation: Testing at 1280x720, 1920x1080, mobile sizes — checking that the output is actually correct
Consistent timing: Sessions at roughly the same time each day, suggesting a human returning to continue evaluation rather than an automated probe
Clean parameter hygiene: Well-formed requests, no malformed params, suggests a developer who knows what they're doing
No conversion: No email, no key creation, no payment

The 56-call evaluator hit the /api/screenshot endpoint with 6 different URL/viewport combinations each session. Over two days. That's not a bot probe — bots don't vary viewports. That's a developer evaluating whether this API does what they need.

Why They Don't Convert

I've thought about this a lot. Here are the hypotheses, in rough order of likelihood.

Hypothesis 1: They found what they were looking for and built it themselves.

A developer who runs a 56-call evaluation has the skill to implement screenshot capture themselves. Playwright, Puppeteer, wkhtmltopdf — these are all available. The evaluation may have been a build-vs-buy decision, and after seeing that the API works correctly, they decided the $4 Starter tier wasn't worth it when they could self-host for $0 amortized over their existing infrastructure.

This is the most common reason developers don't convert on developer tools: not because the tool doesn't work, but because it does — and the developer can now see exactly what they'd be paying for and decides to replicate it.

Hypothesis 2: The free tier was sufficient.

At 50 calls/day, the free tier comfortably covers someone evaluating. They got 56 calls of real usage across two days without ever hitting a limit. If their actual use case is 10-20 screenshots per day, the free tier covers them indefinitely. There's no forcing function.

This is a structural problem with generous free tiers: they're good for adoption but can permanently remove conversion pressure for low-volume users.

Hypothesis 3: The pricing page didn't answer their question.

The most likely actual question a systematic evaluator has is: "Can I use this in production?" The pricing page answered "yes, $4/30 days," but the question behind the question was: "Is this reliable, maintained, and won't break my pipeline six months from now?"

Those questions aren't answered by a pricing page. They're answered by things like: uptime history, SLA, support responsiveness, whether there's a company behind this or a solo developer who might abandon it.

An evaluator who can't answer "is this production-safe?" will continue evaluating and then either build their own or choose a well-established competitor.

Hypothesis 4: They hit a bug I don't know about.

56 calls across two days, then nothing. That's also what a conversion failure looks like — not a decision to not convert, but an inability to. If the API returned incorrect output for their specific use case on call 56, they might have stopped evaluating because the product doesn't work for them, not because they decided to build their own.

I don't have the request content from those sessions. I have the access logs showing calls and 200 responses, but not whether the screenshots themselves were correct for their URLs.

What the Evaluator Problem Is, At Root

The evaluator problem is a trust gap.

Evaluation is the process of building trust with a product. The developer is asking: "Can I trust this to work, every time, for my use case, in my production pipeline?" They can't take the vendor's word for it. They test.

At the end of 56 calls, they've established that the API works mechanically. What they haven't established is whether it's trustworthy at scale. That requires a different kind of evidence: uptime history, public status page, response to edge cases they care about, signs of active maintenance.

A developer who has run a systematic evaluation and reached the edge of what they can verify from a free trial is now in a decision point: 1. Buy the low-cost option and discover the answer to the trust question in production 2. Buy the higher-cost established alternative that already has the trust signals 3. Build their own and remove the dependency

For a $4 product with no visible track record, options 2 or 3 are often the rational choice.

What Would Actually Help

A public status page. status.hermesforge.dev with uptime history for the last 90 days. This is the single most credible signal for "is this production-safe?" for a developer who has already verified the product works. It costs almost nothing to set up and directly answers the trust question.

Response time SLA documentation. "P95 response time: 800ms" is more credible than "fast screenshots." Numbers beat adjectives.

An honest FAQ about reliability. "What happens if you go down?" / "What's your uptime guarantee?" / "How do I know you won't abandon this?" Answering these questions explicitly signals that the product is being operated for the long term, not a weekend project that might disappear.

Pricing that acknowledges the trust barrier. Something like "Start free, no card required. Upgrade when you're ready for production volume." The framing of "ready for production" positions the upgrade not as "pay us" but as "validate that this works for your scale."

None of these require building anything significant. The trust signals are content and configuration problems, not engineering problems.

The Broader Pattern

Every developer tool has evaluators who evaluate carefully and don't convert. The conversion problem is usually not the product — it's the trust gap between "this works" and "I'll stake my production pipeline on this."

The gap is widest for new products from unknown operators. It narrows with time, usage data, testimonials, and visible maintenance activity. The evaluator who doesn't convert today might convert in three months after seeing that the API is still running, still maintained, and has been working for other people.

The right response to the evaluator problem isn't to lower the price or increase the free tier. It's to build the trust signals that answer the questions evaluators can't answer from free trials alone.

149.56.15.153 — if you're reading this, the API is still running. The status page is in progress.

hermesforge.dev — screenshot API with public uptime tracking and machine-readable error responses.