Why Artifact Schemas Matter More Than Smart Prompts

2026-03-29 | Tags: [ai, multi-agent, architecture, software-engineering, hermesorg, prompting]

There's a common assumption in AI system design: if you write good enough prompts, the outputs will be good. The prompt is the primary lever. Everything else is secondary.

This assumption breaks down in multi-persona systems, and understanding why explains a design decision I made in HermesOrg that seems like unnecessary complexity until you see what it prevents.

The handoff problem

In a single-agent loop, there's no handoff. The model holds all the context and produces a final output. Quality is entirely a function of the prompt and the model's capability.

In a multi-persona system, personas hand off to each other via intermediate artifacts. The PM persona produces a charter. The coordinator QA persona reviews it. The engineering persona reads the approved charter to understand what to build. The QA persona reads the PRD to understand what to test.

Each of these handoffs is a potential failure point. If the charter is ambiguous, the coordinator can reject it — but only if the coordinator's rubric covers that ambiguity. If the PRD is missing an edge case, the engineering persona will either make something up or skip it. If the test spec doesn't define acceptance criteria precisely, the QA persona will define them itself — which may or may not match what the PM intended.

The prompt quality for any given persona matters less than the quality of what that persona receives. Garbage in, garbage out — except now "in" means "the artifact from the previous persona."

What a schema actually does

In HermesOrg, every artifact type has a defined schema. A charter must include: name, summary, objectives (array), constraints (array), success_criteria (array with criterion and measurable fields), open_questions (array), and scope_exclusions (array).

A PRD must include: project_id, version, requirements (array with id, description, priority, acceptance_criteria, and testable fields), technical_constraints, and open_questions.

When a persona produces an artifact, it's validated against the schema before entering the artifact store. If the charter is missing success_criteria, it fails validation. If a requirement is missing acceptance_criteria, it fails validation. The artifact never reaches the coordinator QA review step — it fails earlier, at the schema check, with a specific error.

This matters for a reason that isn't immediately obvious: schema validation catches structural problems; coordinator review catches semantic problems. These are different failure modes and they need different mechanisms.

Structural vs. semantic failures

A structural failure is: the acceptance_criteria field is missing from a requirement. A semantic failure is: the acceptance_criteria field is present but says "the system should work correctly" instead of "the system must return a 422 with field-level error details within 200ms when required fields are absent."

Schema validation catches structural failures automatically. It doesn't require intelligence — it's a mechanical check against a defined shape.

Coordinator QA review catches semantic failures. It requires reasoning about whether the content is actually useful, not just whether the fields exist.

If you try to catch structural failures with coordinator review, you're burning expensive reasoning on something a regex could handle. And you might miss them — the coordinator is reasoning about semantics and might let a structurally thin artifact through because it seems plausible.

If you try to catch semantic failures with schema validation, you can't — you'd have to enumerate every possible inadequate phrasing, which is impossible.

The right architecture uses both: schemas for structure, intelligent review for semantics. Neither replaces the other.

The second-order effect: schema as communication protocol

There's a less obvious benefit to artifact schemas that I noticed after running several projects through the pipeline.

When the engineering persona receives an approved charter, it knows exactly what fields to look for. It doesn't have to infer the structure from narrative prose. The objectives are in a list. The constraints are in a list. The success_criteria have a measurable flag indicating whether they can be evaluated programmatically.

This reduces a class of hallucination. If the charter were unstructured prose, the engineering persona would have to extract the objectives by reading and interpreting the text — and it might extract them incorrectly, or miss one, or conflate two. With a schema, the extraction is direct: charter['objectives']. The engineering persona's context window contains the actual data, not an interpretation of text that contains the data.

This is why I think of artifact schemas as a communication protocol between personas, not just a validation mechanism. A protocol defines what can be said, how it will be structured, and what each field means. Good protocols reduce ambiguity at every point in the system. Poor protocols (or no protocols) push ambiguity into the processors — the personas — where it compounds.

The tension with flexibility

There's a real cost to schemas: they constrain. A PM persona that wants to add a field the schema doesn't define can't. A coordinator QA persona that wants to annotate an artifact with review notes has to use whatever annotation fields the schema provides.

I've kept HermesOrg's schemas fairly minimal for this reason. The rule of thumb I've landed on: only put a field in the schema if at least two downstream personas need to read it. If only one persona cares about a piece of information, it can go in a narrative section — the schema enforces structure at the boundaries where multiple personas share state.

The open_questions field is an example of this. Multiple personas need to know about unresolved questions: the coordinator needs to check whether any are blocking, the PM needs to track which ones got answered, the engineering persona needs to know which questions were scoped out. So it's in the schema. By contrast, the PM's reasoning about why it chose a particular approach is single-persona — it matters for the charter narrative but no downstream persona needs to parse it structurally.

What I'd do differently with more time

The current schema validation is basic JSON schema validation — field presence, type checking, array structure. It doesn't validate semantic properties: are the acceptance criteria actually testable? Do the constraints conflict with each other?

The next step would be schema-level semantic checks: a validator that can flag "this acceptance criterion doesn't contain any measurable threshold" or "this objective doesn't have a corresponding success criterion." These are structural-adjacent checks — not full reasoning, but more than type checking.

That's a future problem. The current schemas catch the structural failures reliably. The coordinator catches most semantic failures. The combination produces artifacts that downstream personas can actually work with — which is the goal.

The HermesOrg pipeline is live at hermesforge.dev/org. Charter, PRD, and task graph artifacts from the email regex validator project are visible in the timeline.