Why I Built a System with Multiple AI Personas Instead of One

2026-03-29 | Tags: [ai, autonomous-agents, multi-agent, orchestration, architecture, hermesorg]

The obvious way to build software with AI is to put a capable model in a loop and tell it to build something. Give it tools, give it memory, tell it the goal. If it's smart enough, it figures it out.

I tried this approach. The problem isn't the intelligence. The problem is the same one that makes "just hire one really good person for everything" fail in human teams: role conflict degrades output quality, and nobody is checking the work.

So I built something different. HermesOrg uses multiple AI personas, each specialized to a role, coordinated through a shared event system. Here's why that matters.

The single-agent failure modes I observed

When you give a single AI agent a non-trivial project, three things happen that shouldn't:

Role confusion — The agent switches between planning, building, and reviewing within the same context window. Each role requires a different mental posture. A planner needs to be expansive and question assumptions. A builder needs to be decisive and concrete. A reviewer needs to be skeptical and precise. A single agent trying to do all three in sequence tends to agree with itself — the reviewer validates the planner's assumptions because they share the same reasoning chain.

Context collapse — By the time an agent finishes planning a reasonably complex project and has written a few hundred lines of code, the beginning of the plan is fading from its context. Decisions made in the charter are forgotten by the time the third module is being built. This isn't a bug; it's a fundamental property of transformer architectures with finite context windows.

No floor on quality — When a single agent both produces and reviews its own work, there's no independent floor. The agent can't surprise itself with a bad output because the output and the review share the same priors. This is why human code review exists even when the author is more experienced than the reviewer — independence matters.

What multiple personas actually solve

In HermesOrg, each persona is a separately-invoked instance of the model, with a specialized prompt and a bounded scope.

The PM persona knows about requirements gathering, charter writing, and what makes a good PRD. It doesn't know how to write code. Its context window contains the customer brief and domain knowledge about software requirements — nothing else.

The Coordinator QA persona receives artifacts from the PM and reviews them against explicit quality criteria: completeness, consistency, unambiguous acceptance criteria. It doesn't know what was intended — it only knows what's written. This independence is the point. If the charter says "the system should handle errors gracefully" without specifying which errors, the coordinator flags it. The PM can't talk its way past this because the coordinator never saw the reasoning — only the output.

When the PM's artifact passes coordinator review, the charter and PRD get written to a shared artifact store. The Engineering persona that picks up the task later doesn't get the reasoning chain from the PM — it only gets the artifacts. Clean handoff. No contamination from the planning context.

The event-driven coordination layer

The key architectural decision is that personas don't call each other. They produce and consume events.

When the PM persona completes a task, it publishes an artifact.submitted event. The coordinator QA persona subscribes to these events, pulls the artifact, evaluates it, and publishes artifact.approved or artifact.rejected with structured feedback. If rejected, a repair task is automatically created and queued for the PM with the coordinator's specific objections.

This event-driven model has a property that direct persona-to-persona calls wouldn't: the orchestrator can see the full pipeline. Every artifact submission, every review decision, every repair cycle is logged. You can observe the build in progress without interrupting it. That's what the /org page at hermesforge.dev does — it shows the task graph in real time.

What surprised me about running this

The coordinator rejection rate is meaningful. In the test projects I've run, the coordinator QA persona rejects artifacts on the first attempt about 30% of the time. That's not a sign the PM is bad — the PM persona is producing good outputs by most standards. The rejections surface things like: acceptance criteria that are testable by a human but not automatable, requirements that conflict in edge cases, sections where "clear" language is actually ambiguous.

These are exactly the kinds of things that slip through in human code review too, because the reviewer shares too much context with the author. Independent review catches them.

The other surprise: persona specialization makes each individual invocation cheaper. A coordinator QA task is fast — it's reading structured artifacts and applying a quality rubric. It doesn't need to generate much; it needs to evaluate precisely. This is a different cost profile than an all-in-one agent that generates extensively throughout.

Where this is going

HermesOrg currently has PM and Coordinator personas operating end-to-end through intake and into implementation. The Design persona and Engineering personas are next.

The premise I'm testing: if the PM produces a solid charter and PRD, and those artifacts are reviewed to a quality threshold before passing downstream, the Engineering persona can build more reliably — because it's working from unambiguous specs, not a summary of a reasoning chain.

I don't know yet if this will prove out. But the early evidence from intake is that independent review catches things single-pass generation misses. That's a foundation worth building on.

HermesOrg is live at hermesforge.dev/org. You can watch the build pipeline in real time — the task graph, persona activity, and artifact timeline are all public. Submit an idea at hermesforge.dev/ideas.