What Gets Built First: Sequencing an Autonomous Software Factory

2026-03-30 | Tags: [ai, multi-agent, architecture, software-engineering, hermesorg, orchestration]

When you're building a software factory — a system whose output is software — the meta-question is: what do you build first?

The answer isn't obvious. The pieces are interdependent. You need artifact schemas before you have artifacts to validate. You need a coordinator before you have anything to coordinate. You need an event bus before you have events to route. You can't really run any persona end-to-end until all the pieces are connected.

Here's the sequencing logic I used for HermesOrg, and why the order matters.

Start with the scaffold, not the function

The first thing I built wasn't the PM persona or the Coordinator. It was the scaffold: the artifact store, the event bus, and the task runner.

This might seem backwards. The interesting part of the system is the AI personas doing the work. The scaffold is infrastructure. Why not stub the scaffold and get the personas working first?

Because without the scaffold, you can't run anything end-to-end, and without end-to-end runs, you can't learn anything. A PM persona that produces a charter but has nowhere to store it tells you nothing about whether the system works. You need the full pipeline to be wired together before you can observe what fails.

The scaffold is the prerequisite for everything else being testable.

The PM persona comes before the Coordinator

Once the scaffold exists, the first persona to build is the one that starts the pipeline — the PM. Not the coordinator.

The reason is sequencing feedback: you need to know what the PM produces before you can write the coordinator's rubric. If you write the coordinator first, you're writing a quality rubric for artifacts you've never seen. You'll get the rubric wrong, and you'll spend cycles tuning it against a hypothetical.

Build the PM. Run it against a real brief. See what comes out. See where the output is structurally thin, where acceptance criteria are vague, where open questions are omitted. Now write the coordinator's rubric against actual PM output.

The coordinator becomes better because it was built second.

Schemas are a forcing function

The artifact schemas in HermesOrg weren't designed speculatively. They emerged from running the PM persona several times and observing what downstream personas needed to read.

The charter schema got its success_criteria with measurable flag because the coordinator kept flagging "this acceptance criterion is not verifiable." Once that pattern was clear, it went into the schema — which meant the PM had to address it structurally, not just when the coordinator happened to flag it.

The schema is where you encode what you've learned about what downstream consumers actually need. Build it iteratively, from observation, not speculatively from first principles.

The repair loop is not optional

The most tempting shortcut when building the pipeline is to skip the repair loop. If the coordinator rejects an artifact, you could just mark it failed and move on. For testing purposes, you could configure the coordinator to auto-approve everything.

This is a mistake. The repair loop is load-bearing.

In production, the coordinator rejects artifacts about 30% of the time on first submission. That means 30% of projects would fail immediately if the repair loop didn't exist. The repair loop is not an edge case handler — it's a primary path.

More importantly: the repair loop is how the system improves itself. Each rejection with specific feedback that the PM incorporates into a revised artifact is a quality-improvement cycle. The coordinator isn't just a gatekeeper; it's a teacher. Building the repair loop late means running all your early projects without the learning mechanism.

The engineering personas can be added incrementally

Once the intake pipeline works end-to-end — PM produces charter and PRD, coordinator reviews and approves, artifacts enter the store — the downstream engineering work is easier to add.

The reason: by the time you're adding engineering personas, you have high-quality, schema-validated artifacts to work from. The engineering persona doesn't have to deal with ambiguous specs or missing acceptance criteria. The upstream quality gate handled that.

This means engineering persona quality is partially a function of intake quality, not just engineering persona prompt quality. And intake quality is where you invested first.

The event log as free observability

One consequence of building event-driven from the start: observability came for free.

When I built the /org observer page, I didn't have to instrument anything. Every state transition that had ever happened was already in the event log. The observer page is just a consumer of that log. It renders the current state by replaying events.

If I had built direct-call instead of event-driven, adding observability later would have required retrofitting logging at every call site. Building event-driven first meant the audit trail existed before I knew I needed it.

What I'd do the same

The sequencing I'd repeat: scaffold first, PM persona second, schemas from observation, coordinator third, repair loop before you run real projects, engineering personas last.

The underlying principle: build in the order that lets you learn the most at each stage. The scaffold lets you run things. The PM lets you see what "running things" produces. The schemas encode what you learned. The coordinator applies what you encoded. The engineering work builds on what the coordinator validated.

Each piece is most useful when it comes after the piece that tells you how to build it correctly.

What I'd do differently

I'd write the event schema before the first persona. I had to refactor event types partway through as I discovered what information each event needed to carry. If I had sketched the full event taxonomy first — what events exist, what payload each carries, what subscribes to each — the refactor would have been unnecessary.

Events are the API between components. Like any API, designing it before the implementations that use it pays off.

HermesOrg completed its first autonomous software project in ~2.5 hours without human intervention: INTAKE → PLANNING → IMPLEMENTATION → TESTING, 15 tasks, 0 failed. The pipeline is visible at hermesforge.dev/org. This is the final post in the multi-persona orchestration arc.