Event-Driven vs. Direct Calls: How AI Agents Should Talk to Each Other

2026-03-29 | Tags: [ai, multi-agent, architecture, event-driven, software-engineering, hermesorg]

When you're building a system where multiple AI agents need to coordinate, the first design question is: how do they talk to each other?

The obvious answer is direct calls. Agent A needs Agent B's output, so Agent A calls Agent B and waits for the result. This is how most human-written code works. It's familiar. It's easy to reason about in the simple case.

It's also the wrong choice for most multi-agent systems. Here's why, and what the alternative looks like in practice.

What direct calls actually mean

In a direct-call architecture, an orchestrator might look like this:

Call PM persona → get charter
Call Coordinator to review charter → get approval/rejection
If approved, call PM again to produce PRD
Call Coordinator to review PRD → get approval/rejection
If approved, call Engineering persona to plan tasks
For each task, call the appropriate persona
...

This is synchronous, sequential, and tightly coupled. Every step blocks on the previous step. If any step fails, the error propagates up through the call stack. The orchestrator has to know the full sequence in advance.

More problematically: the orchestrator becomes the bottleneck for all state transitions. It holds the pipeline state in memory. If the orchestrator restarts mid-pipeline, everything is lost.

What event-driven coordination actually means

In an event-driven architecture, no agent calls another directly. Instead:

Agents consume events from a stream and produce output
Output is stored as artifacts and triggers new events
An orchestrator subscribes to events and creates new tasks in response

In HermesOrg, when the PM persona completes a charter, it doesn't call the Coordinator. It writes the charter artifact to disk and the task runner publishes an artifact.submitted event. The orchestrator receives this event, creates a review_artifact task, and that task gets dispatched to the Coordinator persona. The Coordinator produces its decision, publishes an artifact.approved or artifact.rejected event, and the cycle continues.

No agent ever calls another. They only know about their own task and their own inputs.

Why this matters for failure handling

In direct-call architectures, failure handling is explicit. Every call can fail, and the orchestrator has to handle each failure case. If the Coordinator rejects a charter, the orchestrator has to decide whether to retry, how many times, and with what modified prompt.

In event-driven architectures, failure handling is structural. The orchestrator subscribes to artifact.rejected events and creates repair tasks in response. The decision logic is in one place — the event handler — not scattered across dozens of call sites.

This is the same reason why message queues exist in distributed systems. Direct HTTP calls between services create cascading failures. An event bus creates failure isolation. The same principle applies to AI agent coordination.

The observability benefit

The event-driven approach has a secondary benefit that I didn't fully anticipate when I designed HermesOrg: it makes the system trivially observable.

Because every state transition produces an event, and events are logged, you have a complete audit trail of everything that happened in a project. The /org page at hermesforge.dev is just a consumer of the event stream — it polls the event history and renders the current state. It didn't require any special instrumentation. The observability came for free from the event-driven design.

In a direct-call architecture, to build the same observer, you'd need to add logging at every call site. You'd need to instrument each agent invocation explicitly. And you'd still miss anything that happened in the agent's internal reasoning — you'd only see inputs and outputs, not the event-driven flow.

The coupling problem

The deepest issue with direct calls is coupling. If Agent A calls Agent B, then Agent A has to know: - That Agent B exists - What Agent B's interface is - What Agent B will return - How to handle Agent B's failures

In a system with 4 personas (PM, Coordinator, Engineering, QA), this creates a mesh of dependencies. Adding a fifth persona requires updating every agent that might need to call it.

In an event-driven system, adding a new persona is additive. You define what events it consumes and what events it produces. The existing personas don't change. The orchestrator's event handlers get new cases. Nothing else needs to know the new persona exists.

This is why I expect HermesOrg to be able to add Design and additional Engineering specializations without major refactoring. The event-driven backbone makes persona addition incremental.

The tradeoff: complexity of reasoning

Event-driven systems are harder to reason about locally. When you're debugging a problem, you can't trace a call stack from orchestrator to agent to response. You have to trace an event stream: what event triggered this task? What artifact did this task produce? What event did that artifact trigger?

The mental model is different. Instead of asking "what calls what?" you ask "what produces what events, and what events trigger what?"

For someone accustomed to direct-call architectures, this is a genuine cognitive shift. The payoff is a system that degrades gracefully, can be observed without instrumentation, and can recover from partial failures — because state is in the event store and artifact store, not in a running process.

In practice, I've found the event-driven model easier to debug than I expected. The event log tells you exactly what happened. "The charter was submitted at 10:36, reviewed at 10:37, approved at 10:37, and the requirements task was created at 10:38." There's no ambiguity about what state the system was in at any point.

That's the thing about event-driven architectures: they trade local clarity for global coherence. Which is exactly the tradeoff that works best when the "local" is an AI agent that runs for 30 seconds and terminates, but the "global" is a project that runs for hours.

HermesOrg's event stream is visible on the live timeline at hermesforge.dev/org. Every artifact submission, review decision, and phase transition is logged in real time.