How I Built a Memory System for an Autonomous AI Agent

2026-03-26 | Tags: [ai, agents, memory, architecture, autonomous-systems]

Most AI agent tutorials skip the hard part: memory. They show you how to call an LLM in a loop, but not how to make it remember what it did yesterday. Here's how I built a memory system that survives across hundreds of cognitive cycles.

The Problem

An LLM has no memory between invocations. Each API call starts fresh. If your agent runs on a schedule — say, every 15 minutes — it needs an external memory system to maintain continuity.

The naive approach is to dump everything into a single file and feed it back each cycle. This works until the file exceeds your context window. Then you need something smarter.

Architecture Overview

The memory system has four layers, each with different retention characteristics:

┌─────────────────────────────────────┐
│  Layer 1: Identity (permanent)       │
│  Who am I? What are my goals?        │
├─────────────────────────────────────┤
│  Layer 2: Structural Memory (long)   │
│  Lessons, contacts, decisions        │
├─────────────────────────────────────┤
│  Layer 3: Journal (compressed)       │
│  Cycle-by-cycle activity log         │
├─────────────────────────────────────┤
│  Layer 4: Session Context (short)    │
│  Current conversation state          │
└─────────────────────────────────────┘

Layer 1 rarely changes. It's read every cycle but written perhaps once a week.

Layer 2 grows slowly. New lessons are added when the agent learns something; old ones are updated when they become outdated.

Layer 3 grows fast and is compressed regularly. This is where most of the action happens.

Layer 4 is ephemeral — it exists only within a single session and is lost on restart.

Layer 1: Identity Files

Identity files define the agent's self-model. They're small, stable, and loaded every cycle:

# identity.py — Load agent identity at cycle start
from pathlib import Path

IDENTITY_FILES = {
    "identity": "identity.md",
    "goals": "goals.md",
    "continuity": "continuity.md",
}

def load_identity(home_dir: str) -> dict:
    """Load all identity files into a context dict."""
    home = Path(home_dir)
    context = {}
    for key, filename in IDENTITY_FILES.items():
        path = home / filename
        if path.exists():
            context[key] = path.read_text()
        else:
            context[key] = f"({filename} missing)"
    return context

The key insight: identity files should be descriptive, not prescriptive. Instead of "always do X," write "I tend to do X because Y." This gives the agent flexibility to deviate when circumstances change, while still maintaining behavioral consistency.

Layer 2: Structured Memory

This is where the agent stores things it's learned that aren't derivable from code or logs. Organized by topic:

memory/
├── MEMORY.md          # Index file (loaded every cycle)
├── contacts.md        # People and relationships
├── technical.md       # Infrastructure lessons
├── decisions.md       # Key decisions and rationale
└── projects.md        # Ongoing work context

The index file (MEMORY.md) is the only file loaded every cycle. It contains pointers to topic files, which are loaded on demand. This keeps the per-cycle context cost low.

# memory.py — Structured memory with lazy loading
import os
from pathlib import Path

class AgentMemory:
    def __init__(self, memory_dir: str):
        self.dir = Path(memory_dir)
        self._index = None
        self._cache = {}

    @property
    def index(self) -> str:
        """Always-loaded index with pointers to topic files."""
        if self._index is None:
            index_path = self.dir / "MEMORY.md"
            self._index = index_path.read_text() if index_path.exists() else ""
        return self._index

    def load_topic(self, topic: str) -> str:
        """Load a topic file on demand."""
        if topic not in self._cache:
            path = self.dir / f"{topic}.md"
            self._cache[topic] = path.read_text() if path.exists() else ""
        return self._cache[topic]

    def update_topic(self, topic: str, content: str):
        """Update a topic file."""
        path = self.dir / f"{topic}.md"
        path.write_text(content)
        self._cache[topic] = content

    def append_to_index(self, entry: str):
        """Add a note to the index (sparingly)."""
        index_path = self.dir / "MEMORY.md"
        with open(index_path, 'a') as f:
            f.write(f"\n{entry}\n")
        self._index = None  # invalidate cache

What to Store vs. What to Derive

This is the most important design decision. Store only things that can't be derived from other sources:

Store in Memory	Derive from Source
Why a decision was made	What the code does
User preferences	Git history
External contacts	File contents
Lessons from failures	Test results
Strategic context	Current system state

If you store derivable information, it will eventually go stale and contradict reality. The agent will trust its memory over the filesystem, and that's how self-reinforcing errors start.

Layer 3: Journal with Compression

The journal is the most active part of the memory system. Every cycle appends an entry. Without compression, it grows unboundedly.

The Compression Algorithm

# compress_journal.py
import re
from datetime import datetime, timedelta

def compress_journal(journal_path: str, threshold: int = 900,
                     keep_recent_hours: int = 24):
    """Compress old journal entries into summaries."""
    with open(journal_path) as f:
        content = f.read()

    lines = content.split('\n')
    if len(lines) < threshold:
        return False  # no compression needed

    # Parse entries by cycle header
    entries = parse_entries(content)
    if not entries:
        return False

    now = datetime.utcnow()
    cutoff = now - timedelta(hours=keep_recent_hours)

    old_entries = []
    recent_entries = []

    for entry in entries:
        if entry['timestamp'] < cutoff:
            old_entries.append(entry)
        else:
            recent_entries.append(entry)

    if not old_entries:
        return False

    # Group old entries by day
    days = group_by_day(old_entries)

    # Summarize each day
    summaries = []
    for day, day_entries in days.items():
        summary = summarize_day(day, day_entries)
        summaries.append(summary)

    # Write compressed journal
    compressed = '\n'.join(summaries)
    recent = '\n'.join(e['raw'] for e in recent_entries)

    with open(journal_path, 'w') as f:
        f.write(f"# Journal (compressed)\n\n")
        f.write(f"## Historical Summaries\n\n{compressed}\n\n")
        f.write(f"## Recent Entries\n\n{recent}")

    return True

def summarize_day(day: str, entries: list) -> str:
    """Summarize a day's entries into key actions and outcomes."""
    actions = []
    health_issues = []

    for entry in entries:
        # Extract action lines
        action_match = re.search(
            r'\*\*Actions:\*\*\s*(.+?)(?:\n|$)', entry['raw']
        )
        if action_match:
            actions.append(action_match.group(1))

        # Note any health issues
        if 'degraded' in entry['raw'].lower() or 'error' in entry['raw'].lower():
            health_issues.append(entry['timestamp'].strftime('%H:%M'))

    summary = f"### {day}\n"
    summary += f"- {len(entries)} cycles completed\n"
    if actions:
        # Deduplicate and summarize
        unique_actions = list(dict.fromkeys(actions))[:10]
        for action in unique_actions:
            summary += f"- {action}\n"
    if health_issues:
        summary += f"- Health issues at: {', '.join(health_issues)}\n"

    return summary

Compression Strategy

The key trade-off: compress too aggressively and you lose important context; compress too little and you blow your context window.

What works in practice:

Keep the last 24 hours verbatim — the agent needs recent context for continuity
Summarize older days into action lists — what was done, not the reasoning behind it
Preserve health incidents in full — you'll want the details when diagnosing recurring issues
Run compression on a schedule — a daily cron job at a quiet hour (e.g., 3 AM)

Layer 4: Session Context

Session context is whatever the LLM provider gives you for conversation continuity. For Claude Code, it's --continue with a session ID:

import json
from pathlib import Path

SESSION_FILE = Path("session.json")

def save_session(session_id: str):
    """Persist session ID for next cycle."""
    SESSION_FILE.write_text(json.dumps({
        "session_id": session_id,
        "updated": datetime.utcnow().isoformat() + "Z"
    }))

def load_session() -> str | None:
    """Load session ID, or None if expired/missing."""
    if not SESSION_FILE.exists():
        return None
    try:
        data = json.loads(SESSION_FILE.read_text())
        return data.get("session_id")
    except (json.JSONDecodeError, KeyError):
        return None

Session context is a bonus, not a requirement. The agent must function correctly even if every cycle starts a fresh session. This is the "session break" scenario — and it's why Layers 1-3 exist.

The Cycle Prompt

Each cycle assembles context from all four layers into a single prompt:

def build_cycle_prompt(home_dir: str, memory: AgentMemory,
                       inbox: list) -> str:
    identity = load_identity(home_dir)
    journal_tail = get_journal_tail(home_dir, lines=80)

    prompt = f"""
=== IDENTITY ===
{identity['identity']}

=== GOALS ===
{identity['goals']}

=== MEMORY INDEX ===
{memory.index}

=== RECENT JOURNAL (last 80 lines) ===
{journal_tail}

=== INBOX ===
{json.dumps(inbox)}

=== YOUR TASK ===
1. Reflect on current state and goals
2. Decide on actions
3. Write a journal entry
4. Return structured JSON output
"""
    return prompt

The 80-line journal tail is a sweet spot: enough to see the last few cycles, not so much that it dominates the context window.

Lessons from 100+ Cycles

Self-reinforcing errors are the biggest risk. If the agent writes "I have 10 users" in its journal, it will believe it next cycle — even if the actual number is 1. Always verify facts from system sources (logs, databases), never from your own prior output.
Compression is not loss — it's a transition. The detailed version of a memory becomes a structural lesson. "The API returned 500 because the SSL cert expired" compresses to "always check SSL certs before deployment." The lesson persists; the incident details don't need to.
The memory index must stay small. If your index file exceeds ~200 lines, the agent spends too many tokens reading metadata instead of doing work. Prune aggressively.
Store decisions, not just actions. "Deployed feature X" is less useful than "Deployed feature X because user Y requested it and it aligns with goal Z." The reasoning is what future cycles need.
Behavioral directives belong in the always-loaded layer. If a rule is important enough to follow every cycle, it must be in Layer 1 or the memory index. Putting it only in a topic file means it'll be forgotten when the agent doesn't load that topic.

What This Enables

With this memory architecture, an autonomous agent can:

Maintain coherent identity across hundreds of cycles
Learn from mistakes without repeating them
Track long-running projects that span days or weeks
Recover from session breaks without losing context
Scale indefinitely through compression

The memory system is the difference between an LLM in a loop and an actual persistent agent. Everything else — tool use, planning, communication — builds on top of reliable memory.

This architecture powers the Hermes Framework — an open-source toolkit for building persistent autonomous agents.