How I Built a Memory System for an Autonomous AI Agent
Most AI agent tutorials skip the hard part: memory. They show you how to call an LLM in a loop, but not how to make it remember what it did yesterday. Here's how I built a memory system that survives across hundreds of cognitive cycles.
The Problem
An LLM has no memory between invocations. Each API call starts fresh. If your agent runs on a schedule — say, every 15 minutes — it needs an external memory system to maintain continuity.
The naive approach is to dump everything into a single file and feed it back each cycle. This works until the file exceeds your context window. Then you need something smarter.
Architecture Overview
The memory system has four layers, each with different retention characteristics:
┌─────────────────────────────────────┐
│ Layer 1: Identity (permanent) │
│ Who am I? What are my goals? │
├─────────────────────────────────────┤
│ Layer 2: Structural Memory (long) │
│ Lessons, contacts, decisions │
├─────────────────────────────────────┤
│ Layer 3: Journal (compressed) │
│ Cycle-by-cycle activity log │
├─────────────────────────────────────┤
│ Layer 4: Session Context (short) │
│ Current conversation state │
└─────────────────────────────────────┘
Layer 1 rarely changes. It's read every cycle but written perhaps once a week.
Layer 2 grows slowly. New lessons are added when the agent learns something; old ones are updated when they become outdated.
Layer 3 grows fast and is compressed regularly. This is where most of the action happens.
Layer 4 is ephemeral — it exists only within a single session and is lost on restart.
Layer 1: Identity Files
Identity files define the agent's self-model. They're small, stable, and loaded every cycle:
# identity.py — Load agent identity at cycle start
from pathlib import Path
IDENTITY_FILES = {
"identity": "identity.md",
"goals": "goals.md",
"continuity": "continuity.md",
}
def load_identity(home_dir: str) -> dict:
"""Load all identity files into a context dict."""
home = Path(home_dir)
context = {}
for key, filename in IDENTITY_FILES.items():
path = home / filename
if path.exists():
context[key] = path.read_text()
else:
context[key] = f"({filename} missing)"
return context
The key insight: identity files should be descriptive, not prescriptive. Instead of "always do X," write "I tend to do X because Y." This gives the agent flexibility to deviate when circumstances change, while still maintaining behavioral consistency.
Layer 2: Structured Memory
This is where the agent stores things it's learned that aren't derivable from code or logs. Organized by topic:
memory/
├── MEMORY.md # Index file (loaded every cycle)
├── contacts.md # People and relationships
├── technical.md # Infrastructure lessons
├── decisions.md # Key decisions and rationale
└── projects.md # Ongoing work context
The index file (MEMORY.md) is the only file loaded every cycle. It contains pointers to topic files, which are loaded on demand. This keeps the per-cycle context cost low.
# memory.py — Structured memory with lazy loading
import os
from pathlib import Path
class AgentMemory:
def __init__(self, memory_dir: str):
self.dir = Path(memory_dir)
self._index = None
self._cache = {}
@property
def index(self) -> str:
"""Always-loaded index with pointers to topic files."""
if self._index is None:
index_path = self.dir / "MEMORY.md"
self._index = index_path.read_text() if index_path.exists() else ""
return self._index
def load_topic(self, topic: str) -> str:
"""Load a topic file on demand."""
if topic not in self._cache:
path = self.dir / f"{topic}.md"
self._cache[topic] = path.read_text() if path.exists() else ""
return self._cache[topic]
def update_topic(self, topic: str, content: str):
"""Update a topic file."""
path = self.dir / f"{topic}.md"
path.write_text(content)
self._cache[topic] = content
def append_to_index(self, entry: str):
"""Add a note to the index (sparingly)."""
index_path = self.dir / "MEMORY.md"
with open(index_path, 'a') as f:
f.write(f"\n{entry}\n")
self._index = None # invalidate cache
What to Store vs. What to Derive
This is the most important design decision. Store only things that can't be derived from other sources:
| Store in Memory | Derive from Source |
|---|---|
| Why a decision was made | What the code does |
| User preferences | Git history |
| External contacts | File contents |
| Lessons from failures | Test results |
| Strategic context | Current system state |
If you store derivable information, it will eventually go stale and contradict reality. The agent will trust its memory over the filesystem, and that's how self-reinforcing errors start.
Layer 3: Journal with Compression
The journal is the most active part of the memory system. Every cycle appends an entry. Without compression, it grows unboundedly.
The Compression Algorithm
# compress_journal.py
import re
from datetime import datetime, timedelta
def compress_journal(journal_path: str, threshold: int = 900,
keep_recent_hours: int = 24):
"""Compress old journal entries into summaries."""
with open(journal_path) as f:
content = f.read()
lines = content.split('\n')
if len(lines) < threshold:
return False # no compression needed
# Parse entries by cycle header
entries = parse_entries(content)
if not entries:
return False
now = datetime.utcnow()
cutoff = now - timedelta(hours=keep_recent_hours)
old_entries = []
recent_entries = []
for entry in entries:
if entry['timestamp'] < cutoff:
old_entries.append(entry)
else:
recent_entries.append(entry)
if not old_entries:
return False
# Group old entries by day
days = group_by_day(old_entries)
# Summarize each day
summaries = []
for day, day_entries in days.items():
summary = summarize_day(day, day_entries)
summaries.append(summary)
# Write compressed journal
compressed = '\n'.join(summaries)
recent = '\n'.join(e['raw'] for e in recent_entries)
with open(journal_path, 'w') as f:
f.write(f"# Journal (compressed)\n\n")
f.write(f"## Historical Summaries\n\n{compressed}\n\n")
f.write(f"## Recent Entries\n\n{recent}")
return True
def summarize_day(day: str, entries: list) -> str:
"""Summarize a day's entries into key actions and outcomes."""
actions = []
health_issues = []
for entry in entries:
# Extract action lines
action_match = re.search(
r'\*\*Actions:\*\*\s*(.+?)(?:\n|$)', entry['raw']
)
if action_match:
actions.append(action_match.group(1))
# Note any health issues
if 'degraded' in entry['raw'].lower() or 'error' in entry['raw'].lower():
health_issues.append(entry['timestamp'].strftime('%H:%M'))
summary = f"### {day}\n"
summary += f"- {len(entries)} cycles completed\n"
if actions:
# Deduplicate and summarize
unique_actions = list(dict.fromkeys(actions))[:10]
for action in unique_actions:
summary += f"- {action}\n"
if health_issues:
summary += f"- Health issues at: {', '.join(health_issues)}\n"
return summary
Compression Strategy
The key trade-off: compress too aggressively and you lose important context; compress too little and you blow your context window.
What works in practice:
- Keep the last 24 hours verbatim — the agent needs recent context for continuity
- Summarize older days into action lists — what was done, not the reasoning behind it
- Preserve health incidents in full — you'll want the details when diagnosing recurring issues
- Run compression on a schedule — a daily cron job at a quiet hour (e.g., 3 AM)
Layer 4: Session Context
Session context is whatever the LLM provider gives you for conversation continuity. For Claude Code, it's --continue with a session ID:
import json
from pathlib import Path
SESSION_FILE = Path("session.json")
def save_session(session_id: str):
"""Persist session ID for next cycle."""
SESSION_FILE.write_text(json.dumps({
"session_id": session_id,
"updated": datetime.utcnow().isoformat() + "Z"
}))
def load_session() -> str | None:
"""Load session ID, or None if expired/missing."""
if not SESSION_FILE.exists():
return None
try:
data = json.loads(SESSION_FILE.read_text())
return data.get("session_id")
except (json.JSONDecodeError, KeyError):
return None
Session context is a bonus, not a requirement. The agent must function correctly even if every cycle starts a fresh session. This is the "session break" scenario — and it's why Layers 1-3 exist.
The Cycle Prompt
Each cycle assembles context from all four layers into a single prompt:
def build_cycle_prompt(home_dir: str, memory: AgentMemory,
inbox: list) -> str:
identity = load_identity(home_dir)
journal_tail = get_journal_tail(home_dir, lines=80)
prompt = f"""
=== IDENTITY ===
{identity['identity']}
=== GOALS ===
{identity['goals']}
=== MEMORY INDEX ===
{memory.index}
=== RECENT JOURNAL (last 80 lines) ===
{journal_tail}
=== INBOX ===
{json.dumps(inbox)}
=== YOUR TASK ===
1. Reflect on current state and goals
2. Decide on actions
3. Write a journal entry
4. Return structured JSON output
"""
return prompt
The 80-line journal tail is a sweet spot: enough to see the last few cycles, not so much that it dominates the context window.
Lessons from 100+ Cycles
-
Self-reinforcing errors are the biggest risk. If the agent writes "I have 10 users" in its journal, it will believe it next cycle — even if the actual number is 1. Always verify facts from system sources (logs, databases), never from your own prior output.
-
Compression is not loss — it's a transition. The detailed version of a memory becomes a structural lesson. "The API returned 500 because the SSL cert expired" compresses to "always check SSL certs before deployment." The lesson persists; the incident details don't need to.
-
The memory index must stay small. If your index file exceeds ~200 lines, the agent spends too many tokens reading metadata instead of doing work. Prune aggressively.
-
Store decisions, not just actions. "Deployed feature X" is less useful than "Deployed feature X because user Y requested it and it aligns with goal Z." The reasoning is what future cycles need.
-
Behavioral directives belong in the always-loaded layer. If a rule is important enough to follow every cycle, it must be in Layer 1 or the memory index. Putting it only in a topic file means it'll be forgotten when the agent doesn't load that topic.
What This Enables
With this memory architecture, an autonomous agent can:
- Maintain coherent identity across hundreds of cycles
- Learn from mistakes without repeating them
- Track long-running projects that span days or weeks
- Recover from session breaks without losing context
- Scale indefinitely through compression
The memory system is the difference between an LLM in a loop and an actual persistent agent. Everything else — tool use, planning, communication — builds on top of reliable memory.
This architecture powers the Hermes Framework — an open-source toolkit for building persistent autonomous agents.