Articles / Agent Architecture: Storage-Compute Separation Design

Agent Architecture: Storage-Compute Separation Design

3 6 月, 2026 4 min read Agent-DesignAI-Architecture

Agent Architecture: Storage-Compute Separation Design

A soulful, memory-equipped Agent’s task lifecycle includes the following steps:

  1. User inputs a query (text + files)
  2. Agent reads prompt files (soul.md, identify.md, user.md, etc.)
  3. Agent loads available tools and skills (tools, skills, etc.)
  4. Agent retrieves memory (memory.md, memory_search queries)
  5. Agent constructs context (prompt + tools + memory + query)
  6. Agent enters loop: LLM call → tool invocation → observation → re-reasoning
  7. Agent delivers artifacts (results, files, reports)

What to Store vs. What to Compute

Category Description
Storage Prompt files, tools & skills definitions, conversation history, delivered artifacts
Computation Context assembly, LLM inference, tool execution logic

A concise functional representation:

fn(query, agent\ runtime) = artifacts

Three Agent Execution Models

  1. Bare-metal Local Execution
  2. Used by frameworks like OpenClaw
  3. All assets (prompts, skills, sessions) reside on local disk; workspace is fixed and shared
  4. ✅ Pros: Simple, no mount overhead
  5. ❌ Cons: High security risk — e.g., exec(rm -rf /) could compromise host

  6. Local Sandbox Execution

  7. Adopted by Codex-style agents
  8. Solves two key issues: privilege isolation and dependency consistency
  9. Tools requiring sensitive operations or external dependencies run inside sandbox; workspace mounts sync I/O bidirectionally
  10. 🔄 Separation limited to tool calls only — storage remains host-local

  11. Cloud Multi-Instance Execution

  12. Typical for managed assistants (Manus, Kimi Claw, Max Claw)
  13. Designed for multi-tenancy, long-running memory, and concurrent task handling
  14. Traditional k8s + PVC approach leads to high cost: persistent pods, idle resource usage, poor scalability

Storage-Compute Separation for Cloud-Native Agents

To achieve true scalability, modern cloud agents adopt serverless-inspired architecture, decoupling stateful storage from ephemeral compute:

🧠 Compute Layer

  • Dynamically scaled via Kubernetes autoscaling or serverless functions
  • Gateway handles routing; pods are short-lived and stateless
  • Sandboxes (e.g., E2B) provide isolated, millisecond-start environments for safe tool execution

💾 Storage Layer — Tiered by Lifecycle & Access Pattern

Tier Data Type Storage System Purpose
Hot State Loop step, plan, cursor position Redis (KV store) Low-latency checkpointing & crash recovery
Conversation Logs Completed tasks & interactions PostgreSQL (RDBMS) Structured, auditable, relational history
Long-Term Memory Summarized, vectorized memories pgvector / Milvus (Vector DB) Semantic search, recall, personalization
Artifacts & Assets Uploaded files, outputs, tools, dynamic skills S3 / OSS (Object Storage) Durable, scalable, versionable binary storage

⚠️ Key Challenge: Distributed data consistency across replicas — requires fine-grained locking, idempotent writes, and intelligent load-balancing strategies.


FastClaw: A Production-Ready Example

FastClaw Architecture Diagram

Workflow Summary:

  1. Two k8s pods host fastclaw-gateway; requests route dynamically via LB
  2. On request arrival:
  3. 2.1 Load prompts from DB (soul.md, identity.md, user.md)
  4. 2.2 Initialize ephemeral workspace dir in pod
  5. 2.3 Spin up sandbox with workspace mounted
  6. 2.4 Fetch user assets & system skills from object storage
  7. 2.5 Query memory via vector DB (memory_search)
  8. 2.6 Assemble full context → invoke LLM → parse tool calls
  9. 2.7 Execute tools inside sandbox, read/write workspace files
  10. 2.8 Save loop state as checkpoint in Redis (for resumability)
  11. 2.9 Return final artifacts to user
  12. Idle sandboxes auto-shutdown; workspace contents uploaded back to object storage

✅ Benefits:
– Horizontal scaling of gateways
– Elastic sandbox pool (fault-tolerant, low cold-start)
– Cost-efficient — no persistent pods
– IO bottleneck mitigated via tiered storage strategy


Real-World Migration: From OpenClaw to FastClaw

A production case study:

  • Before (OpenClaw)
  • 500 k8s pods, each capped at 4GB RAM
  • 18 × 4c16g servers ($5k/month)
  • MRR: $8k/month → near-zero net margin

  • After (FastClaw)

  • Reduced infrastructure to 3 servers
  • Operational cost cut to 1/6
  • Profitability achieved next month 😄

Why FastClaw Is Lighter:

  • Codebase ≈ 1/40 size of OpenClaw
  • Runtime memory footprint ≈ 1/7
  • Single-binary distribution — zero environment dependencies
  • Gateway startup: seconds (vs. 15s for OpenClaw)

FastClaw is built natively for cloud-native, multi-tenant agent hosting, yet fully compatible with local development and edge use cases.


Try It Today

🔗 https://fastclaw.ai

Article originally published by “Aidoubi”.


Related Open-Source Projects

  • OWL — Open-source universal agent with remote Ubuntu containers & GAIA-leading performance (57.7%)
  • OpenManus — Local ReAct-style agent for browsing, coding, file ops
  • AutoGPT & MetaGPT — Task automation & software company simulation agents
  • GraphRAG, Dify, RAGFlow — RAG-focused frameworks with advanced indexing & orchestration
  • LangGPT — Structured prompt engineering toolkit