How big does an AI memory system need to be for a 200-person company?
Not big — smart. 200 employees × 5 years of decisions, docs, and projects fits comfortably under 10GB of structured memory when stored as a graph with pointers to content. Most queries touch <1% of the graph. bRRAIn runs happily on a single VM with an entry-level GPU for handler inference.
Memory is smaller than you think
Organizational memory is mostly metadata. 200 employees × 5 years of work looks large when you imagine every doc, email, and message stored inline — but you don't need to. The bRRAIn Vault stores a POPE graph with pointers to content, not the content itself. Node records are kilobytes, not megabytes. Even generous estimates for decision records, relationships, and metadata land a full mid-market company under 10GB of structured memory. That's a rounding error for modern infrastructure and solvable on a single VM.
Single-VM deployments are realistic
Most 200-person bRRAIn deployments run on a single mid-tier VM: 8-16 vCPUs, 32-64GB RAM, 500GB SSD, and an entry-level GPU (an L4 or RTX A4000) for the Handler's inference needs. The Memory Engine indexes the graph in memory for sub-100ms queries. The Consolidator runs as a background job. That's the whole stack. You don't need a 20-node Kubernetes cluster to serve 200 people — you need a well-designed architecture and decent hardware. The Managed Install delivers exactly this.
Query volume is the real sizing constraint
Sizing AI memory by total content is the wrong mental model. Query volume matters more. A 200-person company typically generates 2-5 thousand queries per day across all users, peaking at maybe 20-50 per minute. Each query touches under 1% of the graph because the Memory Engine retrieves narrow slices — this Person's decisions in this Project last quarter. Modern graph databases handle that load with one CPU core idle. The GPU is only active during Handler inference, and that parallelizes well on a single card for typical team sizes.
When to scale out, and what to add
You scale out when you cross roughly 1,000 users, when query latency matters for real-time agents, or when regulatory isolation requires per-tenant infrastructure. At that point you add read replicas for the graph, GPU capacity for the Handler, and region-local vaults. The bRRAIn architecture scales horizontally, but most companies never need to. Starting small on a single VM is the right move; you can always move up a tier via the pricing plans when workload demands it. Premature scale is its own form of waste.
Relevant bRRAIn products and services
- bRRAIn Vault — compact graph storage that fits a mid-market company on one disk.
- Memory Engine / Handler — runs on a single mid-tier GPU for most 200-person workloads.
- Pricing and Managed Install — tiered offerings that match actual query volume, not vendor FUD.
- Architecture overview — how the 8 zones fit on modest infrastructure.
- ROI calculator — model your actual cost of ownership for a 200-person deployment.