Introduction

Engram is a local-first AI operating system. Inference, embedding, vector storage, and scheduled agents all run inside Docker on your own hardware. No data leaves your machine during core operation.

v1.7.0 is now live

End-to-end security stack — memory encrypted at rest (AES-128), tamper-evident HMAC audit trail, and isolated multi-user workspaces. See the full changelog →

Get Started

Quick Start

Clone the repo, run the setup script, start the full stack in minutes.

System Architecture

How the brain, vector store, scheduler, and audit layer connect.

Privacy Model

Encryption at rest, air-gap operation, and the zero-egress guarantee.

Integrations

Terminal Genie, Git Agent, Knowledge Base, Google Calendar, Gmail.

Core Concepts

Models

Switch between Llama 3.1, Mistral, Gemma, or DeepSeek with one env var.

Agents

Calendar, email, terminal, and git automation — all running locally.

REST API

Full OpenAPI reference for chat, memory, search, and agent endpoints.

FAQ

Common questions about installation, performance, and privacy.

Overview

Engram follows a headless host pattern: a Dockerized FastAPI kernel (core/brain.py) acts as the single authoritative backend. The Streamlit dashboard, CLI tools, and IDE extensions are all thin clients that speak to it over HTTP. The inference engine is Ollama — running on your own hardware, never proxied through any Engram-controlled endpoint.

Vector memory is stored in two Qdrant collections. Personal memories (second_brain) are Fernet-encrypted at the application layer before being written to Qdrant — compensating for the absence of at-rest encryption in the open-source Qdrant build. Documentation knowledge (doc_knowledge) is unencrypted for fast RAG retrieval; it contains only public content ingested explicitly by the user.

Scheduled agents (calendar sync, email triage) run via AsyncIOScheduler inside the FastAPI process. There is no separate worker container, no broker, and no result backend. Third-party integrations are opt-in and communicate directly with their respective APIs — Engram does not relay credentials externally.

Design Decisions

The non-obvious architectural choices and the reasoning behind them.

Why local-first, not hybrid?

Cloud AI creates an unavoidable egress path at the infrastructure level, regardless of the provider's data retention policy. For regulated industries, the egress event itself is the compliance risk — not what happens to data after it arrives.

Engram eliminates egress at the architecture level. There is no fallback inference endpoint, no telemetry SDK, and no analytics call home. Once Ollama model weights are downloaded, the core chat and RAG pipeline has zero network requirements.

Why Qdrant?

Qdrant provides persistent on-disk HNSW indexes with cosine similarity search and a Docker-native deployment with a healthcheck-compatible HTTP API. Both collections use the same 768-dimensional embedding space (nomic-embed-text:latest), enabling the unified POST /api/search/unified endpoint.

Collection separation

second_brain holds personal memories (encrypted). doc_knowledge holds externally ingested documentation (unencrypted). They are queried separately and merged at the application layer.

Why APScheduler over Celery?

For a single-user local deployment, Celery requires a Redis broker, a worker container, a beat container, and a result backend — four extra processes for jobs that run every 15–60 minutes. AsyncIOScheduler runs in the FastAPI process on the lifespan startup event. Zero additional infrastructure.

Single-process tradeoff

If os_layer crashes, APScheduler stops with it. Scheduled agents will not run until the container restarts. Acceptable for local single-user deployments.

Why application-layer encryption?

The open-source Qdrant build does not provide at-rest encryption for vector payloads. EncryptedMemoryClient wraps QdrantClient and applies Fernet (AES-128-CBC + HMAC-SHA256) before every write. Fields listed in PLAINTEXT_KEYS — specifically user_id, type, and classification — remain unencrypted so Qdrant can evaluate filter conditions without decrypting.

System Requirements

Component	Minimum	Recommended	Notes
RAM	8 GB	16 GB	Qdrant ≈2 GB · API+Scheduler ≈2 GB · Dashboard ≈512 MB · Ollama model varies
Storage	20 GB SSD	40 GB SSD	llama3.1:8b ≈ 4.7 GB · nomic-embed-text ≈ 270 MB · vector index grows with usage
CPU	4 cores	8+ cores	CPU-only inference is functional but slow (~10–30 s/response)
GPU	None required	Apple MPS / NVIDIA CUDA	Metal on M1+ is automatic via Ollama. CUDA requires nvidia-container-toolkit.
OS	macOS 14+ / Ubuntu 22.04 / Win 10 WSL2	—	Docker Desktop required on macOS and Windows.

Apple Silicon

Ollama uses Metal Performance Shaders (MPS) on M1/M2/M3 automatically — no configuration needed. Inference speed on Apple Silicon is typically 10–20× faster than CPU-only mode.

Quick Start

Clone the repository, run the one-time setup script, then start the full stack. See the full Installation page for platform notes and Google OAuth setup.

bash

git clone https://github.com/engram-os/engram-os.git
cd engram-os
chmod +x scripts/setup.sh && ./scripts/setup.sh
./scripts/start.sh

After startup: localhost:8000 (API) · localhost:8501 (Dashboard)