Local AI via Ollama
Run ACE's background agents locally with zero API costs, zero configuration, and complete privacy
ACE's 10 background agents use LLMs to reason about your knowledge — finding contradictions, discovering connections, and surfacing insights. By default, this requires cloud API keys (Anthropic, OpenAI, or Google).
Local AI adds Ollama as a built-in fallback. When no cloud API keys are configured, agents automatically use a local model (DeepSeek R1) running on your machine. Zero cost, zero config, fully private.
LLM Fallback Chain
Cloud providers are tried first (better quality). Ollama is the safety net.
Local models run on your hardware. No API fees, no usage limits, no billing surprises.
Your data never leaves your machine. No cloud inference, no third-party processing.
Install Ollama and pull a model. ACE detects it automatically and routes agents to it.
No internet required for agent intelligence. Perfect for air-gapped or restricted environments.
1. Install Ollama
brew install ollamacurl -fsSL https://ollama.com/install.sh | sh2. Pull the recommended models
# Chat model for agent reasoning (4.7 GB)
ollama pull deepseek-r1:7b
# Embedding model for semantic search (670 MB)
ollama pull mxbai-embed-large
3. Verify it works
# Check Ollama is running
curl http://localhost:11434/api/tags
ACE detects Ollama automatically. No additional configuration needed.
The intelligence loop that Local AI enables:
- 1You code with Claude, Cursor, Codex, or any AI tool (unchanged)
- 2Session hooks capture what happened (decisions, issues, code changes)
- 3Between sessions, ACE's embedded DeepSeek brain processes everything — the Observer watches, agents reason, genomes evolve
- 4Next session starts with a briefing of everything discovered
| Cloud (BYOK) | Local (Ollama) | |
|---|---|---|
| Quality | Best (Claude, GPT-5) | Good (DeepSeek R1) |
| Speed | Fast (cloud GPUs) | Slower (local CPU/GPU) |
| Cost | Pay per token | Free |
| Privacy | Data sent to provider | Fully private |
| Offline | Requires internet | Works offline |
| Setup | API key required | Install + pull model |
You can use both. Cloud providers are tried first; Ollama is the fallback. If you have API keys configured, cloud models will be preferred for better quality.
| Model | Purpose | Size | RAM |
|---|---|---|---|
deepseek-r1:7b | Agent reasoning | 4.7 GB | 8 GB+ |
deepseek-r1:14b | Higher quality reasoning | 9 GB | 16 GB+ |
mxbai-embed-large | Semantic search | 670 MB | 4 GB+ |
Quick setup (recommended)
# Guided setup: checks Ollama, pulls models
ace ollama setup
This checks if Ollama is installed and running, then pulls the recommended models (DeepSeek R1 7B + mxbai-embed-large).
Status & models
# Check Local AI status
ace ollama status
# List installed models
ace ollama models
# Overall ACE status (includes Local AI section)
ace status
Ollama not detected
Make sure Ollama is running: ollama serve (or it may already be running as a system service). Verify with curl http://localhost:11434/api/tags.
Model not found
Pull the required models: ollama pull deepseek-r1:7b and ollama pull mxbai-embed-large. Run ace ollama models to verify.
Port conflict (11434 in use)
If another service uses port 11434, set a custom URL via OLLAMA_HOST environment variable and update the Ollama URL in ACE Settings.
Out of memory / slow responses
The 7B model needs ~8 GB RAM. If your machine is constrained, close other applications or consider the smaller models. A dedicated GPU (8 GB+ VRAM) significantly improves speed.
Agents not using Ollama
Ollama is the last fallback. If you have cloud API keys configured, agents will use those first (higher quality). To force Ollama, remove cloud API keys from Settings.
Ollama works out of the box with zero config. For advanced use:
- Custom URL: Change from
localhost:11434in Dashboard → Settings → Local AI - Model selection: Choose different chat/embedding models from the dropdowns in Settings
- Per-agent override: Set individual agents to use Ollama in the Agent Config dialog
- Disable: Turn off Local AI fallback entirely in Settings if you only want cloud providers
Does this replace my cloud AI provider?
No. If you have API keys configured, cloud providers are used first (they produce higher quality results). Ollama is a fallback for when no cloud keys are available.
Can I use both cloud and local?
Yes. The fallback chain tries Anthropic → OpenAI → Google → Ollama. If cloud fails, Ollama catches it.
Do I need a GPU?
No. Ollama runs on CPU (slower but works). A GPU with 8GB+ VRAM significantly speeds up inference.
What happens if Ollama isn't running?
Agents gracefully skip the LLM step, same as before Local AI existed. No errors, no crashes.