Local AI via Ollama

Free

Run ACE's background agents locally with zero API costs, zero configuration, and complete privacy

What is Local AI?

ACE's 10 background agents use LLMs to reason about your knowledge — finding contradictions, discovering connections, and surfacing insights. By default, this requires cloud API keys (Anthropic, OpenAI, or Google).

Local AI adds Ollama as a built-in fallback. When no cloud API keys are configured, agents automatically use a local model (DeepSeek R1) running on your machine. Zero cost, zero config, fully private.

LLM Fallback Chain

Anthropic→OpenAI→Google→Ollama (Local)

Cloud providers are tried first (better quality). Ollama is the safety net.

Zero Cost

Local models run on your hardware. No API fees, no usage limits, no billing surprises.

Complete Privacy

Your data never leaves your machine. No cloud inference, no third-party processing.

Zero Configuration

Install Ollama and pull a model. ACE detects it automatically and routes agents to it.

Works Offline

No internet required for agent intelligence. Perfect for air-gapped or restricted environments.

Installation

Get up and running in 2 minutes

1. Install Ollama

macOS

brew install ollama

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download from ollama.com/download

2. Pull the recommended models

# Chat model for agent reasoning (4.7 GB)

ollama pull deepseek-r1:7b

# Embedding model for semantic search (670 MB)

ollama pull mxbai-embed-large

3. Verify it works

# Check Ollama is running

curl http://localhost:11434/api/tags

ACE detects Ollama automatically. No additional configuration needed.

How It Works

The intelligence loop that Local AI enables:

1You code with Claude, Cursor, Codex, or any AI tool (unchanged)
2Session hooks capture what happened (decisions, issues, code changes)
3Between sessions, ACE's embedded DeepSeek brain processes everything — the Observer watches, agents reason, genomes evolve
4Next session starts with a briefing of everything discovered

Cloud vs Local

Both work great. Use cloud for quality, local for cost and privacy.

	Cloud (BYOK)	Local (Ollama)
Quality	Best (Claude, GPT-5)	Good (DeepSeek R1)
Speed	Fast (cloud GPUs)	Slower (local CPU/GPU)
Cost	Pay per token	Free
Privacy	Data sent to provider	Fully private
Offline	Requires internet	Works offline
Setup	API key required	Install + pull model

You can use both. Cloud providers are tried first; Ollama is the fallback. If you have API keys configured, cloud models will be preferred for better quality.

Model Requirements

Model	Purpose	Size	RAM
`deepseek-r1:7b`	Agent reasoning	4.7 GB	8 GB+
`deepseek-r1:14b`	Higher quality reasoning	9 GB	16 GB+
`mxbai-embed-large`	Semantic search	670 MB	4 GB+

ACE CLI Integration

Manage Local AI from the command line

Quick setup (recommended)

# Guided setup: checks Ollama, pulls models

ace ollama setup

This checks if Ollama is installed and running, then pulls the recommended models (DeepSeek R1 7B + mxbai-embed-large).

Status & models

# Check Local AI status

ace ollama status

# List installed models

ace ollama models

# Overall ACE status (includes Local AI section)

ace status

Troubleshooting

Ollama not detected

Make sure Ollama is running: ollama serve (or it may already be running as a system service). Verify with curl http://localhost:11434/api/tags.

Model not found

Pull the required models: ollama pull deepseek-r1:7b and ollama pull mxbai-embed-large. Run ace ollama models to verify.

Port conflict (11434 in use)

If another service uses port 11434, set a custom URL via OLLAMA_HOST environment variable and update the Ollama URL in ACE Settings.

Out of memory / slow responses

The 7B model needs ~8 GB RAM. If your machine is constrained, close other applications or consider the smaller models. A dedicated GPU (8 GB+ VRAM) significantly improves speed.

Agents not using Ollama

Ollama is the last fallback. If you have cloud API keys configured, agents will use those first (higher quality). To force Ollama, remove cloud API keys from Settings.

Configuration

Ollama works out of the box with zero config. For advanced use:

Custom URL: Change from localhost:11434 in Dashboard → Settings → Local AI
Model selection: Choose different chat/embedding models from the dropdowns in Settings
Per-agent override: Set individual agents to use Ollama in the Agent Config dialog
Disable: Turn off Local AI fallback entirely in Settings if you only want cloud providers

FAQ

Does this replace my cloud AI provider?

No. If you have API keys configured, cloud providers are used first (they produce higher quality results). Ollama is a fallback for when no cloud keys are available.

Can I use both cloud and local?

Yes. The fallback chain tries Anthropic → OpenAI → Google → Ollama. If cloud fails, Ollama catches it.

Do I need a GPU?

No. Ollama runs on CPU (slower but works). A GPU with 8GB+ VRAM significantly speeds up inference.

What happens if Ollama isn't running?

Agents gracefully skip the LLM step, same as before Local AI existed. No errors, no crashes.