Skip to main content

Local AI via Ollama

Free

Run ACE's background agents locally with zero API costs, zero configuration, and complete privacy

What is Local AI?

ACE's 10 background agents use LLMs to reason about your knowledge — finding contradictions, discovering connections, and surfacing insights. By default, this requires cloud API keys (Anthropic, OpenAI, or Google).

Local AI adds Ollama as a built-in fallback. When no cloud API keys are configured, agents automatically use a local model (DeepSeek R1) running on your machine. Zero cost, zero config, fully private.

LLM Fallback Chain

AnthropicOpenAIGoogleOllama (Local)

Cloud providers are tried first (better quality). Ollama is the safety net.

Zero Cost

Local models run on your hardware. No API fees, no usage limits, no billing surprises.

Complete Privacy

Your data never leaves your machine. No cloud inference, no third-party processing.

Zero Configuration

Install Ollama and pull a model. ACE detects it automatically and routes agents to it.

Works Offline

No internet required for agent intelligence. Perfect for air-gapped or restricted environments.

Installation
Get up and running in 2 minutes

1. Install Ollama

macOS
brew install ollama
Linux
curl -fsSL https://ollama.com/install.sh | sh
Windows
Download from ollama.com/download

2. Pull the recommended models

# Chat model for agent reasoning (4.7 GB)

ollama pull deepseek-r1:7b

# Embedding model for semantic search (670 MB)

ollama pull mxbai-embed-large

3. Verify it works

# Check Ollama is running

curl http://localhost:11434/api/tags

ACE detects Ollama automatically. No additional configuration needed.

How It Works

The intelligence loop that Local AI enables:

  1. 1You code with Claude, Cursor, Codex, or any AI tool (unchanged)
  2. 2Session hooks capture what happened (decisions, issues, code changes)
  3. 3Between sessions, ACE's embedded DeepSeek brain processes everything — the Observer watches, agents reason, genomes evolve
  4. 4Next session starts with a briefing of everything discovered
Cloud vs Local
Both work great. Use cloud for quality, local for cost and privacy.
Cloud (BYOK)Local (Ollama)
QualityBest (Claude, GPT-5)Good (DeepSeek R1)
SpeedFast (cloud GPUs)Slower (local CPU/GPU)
CostPay per tokenFree
PrivacyData sent to providerFully private
OfflineRequires internetWorks offline
SetupAPI key requiredInstall + pull model

You can use both. Cloud providers are tried first; Ollama is the fallback. If you have API keys configured, cloud models will be preferred for better quality.

Model Requirements
ModelPurposeSizeRAM
deepseek-r1:7bAgent reasoning4.7 GB8 GB+
deepseek-r1:14bHigher quality reasoning9 GB16 GB+
mxbai-embed-largeSemantic search670 MB4 GB+
ACE CLI Integration
Manage Local AI from the command line

Quick setup (recommended)

# Guided setup: checks Ollama, pulls models

ace ollama setup

This checks if Ollama is installed and running, then pulls the recommended models (DeepSeek R1 7B + mxbai-embed-large).

Status & models

# Check Local AI status

ace ollama status

# List installed models

ace ollama models

# Overall ACE status (includes Local AI section)

ace status

Troubleshooting

Ollama not detected

Make sure Ollama is running: ollama serve (or it may already be running as a system service). Verify with curl http://localhost:11434/api/tags.

Model not found

Pull the required models: ollama pull deepseek-r1:7b and ollama pull mxbai-embed-large. Run ace ollama models to verify.

Port conflict (11434 in use)

If another service uses port 11434, set a custom URL via OLLAMA_HOST environment variable and update the Ollama URL in ACE Settings.

Out of memory / slow responses

The 7B model needs ~8 GB RAM. If your machine is constrained, close other applications or consider the smaller models. A dedicated GPU (8 GB+ VRAM) significantly improves speed.

Agents not using Ollama

Ollama is the last fallback. If you have cloud API keys configured, agents will use those first (higher quality). To force Ollama, remove cloud API keys from Settings.

Configuration

Ollama works out of the box with zero config. For advanced use:

  • Custom URL: Change from localhost:11434 in Dashboard → Settings → Local AI
  • Model selection: Choose different chat/embedding models from the dropdowns in Settings
  • Per-agent override: Set individual agents to use Ollama in the Agent Config dialog
  • Disable: Turn off Local AI fallback entirely in Settings if you only want cloud providers
FAQ

Does this replace my cloud AI provider?

No. If you have API keys configured, cloud providers are used first (they produce higher quality results). Ollama is a fallback for when no cloud keys are available.

Can I use both cloud and local?

Yes. The fallback chain tries Anthropic → OpenAI → Google → Ollama. If cloud fails, Ollama catches it.

Do I need a GPU?

No. Ollama runs on CPU (slower but works). A GPU with 8GB+ VRAM significantly speeds up inference.

What happens if Ollama isn't running?

Agents gracefully skip the LLM step, same as before Local AI existed. No errors, no crashes.