Complete Guide: Setting Up Ollama with OpenClaw (Free Local AI)
Step-by-step guide to running Ollama with OpenClaw for free, private, local AI. Covers installation, model selection by hardware, configuration, performance tuning, and troubleshooting.
Complete Guide: Setting Up Ollama with OpenClaw (Free Local AI)
TL;DR: Install Ollama, pull a model that fits your hardware, point OpenClaw at localhost:11434, and you have a free, private AI assistant. Takes about 15 minutes. Zero API costs, ever.
Estimated time: 15-30 minutes
Difficulty: Beginner (if you're comfortable with a terminal)
Why Ollama + OpenClaw?
Ollama is the easiest way to run AI models on your own hardware. No cloud API keys, no monthly bills, no data leaving your machine. Combined with OpenClaw, you get a full AI assistant over Telegram, WhatsApp, or Discord — powered entirely by your own computer.
The tradeoff is quality. Local models are good and getting better fast, but they're not Claude or GPT-5.2. For many tasks — answering questions, drafting text, brainstorming, summarizing — they're more than enough. For complex reasoning or long code generation, you'll notice the gap.
Step 1: Install Ollama
Linux / macOS
curl -fsSL https://ollama.com/install.sh | sh
macOS (Homebrew)
brew install ollama
Windows
Download the installer from ollama.com/download.
Verify Installation
ollama --version
You should see something like ollama version 0.7.x. Ollama starts a background service automatically after install.
Step 2: Pick a Model for Your Hardware
This is where people overthink it. Here's the simple version:
Hardware → Model Matrix
| Your RAM | Your GPU VRAM | Recommended Model | Performance |
|---|---|---|---|
| 8 GB | None / integrated | glm4:9b |
Usable, 5-15 tok/s |
| 8 GB | 6-8 GB NVIDIA | llama3.1:8b |
Good, 20-40 tok/s |
| 16 GB | None | qwen3:14b |
Good, 8-15 tok/s |
| 16 GB | 8-12 GB NVIDIA | deepseek-r1:14b |
Good, 15-30 tok/s |
| 32 GB | 16+ GB NVIDIA | qwen3:32b |
Very good, 15-25 tok/s |
| 32 GB+ | Apple Silicon M1-M4 | qwen3:32b |
Very good, 20-40 tok/s |
| 64 GB+ | 24+ GB NVIDIA | deepseek-r1:70b |
Near-cloud quality, 5-15 tok/s |
If you're not sure, start with glm4:9b. It's fast, capable for its size, and runs on basically anything.
Model Recommendations by Use Case
GLM-4 9B — The all-rounder. Fast, follows instructions well, handles multiple languages. Best bang for buck on modest hardware.
ollama pull glm4:9b
DeepSeek-R1 (14B / 32B / 70B) — Best for reasoning tasks. It "thinks" before answering (chain-of-thought), which produces notably better results for math, logic, and analysis. Slower because of the thinking step, but the quality jump is real.
ollama pull deepseek-r1:14b # 16 GB RAM minimum
ollama pull deepseek-r1:32b # 32 GB RAM minimum
ollama pull deepseek-r1:70b # 64 GB RAM minimum
Llama 3.1 (8B / 70B) — Meta's workhorse. Well-rounded, excellent instruction following, huge community support. The 8B version is the classic choice for lightweight setups.
ollama pull llama3.1:8b # 8 GB RAM minimum
ollama pull llama3.1:70b # 64 GB RAM minimum
Qwen 3 (8B / 14B / 32B) — Alibaba's latest. Excellent multilingual support (especially CJK languages), strong coding ability, and good instruction following. The 32B version punches above its weight.
ollama pull qwen3:8b # 8 GB RAM minimum
ollama pull qwen3:14b # 16 GB RAM minimum
ollama pull qwen3:32b # 32 GB RAM minimum
Pull Your Chosen Model
# Example: pulling GLM-4 9B
ollama pull glm4:9b
This downloads the model (several GB depending on size). First pull takes a few minutes on a fast connection.
Verify it's ready:
ollama list
You should see your model listed with its size and modification date.
Step 3: Test the Model Locally
Before configuring OpenClaw, make sure the model actually runs:
ollama run glm4:9b "What's the capital of France?"
You should get a response within a few seconds. If this works, the hard part is done.
Check the API is accessible:
curl http://localhost:11434/v1/models
This should return a JSON list of your downloaded models.
Step 4: Configure OpenClaw
Open your OpenClaw config file and add Ollama as a provider.
config.json
{
"providers": {
"ollama": {
"type": "openai",
"baseUrl": "http://localhost:11434/v1",
"apiKey": "ollama"
}
},
"defaultModel": "ollama/glm4:9b"
}
config.yaml
providers:
ollama:
type: openai
baseUrl: "http://localhost:11434/v1"
apiKey: "ollama"
defaultModel: "ollama/glm4:9b"
Important notes:
- The
typemust beopenai— Ollama exposes an OpenAI-compatible API - The
apiKeymust be present and non-empty, even though Ollama doesn't check it. Use any string. This is the #1 setup issue people hit - The
baseUrlmust include/v1at the end - The
defaultModelformat isprovider/model-name— the model name must exactly match whatollama listshows
Multiple Models
You can configure multiple Ollama models and switch between them:
{
"providers": {
"ollama": {
"type": "openai",
"baseUrl": "http://localhost:11434/v1",
"apiKey": "ollama",
"models": [
{ "id": "glm4:9b", "name": "GLM-4 Fast" },
{ "id": "deepseek-r1:14b", "name": "DeepSeek Reasoning" },
{ "id": "qwen3:14b", "name": "Qwen 3 14B" }
]
}
},
"defaultModel": "ollama/glm4:9b"
}
Step 5: Restart OpenClaw and Test
clawdbot gateway restart
Check the logs:
clawdbot gateway logs
Look for confirmation that the provider loaded without errors. Then send a message to your bot through Telegram, WhatsApp, or whatever channel you have configured.
Performance Tuning
GPU Offloading
If you have an NVIDIA GPU, Ollama automatically uses it. Check GPU usage:
nvidia-smi
If your model is running on CPU despite having a GPU:
# Make sure NVIDIA drivers and CUDA are installed
nvidia-smi # Should show your GPU
# Ollama auto-detects NVIDIA GPUs, but you can force GPU layers
# Create a custom Modelfile
cat > Modelfile << 'EOF'
FROM glm4:9b
PARAMETER num_gpu 999
EOF
ollama create glm4-gpu -f Modelfile
Apple Silicon (M1/M2/M3/M4): Ollama uses Metal acceleration automatically. No configuration needed. Apple Silicon is excellent for local inference — the unified memory means even large models run well.
Context Window
Default context is often 2048 tokens, which is too short for a conversation. Increase it:
cat > Modelfile << 'EOF'
FROM glm4:9b
PARAMETER num_ctx 8192
EOF
ollama create glm4-8k -f Modelfile
Then update your OpenClaw config to use glm4-8k instead.
Context window options:
| Setting | RAM Impact | Good For |
|---|---|---|
| 2048 | Minimal | Quick Q&A, not conversations |
| 4096 | Low | Short conversations |
| 8192 | Moderate | Normal chat sessions |
| 16384 | High | Long conversations with memory |
| 32768 | Very high | Document analysis |
Warning: Larger context windows use significantly more RAM. A 7B model with 32K context can easily use 12-16 GB.
Quantization Levels
When you pull a model with ollama pull, you get the default quantization (usually Q4_K_M). You can choose different levels:
| Quantization | Quality | Size | Speed |
|---|---|---|---|
| Q2_K | ⭐⭐ | Smallest | Fastest |
| Q4_K_M | ⭐⭐⭐⭐ | Medium | Good balance |
| Q5_K_M | ⭐⭐⭐⭐ | Larger | Slightly slower |
| Q6_K | ⭐⭐⭐⭐⭐ | Large | Slower |
| Q8_0 | ⭐⭐⭐⭐⭐ | Largest | Slowest |
| fp16 | ⭐⭐⭐⭐⭐ | Huge | Slowest |
Recommendation: Stick with Q4_K_M (the default). It's the sweet spot. Only go lower if your hardware can't handle it, and only go higher if you have RAM to spare and want marginal quality improvement.
Keep Models Loaded
By default, Ollama unloads models after 5 minutes of inactivity. For an always-ready assistant, keep the model loaded:
# Set keep-alive to 24 hours
curl http://localhost:11434/api/generate -d '{
"model": "glm4:9b",
"keep_alive": "24h"
}'
Or set it system-wide in your Ollama environment:
# Add to ~/.bashrc or /etc/environment
export OLLAMA_KEEP_ALIVE=24h
Local vs Cloud: Honest Comparison
| Aspect | Ollama (7-9B) | Ollama (32-70B) | Claude Sonnet | GPT-5.2 |
|---|---|---|---|---|
| Monthly cost | $0 | $0 | $15-40 | $15-40 |
| Response quality | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Speed | 10-40 tok/s | 5-20 tok/s | 50-100 tok/s | 50-100 tok/s |
| Complex reasoning | Fair | Good | Excellent | Excellent |
| Tool/function calling | Limited | Decent | Excellent | Excellent |
| Privacy | ✅ Total | ✅ Total | ❌ Cloud | ❌ Cloud |
| Needs internet | ❌ No | ❌ No | ✅ Yes | ✅ Yes |
| Hardware needed | 8 GB RAM | 32-64 GB RAM | None | None |
Where local wins
- You value privacy above everything
- You want zero ongoing costs
- You need offline access
- You're experimenting and learning
Where cloud wins
- Complex multi-step tasks and coding
- Long context windows (200K+ tokens)
- Tool use and function calling
- Consistent speed regardless of your hardware
The hybrid approach
Run Ollama for everyday questions and switch to a cloud model when you need heavy lifting. Configure both providers in OpenClaw and switch as needed.
Common Issues and Fixes
"Connection refused" when OpenClaw tries to reach Ollama
# Check if Ollama is running
systemctl status ollama # Linux
ollama ps # Any platform
# If not running, start it
ollama serve # Foreground
systemctl start ollama # Linux, background
Responses are extremely slow
- No GPU detected: Check
ollama ps— it shows whether GPU layers are being used - Model too large for your RAM: If the model doesn't fit in RAM, it pages to disk, which is 10-100x slower. Use a smaller model
- High context window: Reduce
num_ctxif you set it very high - Other programs using GPU: Close GPU-intensive apps (games, video editors)
"Out of memory" or system freezes
The model is too big for your hardware. Either:
- Use a smaller model (
8binstead of14b) - Use a more aggressive quantization (
Q2_Kinstead ofQ4_K_M) - Reduce the context window
- Close other memory-heavy applications
"apiKey is required" error in OpenClaw
Set apiKey to any non-empty string in your provider config. Ollama doesn't check it, but OpenClaw requires the field to exist.
Model outputs garbage or stops mid-sentence
- Wrong chat template: Pull the model again — Ollama usually handles templates automatically
- Q2 quantization: Too aggressive. Pull a Q4_K_M version instead
- Context overflow: The conversation exceeded the model's context window. Start a new session
Ollama works locally but not from OpenClaw in Docker
If OpenClaw runs in Docker, localhost inside the container doesn't reach the host machine. Use:
{
"providers": {
"ollama": {
"type": "openai",
"baseUrl": "http://host.docker.internal:11434/v1",
"apiKey": "ollama"
}
}
}
Or use the host's actual IP address.
The Easy Way
Setting up Ollama is straightforward, but you're trading cloud model quality for privacy and zero cost. If that tradeoff works for you, this is a great setup.
If you'd rather not manage server infrastructure, lobsterfarm provides managed OpenClaw hosting — deployment, updates, and support handled for you.
Skip the setup. Start using your AI assistant today.
lobsterfarm gives you a fully managed OpenClaw instance — one click, your own server, running 24/7.