Using Local Models (LMStudio, Ollama) with OpenClaw
How to run OpenClaw with local AI models using LMStudio or Ollama. Covers setup, configuration, common errors, and performance expectations.
Using Local Models (LMStudio, Ollama) with OpenClaw
TL;DR: You can run OpenClaw with local models via LMStudio or Ollama using their OpenAI-compatible APIs. It works, but expect slower responses and less capable output than cloud models. Great for privacy and experimentation.
Why Use Local Models?
Most people run OpenClaw with cloud models — Claude, GPT-4, Gemini. But there are legitimate reasons to run models locally:
Privacy
Your conversations never leave your machine. No data sent to Anthropic, OpenAI, or Google. For sensitive work — medical records, legal documents, financial data, personal journals — this matters.
Cost
After the upfront hardware investment, local inference is essentially free. No per-token charges, no surprise $50 bills at the end of the month. If you already have a decent GPU, this is appealing.
Offline Access
Cloud APIs require internet. Local models don't. If you're on a plane, at a cabin, or just have unreliable internet, local models keep working.
Learning & Experimentation
Running models locally teaches you how LLMs actually work. You can try different models, adjust parameters, and understand what's happening under the hood. It's genuinely educational.
The Tradeoff
Let's be honest: local models are significantly less capable than the best cloud models. A 7B parameter model running on your laptop is not going to match Claude Sonnet or GPT-4o. But for many tasks — drafting emails, answering questions, brainstorming, summarizing — they're more than adequate.
Option 1: LMStudio Setup
LMStudio is the easiest way to run local models. It has a nice GUI, handles model downloads, and exposes an OpenAI-compatible API.
Step 1: Install LMStudio
Download from lmstudio.ai (available for macOS, Windows, and Linux).
Step 2: Download a Model
Open LMStudio and browse or search for a model. Good starting choices:
| Model | Size | RAM Needed | Good For |
|---|---|---|---|
| Llama 3.1 8B | ~5 GB | 8+ GB | General chat, fast responses |
| Mistral 7B Instruct | ~4.5 GB | 8+ GB | Following instructions |
| Llama 3.1 70B (Q4) | ~40 GB | 48+ GB | Near-cloud quality |
| Qwen 2.5 14B | ~9 GB | 16+ GB | Good balance of speed/quality |
| DeepSeek R1 Distill 14B | ~9 GB | 16+ GB | Reasoning tasks |
Click "Download" next to the model you want. GGUF quantized versions (Q4_K_M or Q5_K_M) offer the best speed/quality balance.
Step 3: Start the Local Server
- Go to the Local Server tab (left sidebar, server icon)
- Select your downloaded model from the dropdown
- Click Start Server
- Note the URL — by default it's
http://localhost:1234
You should see:
Server started on http://localhost:1234
Step 4: Configure OpenClaw
Add a custom provider to your OpenClaw config:
providers:
lmstudio:
type: openai
baseUrl: "http://localhost:1234/v1"
apiKey: "lm-studio" # LMStudio doesn't require a real key, but the field needs a value
models:
- id: "local-model"
name: "LMStudio Local"
defaultModel: "lmstudio/local-model"
Important: The apiKey field must be present even though LMStudio doesn't check it. If you leave it blank or omit it, OpenClaw may throw an "API key undefined" error. Use any placeholder string.
Step 5: Restart OpenClaw
clawdbot gateway restart
Send a message to your bot. It should now respond using your local model.
Option 2: Ollama Setup
Ollama is a command-line tool for running local models. It's lighter than LMStudio and great for servers or headless setups.
Step 1: Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or on macOS with Homebrew
brew install ollama
For Windows, download from ollama.com.
Step 2: Pull a Model
# Good all-around model
ollama pull llama3.1:8b
# Smaller and faster
ollama pull mistral:7b
# If you have the RAM (48GB+)
ollama pull llama3.1:70b
Step 3: Start the Ollama Server
Ollama runs as a service automatically after install. Verify it's running:
ollama list # Shows downloaded models
ollama ps # Shows running models
The API is available at http://localhost:11434 by default.
Step 4: Configure OpenClaw
providers:
ollama:
type: openai
baseUrl: "http://localhost:11434/v1"
apiKey: "ollama" # Placeholder — Ollama doesn't check this
models:
- id: "llama3.1:8b"
name: "Llama 3.1 8B"
defaultModel: "ollama/llama3.1:8b"
Step 5: Restart and Test
clawdbot gateway restart
Common Issues (and Fixes)
"API key undefined" or "apiKey is required"
Problem: OpenClaw expects an API key in the provider config, even for local models that don't use one.
Fix: Set the apiKey field to any non-empty string:
apiKey: "not-needed"
This is the #1 issue people hit when setting up local models. It's a known pain point.
"Connection refused" on localhost
Problem: OpenClaw can't reach the local model server.
Checklist:
-
Is the server actually running?
- LMStudio: Check the Server tab — is it started?
- Ollama: Run
ollama ps— is a model loaded?
-
Correct port?
- LMStudio default:
1234 - Ollama default:
11434 - Check if you changed it
- LMStudio default:
-
Firewall blocking localhost? (rare but happens on some Linux setups)
curl http://localhost:1234/v1/models # LMStudio curl http://localhost:11434/v1/models # Ollama -
Running in Docker? If OpenClaw is in a container,
localhostinside the container isn't the host machine. Usehost.docker.internalinstead:baseUrl: "http://host.docker.internal:1234/v1"
Model loads but responses are garbage
Possible causes:
- Wrong chat template: Some models need specific prompt formatting. LMStudio usually handles this automatically, but Ollama might need the right Modelfile
- Quantization too aggressive: Q2 or Q3 quantizations sacrifice too much quality. Stick with Q4_K_M or higher
- Context window too small: Some configs default to 2048 tokens. Increase it:
# Ollama ollama run llama3.1:8b --num-ctx 8192
Extremely slow responses
Local model speed depends on your hardware:
- CPU only: Expect 2-10 tokens/second. Usable but sluggish
- Apple Silicon (M1/M2/M3): 15-40 tokens/second for 7B models. Quite good
- NVIDIA GPU (8GB+ VRAM): 30-80 tokens/second for 7B models. Fast
- Multiple GPUs / high VRAM: Can run 70B models at reasonable speeds
If it's painfully slow, try a smaller model or a more aggressive quantization.
Ollama uses too much RAM/VRAM
Ollama keeps models loaded in memory by default. To unload:
ollama stop llama3.1:8b
Or set an auto-unload timeout in your Ollama config.
Performance Expectations: Local vs Cloud
Let's set realistic expectations. This table compares what you'll get:
| Aspect | Local 7-8B | Local 70B | Claude Sonnet | GPT-4o |
|---|---|---|---|---|
| Response quality | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Speed (tokens/sec) | 10-40 | 3-15 | 50-100 | 50-100 |
| Following complex instructions | Fair | Good | Excellent | Excellent |
| Code generation | Decent | Good | Excellent | Excellent |
| Creative writing | Decent | Good | Excellent | Good |
| Cost per month | $0* | $0* | $20-100 | $20-100 |
| Privacy | ✅ Total | ✅ Total | ❌ Cloud | ❌ Cloud |
| Internet required | ❌ No | ❌ No | ✅ Yes | ✅ Yes |
*After hardware costs. Running a 70B model needs significant hardware (~$1000+ for appropriate GPU).
Where Local Models Shine
- Quick Q&A and chat
- Drafting and editing text
- Summarizing documents
- Private/sensitive conversations
- Offline use
- Learning how LLMs work
Where Cloud Models Still Win
- Complex multi-step reasoning
- Long context windows (200K+ tokens)
- Code generation for large projects
- Following nuanced instructions
- Tool use and function calling
- Speed and consistency
Hybrid Approach: Best of Both
Some people run local models for routine tasks and switch to cloud models for complex ones. You can configure multiple providers in OpenClaw and switch between them:
providers:
ollama:
type: openai
baseUrl: "http://localhost:11434/v1"
apiKey: "ollama"
models:
- id: "llama3.1:8b"
anthropic:
type: anthropic
apiKey: "sk-ant-..."
models:
- id: "claude-sonnet-4-20250514"
defaultModel: "ollama/llama3.1:8b" # Use local by default
Then switch models when you need more power, either through config or by telling your assistant to switch.
The Easy Way
Local models are great for specific use cases, but configuring providers, debugging connection issues, and managing model downloads adds friction that most people don't want.
If you want to experiment with local models, go for it — it's a great learning experience. But if you just want a working AI assistant without managing infrastructure, lobsterfarm provides managed OpenClaw hosting with deployment, updates, and support.
Skip the setup. Start using your AI assistant today.
lobsterfarm gives you a fully managed OpenClaw instance — one click, your own server, running 24/7.