local models lmstudio ollama openclaw self-hosted privacy llm

Using Local Models (LMStudio, Ollama) with OpenClaw

How to run OpenClaw with local AI models using LMStudio or Ollama. Covers setup, configuration, common errors, and performance expectations.

Using Local Models (LMStudio, Ollama) with OpenClaw

TL;DR: You can run OpenClaw with local models via LMStudio or Ollama using their OpenAI-compatible APIs. It works, but expect slower responses and less capable output than cloud models. Great for privacy and experimentation.


Why Use Local Models?

Most people run OpenClaw with cloud models — Claude, GPT-4, Gemini. But there are legitimate reasons to run models locally:

Privacy

Your conversations never leave your machine. No data sent to Anthropic, OpenAI, or Google. For sensitive work — medical records, legal documents, financial data, personal journals — this matters.

Cost

After the upfront hardware investment, local inference is essentially free. No per-token charges, no surprise $50 bills at the end of the month. If you already have a decent GPU, this is appealing.

Offline Access

Cloud APIs require internet. Local models don't. If you're on a plane, at a cabin, or just have unreliable internet, local models keep working.

Learning & Experimentation

Running models locally teaches you how LLMs actually work. You can try different models, adjust parameters, and understand what's happening under the hood. It's genuinely educational.

The Tradeoff

Let's be honest: local models are significantly less capable than the best cloud models. A 7B parameter model running on your laptop is not going to match Claude Sonnet or GPT-4o. But for many tasks — drafting emails, answering questions, brainstorming, summarizing — they're more than adequate.


Option 1: LMStudio Setup

LMStudio is the easiest way to run local models. It has a nice GUI, handles model downloads, and exposes an OpenAI-compatible API.

Step 1: Install LMStudio

Download from lmstudio.ai (available for macOS, Windows, and Linux).

Step 2: Download a Model

Open LMStudio and browse or search for a model. Good starting choices:

Model Size RAM Needed Good For
Llama 3.1 8B ~5 GB 8+ GB General chat, fast responses
Mistral 7B Instruct ~4.5 GB 8+ GB Following instructions
Llama 3.1 70B (Q4) ~40 GB 48+ GB Near-cloud quality
Qwen 2.5 14B ~9 GB 16+ GB Good balance of speed/quality
DeepSeek R1 Distill 14B ~9 GB 16+ GB Reasoning tasks

Click "Download" next to the model you want. GGUF quantized versions (Q4_K_M or Q5_K_M) offer the best speed/quality balance.

Step 3: Start the Local Server

  1. Go to the Local Server tab (left sidebar, server icon)
  2. Select your downloaded model from the dropdown
  3. Click Start Server
  4. Note the URL — by default it's http://localhost:1234

You should see:

Server started on http://localhost:1234

Step 4: Configure OpenClaw

Add a custom provider to your OpenClaw config:

providers:
  lmstudio:
    type: openai
    baseUrl: "http://localhost:1234/v1"
    apiKey: "lm-studio"  # LMStudio doesn't require a real key, but the field needs a value
    models:
      - id: "local-model"
        name: "LMStudio Local"

defaultModel: "lmstudio/local-model"

Important: The apiKey field must be present even though LMStudio doesn't check it. If you leave it blank or omit it, OpenClaw may throw an "API key undefined" error. Use any placeholder string.

Step 5: Restart OpenClaw

clawdbot gateway restart

Send a message to your bot. It should now respond using your local model.


Option 2: Ollama Setup

Ollama is a command-line tool for running local models. It's lighter than LMStudio and great for servers or headless setups.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or on macOS with Homebrew
brew install ollama

For Windows, download from ollama.com.

Step 2: Pull a Model

# Good all-around model
ollama pull llama3.1:8b

# Smaller and faster
ollama pull mistral:7b

# If you have the RAM (48GB+)
ollama pull llama3.1:70b

Step 3: Start the Ollama Server

Ollama runs as a service automatically after install. Verify it's running:

ollama list    # Shows downloaded models
ollama ps      # Shows running models

The API is available at http://localhost:11434 by default.

Step 4: Configure OpenClaw

providers:
  ollama:
    type: openai
    baseUrl: "http://localhost:11434/v1"
    apiKey: "ollama"  # Placeholder — Ollama doesn't check this
    models:
      - id: "llama3.1:8b"
        name: "Llama 3.1 8B"

defaultModel: "ollama/llama3.1:8b"

Step 5: Restart and Test

clawdbot gateway restart

Common Issues (and Fixes)

"API key undefined" or "apiKey is required"

Problem: OpenClaw expects an API key in the provider config, even for local models that don't use one.

Fix: Set the apiKey field to any non-empty string:

apiKey: "not-needed"

This is the #1 issue people hit when setting up local models. It's a known pain point.

"Connection refused" on localhost

Problem: OpenClaw can't reach the local model server.

Checklist:

  1. Is the server actually running?

    • LMStudio: Check the Server tab — is it started?
    • Ollama: Run ollama ps — is a model loaded?
  2. Correct port?

    • LMStudio default: 1234
    • Ollama default: 11434
    • Check if you changed it
  3. Firewall blocking localhost? (rare but happens on some Linux setups)

    curl http://localhost:1234/v1/models  # LMStudio
    curl http://localhost:11434/v1/models  # Ollama
    
  4. Running in Docker? If OpenClaw is in a container, localhost inside the container isn't the host machine. Use host.docker.internal instead:

    baseUrl: "http://host.docker.internal:1234/v1"
    

Model loads but responses are garbage

Possible causes:

  • Wrong chat template: Some models need specific prompt formatting. LMStudio usually handles this automatically, but Ollama might need the right Modelfile
  • Quantization too aggressive: Q2 or Q3 quantizations sacrifice too much quality. Stick with Q4_K_M or higher
  • Context window too small: Some configs default to 2048 tokens. Increase it:
    # Ollama
    ollama run llama3.1:8b --num-ctx 8192
    

Extremely slow responses

Local model speed depends on your hardware:

  • CPU only: Expect 2-10 tokens/second. Usable but sluggish
  • Apple Silicon (M1/M2/M3): 15-40 tokens/second for 7B models. Quite good
  • NVIDIA GPU (8GB+ VRAM): 30-80 tokens/second for 7B models. Fast
  • Multiple GPUs / high VRAM: Can run 70B models at reasonable speeds

If it's painfully slow, try a smaller model or a more aggressive quantization.

Ollama uses too much RAM/VRAM

Ollama keeps models loaded in memory by default. To unload:

ollama stop llama3.1:8b

Or set an auto-unload timeout in your Ollama config.


Performance Expectations: Local vs Cloud

Let's set realistic expectations. This table compares what you'll get:

Aspect Local 7-8B Local 70B Claude Sonnet GPT-4o
Response quality ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Speed (tokens/sec) 10-40 3-15 50-100 50-100
Following complex instructions Fair Good Excellent Excellent
Code generation Decent Good Excellent Excellent
Creative writing Decent Good Excellent Good
Cost per month $0* $0* $20-100 $20-100
Privacy ✅ Total ✅ Total ❌ Cloud ❌ Cloud
Internet required ❌ No ❌ No ✅ Yes ✅ Yes

*After hardware costs. Running a 70B model needs significant hardware (~$1000+ for appropriate GPU).

Where Local Models Shine

  • Quick Q&A and chat
  • Drafting and editing text
  • Summarizing documents
  • Private/sensitive conversations
  • Offline use
  • Learning how LLMs work

Where Cloud Models Still Win

  • Complex multi-step reasoning
  • Long context windows (200K+ tokens)
  • Code generation for large projects
  • Following nuanced instructions
  • Tool use and function calling
  • Speed and consistency

Hybrid Approach: Best of Both

Some people run local models for routine tasks and switch to cloud models for complex ones. You can configure multiple providers in OpenClaw and switch between them:

providers:
  ollama:
    type: openai
    baseUrl: "http://localhost:11434/v1"
    apiKey: "ollama"
    models:
      - id: "llama3.1:8b"
  anthropic:
    type: anthropic
    apiKey: "sk-ant-..."
    models:
      - id: "claude-sonnet-4-20250514"

defaultModel: "ollama/llama3.1:8b"  # Use local by default

Then switch models when you need more power, either through config or by telling your assistant to switch.


The Easy Way

Local models are great for specific use cases, but configuring providers, debugging connection issues, and managing model downloads adds friction that most people don't want.

If you want to experiment with local models, go for it — it's a great learning experience. But if you just want a working AI assistant without managing infrastructure, lobsterfarm provides managed OpenClaw hosting with deployment, updates, and support.

Get started with lobsterfarm →

Skip the setup. Start using your AI assistant today.

lobsterfarm gives you a fully managed OpenClaw instance — one click, your own server, running 24/7.