troubleshooting openclaw ollama local-ai model-config context-window

Fix: OpenClaw Gives Bad Responses with Ollama (Local AI Acting Weird)

Your OpenClaw bot acts dumb with Ollama? It's not the bot — it's the model config. Here's how to fix context windows, model selection, and settings.

February 3, 2026

Fix: OpenClaw Gives Bad Responses with Ollama (Local AI Acting Weird)

TL;DR: Your local model's context window is probably too small, you're using a model that's too lightweight, or your config is wrong. Here's how to fix all three.

The Error

There's no single error message here. Instead, you get symptoms:

Bot gives nonsensical, unrelated, or repetitive answers
Bot ignores your system prompt / personality
Bot "forgets" what you said 2 messages ago
Bot responds with fragments or half-sentences
Bot works great for simple questions but falls apart in real conversations

As one GitHub user put it:

Why does my bot act like an idiot with Ollama?

Yeah. That was literally the issue title. (GitHub #4333)

Why This Happens

Local models through Ollama are fundamentally different from cloud APIs like Claude or GPT-5. Three things usually go wrong:

1. Context window is too small

OpenClaw sends the entire conversation history (system prompt + all messages) with each request. Cloud models handle 128K-200K tokens easily. Most local models default to 2048-4096 tokens — which fills up after just a few exchanges.

When the context overflows, the model either:

Ignores older messages (including your system prompt)
Generates garbage because it can't "see" the full conversation
Repeats itself because it's lost track of what was already said

2. Model is too small

Running a 3B or 7B parameter model and expecting Claude-level intelligence is like putting a bicycle engine in a truck. Small models are great for simple Q&A but fall apart with:

Complex instructions
Multi-turn conversations
Tool use
Maintaining a consistent persona

3. Config is wrong or missing

Ollama has specific settings that OpenClaw needs. Missing or wrong config means the model runs with bad defaults.

(GitHub #2425)

How to Fix It

Step 1: Choose the right model

Not all models are created equal. Here's a realistic guide:

Your Hardware	Recommended Model	VRAM Needed
8GB VRAM (RTX 3070)	`llama3.1:8b-instruct-q4_K_M`	~5GB
12GB VRAM (RTX 3080)	`mistral-nemo:12b-instruct`	~8GB
16GB VRAM (RTX 4080)	`llama3.1:70b-instruct-q4_K_M`	~40GB*
24GB+ VRAM (RTX 4090)	`qwen2.5:72b-instruct-q4_K_M`	~40GB*
CPU only (no GPU)	`llama3.2:3b`	N/A (slow)

*70B+ models need multiple GPUs or will offload to CPU (very slow).

Pull your model:

ollama pull llama3.1:8b-instruct-q4_K_M

Rule of thumb: Use the largest instruct/chat model your hardware can run at reasonable speed. If it takes more than 5 seconds per response, go smaller.

Step 2: Increase the context window

This is the #1 fix for "dumb" behavior. Tell Ollama to use a larger context:

# Create a custom model with more context
cat > ~/Modelfile << 'EOF'
FROM llama3.1:8b-instruct-q4_K_M
PARAMETER num_ctx 8192
EOF

ollama create llama3.1-8k -f ~/Modelfile

For even longer conversations:

cat > ~/Modelfile << 'EOF'
FROM llama3.1:8b-instruct-q4_K_M
PARAMETER num_ctx 16384
EOF

ollama create llama3.1-16k -f ~/Modelfile

Warning: Larger context windows use more VRAM. If you set num_ctx too high, Ollama will either crash or fall back to CPU (which is painfully slow). Start with 8192 and increase if your GPU can handle it.

Step 3: Configure OpenClaw correctly

Here's a config that actually works:

{
  "providers": {
    "ollama": {
      "type": "ollama",
      "baseUrl": "http://localhost:11434",
      "model": "llama3.1-8k",
      "options": {
        "temperature": 0.7,
        "num_ctx": 8192,
        "num_predict": 1024,
        "top_p": 0.9,
        "repeat_penalty": 1.1
      }
    }
  },
  "ai": {
    "provider": "ollama",
    "maxTokens": 1024,
    "contextStrategy": "trim"
  }
}

Key settings:

num_ctx: Must match what you set in the Modelfile (or Ollama ignores it)
num_predict: Max tokens per response. Don't set this too high or the model will ramble
repeat_penalty: Set to 1.1 to reduce repetitive responses (a common issue with small models)
contextStrategy: "trim": Tells OpenClaw to trim old messages when approaching the context limit, instead of sending everything and hoping for the best

Step 4: Simplify your system prompt

Cloud models can handle a 2000-word system prompt with detailed personality, rules, and examples. Local models can't — that system prompt eats your context window.

Keep your system prompt under 500 tokens for 8B models:

You are a helpful AI assistant. Be concise and clear in your responses.

vs. the 2000-token personality document you'd use with Claude. Save the detailed stuff for bigger models.

Step 5: Test it

openclaw gateway restart

# Quick test
openclaw chat --provider ollama "What is 2+2?"

If the response is coherent and fast, you're good. Try a multi-turn conversation to make sure context handling works.

How to Prevent It

Match expectations to hardware. A 7B model on a consumer GPU is impressive for what it is, but it won't match Claude Opus or GPT-5. Adjust your system prompt complexity and conversation length accordingly.
Monitor VRAM usage. Run nvidia-smi while chatting. If VRAM is maxed out, reduce num_ctx or use a smaller quantization.
Clear sessions regularly. Long conversations degrade quality with small models. Reset sessions when they get long.
Keep Ollama updated. ollama pull <model> also updates the model. Run it periodically.

The Easy Way

lobsterfarm is a managed hosting service for OpenClaw — deployment, updates, and support handled for you.

Get started with lobsterfarm →

Skip the setup. Start using your AI assistant today.

lobsterfarm gives you a fully managed OpenClaw instance — one click, your own server, running 24/7.

← All Guides lobsterfarm.ai