troubleshooting openclaw ollama local-ai model-config context-window

Fix: OpenClaw Gives Bad Responses with Ollama (Local AI Acting Weird)

Your OpenClaw bot acts dumb with Ollama? It's not the bot — it's the model config. Here's how to fix context windows, model selection, and settings.

Fix: OpenClaw Gives Bad Responses with Ollama (Local AI Acting Weird)

TL;DR: Your local model's context window is probably too small, you're using a model that's too lightweight, or your config is wrong. Here's how to fix all three.

The Error

There's no single error message here. Instead, you get symptoms:

  • Bot gives nonsensical, unrelated, or repetitive answers
  • Bot ignores your system prompt / personality
  • Bot "forgets" what you said 2 messages ago
  • Bot responds with fragments or half-sentences
  • Bot works great for simple questions but falls apart in real conversations

As one GitHub user put it:

Why does my bot act like an idiot with Ollama?

Yeah. That was literally the issue title. (GitHub #4333)

Why This Happens

Local models through Ollama are fundamentally different from cloud APIs like Claude or GPT-5. Three things usually go wrong:

1. Context window is too small

OpenClaw sends the entire conversation history (system prompt + all messages) with each request. Cloud models handle 128K-200K tokens easily. Most local models default to 2048-4096 tokens — which fills up after just a few exchanges.

When the context overflows, the model either:

  • Ignores older messages (including your system prompt)
  • Generates garbage because it can't "see" the full conversation
  • Repeats itself because it's lost track of what was already said

2. Model is too small

Running a 3B or 7B parameter model and expecting Claude-level intelligence is like putting a bicycle engine in a truck. Small models are great for simple Q&A but fall apart with:

  • Complex instructions
  • Multi-turn conversations
  • Tool use
  • Maintaining a consistent persona

3. Config is wrong or missing

Ollama has specific settings that OpenClaw needs. Missing or wrong config means the model runs with bad defaults.

(GitHub #2425)

How to Fix It

Step 1: Choose the right model

Not all models are created equal. Here's a realistic guide:

Your Hardware Recommended Model VRAM Needed
8GB VRAM (RTX 3070) llama3.1:8b-instruct-q4_K_M ~5GB
12GB VRAM (RTX 3080) mistral-nemo:12b-instruct ~8GB
16GB VRAM (RTX 4080) llama3.1:70b-instruct-q4_K_M ~40GB*
24GB+ VRAM (RTX 4090) qwen2.5:72b-instruct-q4_K_M ~40GB*
CPU only (no GPU) llama3.2:3b N/A (slow)

*70B+ models need multiple GPUs or will offload to CPU (very slow).

Pull your model:

ollama pull llama3.1:8b-instruct-q4_K_M

Rule of thumb: Use the largest instruct/chat model your hardware can run at reasonable speed. If it takes more than 5 seconds per response, go smaller.

Step 2: Increase the context window

This is the #1 fix for "dumb" behavior. Tell Ollama to use a larger context:

# Create a custom model with more context
cat > ~/Modelfile << 'EOF'
FROM llama3.1:8b-instruct-q4_K_M
PARAMETER num_ctx 8192
EOF

ollama create llama3.1-8k -f ~/Modelfile

For even longer conversations:

cat > ~/Modelfile << 'EOF'
FROM llama3.1:8b-instruct-q4_K_M
PARAMETER num_ctx 16384
EOF

ollama create llama3.1-16k -f ~/Modelfile

Warning: Larger context windows use more VRAM. If you set num_ctx too high, Ollama will either crash or fall back to CPU (which is painfully slow). Start with 8192 and increase if your GPU can handle it.

Step 3: Configure OpenClaw correctly

Here's a config that actually works:

{
  "providers": {
    "ollama": {
      "type": "ollama",
      "baseUrl": "http://localhost:11434",
      "model": "llama3.1-8k",
      "options": {
        "temperature": 0.7,
        "num_ctx": 8192,
        "num_predict": 1024,
        "top_p": 0.9,
        "repeat_penalty": 1.1
      }
    }
  },
  "ai": {
    "provider": "ollama",
    "maxTokens": 1024,
    "contextStrategy": "trim"
  }
}

Key settings:

  • num_ctx: Must match what you set in the Modelfile (or Ollama ignores it)
  • num_predict: Max tokens per response. Don't set this too high or the model will ramble
  • repeat_penalty: Set to 1.1 to reduce repetitive responses (a common issue with small models)
  • contextStrategy: "trim": Tells OpenClaw to trim old messages when approaching the context limit, instead of sending everything and hoping for the best

Step 4: Simplify your system prompt

Cloud models can handle a 2000-word system prompt with detailed personality, rules, and examples. Local models can't — that system prompt eats your context window.

Keep your system prompt under 500 tokens for 8B models:

You are a helpful AI assistant. Be concise and clear in your responses.

vs. the 2000-token personality document you'd use with Claude. Save the detailed stuff for bigger models.

Step 5: Test it

openclaw gateway restart

# Quick test
openclaw chat --provider ollama "What is 2+2?"

If the response is coherent and fast, you're good. Try a multi-turn conversation to make sure context handling works.

How to Prevent It

  • Match expectations to hardware. A 7B model on a consumer GPU is impressive for what it is, but it won't match Claude Opus or GPT-5. Adjust your system prompt complexity and conversation length accordingly.
  • Monitor VRAM usage. Run nvidia-smi while chatting. If VRAM is maxed out, reduce num_ctx or use a smaller quantization.
  • Clear sessions regularly. Long conversations degrade quality with small models. Reset sessions when they get long.
  • Keep Ollama updated. ollama pull <model> also updates the model. Run it periodically.

The Easy Way

lobsterfarm is a managed hosting service for OpenClaw — deployment, updates, and support handled for you.

Get started with lobsterfarm →

Skip the setup. Start using your AI assistant today.

lobsterfarm gives you a fully managed OpenClaw instance — one click, your own server, running 24/7.