troubleshooting openclaw context-window tokens compaction tool-output

Fix: Context Overflow & Token Burn from Large Tool Outputs in OpenClaw

OpenClaw burning through tokens or crashing from context overflow? Large tool outputs are bloating your session. Here's how compaction works and when to reset.

Fix: Context Overflow & Token Burn from Large Tool Outputs in OpenClaw

TL;DR: Large tool outputs (web scrapes, file reads, API responses) bloat your context window, burning tokens and degrading responses. Manage tool output size, use compaction, and reset sessions when needed.

The Error

Error: context_length_exceeded — This model's maximum context length is 200000 tokens. 
However, your messages resulted in 234,521 tokens.

You might also see:

Error: Request too large — reduce the number of messages or tool outputs
Warning: Context approaching limit (187,432 / 200,000 tokens). Compaction triggered.

Or the subtler version — no error at all, but:

  • Responses get slower and more expensive
  • The AI starts "forgetting" earlier parts of the conversation
  • Your API bill spikes for no obvious reason
  • Responses become generic or ignore your instructions

Why This Happens

Every message in an OpenClaw session is part of the context — the full conversation history sent to the AI model with each request. This includes:

  • Your system prompt (personality, instructions)
  • All user messages
  • All AI responses
  • All tool outputs — and this is where things blow up

When the AI uses tools (browsing a webpage, reading a file, calling an API, searching your emails), the tool's output is added to the context. A single web scrape can be 10,000-50,000 tokens. Read a large file? That's thousands more. And it compounds:

Turn 1: "Summarize this webpage" → +15,000 tokens (page content)
Turn 2: "Now check this other page" → +22,000 tokens
Turn 3: "Compare them" → both pages are still in context = 37,000+ tokens
Turn 4: "Also check my email" → +8,000 tokens
...you're at 80,000 tokens and climbing

Each subsequent message sends ALL of that back to the API. You're paying for the full context every single turn.

(GitHub #1594, #8196)

How to Fix It

Step 1: Understand what's eating your tokens

Check your current session size:

openclaw session info

Look for the token count. If it's above 100,000, you've got bloat.

You can also check the logs to see which tool calls are generating the most output:

openclaw logs --tail 200 | grep -i "tool\|tokens\|context"

Step 2: Reset the session (immediate relief)

If your context is already overflowing, the fastest fix is a clean slate:

openclaw session reset

Or for a specific channel/conversation:

openclaw session reset --channel telegram
openclaw session reset --channel telegram --user 123456789

This clears the conversation history. The AI "forgets" the current conversation but will work normally again.

Step 3: Configure compaction

Compaction is OpenClaw's built-in mechanism for managing context size. When the context approaches the model's limit, compaction summarizes older messages to free up space.

{
  "ai": {
    "compaction": {
      "enabled": true,
      "threshold": 0.75,
      "strategy": "summarize",
      "preserveSystemPrompt": true,
      "preserveRecentMessages": 10
    }
  }
}
  • threshold: 0.75 — Trigger compaction when context reaches 75% of the model's limit
  • strategy: "summarize" — Summarize old messages instead of deleting them
  • preserveRecentMessages: 10 — Always keep the last 10 messages intact

Important: Compaction itself costs tokens (it asks the model to summarize the conversation). But it's much cheaper than sending a 200K-token context with every message.

Step 4: Limit tool output size

Prevent the bloat in the first place by capping how much data tools can inject into context:

{
  "ai": {
    "maxToolOutputTokens": 4000,
    "truncateToolOutput": true
  }
}

This caps each tool's output to ~4,000 tokens. If a web scrape returns 50,000 tokens, only the first 4,000 are kept in context.

Step 5: Use smarter prompting

Instead of asking the AI to "read this webpage" (which dumps the entire page into context), ask it to extract specific information:

Bad (dumps everything into context):

Read https://example.com/long-report and tell me about it

Better (AI extracts only what's needed):

From https://example.com/long-report, extract only the key statistics and conclusions

The AI will still read the full page, but if configured correctly, it'll only keep the relevant parts in its response (which is what stays in context for future turns).

Step 6: Monitor token usage

Add token tracking to spot problems early:

# Check recent API costs
openclaw usage --last 24h

# Check per-session token counts
openclaw session list --verbose

If you see a session with 150K+ tokens, it's time for a reset or compaction review.

When Compaction Fails

Sometimes compaction doesn't help:

  • Tool outputs that can't be summarized — binary data, code snippets, structured JSON. Compaction tries to summarize these and produces garbage.
  • Compaction loops — the context is so large that the compaction call itself exceeds the limit. You're stuck and need a manual reset.
  • Critical context gets lost — compaction summarizes away details the AI needed. The conversation quality drops after compaction.

If you're hitting these edge cases, session resets are your friend. Think of conversations as disposable — start a new session for each distinct task instead of running everything in one never-ending thread.

How to Prevent It

  • Start new sessions for new tasks. Don't use one session for "summarize a webpage, then check my email, then write code." Each task should be its own session.
  • Set maxToolOutputTokens from day one. 4,000-8,000 tokens is a good range. You rarely need more than that in context.
  • Prefer concise tools. If you have a choice between a tool that returns raw HTML and one that returns extracted text, use the extracted version.
  • Set a compaction threshold of 0.6-0.75. Lower thresholds mean more frequent compaction, but you'll never hit the hard limit.
  • Monitor your API spending. A sudden spike in token usage almost always means context bloat.

The Easy Way

lobsterfarm is a managed hosting service for OpenClaw — deployment, updates, and support handled for you.

Get started with lobsterfarm →

Skip the setup. Start using your AI assistant today.

lobsterfarm gives you a fully managed OpenClaw instance — one click, your own server, running 24/7.