The Complete Guide to AI API Pricing in 2026
Every major AI API's pricing in one place. Claude, GPT, Gemini — input, output, cached tokens, real monthly costs, and why BYO-key beats subscription models.
The Complete Guide to AI API Pricing in 2026
TL;DR: AI APIs charge per token (roughly per word). Costs range from $0.10 to $75 per million tokens depending on the model. Most people spend $5-50/month. Cached tokens are 90% cheaper and change the math dramatically. Bringing your own API key is almost always better than paying a subscription markup.
How AI API Pricing Works
Every AI provider charges the same basic way:
- Input tokens: What you send to the model (your message + conversation history + system prompt)
- Output tokens: What the model generates (the response)
- Output is always more expensive than input (3-5x typically)
- Cached input tokens: Repeated parts of your input (system prompt, memory) that the provider has seen recently — dramatically cheaper
Prices are quoted per million tokens. One million tokens is roughly 750,000 words, or about 2,500 pages of text.
Claude (Anthropic) — 2026 Pricing
| Model | Input | Output | Cached Input | Context Window |
|---|---|---|---|---|
| Haiku 3.5 | $0.80 | $4.00 | $0.08 | 200K |
| Sonnet 4 | $3.00 | $15.00 | $0.30 | 200K |
| Opus 4 | $15.00 | $75.00 | $1.50 | 200K |
Prices per million tokens
Haiku is the budget workhorse. Fast, cheap, surprisingly capable for everyday tasks. Great for background jobs, simple questions, and high-volume use cases.
Sonnet is the sweet spot. Best balance of quality, speed, and cost. This is what most people should use as their daily driver.
Opus is the heavy hitter. Best reasoning, best writing quality, but 5x the cost of Sonnet. Use it when you need the absolute best output — complex analysis, nuanced writing, difficult code.
Extended Thinking (Opus/Sonnet)
Both Opus and Sonnet support extended thinking — the model "thinks" before responding, producing better results for complex problems. Thinking tokens count as output tokens, which means complex reasoning tasks cost more.
OpenAI — 2026 Pricing
| Model | Input | Output | Cached Input | Context Window |
|---|---|---|---|---|
| GPT-4o Mini | $0.15 | $0.60 | $0.075 | 128K |
| GPT-4o | $2.50 | $10.00 | $1.25 | 128K |
| GPT-4.1 | $2.00 | $8.00 | $0.50 | 1M |
| GPT-4.1 Mini | $0.40 | $1.60 | $0.10 | 1M |
| GPT-4.1 Nano | $0.10 | $0.40 | $0.025 | 1M |
| o3 | $2.00 | $8.00 | $0.50 | 200K |
| o3 Mini | $1.10 | $4.40 | $0.275 | 200K |
| o4-mini | $1.10 | $4.40 | $0.275 | 200K |
Prices per million tokens
GPT-4.1 Nano is absurdly cheap — $0.10/million input tokens. For simple tasks, you can run hundreds of queries for pennies.
GPT-4o remains solid and well-rounded. The 128K context window is enough for most use cases.
GPT-4.1 gives you a 1M token context window at reasonable prices. Good for document-heavy workflows.
o3/o4-mini are OpenAI's reasoning models. They think through problems step-by-step, similar to Claude's extended thinking.
Google Gemini — 2026 Pricing
| Model | Input | Output | Context Window |
|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M |
| Gemini 2.5 Pro | $1.25-$2.50 | $10.00-$15.00 | 1M |
Prices per million tokens. Gemini 2.5 Pro pricing varies by prompt length.
Google's Gemini pricing is competitive, especially Flash. The massive context windows are the main draw.
Why Cached Tokens Matter (A Lot)
Here's the thing most people miss: cached token pricing changes the math entirely.
When you use an AI assistant like OpenClaw, every message sends the same system prompt, personality config (SOUL.md), and memory files. That's typically 3,000-10,000 tokens that are identical across every single API call.
Without caching, you pay full price for those tokens every time. With caching, you pay 10% (Anthropic) or 50% (OpenAI) of the regular price.
Real example: You send 50 messages in a day. Your system prompt is 5,000 tokens.
| Without Caching | With Caching (Claude) | |
|---|---|---|
| System prompt cost (50 calls) | 250K tokens × $3/M = $0.75 | 250K tokens × $0.30/M = $0.075 |
| Savings | — | $0.675/day = ~$20/month |
That's just the system prompt. Memory files, personality, and skill definitions all benefit from caching too. In practice, caching reduces total API costs by 30-50% for typical OpenClaw usage.
OpenClaw structures its prompts to maximize cache hits.
Real Monthly Cost Scenarios
Based on actual usage patterns, here's what people spend:
Light User ($5-15/month)
- 10-20 messages per day
- Quick questions, short conversations
- Model: Claude Haiku or GPT-4o Mini
- No heavy background tasks
- Typical: $0.15-0.50/day
Moderate User ($20-50/month)
- 30-60 messages per day
- Mix of quick chats and longer sessions
- Model: Claude Sonnet or GPT-4o
- Some cron jobs (email check, calendar)
- Typical: $0.70-1.70/day
Heavy User ($50-100+/month)
- 80-150+ messages per day
- Long sessions with code, documents, research
- Model: Claude Sonnet, occasionally Opus
- Active cron jobs, web browsing, file analysis
- Typical: $1.70-3.50/day
Budget Optimized ($2-8/month)
- Moderate message volume
- Model: GPT-4.1 Nano or Claude Haiku
- Compaction enabled, efficient cron scheduling
- Proof that AI assistants don't have to be expensive
Cost Optimization Tips
1. Match the Model to the Task
Don't use Opus to check the weather. Don't use Haiku to write your business plan. Most AI assistant platforms (including OpenClaw) let you configure a default model and switch for specific tasks.
A good setup: Haiku for cron jobs and simple queries, Sonnet for daily conversation, Opus when you explicitly need the best reasoning.
2. Enable Compaction
Long conversations are expensive because you re-send the entire history with every message. Context compaction summarizes older messages, keeping costs linear instead of exponential.
3. Space Out Background Tasks
Cron jobs and heartbeat checks add up. Running email checks every 15 minutes instead of every hour costs 4x more. Find the right frequency — most people don't need real-time monitoring.
4. Start Fresh Sessions
After 30+ exchanges, the context is getting large and expensive. Starting a new conversation resets the token cost while preserving persistent memory.
5. Set Billing Alerts
Both Anthropic and OpenAI let you set monthly budget limits:
- Anthropic: console.anthropic.com → Settings → Billing
- OpenAI: platform.openai.com → Settings → Limits
Set a limit below your comfort threshold. You'd rather your AI pause for a day than surprise you with a $200 bill.
BYO-Key vs Subscription Models: Why It Matters
Most AI products fall into two pricing categories:
Subscription models (ChatGPT Plus, Gemini Advanced): You pay $20/month flat. Simple, predictable. But you're paying whether you use it or not, and you hit usage limits during peak times.
BYO-key models (OpenClaw, lobsterfarm): You bring your own API key from Anthropic/OpenAI and pay exactly what you use. No markup on tokens, no usage limits beyond what the provider imposes.
Why BYO-Key Usually Wins
- Light users save money: If you only use $5/month in API calls, you pay $5 — not $20
- No artificial limits: Subscription models throttle heavy users. With your own API key, you get the full rate limits
- Model flexibility: Use any model from any provider. Switch between Claude and GPT freely
- Transparency: You see exactly what every message costs. No mystery pricing
- No lock-in: Your API key works with any compatible tool, not just one product
The only time a subscription wins is if you use it exactly enough to justify the flat fee and don't mind the limitations. For most people who use AI seriously, BYO-key is cheaper and more flexible.
The Pricing Trend
AI API prices have dropped dramatically over the past two years and continue to fall. GPT-4 launched at $60/M output tokens in 2023. GPT-4o does the same quality work for $10/M. Haiku costs $0.80/M input — that's 75x cheaper than the original GPT-4.
This trend will continue. Expect prices to halve roughly every 12-18 months as hardware improves and competition intensifies. What costs $30/month today will likely cost $10-15/month in a year.
This is why BYO-key matters: you automatically benefit from price drops. Subscription models rarely reduce their fees when underlying costs decrease.
The Bottom Line
AI APIs are priced per token, with output costing 3-5x more than input. Cached tokens save 50-90%. Most people spend $5-50/month depending on model choice and usage patterns.
The key insight: you have more control over costs than you think. Model selection, compaction, conversation length, and cron frequency all significantly impact your bill.
lobsterfarm uses BYO-key pricing — you bring your Anthropic or OpenAI key, pay only what you use, and we optimize prompt caching and compaction to keep your costs as low as possible.
Get started with lobsterfarm → · Detailed cost breakdown → · Tips to reduce spend →
Skip the setup. Start using your AI assistant today.
lobsterfarm gives you a fully managed OpenClaw instance — one click, your own server, running 24/7.