ai pricing api costs claude pricing openai pricing tokens byo key cost comparison

The Complete Guide to AI API Pricing in 2026

Every major AI API's pricing in one place. Claude, GPT, Gemini — input, output, cached tokens, real monthly costs, and why BYO-key beats subscription models.

February 3, 2026

The Complete Guide to AI API Pricing in 2026

TL;DR: AI APIs charge per token (roughly per word). Costs range from $0.10 to $75 per million tokens depending on the model. Most people spend $5-50/month. Cached tokens are 90% cheaper and change the math dramatically. Bringing your own API key is almost always better than paying a subscription markup.

How AI API Pricing Works

Every AI provider charges the same basic way:

Input tokens: What you send to the model (your message + conversation history + system prompt)
Output tokens: What the model generates (the response)
Output is always more expensive than input (3-5x typically)
Cached input tokens: Repeated parts of your input (system prompt, memory) that the provider has seen recently — dramatically cheaper

Prices are quoted per million tokens. One million tokens is roughly 750,000 words, or about 2,500 pages of text.

Claude (Anthropic) — 2026 Pricing

Model	Input	Output	Cached Input	Context Window
Haiku 3.5	$0.80	$4.00	$0.08	200K
Sonnet 4	$3.00	$15.00	$0.30	200K
Opus 4	$15.00	$75.00	$1.50	200K

Prices per million tokens

Haiku is the budget workhorse. Fast, cheap, surprisingly capable for everyday tasks. Great for background jobs, simple questions, and high-volume use cases.

Sonnet is the sweet spot. Best balance of quality, speed, and cost. This is what most people should use as their daily driver.

Opus is the heavy hitter. Best reasoning, best writing quality, but 5x the cost of Sonnet. Use it when you need the absolute best output — complex analysis, nuanced writing, difficult code.

Extended Thinking (Opus/Sonnet)

Both Opus and Sonnet support extended thinking — the model "thinks" before responding, producing better results for complex problems. Thinking tokens count as output tokens, which means complex reasoning tasks cost more.

OpenAI — 2026 Pricing

Model	Input	Output	Cached Input	Context Window
GPT-4o Mini	$0.15	$0.60	$0.075	128K
GPT-4o	$2.50	$10.00	$1.25	128K
GPT-4.1	$2.00	$8.00	$0.50	1M
GPT-4.1 Mini	$0.40	$1.60	$0.10	1M
GPT-4.1 Nano	$0.10	$0.40	$0.025	1M
o3	$2.00	$8.00	$0.50	200K
o3 Mini	$1.10	$4.40	$0.275	200K
o4-mini	$1.10	$4.40	$0.275	200K

Prices per million tokens

GPT-4.1 Nano is absurdly cheap — $0.10/million input tokens. For simple tasks, you can run hundreds of queries for pennies.

GPT-4o remains solid and well-rounded. The 128K context window is enough for most use cases.

GPT-4.1 gives you a 1M token context window at reasonable prices. Good for document-heavy workflows.

o3/o4-mini are OpenAI's reasoning models. They think through problems step-by-step, similar to Claude's extended thinking.

Google Gemini — 2026 Pricing

Model	Input	Output	Context Window
Gemini 2.0 Flash	$0.10	$0.40	1M
Gemini 2.5 Pro	$1.25-$2.50	$10.00-$15.00	1M

Prices per million tokens. Gemini 2.5 Pro pricing varies by prompt length.

Google's Gemini pricing is competitive, especially Flash. The massive context windows are the main draw.

Why Cached Tokens Matter (A Lot)

Here's the thing most people miss: cached token pricing changes the math entirely.

When you use an AI assistant like OpenClaw, every message sends the same system prompt, personality config (SOUL.md), and memory files. That's typically 3,000-10,000 tokens that are identical across every single API call.

Without caching, you pay full price for those tokens every time. With caching, you pay 10% (Anthropic) or 50% (OpenAI) of the regular price.

Real example: You send 50 messages in a day. Your system prompt is 5,000 tokens.

	Without Caching	With Caching (Claude)
System prompt cost (50 calls)	250K tokens × $3/M = $0.75	250K tokens × $0.30/M = $0.075
Savings	—	$0.675/day = ~$20/month

That's just the system prompt. Memory files, personality, and skill definitions all benefit from caching too. In practice, caching reduces total API costs by 30-50% for typical OpenClaw usage.

OpenClaw structures its prompts to maximize cache hits.

Real Monthly Cost Scenarios

Based on actual usage patterns, here's what people spend:

Light User ($5-15/month)

10-20 messages per day
Quick questions, short conversations
Model: Claude Haiku or GPT-4o Mini
No heavy background tasks
Typical: $0.15-0.50/day

Moderate User ($20-50/month)

30-60 messages per day
Mix of quick chats and longer sessions
Model: Claude Sonnet or GPT-4o
Some cron jobs (email check, calendar)
Typical: $0.70-1.70/day

Heavy User ($50-100+/month)

80-150+ messages per day
Long sessions with code, documents, research
Model: Claude Sonnet, occasionally Opus
Active cron jobs, web browsing, file analysis
Typical: $1.70-3.50/day

Budget Optimized ($2-8/month)

Moderate message volume
Model: GPT-4.1 Nano or Claude Haiku
Compaction enabled, efficient cron scheduling
Proof that AI assistants don't have to be expensive

Cost Optimization Tips

1. Match the Model to the Task

Don't use Opus to check the weather. Don't use Haiku to write your business plan. Most AI assistant platforms (including OpenClaw) let you configure a default model and switch for specific tasks.

A good setup: Haiku for cron jobs and simple queries, Sonnet for daily conversation, Opus when you explicitly need the best reasoning.

2. Enable Compaction

Long conversations are expensive because you re-send the entire history with every message. Context compaction summarizes older messages, keeping costs linear instead of exponential.

3. Space Out Background Tasks

Cron jobs and heartbeat checks add up. Running email checks every 15 minutes instead of every hour costs 4x more. Find the right frequency — most people don't need real-time monitoring.

4. Start Fresh Sessions

After 30+ exchanges, the context is getting large and expensive. Starting a new conversation resets the token cost while preserving persistent memory.

5. Set Billing Alerts

Both Anthropic and OpenAI let you set monthly budget limits:

Anthropic: console.anthropic.com → Settings → Billing
OpenAI: platform.openai.com → Settings → Limits

Set a limit below your comfort threshold. You'd rather your AI pause for a day than surprise you with a $200 bill.

BYO-Key vs Subscription Models: Why It Matters

Most AI products fall into two pricing categories:

Subscription models (ChatGPT Plus, Gemini Advanced): You pay $20/month flat. Simple, predictable. But you're paying whether you use it or not, and you hit usage limits during peak times.

BYO-key models (OpenClaw, lobsterfarm): You bring your own API key from Anthropic/OpenAI and pay exactly what you use. No markup on tokens, no usage limits beyond what the provider imposes.

Why BYO-Key Usually Wins

Light users save money: If you only use $5/month in API calls, you pay $5 — not $20
No artificial limits: Subscription models throttle heavy users. With your own API key, you get the full rate limits
Model flexibility: Use any model from any provider. Switch between Claude and GPT freely
Transparency: You see exactly what every message costs. No mystery pricing
No lock-in: Your API key works with any compatible tool, not just one product

The only time a subscription wins is if you use it exactly enough to justify the flat fee and don't mind the limitations. For most people who use AI seriously, BYO-key is cheaper and more flexible.

The Pricing Trend

AI API prices have dropped dramatically over the past two years and continue to fall. GPT-4 launched at $60/M output tokens in 2023. GPT-4o does the same quality work for $10/M. Haiku costs $0.80/M input — that's 75x cheaper than the original GPT-4.

This trend will continue. Expect prices to halve roughly every 12-18 months as hardware improves and competition intensifies. What costs $30/month today will likely cost $10-15/month in a year.

This is why BYO-key matters: you automatically benefit from price drops. Subscription models rarely reduce their fees when underlying costs decrease.

The Bottom Line

AI APIs are priced per token, with output costing 3-5x more than input. Cached tokens save 50-90%. Most people spend $5-50/month depending on model choice and usage patterns.

The key insight: you have more control over costs than you think. Model selection, compaction, conversation length, and cron frequency all significantly impact your bill.

lobsterfarm uses BYO-key pricing — you bring your Anthropic or OpenAI key, pay only what you use, and we optimize prompt caching and compaction to keep your costs as low as possible.

Get started with lobsterfarm → · Detailed cost breakdown → · Tips to reduce spend →

Skip the setup. Start using your AI assistant today.

lobsterfarm gives you a fully managed OpenClaw instance — one click, your own server, running 24/7.

← All Guides lobsterfarm.ai