Roman Peschke @roman.peschke
Free Guide

15 Ways to Stop Burning Claude Code Tokens

Cut your token usage by 10-30x with configuration and workflow changes. No extra tools. Just better habits.

Tools you'll use

People are burning through their Claude Code limits in under an hour. Max 20 users ($200/month) have reported hitting 100% in 70 minutes. Max 5 users are getting one hour of work before they're locked out. Anthropic acknowledged it publicly and called it their "top priority."

Some of that was bugs on their end (an autocompact loop was silently retrying thousands of times per session). They patched it. But the bigger problem is how most people use Claude Code. Bloated context, compounding chat history, unused MCP servers loading on every single message. That's all on you.

These 15 changes target the actual causes. Most take under a minute to implement.

Cut the bloat
#1

Start fresh

Claude rereads your entire conversation history on every single message. Message 1 costs ~500 tokens. Message 30 costs ~15,000. The longer the chat, the more every message costs.

What to do
  • Run /clear between unrelated tasks
  • One conversation = one task
  • If you need context from the last task, paste a summary
Before
30 messages, 1 chat
15,000 tokens
After
30 messages, 3 chats
500 tokens each
#2

Kill unused MCPs

Every connected MCP server loads its full tool definitions into your context on every message, whether you use it or not. Unused servers = bloat on every interaction.

What to do
  • Run /mcp at the start of each session
  • Review what's connected, disconnect anything you're not using
  • If a CLI exists for the same thing, use that instead
Example

GitHub (~8K) + Playwright (~13K) + Gmail (~2.6K) = ~26,000 wasted tokens per message. Over 10 messages, that's 260,000 tokens gone.

#3

Combine prompts

Three separate back-and-forth messages cost three times as much as one combined message because chat history compounds.

What to do
  • Batch multiple requests into one prompt
  • If Claude gets it wrong, edit your original message and regenerate
  • Edits replace. Follow-ups stack.
3 messages
"Summarize this." 2K
"Extract issues." 4K
"Suggest fixes." 6K
Total: 12K tokens
1 message
"Summarize this log, extract the errors, and suggest a fix."
Total: 2K tokens
Control the context
#4

Start with a plan

The fastest way to burn through a limit is letting Claude write hundreds of lines of code down the wrong path.

What to do
  • Hit Shift+Tab twice to enter plan mode
  • Let Claude map the approach and ask questions before writing a single line
  • Add explicit instructions to your CLAUDE.md
Add this to CLAUDE.md

"Do not make any changes until you have 95% confidence in what you need to build. Ask me follow-up questions until you reach that confidence level."

#5

/context and /cost

Token bleeds are invisible. You might be losing 50,000 tokens per message to a bloated CLAUDE.md, skills, memory, and loaded files without realizing it.

What to do
  • Run /context in a fresh session to see what's eating your context window
  • Run /cost to track actual token usage and dollar spend
What they do

/context = full breakdown of what's loaded into your context window right now.

/cost = actual token usage and dollar spend for the current session.

#6

Paste less

Pasting entire documents when Claude only needs a fraction is a massive waste. Every line gets reprocessed on every message.

What to do
  • Before you drop a PDF, error log, or anything large, ask: "Does Claude need this entire thing, or just a section?"
500 lines pasted
500 lines x 30 messages
15,000 lines of token cost
20 lines pasted
20 lines x 30 messages
600 lines of token cost
Stay in control
#7

Watch for 30 seconds

Sometimes Claude gets stuck in an internal loop. You send a prompt and switch tabs. Claude goes the wrong direction. You come back to AI slop.

What to do
  • Watch Claude for 30 seconds after every prompt
  • If it starts opening files you didn't expect, or re-reading things it already looked at, hit Escape and redirect
Example

You ask Claude to fix copy on your landing page. It starts reading every subpage. Hit Escape. "No, just focus on /home."

#8

Trim CLAUDE.md

CLAUDE.md gets loaded into context on every single message. A 1,000-line file with inline docs, style guides, and API references means 1,000 lines of tokens on every message.

What to do
  • Keep CLAUDE.md under 200 lines
  • Move detailed docs into separate files
  • Use CLAUDE.md as an index that points to them
Example

Instead of pasting your entire style guide inline, add one line:

references/sales-messaging.md -- Copy frameworks, pain points. Read when writing ads or emails.

#9

Point, don't search

A vague prompt like "find the bug and fix pls" forces Claude to freely explore, open, and read dozens of irrelevant files, burning tokens the whole time.

What to do
  • Be specific: "Check verifyUser in src/auth.js, line 42."
  • Use @filename to reference specific files
  • If you know the function name, give it. If you know the line, give it.
Example

Come up with 4 variations for our landing page headline, reference @offer.md and @customer-pain-points.md. Current headline is: "[HEADLINE]"

Manage the session
#10

Compact at 60%

Claude's auto-compaction triggers at 95% capacity. By that point, the context is already suffering from "loss in the middle," where the AI ignores data buried in a massive context window.

What to do
  • Run /compact at 60% with specific instructions
  • After 3-4 compactions, the summary drifts. Ask for a session summary, run /clear, paste into a fresh chat.
Example

/compact Keep the final database schema and the updated user authentication logic

#11

The 5-minute clear

Claude's prompt cache expires after ~5 minutes of inactivity. If you step away and come back, your next message reprocesses the entire session from scratch at full token cost.

What to do
  • Don't start a large project and walk away
  • Use /compact, handoffs, or session summaries before leaving
  • When you sit back down, start a fresh session and use @ to reference the handoff
Example

You need to make lunch. Tell Claude to make a detailed handoff summary. When you come back, start fresh and reference it with @.

#12

Model = Task

Different models have wildly different token costs. Using the most expensive model for simple tasks drains usage for no reason.

What to do
  • Switch models with /model
  • Sonnet: 80% of coding work
  • Haiku: Simple formatting and summaries
  • Opus: Deep, complex architectural planning
Example

You have a massive block of raw text that needs to be formatted into JSON. Switch to Haiku, format the text, switch back to Sonnet for coding.

System-level savings
#13

Use sub-agents less

Agent workflows use 7-10x more tokens than single-agent sessions. Every sub-agent runs its own full context window as a separate instance. If it needs full context, it loads everything from scratch.

What to do
  • Use sub-agents for one-off isolated tasks (research, file scanning, test runs) that can use Haiku
Add this to CLAUDE.md

"If a task requires multi-file analysis, spawn a sub-agent using Haiku to do the reading, and return only the summarized insights to the main session."

#14

Time your sessions

Anthropic throttles how fast your session limit drains based on global server demand. Peak hours (8 AM to 2 PM EST on weekdays) will drain your limit faster than off-peak hours.

What to do
  • Save your most token-heavy tasks for afternoons, evenings, or weekends
Example

You need to run a massive, repository-wide code refactor. Instead of doing it at 10 AM on a Tuesday, wait until 4 PM so your token limit stretches further.

#15

Make CLAUDE.md learn

Your CLAUDE.md should contain stable decisions, architecture rules, and progress summaries. Think of it as the source of truth that makes every future prompt shorter.

What to do
  • Build a dynamic constitution inside your CLAUDE.md
  • Check weekly and trim anything that no longer applies
  • Keep it under 200 lines
Add this section to CLAUDE.md

## Applied Learning

When something fails repeatedly, when I have to re-explain, or when a workaround is found, add a one-line bullet here. Keep each bullet under 15 words. No explanations.

FAQ

Does /clear delete my CLAUDE.md or memory?
No. /clear only wipes your conversation history. Your CLAUDE.md, memory files, and project settings are untouched. It's safe to use between tasks.
What's the difference between /compact and /clear?
/compact summarizes your conversation into a shorter version and keeps going. /clear wipes the conversation entirely and starts fresh. Use /compact when you need to preserve context. Use /clear when you're switching tasks.
Do sub-agents share the same token limit?
Yes. Sub-agents draw from the same session budget as your main conversation. Each one spins up its own full context window, which is why they use 7-10x more tokens than doing the same work in a single session.
Does switching models reset my conversation?
No. Your conversation context carries over when you switch models with /model. You can switch to Haiku for a formatting task and back to Sonnet for coding without losing anything.