15 Ways to Stop Burning Claude Code Tokens
Cut your token usage by 10-30x with configuration and workflow changes. No extra tools. Just better habits.
People are burning through their Claude Code limits in under an hour. Max 20 users ($200/month) have reported hitting 100% in 70 minutes. Max 5 users are getting one hour of work before they're locked out. Anthropic acknowledged it publicly and called it their "top priority."
Some of that was bugs on their end (an autocompact loop was silently retrying thousands of times per session). They patched it. But the bigger problem is how most people use Claude Code. Bloated context, compounding chat history, unused MCP servers loading on every single message. That's all on you.
These 15 changes target the actual causes. Most take under a minute to implement.
Start fresh
Claude rereads your entire conversation history on every single message. Message 1 costs ~500 tokens. Message 30 costs ~15,000. The longer the chat, the more every message costs.
- Run
/clearbetween unrelated tasks - One conversation = one task
- If you need context from the last task, paste a summary
Kill unused MCPs
Every connected MCP server loads its full tool definitions into your context on every message, whether you use it or not. Unused servers = bloat on every interaction.
- Run
/mcpat the start of each session - Review what's connected, disconnect anything you're not using
- If a CLI exists for the same thing, use that instead
GitHub (~8K) + Playwright (~13K) + Gmail (~2.6K) = ~26,000 wasted tokens per message. Over 10 messages, that's 260,000 tokens gone.
Combine prompts
Three separate back-and-forth messages cost three times as much as one combined message because chat history compounds.
- Batch multiple requests into one prompt
- If Claude gets it wrong, edit your original message and regenerate
- Edits replace. Follow-ups stack.
Start with a plan
The fastest way to burn through a limit is letting Claude write hundreds of lines of code down the wrong path.
- Hit
Shift+Tabtwice to enter plan mode - Let Claude map the approach and ask questions before writing a single line
- Add explicit instructions to your CLAUDE.md
"Do not make any changes until you have 95% confidence in what you need to build. Ask me follow-up questions until you reach that confidence level."
/context and /cost
Token bleeds are invisible. You might be losing 50,000 tokens per message to a bloated CLAUDE.md, skills, memory, and loaded files without realizing it.
- Run
/contextin a fresh session to see what's eating your context window - Run
/costto track actual token usage and dollar spend
/context = full breakdown of what's loaded into your context window right now.
/cost = actual token usage and dollar spend for the current session.
Paste less
Pasting entire documents when Claude only needs a fraction is a massive waste. Every line gets reprocessed on every message.
- Before you drop a PDF, error log, or anything large, ask: "Does Claude need this entire thing, or just a section?"
Watch for 30 seconds
Sometimes Claude gets stuck in an internal loop. You send a prompt and switch tabs. Claude goes the wrong direction. You come back to AI slop.
- Watch Claude for 30 seconds after every prompt
- If it starts opening files you didn't expect, or re-reading things it already looked at, hit
Escapeand redirect
You ask Claude to fix copy on your landing page. It starts reading every subpage. Hit Escape. "No, just focus on /home."
Trim CLAUDE.md
CLAUDE.md gets loaded into context on every single message. A 1,000-line file with inline docs, style guides, and API references means 1,000 lines of tokens on every message.
- Keep CLAUDE.md under 200 lines
- Move detailed docs into separate files
- Use CLAUDE.md as an index that points to them
Instead of pasting your entire style guide inline, add one line:
references/sales-messaging.md -- Copy frameworks, pain points. Read when writing ads or emails.
Point, don't search
A vague prompt like "find the bug and fix pls" forces Claude to freely explore, open, and read dozens of irrelevant files, burning tokens the whole time.
- Be specific:
"Check verifyUser in src/auth.js, line 42." - Use
@filenameto reference specific files - If you know the function name, give it. If you know the line, give it.
Come up with 4 variations for our landing page headline, reference @offer.md and @customer-pain-points.md. Current headline is: "[HEADLINE]"
Compact at 60%
Claude's auto-compaction triggers at 95% capacity. By that point, the context is already suffering from "loss in the middle," where the AI ignores data buried in a massive context window.
- Run
/compactat 60% with specific instructions - After 3-4 compactions, the summary drifts. Ask for a session summary, run
/clear, paste into a fresh chat.
/compact Keep the final database schema and the updated user authentication logic
The 5-minute clear
Claude's prompt cache expires after ~5 minutes of inactivity. If you step away and come back, your next message reprocesses the entire session from scratch at full token cost.
- Don't start a large project and walk away
- Use
/compact, handoffs, or session summaries before leaving - When you sit back down, start a fresh session and use
@to reference the handoff
You need to make lunch. Tell Claude to make a detailed handoff summary. When you come back, start fresh and reference it with @.
Model = Task
Different models have wildly different token costs. Using the most expensive model for simple tasks drains usage for no reason.
- Switch models with
/model - Sonnet: 80% of coding work
- Haiku: Simple formatting and summaries
- Opus: Deep, complex architectural planning
You have a massive block of raw text that needs to be formatted into JSON. Switch to Haiku, format the text, switch back to Sonnet for coding.
Use sub-agents less
Agent workflows use 7-10x more tokens than single-agent sessions. Every sub-agent runs its own full context window as a separate instance. If it needs full context, it loads everything from scratch.
- Use sub-agents for one-off isolated tasks (research, file scanning, test runs) that can use Haiku
"If a task requires multi-file analysis, spawn a sub-agent using Haiku to do the reading, and return only the summarized insights to the main session."
Time your sessions
Anthropic throttles how fast your session limit drains based on global server demand. Peak hours (8 AM to 2 PM EST on weekdays) will drain your limit faster than off-peak hours.
- Save your most token-heavy tasks for afternoons, evenings, or weekends
You need to run a massive, repository-wide code refactor. Instead of doing it at 10 AM on a Tuesday, wait until 4 PM so your token limit stretches further.
Make CLAUDE.md learn
Your CLAUDE.md should contain stable decisions, architecture rules, and progress summaries. Think of it as the source of truth that makes every future prompt shorter.
- Build a dynamic constitution inside your CLAUDE.md
- Check weekly and trim anything that no longer applies
- Keep it under 200 lines
## Applied Learning
When something fails repeatedly, when I have to re-explain, or when a workaround is found, add a one-line bullet here. Keep each bullet under 15 words. No explanations.