Free Guide

15 Ways to Stop Burning Claude Code Tokens

Cut your token usage by 10-30x with configuration and workflow changes. No extra tools. Just better habits.

Tools you'll use

People are burning through their Claude Code limits in under an hour. Max 20 users ($200/month) have reported hitting 100% in 70 minutes. Max 5 users are getting one hour of work before they're locked out. Anthropic acknowledged it publicly and called it their "top priority."

Some of that was bugs on their end (an autocompact loop was silently retrying thousands of times per session). They patched it. But the bigger problem is how most people use Claude Code. Bloated context, compounding chat history, unused MCP servers loading on every single message. That's all on you.

These 15 changes target the actual causes. Most take under a minute to implement.

Cut the bloat

Start fresh

Claude rereads your entire conversation history on every single message. Message 1 costs ~500 tokens. Message 30 costs ~15,000. The longer the chat, the more every message costs.

What to do

Run /clear between unrelated tasks
One conversation = one task
If you need context from the last task, paste a summary

Before

30 messages, 1 chat

15,000 tokens

After

30 messages, 3 chats

500 tokens each

Kill unused MCPs

Every connected MCP server loads its full tool definitions into your context on every message, whether you use it or not. Unused servers = bloat on every interaction.

What to do

Run /mcp at the start of each session
Review what's connected, disconnect anything you're not using
If a CLI exists for the same thing, use that instead

Example

GitHub (~8K) + Playwright (~13K) + Gmail (~2.6K) = ~26,000 wasted tokens per message. Over 10 messages, that's 260,000 tokens gone.

Combine prompts

Three separate back-and-forth messages cost three times as much as one combined message because chat history compounds.

What to do

Batch multiple requests into one prompt
If Claude gets it wrong, edit your original message and regenerate
Edits replace. Follow-ups stack.

3 messages

"Summarize this." 2K

"Extract issues." 4K

"Suggest fixes." 6K

Total: 12K tokens

1 message

"Summarize this log, extract the errors, and suggest a fix."

Total: 2K tokens

Control the context

Start with a plan

The fastest way to burn through a limit is letting Claude write hundreds of lines of code down the wrong path.

What to do

Hit Shift+Tab twice to enter plan mode
Let Claude map the approach and ask questions before writing a single line
Add explicit instructions to your CLAUDE.md

Add this to CLAUDE.md

"Do not make any changes until you have 95% confidence in what you need to build. Ask me follow-up questions until you reach that confidence level."

/context and /cost

Token bleeds are invisible. You might be losing 50,000 tokens per message to a bloated CLAUDE.md, skills, memory, and loaded files without realizing it.

What to do

Run /context in a fresh session to see what's eating your context window
Run /cost to track actual token usage and dollar spend

What they do

/context = full breakdown of what's loaded into your context window right now.

/cost = actual token usage and dollar spend for the current session.

Paste less

Pasting entire documents when Claude only needs a fraction is a massive waste. Every line gets reprocessed on every message.

What to do

Before you drop a PDF, error log, or anything large, ask: "Does Claude need this entire thing, or just a section?"

500 lines pasted

500 lines x 30 messages

15,000 lines of token cost

20 lines pasted

20 lines x 30 messages

600 lines of token cost

Stay in control

Watch for 30 seconds

Sometimes Claude gets stuck in an internal loop. You send a prompt and switch tabs. Claude goes the wrong direction. You come back to AI slop.

What to do

Watch Claude for 30 seconds after every prompt
If it starts opening files you didn't expect, or re-reading things it already looked at, hit Escape and redirect

Example

You ask Claude to fix copy on your landing page. It starts reading every subpage. Hit Escape. "No, just focus on /home."

Trim CLAUDE.md

CLAUDE.md gets loaded into context on every single message. A 1,000-line file with inline docs, style guides, and API references means 1,000 lines of tokens on every message.

What to do

Keep CLAUDE.md under 200 lines
Move detailed docs into separate files
Use CLAUDE.md as an index that points to them

Example

Instead of pasting your entire style guide inline, add one line:

references/sales-messaging.md -- Copy frameworks, pain points. Read when writing ads or emails.

Point, don't search

A vague prompt like "find the bug and fix pls" forces Claude to freely explore, open, and read dozens of irrelevant files, burning tokens the whole time.

What to do

Be specific: "Check verifyUser in src/auth.js, line 42."
Use @filename to reference specific files
If you know the function name, give it. If you know the line, give it.

Example

Come up with 4 variations for our landing page headline, reference @offer.md and @customer-pain-points.md. Current headline is: "[HEADLINE]"

Manage the session

#10

Compact at 60%

Claude's auto-compaction triggers at 95% capacity. By that point, the context is already suffering from "loss in the middle," where the AI ignores data buried in a massive context window.

What to do

Run /compact at 60% with specific instructions
After 3-4 compactions, the summary drifts. Ask for a session summary, run /clear, paste into a fresh chat.

Example

/compact Keep the final database schema and the updated user authentication logic

#11

The 5-minute clear

Claude's prompt cache expires after ~5 minutes of inactivity. If you step away and come back, your next message reprocesses the entire session from scratch at full token cost.

What to do

Don't start a large project and walk away
Use /compact, handoffs, or session summaries before leaving
When you sit back down, start a fresh session and use @ to reference the handoff

Example

You need to make lunch. Tell Claude to make a detailed handoff summary. When you come back, start fresh and reference it with @.

#12

Model = Task

Different models have wildly different token costs. Using the most expensive model for simple tasks drains usage for no reason.

What to do

Switch models with /model
Sonnet: 80% of coding work
Haiku: Simple formatting and summaries
Opus: Deep, complex architectural planning

Example

You have a massive block of raw text that needs to be formatted into JSON. Switch to Haiku, format the text, switch back to Sonnet for coding.

System-level savings

#13

Use sub-agents less

Agent workflows use 7-10x more tokens than single-agent sessions. Every sub-agent runs its own full context window as a separate instance. If it needs full context, it loads everything from scratch.

What to do

Use sub-agents for one-off isolated tasks (research, file scanning, test runs) that can use Haiku

Add this to CLAUDE.md

"If a task requires multi-file analysis, spawn a sub-agent using Haiku to do the reading, and return only the summarized insights to the main session."

#14

Time your sessions

Anthropic throttles how fast your session limit drains based on global server demand. Peak hours (8 AM to 2 PM EST on weekdays) will drain your limit faster than off-peak hours.

What to do

Save your most token-heavy tasks for afternoons, evenings, or weekends

Example

You need to run a massive, repository-wide code refactor. Instead of doing it at 10 AM on a Tuesday, wait until 4 PM so your token limit stretches further.

#15

Make CLAUDE.md learn

Your CLAUDE.md should contain stable decisions, architecture rules, and progress summaries. Think of it as the source of truth that makes every future prompt shorter.

What to do

Build a dynamic constitution inside your CLAUDE.md
Check weekly and trim anything that no longer applies
Keep it under 200 lines

Add this section to CLAUDE.md

## Applied Learning

When something fails repeatedly, when I have to re-explain, or when a workaround is found, add a one-line bullet here. Keep each bullet under 15 words. No explanations.

FAQ

Does /clear delete my CLAUDE.md or memory?

No. /clear only wipes your conversation history. Your CLAUDE.md, memory files, and project settings are untouched. It's safe to use between tasks.

What's the difference between /compact and /clear?

/compact summarizes your conversation into a shorter version and keeps going. /clear wipes the conversation entirely and starts fresh. Use /compact when you need to preserve context. Use /clear when you're switching tasks.

Do sub-agents share the same token limit?

Yes. Sub-agents draw from the same session budget as your main conversation. Each one spins up its own full context window, which is why they use 7-10x more tokens than doing the same work in a single session.

Does switching models reset my conversation?

No. Your conversation context carries over when you switch models with /model. You can switch to Haiku for a formatting task and back to Sonnet for coding without losing anything.