Roman Peschke @roman.peschke
Free Guide

How to Stop Claude Opus 4.7 from Burning Your Tokens in Claude Code

Every change, the two features that eat tokens, and how the Claude Code team says to fix it.

Related video
15 Ways to Stop Burning Claude Code Tokens
Session management, context control, prompt efficiency, and CLAUDE.md optimization. The full breakdown.
Tools covered

Opus 4.7 is the most capable model in Claude Code today, but two changes silently increase your token usage. An updated tokenizer maps the same prompts to up to 35% more tokens. And the new default effort level (xhigh) jumped everyone up two levels of reasoning without asking. Here's every change, what actually burns tokens, and how to fix it.

What's new in Opus 4.7
1

Instruction following got way more literal

Opus 4.7 takes your instructions at face value. Where older models interpreted prompts loosely or skipped parts entirely, 4.7 follows them exactly. This means old prompts written for 4.6 can produce unexpected results if they were vaguely worded. Re-test any prompts you rely on.

2

3x better vision

The model now accepts images up to 2,576 pixels on the long edge (~3.75 megapixels), more than 3x the previous limit. Better for reading dense screenshots, extracting data from diagrams, and anything that needs pixel-level detail. Higher-res images consume more tokens, so downsample if you don't need the extra fidelity.

3

Better memory across sessions

Opus 4.7 is better at using file system-based memory. It remembers important notes across long, multi-session work and uses them to pick up new tasks with less context needed up front.

4

Response length scales to complexity

4.7 isn't as default-verbose as 4.6. You get shorter answers on simple lookups and longer ones on open-ended analysis. If you need a specific length or style, state it explicitly in your prompt. Positive examples of the voice you want work better than "Don't do this" instructions.

5

It calls tools less and reasons more

The model thinks before reaching for tools. This produces better results in many cases, but if you want more aggressive file reading or searching, tell it explicitly when and why to use tools.

6

Fewer subagents by default

Opus 4.7 is more selective about delegating to subagents. If your workflow benefits from parallel subagents (fanning out across files or independent items), spell that out in your prompt.

7

New /ultrareview command

A dedicated review session that reads through your changes and flags bugs and design issues a careful human reviewer would catch. Pro and Max Claude Code users get three free ultrareviews to try it out.

8

Auto mode expanded to Max users

A permissions option where Claude makes decisions without frequent check-ins. Toggle with Shift+Tab. Good for long-running tasks where you've given full context up front. Available in research preview for Claude Code Max users.

The two things that burn tokens
9

The updated tokenizer

Same pricing ($5 per million input tokens, $25 per million output tokens). But the same text now maps to 1.0-1.35x more tokens depending on content type. Your prompts didn't change, but they cost more to process.

10

xhigh effort is now the default

A new effort level between high and max, turned on for all plans automatically. If you had medium set before, you jumped two levels without doing anything. The model also thinks more at higher effort levels on later turns in longer sessions, so token usage compounds over time.

How to fix it
11

Start a new session for each task

Every user turn adds reasoning overhead. The model reasons more after each turn, especially at xhigh. Longer sessions compound the 35% tokenizer increase. Clean context per task = less waste.

12

Match effort to the task

low / medium Drafting a social caption. Summarizing meeting notes. Writing a product description. Cleaning up a CSV before import. Anything you could explain in one sentence.
high Writing a full blog post. Building a content calendar for the month. Turning a podcast transcript into a newsletter. Repurposing a webinar into social clips with scripts. Solid projects with a clear deliverable.
max Auditing your entire marketing stack for leaks. Rebuilding a broken attribution system across Meta, your CRM, and Stripe. Researching a new market and producing a full competitive analysis. Deep work where you don't know the answer yet and need Claude to go find it.

Drop a level from what you'd normally use. Low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6. Same quality, fewer tokens.

13

Specify the full task in turn 1

Treat Claude like an engineer you're delegating to, not a pair programmer you're guiding line by line. Include the intent, constraints, acceptance criteria, and relevant file locations up front. Ambiguous prompts spread across many turns reduce both token efficiency and output quality.

14

Use adaptive thinking prompts

Fixed thinking budgets are gone. The model now decides when to think deeply and when to respond quickly. You can steer it:

  • Want more thinking: "Think carefully and step-by-step before responding; this problem is harder than it looks."
  • Want less thinking: "Prioritize responding quickly rather than thinking deeply. When in doubt, respond directly."

You save tokens on the less-thinking side but may lose accuracy on harder steps.

15

Toggle effort mid-task

You can switch between effort levels during the same task. Use xhigh for the hard part (architecture, debugging, analysis), then drop to high or medium for cleanup and simple follow-ups.

Watch out for

X
Don't port old prompts without re-testing. 4.7 follows instructions so literally that loosely-written prompts break in new ways. What 4.6 interpreted loosely, 4.7 takes at face value.
X
Don't use max effort for everyday work. It overthinks and burns tokens with diminishing returns. xhigh is the sweet spot for most projects.
X
Don't assume Extended Thinking works the same. Fixed thinking budgets are not supported in 4.7. Adaptive thinking replaced it. The model decides when and how much to think based on context.
X
Higher-res images cost more tokens. 4.7 processes images at up to 3.75 megapixels by default. If you're sending screenshots and don't need pixel-level detail, downsample before sending.
Start here

Open Claude Code, check your effort level. If it's max, switch to xhigh. If you've been running one long session all day, start a fresh one for your next task. Two changes, instant savings.

Tools covered in this guide

Claude Code
Anthropic's official CLI for Claude. Available as a terminal CLI, desktop app (Mac/Windows), web app, and IDE extensions for VS Code and JetBrains.
claude.ai/code
Claude Opus 4.7
Anthropic's most capable generally available model. State-of-the-art on advanced software engineering, with particular gains on the most difficult tasks. Same pricing as Opus 4.6.
anthropic.com/news/claude-opus-4-7

FAQ

Does Claude Opus 4.7 cost more per token?
No. Same pricing as Opus 4.6: $5 per million input tokens and $25 per million output tokens. But the new tokenizer means the same text maps to up to 35% more tokens, so the effective cost per prompt can be higher.
What is xhigh effort in Claude Code?
A new effort level between high and max, introduced with Opus 4.7. It's the recommended default for most coding and agentic work. It provides strong autonomy and intelligence without the runaway token usage that max can produce on long runs.
Can I still use Extended Thinking with a fixed budget?
No. Opus 4.7 uses adaptive thinking only. The model decides when and how much to think based on context. You can prompt it to think more or less, but you can't set a fixed token budget for thinking.
Is Opus 4.7 better than Opus 4.6 at every effort level?
Yes. According to Anthropic's internal evals, Opus 4.7 outperforms 4.6 across the full effort range, with the largest gains at high and above. Low-effort 4.7 is roughly equivalent to medium-effort 4.6.