How to Stop Claude Opus 4.7 from Burning Your Tokens in Claude Code
Every change, the two features that eat tokens, and how the Claude Code team says to fix it.
Opus 4.7 is the most capable model in Claude Code today, but two changes silently increase your token usage. An updated tokenizer maps the same prompts to up to 35% more tokens. And the new default effort level (xhigh) jumped everyone up two levels of reasoning without asking. Here's every change, what actually burns tokens, and how to fix it.
Instruction following got way more literal
Opus 4.7 takes your instructions at face value. Where older models interpreted prompts loosely or skipped parts entirely, 4.7 follows them exactly. This means old prompts written for 4.6 can produce unexpected results if they were vaguely worded. Re-test any prompts you rely on.
3x better vision
The model now accepts images up to 2,576 pixels on the long edge (~3.75 megapixels), more than 3x the previous limit. Better for reading dense screenshots, extracting data from diagrams, and anything that needs pixel-level detail. Higher-res images consume more tokens, so downsample if you don't need the extra fidelity.
Better memory across sessions
Opus 4.7 is better at using file system-based memory. It remembers important notes across long, multi-session work and uses them to pick up new tasks with less context needed up front.
Response length scales to complexity
4.7 isn't as default-verbose as 4.6. You get shorter answers on simple lookups and longer ones on open-ended analysis. If you need a specific length or style, state it explicitly in your prompt. Positive examples of the voice you want work better than "Don't do this" instructions.
It calls tools less and reasons more
The model thinks before reaching for tools. This produces better results in many cases, but if you want more aggressive file reading or searching, tell it explicitly when and why to use tools.
Fewer subagents by default
Opus 4.7 is more selective about delegating to subagents. If your workflow benefits from parallel subagents (fanning out across files or independent items), spell that out in your prompt.
New /ultrareview command
A dedicated review session that reads through your changes and flags bugs and design issues a careful human reviewer would catch. Pro and Max Claude Code users get three free ultrareviews to try it out.
Auto mode expanded to Max users
A permissions option where Claude makes decisions without frequent check-ins. Toggle with Shift+Tab. Good for long-running tasks where you've given full context up front. Available in research preview for Claude Code Max users.
The updated tokenizer
Same pricing ($5 per million input tokens, $25 per million output tokens). But the same text now maps to 1.0-1.35x more tokens depending on content type. Your prompts didn't change, but they cost more to process.
xhigh effort is now the default
A new effort level between high and max, turned on for all plans automatically. If you had medium set before, you jumped two levels without doing anything. The model also thinks more at higher effort levels on later turns in longer sessions, so token usage compounds over time.
Start a new session for each task
Every user turn adds reasoning overhead. The model reasons more after each turn, especially at xhigh. Longer sessions compound the 35% tokenizer increase. Clean context per task = less waste.
Match effort to the task
| low / medium | Drafting a social caption. Summarizing meeting notes. Writing a product description. Cleaning up a CSV before import. Anything you could explain in one sentence. |
| high | Writing a full blog post. Building a content calendar for the month. Turning a podcast transcript into a newsletter. Repurposing a webinar into social clips with scripts. Solid projects with a clear deliverable. |
| xhigh | Building a landing page from a sales call transcript. Creating a multi-step onboarding automation. Analyzing a month of ad data and recommending budget shifts. Writing and deploying a lead magnet. Projects where Claude needs to read, think, and make judgment calls. |
| max | Auditing your entire marketing stack for leaks. Rebuilding a broken attribution system across Meta, your CRM, and Stripe. Researching a new market and producing a full competitive analysis. Deep work where you don't know the answer yet and need Claude to go find it. |
Drop a level from what you'd normally use. Low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6. Same quality, fewer tokens.
Specify the full task in turn 1
Treat Claude like an engineer you're delegating to, not a pair programmer you're guiding line by line. Include the intent, constraints, acceptance criteria, and relevant file locations up front. Ambiguous prompts spread across many turns reduce both token efficiency and output quality.
Use adaptive thinking prompts
Fixed thinking budgets are gone. The model now decides when to think deeply and when to respond quickly. You can steer it:
- Want more thinking: "Think carefully and step-by-step before responding; this problem is harder than it looks."
- Want less thinking: "Prioritize responding quickly rather than thinking deeply. When in doubt, respond directly."
You save tokens on the less-thinking side but may lose accuracy on harder steps.
Toggle effort mid-task
You can switch between effort levels during the same task. Use xhigh for the hard part (architecture, debugging, analysis), then drop to high or medium for cleanup and simple follow-ups.
Watch out for
Open Claude Code, check your effort level. If it's max, switch to xhigh. If you've been running one long session all day, start a fresh one for your next task. Two changes, instant savings.