By Ray Awesome in AI Trends — 12 3월 2026

Claude's Usage Limits, Explained: Weekly Quotas, Extended Thinking, and the Opacity Engine Behind the Max Plan

In Feb. 2026, a prompt caching bug forced Anthropic to reset every Claude user's weekly limits — exposing its dual-cap token system's opacity. This piece dissects weekly quotas, Extended Thinking costs, Max plan capture, and Extra Usage pricing through legal and political economy lenses.

Diverse users work intensely around a glowing AI emblem, unaware that invisible token walls close in. ©RayLogue: AI-created image(Midjourney)

On February 27, 2026, Anthropic reset the weekly usage limits for every Claude Code user on the planet. A prompt caching bug had been burning through tokens at two to three times the normal rate — that much was officially confirmed. Even subscribers on the Max 20x plan, shelling out $200 a month, hit their ceiling within days. Some were locked out for an entire week. The apology came in the form of a reset. But what that reset revealed was far more structural than any apology could address.

A year ago, RayLogue published "The Truth About Claude's Usage Limits (2025)," calling out Anthropic's opaque usage policies. In 2026, those policies haven't become more transparent — they've become more sophisticated. A dual-limit system built on token consumption rather than message counts. Extended Thinking, a powerful but expensive new capability. And "Extra Usage," a pay-as-you-go layer stacked on top of your subscription. One question cuts through all of it: Who designs the price of an AI tool, and by what logic?

August 2025: The Rules of the Game Changed

The story begins on August 28, 2025. That's when Anthropic rolled out weekly rate limits across all paid plans — Pro and Max alike. On top of the existing five-hour rolling window, a second barrier emerged: a seven-day cap tracking total token consumption.

Anthropic's official explanation was straightforward. A small number of power users were running Claude Code "24 hours a day in the background," and policy violations like account sharing and reselling were straining system capacity. The restriction, they said, would affect less than 5 percent of all subscribers.

But that 5 percent figure creates a statistical illusion. Anthropic's denominator is all subscribers — a pool that includes casual users who log in once a month alongside full-time developers who live inside Claude Code. When you set the denominator that wide, any restriction looks like it only hits a fringe minority. Narrow the frame to power users who actually depend on AI as a daily work tool, and that 5 percent could easily exceed 50. Data from the South Korean outlet Sisajournal backs this up: 46 percent of Max $200 subscribers were using less than 20 percent of their allotted capacity. Flip that around, and 54 percent were using more — the exact cohort most likely to slam into the ceiling.

The deeper issue is structural. As models grow more powerful and Extended Thinking becomes a default workflow, token consumption will inevitably rise. Today's 5 percent is tomorrow's 15. Anthropic knows this trajectory. They published a snapshot, not a trend line.

This is the digital divide, remixed for the AI era. The old digital divide was about whether you could get online at all. The new one is about how deeply you can use the same tool. A $20 Pro subscriber and a $200 Max subscriber access the same models, but with a 20x gap in usage capacity. In an age where AI is the productivity engine, that gap translates directly into a productivity gap — and ultimately, an opportunity gap.

The Dual-Limit Architecture: Where the 5-Hour Window Meets the Weekly Quota

As of March 2026, Claude's usage limits operate on two simultaneous time axes.

The first is the five-hour rolling window. It tracks token consumption from the moment you send your first message. According to Anthropic's support documentation, this window doesn't reset at a fixed time — it slides continuously. As your oldest messages age past the five-hour mark, capacity frees up incrementally. Rough ceilings for this window: about 45 messages on Pro with Sonnet, 225 on Max 5x, and 900 on Max 20x.

The second is the weekly quota. It monitors total usage over a seven-day cycle. Exceed it, and you're locked out until the quota resets. The critical detail: Anthropic tracks tokens, not messages. A single message sent on the 50th turn of a conversation carries the entire conversation history, consuming orders of magnitude more tokens than the first message in a fresh thread.

The combined effect is anything but simple. You can have headroom in the five-hour window but hit the weekly quota, or vice versa. Users are forced to constantly estimate their own consumption against two invisible thresholds. There's no token-level usage dashboard. A warning appears as you approach the limit, but there's no way to see your current consumption or remaining balance in real time.

Can we call this a transparent policy?

The answer is obvious. The opacity isn't a bug — it's a feature. Unpredictable usage ceilings give Anthropic operational flexibility: the ability to dynamically adjust limits based on server load without any obligation to show users the math.

From a legal standpoint, this is a duty-of-disclosure problem. In a paid subscription contract, the service provider is expected to clearly communicate the core terms of the agreement. Operating a token-based dual-limit system without publishing the specific numerical thresholds brushes up against consumers' right to know — a principle enshrined in Korean consumer protection law, and one with analogs in virtually every jurisdiction. The civil law doctrine of good faith requires contracting parties to respect each other's reasonable expectations. Charging $20 or $200 a month while refusing to provide detailed usage breakdowns doesn't meet that standard.

The Extended Thinking Paradox: The Smarter It Gets, the Faster You Run Out

Extended Thinking, first introduced in February 2025 alongside Claude 3.7 Sonnet and later expanded to all models, allows Claude to run a lengthy internal reasoning process before generating a response. The quality leap for complex coding, mathematical proofs, and multi-step analysis is dramatic. According to Anthropic's API documentation, the thinking token budget starts at a minimum of 1,024 tokens, and Claude Opus 4.6 introduced an adaptive mode where the model autonomously decides how many thinking tokens to burn.

The problem is that every one of those thinking tokens counts against your usage limit. A single Extended Thinking query can consume the token equivalent of dozens of regular messages. Firing up /think high in Claude Code to make a complex architectural decision is powerful — but that one interaction can blow through a significant chunk of your weekly quota.

Here's the structural paradox. Anthropic markets Extended Thinking as an innovation in "deeper reasoning." But the cost of that innovation is deducted from the user's allowance. The smarter the model gets, the fewer times you can afford to use that intelligence. This isn't technological progress — it's a business model that locks technological progress behind a price gate.

There's a deeper philosophical layer. This is a textbook case of instrumental inversion. AI was supposed to be a tool that expands human thinking. But the token economics of Extended Thinking reverse that relationship. Users find themselves asking, "Is this question worth using Extended Thinking on?" every single time. The tool isn't expanding the user's cognition — the tool's cost structure is pre-censoring it. Technology becomes the master; the human petitions the tool for permission. The real paradox Extended Thinking introduced isn't about performance. It's about agency.

Anthropic's recommended workaround is revealing. "Use /think low or /think off for simple tasks, default to Sonnet, and save Opus for truly complex decisions." In other words: here's the most powerful model we've ever built — please don't use it freely. It's like a restaurant putting a tasting menu on the card and whispering, "We'd really recommend the lunch special."

December's Gift, January's Rage: The Cognitive Illusion of the Holiday Bonus

From December 25 to 31, 2025, Anthropic doubled everyone's usage limits as a holiday promotion. When limits snapped back to normal in January 2026, the reaction was volcanic. The Register reported that developers flooded Anthropic's Discord claiming "token usage limits dropped by roughly 60 percent," with some alleging their complaints were deleted by channel moderators. Anthropic countered that it was simply the expiration of a temporary bonus, not a reduction. Users didn't buy it.

GitHub Issue #17084 presented more specific evidence. After the weekly limit reset on January 8, 2026, Opus 4.5's usage ceiling reportedly fell to its lowest level since the model's November 2025 launch. Whether that was a residual effect of the holiday bonus or an actual downward adjustment is something only Anthropic knows for certain. And that's precisely the problem.

The holiday promotion was effective marketing, but it was poison for policy transparency. Users who experienced temporarily elevated limits internalized them as the new baseline. The reversion was numerically identical to the pre-holiday state, but psychologically it registered as a cut. Anthropic couldn't have been unaware of this cognitive mechanism. If they proceeded anyway, a reasonable inference is that the promotion's real purpose wasn't gratitude — it was upsell. Users frustrated by post-holiday limits were structurally more likely to upgrade to Max. The February prompt caching reset follows the same playbook: temporary boost, reversion, perceived loss. Each cycle ratchets up the psychological pressure to migrate to a more expensive plan.

Then came February. A race condition bug in Claude Code versions 2.1.59 through 2.1.61 broke prompt caching. A collision between Auto Memory and Context Compaction caused identical tasks to consume two to three times the normal tokens. Anthropic patched it in v2.1.62 and reset everyone's weekly limits.

Bugs happen. The quick fix and compensatory reset were appropriate. But the deeper issue this incident exposed is more fundamental: users had no way to tell whether their tokens were being consumed normally or abnormally. Without a usage dashboard, without a real-time interface showing remaining tokens, the difference between a bug-induced drain and legitimately heavy use is invisible to the end user.

The cost of that invisibility falls on users, not on Anthropic. During the days before the bug was identified, users who hit their limits were likely blaming their own habits — or weighing an upgrade to Max. A system flaw gets internalized as self-censorship. That's how an opaque usage policy actually works in practice.

The Max Plan: Solution or Structural Capture?

Anthropic's official answer to all of this is the Max plan. The $100/month 5x tier and $200/month 20x tier offer five and twenty times the Pro plan's usage, respectively. For Claude Code, Anthropic claims the Max $200 plan provides 240 to 480 hours per week on Sonnet 4, or 24 to 40 hours on Opus 4.

But the trap is in the baseline model. Those 240-to-480-hour numbers are for Sonnet. Switch to Opus, and you're down to 24 to 40 hours per week. The ceiling — 40 hours — is exactly one full-time work week. The floor — 24 hours — is three days. The width of that range is itself a problem. Users can't predict whether they'll land closer to 24 or 40. Turn on Extended Thinking, and the number shrinks further.

Then there's Extra Usage. Once you exceed your weekly limit, you can purchase additional tokens at standard API rates: $5 per million input tokens and $25 per million output tokens for Opus 4.6. This is a hybrid model — a subscription fee topped with metered overages — which means Max was never "unlimited." It's a higher base rate plus pay-per-use on top.

Anthropic's product lead Scott White, in a TechCrunch interview, responded to a question about a potential $500+ premium tier with: "We'll always keep a number of exploratory options available to us." Not a commitment, but not a denial. The non-answer itself signals that the price curve hasn't peaked.

From a political economy perspective, the combination of Max and Extra Usage is a mechanism of structural capture. The deeper users integrate Claude into their workflows, the higher their switching costs climb. Once your codebase is optimized for Claude Code, your prompt library is built up, and your team's habits are reorganized around Claude, you can't easily leave — even if you're unhappy with the limits. In that context, "buy more tokens at API rates" isn't liberation. It's deeper lock-in. The hybrid pricing model gradually converts subscribers from flat-rate customers into metered customers, and that conversion always points toward Anthropic's revenue maximization.

Zoom out further, and the price of AI tools isn't driven by users' willingness to pay — it's driven by the cost-pass-through structure of the GPU arms race. Anthropic, OpenAI, and Google are pouring staggering sums into securing Nvidia's H100 and H200 chips. The pressure to recoup those investments flows directly into end-user pricing. The political economy behind the Max plan is this: the cost of the GPU power struggle is being transferred to individual subscribers through subscription fees and token prices.

The Korean Community Reacts: 'I'm Paying $200 a Month for This?'

The response from South Korean user communities and media illustrates the structural problem at the level of lived experience. On Clien, one of Korea's major tech forums, critical discussions about Max plan policy changes have been ongoing. The complaint is specific: "I'm paying $200 — roughly 300,000 Korean won — and if I use it in short bursts throughout the day, I hit the limit by mid-month." News1 ran the headline: "'I paid and you're telling me I can only use it for 30 minutes' — Claude Token Controversy Tanks Anthropic's Trust."

The data is more damning than the anecdotes. According to Sisajournal's analysis, 26 percent of Max $200 subscribers were using only 0 to 10 percent of their allotted capacity, and another 20 percent sat in the 10 to 20 percent range. Nearly half of the highest-tier subscribers weren't utilizing even a fifth of what they were paying for. This isn't emotional hyperbole from a forum — it's a structural reality backed by numbers.

Some users have gone further, claiming that Pro plan ($20/month) limits have felt lower since the Max plan launched, raising suspicions of a deliberate upsell strategy. Anthropic denies it. But the gap between perceived limits and official limits is itself evidence of policy opacity. In a system where users can't verify their own experience against hard data, distrust doesn't require conspiracy — it's generated structurally.

Who Designs the Price of a Tool?

In last year's column, I wrote: "AI is our partner, not our master." That proposition still holds, but 2026 demands more complex questions.

Anthropic brands itself as the safe AI company. But safety and usage limits are different conversations. To be fair, Anthropic's technical constraints are real. The GPU cluster costs for running large language models are astronomical. Server loads spike unpredictably by time zone and usage pattern. Without dynamic throttling, the entire service could destabilize. Blocking 24-hour unattended runs and shutting down account sharing are reasonable measures for system integrity. The argument in this piece was never about whether limits should exist. It's about how those limits operate. Managing paid subscribers' usage opaquely while layering metered charges on top isn't a safety logic — it's a revenue logic. Technical constraints don't justify opacity.

The real issue isn't the absolute price level. Opus 4.6 is expensive to run — understood. The real issue is structural opacity. Users can't precisely measure how much they've used, can't predict how much they have left, and can't verify why they hit the wall when they do. The prompt caching bug proved that this opacity isn't just inconvenient — it can cause tangible harm.

Synthesize the layers of analysis in this piece, and a single architecture emerges. Legally, it's a subscription contract that fails basic disclosure obligations and good-faith principles. Philosophically, it's an instrumental inversion where the tool disciplines the user. Socially, a "less than 5 percent" statistical illusion masks a new digital divide forming along AI usage depth. And in political economy terms, the costs of the GPU arms race are being passed through to end users via a capture structure. Four layers of analysis converge on the same point: opacity isn't a UX problem. It's a structural design that cuts across legal, ethical, social, and economic dimensions.

What we need isn't just a "multi-AI strategy" — that's a personal-level response. At the structural level, what we need are transparency standards for AI usage policies. Real-time token balance displays. Detailed consumption logs. Advance notice of limit changes. None of this is technically impossible. AWS, GCP, and Azure have offered usage dashboards and billing alerts for years.

If AI is a tool that extends our thinking, then its price and limits become barriers that constrain it. The authority to set those barriers sits with a single company's infrastructure team, and the rationale behind their decisions isn't shared with users. That's the stubborn fact of the 2026 AI tool market.

And stubborn facts can't be ignored for interpretive convenience. On top of this fact, we can do two things. First, maintain the habit of thinking without the tool. The moment AI stops and your thinking stops with it, the instrumental inversion is complete. The ability to think without AI — to treat a rate limit not as anxiety but as a window for your own cognition — that's intellectual self-reliance in the age of AI. Second, demand transparency from the people who build the tools. Not as individual grievance, but as a fairness standard for digital services. Let's be honest. We can't afford to give up on either.

Claude's Usage Limits, Explained: Weekly Quotas, Extended Thinking, and the Opacity Engine Behind the Max Plan

August 2025: The Rules of the Game Changed

The Dual-Limit Architecture: Where the 5-Hour Window Meets the Weekly Quota

The Extended Thinking Paradox: The Smarter It Gets, the Faster You Run Out

December's Gift, January's Rage: The Cognitive Illusion of the Holiday Bonus

February's Bug Illuminates the System's Blind Spots

The Max Plan: Solution or Structural Capture?

The Korean Community Reacts: 'I'm Paying $200 a Month for This?'

Who Designs the Price of a Tool?

클로드 사용량 제한 정리: 주간 쿼터, 확장 사고, 맥스 플랜의 불투명한 구조

GEO의 역설: AI 시대 기업 뉴스룸, 열어야 하나 닫아야 하나

August 2025: The Rules of the Game Changed

The Dual-Limit Architecture: Where the 5-Hour Window Meets the Weekly Quota

The Extended Thinking Paradox: The Smarter It Gets, the Faster You Run Out

December's Gift, January's Rage: The Cognitive Illusion of the Holiday Bonus

February's Bug Illuminates the System's Blind Spots

The Max Plan: Solution or Structural Capture?

The Korean Community Reacts: 'I'm Paying $200 a Month for This?'

Who Designs the Price of a Tool?

클로드 사용량 제한 정리: 주간 쿼터, 확장 사고, 맥스 플랜의 불투명한 구조

GEO의 역설: AI 시대 기업 뉴스룸, 열어야 하나 닫아야 하나

You might also like...