Context Window Limits

The hero shows a tube filling with tokens: safe (clear output), limit (quality drops), overflow (ignored instructions). Teams treat the context window like storage. It is working memory for one run — unevenly attended, expensive, and the first place early constraints fall off when crowded.

Mechanics companion — for vendor myths see Context Window Myths.

Northline B2B combined research, draft, and policy check in one call for support-reply-v3 until overflow dropped checker instructions on long tickets. Splitting into three steps with explicit budgets raised pass rate seven points and made finance’s token report legible per step — not one opaque blob.

Tokens are fuel, not strategy

A token is roughly a word fragment. Counting tokens helps finance and capacity planning. It does not tell you which paragraphs belong in a customer email draft. Strategy is: what is allowed, what is forbidden, what must be retrieved on demand instead of pasted — see Prompt Engineering vs AI Workflow Engineering when the gap is workflow design, not window size.

Teams that only monitor total token spend miss the operational signal: which step blew the budget and which instruction fell out of the window. Finance should see per-step lines on the canvas—research, draft, checker—not one lump sum that hides overflow in the checker call.

The three zones matter operationally

Context windows degrade unevenly. Early system instructions and edge-case rules drop first when you crowd the tube—while the model still produces fluent prose. That is why overflow feels like “the model ignored policy” even though the answer reads confidently.

Safe zone — room for task framing, policy slice, retrieved snippets, and output schema without crowding. Reviewers see stable citations and format adherence.

Limit zone — model still responds, but adherence softens. “Almost right” drift shows up in regulated fields first — dates, tiers, jurisdiction phrases.

Overflow zone — early instructions and edge constraints fall off first. Refund rules disappear while prose still sounds confident. This is the failure mode that fuels metaphors in Tokens as Fuel for AI Output — measure it with pack version hashes in logs.

Northline reproduced overflow in staging: a twelve-page ticket thread plus full policy pack pushed the checker step into overflow. The draft looked compliant; the checker never saw the VIP escalation clause. After split-call budgets, eval case #17 (long thread + VIP flag) became a permanent gate before prod promotion.

If your workflow “needs” thirty pages in one prompt, you need retrieval and summarization steps, not a bigger window. Oversized single-call context also drives context rot in multi-step agents—relevant facts dilute even when technically “in window.”

Split-call pattern — research, draft, checker

One job per call. Each step gets a token budget and a policy pack version — not a growing mega-message.

Step	Job	Max tokens (example)	Policy pack	Output
Research	Retrieve approved snippets	8,000	`retrieval-v2`	Chunk IDs + summary
Draft	Compose customer reply	12,000	`policy-v4` + research summary	Draft JSON
Checker	Verify tiers, dates, denials	6,000	`checker-rules-v1` + draft	Pass/fail + cites

Northline’s budgets are illustrative — your limits depend on model and schema. The discipline is fixed ceilings per step recorded on the AI Workflow Canvas, with eval cases that fail when any step exceeds budget and drops instructions.

Staying in the safe zone

Version policy packs; do not append ad hoc rules in chat. Measure quality vs token count on held-out cases — if pass rate falls while tokens rise, you are in limit or overflow territory. When humans must sign off, freeze context before review per Handoff Rules Between Humans and AI.

Windows sit inside architecture. Grounding AI Outputs unifies retrieval and verification; What Is Context Architecture explains layers; Context Rot covers when bigger windows degrade agent quality across many steps.

Context Window Limits

Tokens are fuel, not strategy

The three zones matter operationally

Split-call pattern — research, draft, checker

Staying in the safe zone

Move from pilot to program

Continue learning

Grounding AI Outputs

Evaluating Agents with CLEAR

RAG in Production