The default pattern: same system prompt, same context payload, same tool definitions, every model.

That’s expensive on the cheap models and under-utilized on the expensive ones.

What I Started Doing

Scale the context depth with the model tier:

TierWhat gets injected
HaikuJust the request. No project context. No history. No file paths. The system prompt is one paragraph.
SonnetRequest + project name + current status file + the last session’s summary. About 2K tokens of context.
OpusFull workspace context. CLAUDE.md, MEMORY.md, current status, recent logs, relevant goals. About 8K tokens.

The shape is task-driven. Haiku gets a single-shot job that doesn’t need to know who the operator is. Opus gets a multi-turn task that needs to understand the workspace.

Why It Works

Haiku is fast and cheap. You want it to do one thing without thinking too hard about it. Adding 8K tokens of context to a “summarize this email” task is wasteful. The model doesn’t need to know what your goals are to summarize an email.

Opus is slower and more expensive. It’s worth the cost because it can hold complex context and make nuanced decisions. Giving it 8K tokens of project state pays off because it’ll use that context to choose better actions.

The wrong move is using one prompt and one context for both. You get expensive Haiku calls (paying for context the model can’t fully leverage) and under-informed Opus calls (the model is capable of more but you didn’t give it the runway).

How to Implement

Two small changes:

  1. Per-task model selection. Pick the model when you queue the task, not at runtime. For batch operations (enrich 50 contacts), use Haiku. For decision-making (which contacts should we follow up with), use Opus.
  2. Per-tier context bundle. Define three context “tiers” in your prompting layer: minimal, medium, full. Map each tier to a context-assembly function. Call the function before sending the request.
def context_for(tier: str) -> str:
    if tier == "minimal":
        return ""
    if tier == "medium":
        return f"Project: {project}\nStatus: {status}\n"
    if tier == "full":
        return load_full_workspace_context()

That’s it. The model selection determines the tier, the tier determines the context bundle.

What I’d Skip

A per-task complexity classifier. The GSD project uses one (rolling-history heuristic to pick light/standard/heavy). For most workspaces, the human knows which model fits the task. Just declare it. Don’t build a classifier until you have enough volume to make it worth the engineering.

The Larger Pattern

Cheap models exist because most work doesn’t need premium reasoning. Expensive models exist because some work does. Treating them as interchangeable wastes both. Match the context depth to the model’s capacity, and your cost curve flattens without losing quality on the work that matters.