System Prompts Don’t Get Shorter. They Shift. – Rajiv Shah

I keep hearing the model labs tell a story that as models get better, you need to give the model less direction, and the coding agent harnesses around them should get simpler.

I like that story, it sounds so intuitive. But like a good engineer, I also do not quite trust it.

So I pulled together a small visual experiment around Claude’s published system prompts. The original version looked at Opus 4 through Opus 4.7. I have now extended it to the latest Opus row I could verify, Claude Opus 4.8, plus Claude Fable 5.

The short version: some prompt complexity really does move into training. But the total system does not simply get smaller. Complexity moves into safety policy, tool discovery, product context, memory, skills, and app-specific harnesses.

The interactive

The chart below is the experiment. Click a model tab to see the category notes, or switch the chart from absolute words to percent share.

What changed

<span>Opus 4, May 2025</span>
<strong>1,714</strong>
<p>words in the first Claude 4 prompt snapshot in this comparison.</p>

<span>Opus 4.7, Apr 2026</span>
<strong>3,686</strong>
<p>words after safety, agentic behavior, and product scaffolding expanded.</p>

<span>Opus 4.8, May 2026</span>
<strong>2,888</strong>
<p>words in the latest Opus prompt body extracted from Anthropic's docs.</p>

The most interesting update is that the curve is no longer going up. For a while, Opus system prompts grew from 1,714 words to 3,686 words in less than a year. The newer Opus 4.8 and Fable 5 are using smaller system prompts. But digging into the content shows it has shifted.

The pieces that moved

The behavior patches are the easiest part to understand. Older prompts carried little runtime hacks: count letters carefully, restate puzzle constraints, do not over-apologize, avoid certain linguistic tics. Those are exactly the sort of instructions you would expect to move into training as the model gets better.

The structural parts are stickier. Safety instructions remain large. Product and model identity keep changing as the product surface changes. Tool discovery becomes explicit in Opus 4.8: before Claude says it cannot do something, it is told to check for deferred tools, personal context, and skill files.

This is Anthropic making product decisions. They are deciding what context and capabilities are visible, when to reveal them, and how Claude should behave when something might exist outside the visible prompt.

Why this matters

If you are building agents, you should not take the model labs at face value. You should measure what is actually changing.

Here we found that despite the notion of models needing less guidance, the labs are still using a lot of guidance. It just shows up differently.

Want to get hands-on?

If this framing is useful, the next step is to change a harness yourself and watch the trace move.

Start with my hands-on course, Learn Harness Engineering with OpenHands. It walks through Agent Server and Agent Canvas, then turns model routing, retrieval, memory, security, critic loops, and goal scaffolding into runnable projects.
The source repo is rajshah4/learn-openhands-harness, if you want to fork the exercises.
The broader companion repo is rajshah4/harness-engineering, which collects the prompt-evolution experiment, references, and other small harness investigations.
For the conceptual frame, the annotated talk is here: Harness Engineering: Why the System Around the Model Decides Agent Performance.

Data notes

Opus 4 through Opus 4.7 use Simon Willison’s simonw/research mirrors of Anthropic’s published system prompts.
Opus 4.8 and Fable 5 were extracted from Anthropic’s published system prompt page, with model context checked against Anthropic’s model overview and Opus 4.8 docs.
Word counts are approximate and categorized by primary function.