You're the GIL of your own agent fleet

Spawning AI agents costs a keystroke. Reviewing what they ship costs everything you've got. Addy Osmani argues you're a single-threaded resource bolted into a concurrent system, and grinding harder won't fix it. Architecture will.
X Post
Article
7 min read

Key Findings

  • The orchestration tax is structural, not a discipline failure: your cognitive bandwidth is single-threaded, and every agent's output must route through that one serial processor before it can ship.
  • Amdahl's Law applies directly to agentic workflows. Adding agents doesn't raise throughput when human judgment is the bottleneck; it only deepens the queue feeding into it.
  • The right agent count is set by how many you can actually code-review properly, not by what the tool allows. For most developers that's a low single digit.
  • Cognitive surrender is the hidden cost of an overloaded review step: you accept agent output not because it's correct but because forming your own judgment costs attention you no longer have.
  • Sorting tasks before delegation is the highest-leverage move: isolated, async-safe work can run in parallel with little lock contention, while judgment-heavy work should run serially or not at all.

Twenty agents on the dashboard feels like leverage. Everything moves, nothing sits idle, you're never bored. But your attention doesn't parallelize. Every agent's output still routes through one serial processor before it can ship: you. The gap between what your agents produce and what you can actually review, understand, and merge is the orchestration tax. The only way to shrink it is to architect your own attention the way you'd architect any concurrent system.

I was on a panel at Google I/O with Richard Seroter, Aja Hammerly, and Ciera Jaspan, talking about where software engineering is right now and where it's going. Near the end, Richard asked each of us for one thing developers should walk away and do differently. I said the thing I'd been circling for months. Feeling busy is not being productive. You can run 20 agents and feel completely slammed, and that is still not 20 agents' worth of shipped work.

Richard named the pattern in that same conversation. "You talked about the orchestration tax," he said. "You can't manage twenty agents successfully in your own brain." He's right, and I want to take it apart properly. This isn't a discipline problem. It's an architecture problem.

One line from the panel keeps coming back to me, something I said almost by accident: running multiple agents does not mean there is more of you.

Starting an agent is cheap, closing the loop isn't

Agentic workflows hide an asymmetry. Starting an agent costs a keystroke and a sentence. Closing the loop on one costs real work: someone has to check whether the output is correct and reconcile it with whatever the other agents touched. That someone is you. There is exactly one of you.

I wrote about a slice of this in an earlier post, "Your parallel agent limit," which was mostly about the ambient dread of not knowing which thread is quietly failing. This is about the shape underneath that dread. Once you see agent development as a concurrent system, the human stops being the orchestrator and becomes a component inside it.

The slow, serial component.

You're the GIL

If you've written concurrent code, you already have the right intuition. You've just been aiming it at the wrong part of the system.

Python has the Global Interpreter Lock. Spawn as many threads as you like, but only one executes bytecode at a time, because they all have to acquire the lock. You are the GIL of your AI agents. They can all run at once. The moment any of their work needs genuine architectural judgment or conflict resolution, that work has to acquire the lock. There's one lock. You're holding it.

Amdahl's Law makes the limit exact. The speedup from parallelizing is capped by the fraction of work that stays serial. If a big chunk of your pipeline can't be split, you hit a hard ceiling no matter how many cores you throw at it. In agent development, the serial fraction is judgment. Spawning eight agents does nothing for your judgment time. It just deepens the queue feeding into it.

Here's the old performance-engineering fact people still trip over: optimizing a non-bottleneck doesn't raise throughput. It grows the pile of unfinished work sitting in front of the bottleneck. Adding agents optimizes the part that was never the constraint. The constraint is the review step, and your system's throughput equals the throughput of that one step. Full stop.

The orchestration tax is the structural gap between what your agents produce and what you can merge. It's what you get when a single-threaded resource runs a concurrent system.

Grinding harder doesn't move a structural limit

I told the panel I've never felt more productive with my tools, and I've never been more tired. Both are true, and they share a cause.

The tiredness has a specific shape. It's what a serial processor running at 100% with zero slack feels like. Every time you return to an agent you've been away from, you pay a context-switch cost: flush your mental state, reload a cold context. CPUs do this in microseconds and architects still work hard to avoid it. You do it in minutes, and you never reload cleanly. Five agents isn't one workload done five times. It's five cold reloads plus a background process in your head tracking which agent needs you next.

You can't grind your way past a structural limit. The tax gets paid regardless. Push harder and it surfaces as shallow code reviews, or as what I'd call cognitive surrender: accepting the agent's output because forming your own opinion costs attention you no longer have. You pay deliberately, by designing around the limit, or you let it quietly erode your understanding of your own system.

Architect your attention like a bottleneck

Your attention is a scarce serial resource. You wouldn't design a distributed system without thinking hard about the bottleneck. Your brain deserves the same respect.

A few things that have held up for me:

Scale the fleet to your review rate, not the UI. A good concurrent system uses backpressure so the queue can't grow without bound. The producer slows to match the consumer. Your agent count is the producer; your review rate is the consumer. The right number of parallel agents is however many you can actually code-review properly. For most people that's a low single digit. The tool will happily spawn 20. That's a UI feature, not a throughput recommendation.

Sort the work before you delegate it. I keep two piles. One is isolated work I'm happy to hand to background agents in the cloud, async, needing me only at the final gate. The other is work where the judgment is the task: a weird bug, an architectural call, a refactor with non-obvious tradeoffs. The mistake is parallelizing the second pile. Running multiple complex tasks at once doesn't scale your output. It thrashes the lock and everything comes out worse.

Batch your reviews. Context switching costs you every single time. Reviewing four agents in one sitting costs far less than checking one, wandering off, and coming back cold. Give the agents a longer leash. Let work accumulate, then process the batch.

Spend the lock only on judgment. Don't burn attention on what the machine can verify itself. Have the agent write a passing test or generate a screenshot. Let it prove the straightforward 80% on its own, so your scarce focus lands on the 20% that genuinely needs a human.

Protect your serial time. The bottleneck needs your best hours, not the leftover minutes between check-ins. Sometimes the highest-leverage move is to stop orchestrating: close the laptop full of running agents and think hard about one problem with the lock held the whole time. Orchestrating is overhead. It isn't the work.

Aja made the point that architecture is the urgent skill right now, knowing what belongs inside one agent and what's too much for it. I'd add that you're a component in that system too. Your attention has a known, low serial throughput. The system either respects that number or routes around it by quietly lowering your standards.

The failure mode is invisible from the inside

Twenty running agents produces a sensation of enormous productivity. The dashboard is full, everything moves. That sensation is decoupled from actually shipping good code to main. You can be maximally busy and barely produce anything, and from the inside it feels identical to real work.

Ciera brought up Margaret-Anne Storey's research on debt. We talked about technical debt and cognitive debt at the same time. An unpaid orchestration tax is how you rack up both at once. You merge things you didn't read carefully. Your mental model of the codebase goes stale. None of this shows up on a dashboard. It shows up when production breaks, you open the system, and realize you no longer understand how it works.

Spawning agents isn't the skill. Anyone can run 20.

The skill is designing the system around the one resource that can't be cloned or parallelized. Treat your attention the way you'd treat anything else you depend on in production.

Create articles like this

Start Free →
Mr. Article

Share Article

Share this article with anyone. No login required to view.

Share via
or copy link directly
https://mrarticle.blog/shared/nPxwFpLhuzbJ8XP3N6CUoYI2Vz9j59R3

Anyone with this link can view a read-only version of this article.

Link copied to clipboard!