Will agentic engineering kill deep work? (part 1)

15 May

Banner image showing a puppeteer with a grim face, wearing a turtleneck

Summary

There’s a pernicious belief that agentic engineering diminishes the importance of deep work, and that engineers and knowledge workers must now embrace a distracted, interrupted and context-switched workflow. This theory is flawed for many reasons, three of which I discuss in part one of a three-part blogpost series:

It doesn’t recognise the limitations of generative AI.
It falls prey to marketing hype.
It oversimplifies the marketed workflow.

Most executives won’t read my posts about executives. At least, the pointy-haired ones won’t. But I believe the world is also full of well-intentioned executives who don’t always have time to reflect on every issue that concerns their people. This series is about those executives and how they must think about agentic engineering and deep work.

In an earlier post, I noted the output-centric mindset, in which executives equate the work of different roles with the outputs they produce. This mindset leads us to kick AI-workslop cans from one person to another, with no time for deep work.

There’s another pernicious mindset that’s fast becoming pervasive as AI labs claim they’ve “solved” coding. If coding is solved, perhaps technologists must context-switch between agents all day and oversee their work. By association, this theory further implies that future builders must do less deep work and be ready for more interruptions as they orchestrate their agent army. If senior developers resist this idea, they’ll have to toe the line or find the exit. Some executives even believe their entire workforce, developer or not, must now be resilient to interruptions and be less precious about deep work.

I’m not a developer, but I am a technologist, and I find this push toward shallow work and context switching problematic on many levels. In this three-part series, I’d like to explain why I feel that way. Here are the first three flaws with the “death of deep work” theory.

1. It doesn’t recognise the limitations of generative AI

While generative AI uptake has been strong and the press has cheered them along, there has been no product-market fit to justify the huge capex from any of the labs. Adobe’s growth has slowed despite its Firefly push. OpenAI had to kill Sora and show ads in ChatGPT. Despite all the hype about Agentforce, Salesforce’s growth has declined from 19% in 2023 to 10% in 2026. Stochastic parrots sure aren’t making anyone profitable yet.

Meanwhile, it’s fair to say that modern coding harnesses have helped LLMs become effective at generating code. These coding advances are not due to LLM technology improving by orders of magnitude, but rather to a few reasons.

Coding is a constrained syntax activity, unlike writing prose. You can organise commands on a terminal, or code in an IDE, only in a finite set of ways. It’s much easier to post-train and fine-tune an LLM to produce code than to produce something as broad as spoken and written language.
Between open-source projects, GitHub repositories, vulnerability databases, and whatnot, there are many high-quality software engineering datasets for LLMs to train on. The richer the quality of data, the better the training for that specific task.
Coders build coding harnesses. They’re building a product for themselves. They understand patterns, anti-patterns, and the desired user experience, so the product evolves faster than other use cases that coders don’t understand as well.

Those three conditions aren’t true for any other use case yet. Now, finding product-market fit through coding is the ray of light at the end of the tunnel that the AI labs have been hoping for. That’s great! But we can’t extrapolate LLMs’ coding progress to other knowledge-work use cases and expect token-prediction machines to become sentient in the near future. Moreover, finding product-market fit doesn’t mean they’ve found profitability.

2. It falls prey to marketing hype

The labs have no path to profitability with their current pricing. Even after Anthropic and GitHub Copilot have executed their recent bait-and-switch manoeuvres, no lab is close to breaking even in the next three years. Only enterprises will pay for the most expensive plans or for API usage. On occasion, some intrepid consumers may buy an expensive plan, but you can’t rely on them to make trillions of dollars.

So, of course, the labs want enterprises to consume tokens. Lots of them. They’ve even said that quiet part aloud! Peter Steinberger, founder of OpenClaw and now an OpenAI hotshot, runs 100 Codex instances in the cloud, racking up a 1.3 million dollar monthly bill across three people! He claims to be on a quest to find out “how software would be built if token costs didn’t matter.” I find that quest to be rather self-serving, if you know what I mean.

It’s one thing if you’ve built an agentic workflow where token costs don’t matter. Perhaps you’re running a fine-tuned, open-source model on your infrastructure. However, most people use Claude, Cursor, GitHub Copilot, or Codex. If they’re following what Boris Cherny or Andrej Karpathy are saying, and basing their entire workflow on it, it’s like asking the crack dealer how often they must consume cocaine!

Even if you take the crack dealer’s advice, though, it’s worth paying attention to the details of what they’re saying.

3. It oversimplifies the marketed workflow

Boris Cherny and Andrej Karpathy have the same intent but different workflows. Let’s start by examining Karpathy’s workflow. He writes detailed specs, loads the context into the agent, delegates execution to the agent and then takes charge of the verification process. Before he signs off, he applies his judgment when inspecting diffs or looking for brittle abstractions. Andrej is as good an AI-booster as they make these days, and here’s what even he says,

"When you actually look at the code, sometimes I get a little bit of a heart attack because it's not super amazing code necessarily all the time. It's very bloated. There's a lot of copy-paste. There are awkward abstractions that are brittle. It works, but it's just really gross."

So yeah, judgment matters. And it’s not the abstract, pie-in-the-sky style judgment that we see in boardrooms, but the taste to spot bad design when you see it. I’ll return to the topic of judgment in part two of this series.

In the interest of completeness, let’s look at Boris Cherny’s workflow. You can call the steps: spec → plan → auto → verify.

Boris starts by writing a half-page spec and validating the plan with an agent, much like he’d delegate work to an engineer. Once he’s happy with the plan, he lets Claude work on autopilot without intervention. And before he verifies the outputs, he lets Anthropic’s managed code review handle each pull request. He examines the pre-triaged diff and, if Claude misses something, he adds rules so Claude doesn’t miss the same problem next time.

Boris caught people’s attention by claiming to ship 20-30 pull requests each day using this process. He has five such worktrees running in parallel, and he claims to round-robin through tabs, like a supervisor. Of course, such supervision needs context switching.

Now that you know the origin story of why some leaders expect future developers to be agent orchestrators, let’s find the deep work in this workflow.

Writing a spec is deep work. Karpathy spends more time on specs than Cherny.
Verification, too, is deep work. If software breaks in production, you can’t blame the AI. If a human must take the fall, a human must engage in some level of verification. If that verification means updating the user-side harness, then that’s deep work too.
Cherny engages with the AI to plan the work before sending it on autopilot. Delegation may not be cognitively demanding, but it requires attention to detail and focus.

By my estimates, even when you examine the most hyped up workflows for agentic engineering, deep work ranges from about 40% of the time in Cherny’s workflow to about 75% in Karpathy’s workflow. Yes, coding is moving to a higher level of abstraction. Yes, if you believe Cherny, he’s context-switching a lot. But it sure doesn’t sound like deep work is dead, does it?

And then, when you consider that the above advice comes from the boosters, shouldn’t you be sceptical about it?

Let’s imagine, though, that Boris and Andrej don’t want to sell you tokens. Let’s imagine that they have your best interests at heart. Even then, you’d like to know the other side of the story, won’t you? That’s what we’ll get into in the next post of this three-part series.

AIdeepworkleadership

Sumeet Moghe

Will agentic engineering kill deep work? (part 1)

1. It doesn’t recognise the limitations of generative AI

2. It falls prey to marketing hype

3. It oversimplifies the marketed workflow

Will agentic engineering kill deep work? (part 2)

The hidden labour behind a company podcast