Agentic Engineering: How I Delegate Entire Dev Phases to Claude Code

Agentic Coding at the Phase Level: How I Delegate Entire Dev Phases to Claude Code
The first time a phase delegation collapsed on me, it happened quietly. Claude was mid-implementation, 40 minutes in, and I realized the output had nothing to do with the spec anymore. Not a hallucination. Not a bug. The spec had simply drifted out of the active context window. By the time the agent needed to reference it, the model was working from a degraded reconstruction of my original intent.
That's context rot. A compaction event where the LLM loses the spec as the session grows, and it's not a prompt engineering problem. It's a structural property of how large language models work. That incident changed how I architect my agentic coding systems entirely.
The Mental Model That Actually Works
Most developers using Claude Code are doing task-level delegation: "write this function", "fix this test", "refactor this component". Useful. But that's not agentic engineering.
The shift (and it took me longer than I'd like to admit) is from task to phase. A phase is a unit of work with a defined final state, machine-verifiable exit criteria, and a bounded context window. The developer's job becomes writing the brief. The agent's job is to own the phase.
Research on human supervision patterns in agentic systems has found something counterintuitive: the more complex the task, the less feasible step-by-step oversight becomes. Not because developers trust agents more on hard problems, but because supervising a complex phase at that granularity defeats the purpose of delegation. You can't watch every file write in a 40-step implementation. The system has to be designed so you don't need to.
That's the empirical case for phase-level ownership. Not a philosophy, just a consequence of how complex agentic work actually runs.
What a Phase Brief Actually Contains
A phase brief is not a Jira ticket. It's not a list of tasks. It's a description of the world after the phase completes, plus the conditions a machine can check to confirm it got there.
Four components, non-negotiable:
Final state description. What does the codebase look like when this phase is done? Not "implement auth", but "the auth module exports signIn, signOut, and refreshToken, all covered by integration tests hitting a real database."
Machine-verifiable exit criteria. TypeScript compiles. Tests pass. No lint errors. These aren't nice-to-haves, they're the agent's stop condition. Without them, the loop doesn't know when it's done.
Constraints. What the agent should explicitly not touch. File access scope. Which packages are already installed. What interfaces it must not break.
Access level. Read-only for exploration phases. Read-write for implementation. Orchestration agents receive structured reports from sub-agents, they don't write files directly.
In the NEXUS-Micro pipeline, this separation is explicit. PM agents produce a structured spec; Dev agents implement from that spec, scoped to specific modules; QA agents verify without touching what they're validating. Three roles, zero overlap.
The spec is what steers the agent. A tight spec runs cleanly. A vague one costs hours of back-and-forth, or worse, produces output that looks correct until it hits the edge case you forgot to define.
Structuring your own phase-delegation pipeline and hitting walls? I consult with engineering teams on agentic workflows that actually ship. The brief architecture is usually where it breaks. Let's talk.
The Three Failure Modes in Agentic Coding
Every phase delegation failure I've seen traces back to the brief, not the model.
Spec drift. The spec changes mid-implementation, a requirement gets clarified in conversation, an edge case surfaces, a dependency breaks, and there's no checkpoint to update the brief. The agent keeps running against a spec that no longer reflects intent. Fix: treat the brief as immutable within a phase. If the spec needs to change, stop the phase, update the brief, restart.
Context collapse. This is the context rot problem. Long sessions degrade the model's working memory of the spec. The solution is architectural: independent context windows per phase. Not a single session where you hand off the whole project, but a fresh context per phase, with the brief injected at the start. In my setup, the bash-based Ralph Loop approach works better than the plugin for exactly this reason. Each iteration gets a clean window. No session bleed. The plugin-based approach introduces session overhead that accumulates over long runs. The consensus landed on bash-based independent contexts as more reliable than "run forever in one session."
Missing exit criteria. The agent doesn't know when to stop, so it either terminates too early (partial implementation, no signal that anything is wrong) or loops indefinitely. Machine-verifiable criteria are the stop condition. If npm test passes and TypeScript compiles, the phase is done.
The Ralph Loop pattern formalizes all three: the Generator produces output, the Judge validates against explicit criteria, the loop terminates when criteria pass. The critical detail is that the Judge is external tooling, not a self-assessment prompt. An agent that checks its own work will find it acceptable. That's not a flaw in the model; it's a predictable property of any system asked to grade itself.
What Agentic Coding Actually Demands From You
The developer's daily output shifts almost entirely to brief writing. That sounds like a demotion. In practice, it's the opposite.
Writing a tight brief requires understanding the problem deeply enough to describe the final state precisely. You have to know what machine-verifiable criteria exist. You have to anticipate what the agent will need and what it should stay away from. You can't write a good phase brief for work you don't understand.
What goes away: mechanical implementation. Boilerplate. The fourth iteration of the same CRUD pattern. TypeScript generics you've written a hundred times. The agent owns that. You own the architecture of what gets built, the phase graph, the dependency order, the exit conditions.
The mistake I see teams make is trying to delegate phases without restructuring the project first. The repo structure determines most of the agent's effectiveness. Consistent naming, logical module boundaries, clean interfaces between packages, a CLAUDE.md that gives the agent its entry point to the codebase. A confused codebase produces confused output. No amount of brief writing fixes a repo where the agent keeps hitting dead ends and asking for constant clarification.
The work is upfront. Structure the codebase. Write the brief. Define the exit criteria. Then step back and let the phase run.
The Part Nobody Mentions
The shift from AI-assisted to AI-driven development isn't a tool upgrade. It's a restructuring of how a project is designed. The dependency graph between phases, the boundaries between what an agent can own and what requires a human decision, the criteria that terminate a phase, all of this has to be designed explicitly.
Most phase delegation failures aren't model failures. They're design failures. The brief was vague. The context window wasn't isolated. The QA agent was the same session as the implementation agent. The exit criteria were subjective.
The model is ready. The question is whether the pipeline is.
I'm opening a few consulting slots for teams ready to move from AI-assisted to AI-driven development. If you want to build the kind of pipeline described here, let's talk.
Learn the agentic coding workflow
I use in production
A text-based course on building AI coding workflows that actually work. From CLAUDE.md setup to full autonomous pipelines.
See the full curriculum