I started with Aider over a year and a half ago, when "agentic coding" wasn't a thing. You did everything manually: todos, subtasks, executing each step by hand. When something didn't work, the question was always: why? How do I make it work? Then Cursor. Now Claude Code (last 10 months). The hacks we were doing back then are button clicks today.
Every feature built into Claude Code today started as something the community figured out by hand. Plan mode, explore, subagents—all generalizations of what power users were already doing. The built-in versions are great defaults. But they're made to fit every codebase, which means they're optimized for none.
This is what I've learned being obsessed with AI coding (even before it actually worked). Not a tutorial. Those exist, and they're good. This is about understanding WHY those features work, so you can adapt them, customize them, or build your own when the defaults don't quite fit.
The goal is to avoid hell until you end up there anyway.
Generated Debt
https://preview.redd.it/hdzleraefoig1.jpg?width=1376&format=pjpg&auto=webp&s=7865c2d3aa3bc52ef14cb847dd35e0dc3968e6e8
As a dad of two, I've seen the IRL version of this once or twice. If you're an AI dad? You used to deal with this shit 20 times a day. But Opus grew up. GPT moved out from $HOME.
A year ago this was the default outcome of AI coding. "Keep it DRY" in the system prompt, five date formatters in the codebase three days later. The brilliant, hyperactive intern with amnesia. That version of the problem is mostly over. The models got better. Plan mode, explore, subagents, automatic codebase search. The tools catch most of it now.
The AI doesn't write five date formatters anymore. It writes clean code for the part it's looking at and misses the five other files that need to change too. It nails the 80% it can see and is blind to the 20% that requires knowing the whole codebase.
I recently refactored an entire file format across my codebase. 267 files, 20k lines. Planning took a day. Implementation took a day. The first day is why the second day worked. Without that research, the agent would have nailed the new parser and broken half the system.
The AI isn't a toddler with a firehose anymore. More like a coworker that just started. With a firehose. It can build. What it can't do on its own is see the forest, it's too busy grep-globbing the trees. Instead of destroying small things often, they destroy large things rarely. The old problems were obvious. These aren't.
Better prompts don't fix this. Better systems do, and the defaults keep getting better. But for hard codebases and complex tasks, you'll want your own.
Your Difficulty Setting
https://preview.redd.it/7aw1pfvnfoig1.jpg?width=5504&format=pjpg&auto=webp&s=dc84d09b959ddc08618bf3b8fc368b30489034c8
You don't get to choose your difficulty. Whether you tried AI coding yesterday and it didn't work, or you've been at it for months and think it always works. Eventually you hit the wall.
Custom codebase, not in the training data? That's harder. The AI has never seen your component library, your internal frameworks, your patterns. Everything has to be taught from scratch.
Big, complex codebase? Harder. More research before you start, more planning, more architectural decisions that the AI can't make for you.
Complex task? Harder. Better specs, more verification, more careful decisions. And if the task doesn't fit in a single context window, you need systems, or it's game over.
The more of these you're dealing with, the worse everything I just showed you gets. The AI doesn't just write duplicate code—it writes code that doesn't fit at all. It doesn't just go in circles—it confidently builds the wrong thing.
This isn't about tomorrow. It's about when shit hits the fan.
So how do you avoid going to hell? By building the machine before you need it. And if you're already there, stay a while and listen.
Why Now Is Different
https://preview.redd.it/duql18tpfoig1.jpg?width=1376&format=pjpg&auto=webp&s=5e6509b21a7d4c6dd198931b7c9008fd0f63e97c
Good news. The tools have caught up.
If you tried AI coding six months ago, or even three months ago, and it didn't work, I believe you. But what you tried is not what exists today.
Three things are compounding. It feels exponential, and maybe it is.
First, the models. The leap in the last year isn't 10% better. It's night and day. Things that were impossible are now routine.
Second, the tools. Cursor, Claude Code. They're shipping features every week. Their teams use their own products to build the products, so every pain point becomes a fix.
Third, the community is discovering what works, and the tools are absorbing it, fast. Planning mode was a manual workflow people invented. Now it's built in. Todo tracking? Same thing. These were manual workarounds six months ago. Now they're native features.
And this has changed how I work. Before, it was all about finding what the AI couldn't do, and building systems around those gaps. Breaking tasks down until it succeeded. Now I'm shifting to exploring what it can do, with minimal guidance. Right direction, right guardrails, then let go. The systems I'm showing you today are simpler than they would have been six months ago. You build the machine, then you learn to trust it.
One thing kept proving true: build for what AI can do in six months, not what it can do today. Every limitation I've built around has eventually disappeared, or become a button.
So I'm not asking you to white-knuckle through a bad experience. The experience is genuinely different now. And in six months, it'll be different again—better.
The Gardener
https://preview.redd.it/h5vt5gxrfoig1.jpg?width=1376&format=pjpg&auto=webp&s=fdc8a3d2e64544ee08081345deba80ef076aa491
This might be the new normal. Who knows?
When the AI makes a mistake, the first instinct is to just say "fix it." But a Gardener stops, opens their CLAUDE.md, and adds a rule. Or after adding a feature they update the relevant CLAUDE.md in that subdirectory. They fix the machine, not just the product.
Your most important code isn't the application logic—it's your instructions. CLAUDE.md, .cursorrules, whatever you use. That's the DNA that makes the AI generate YOUR code, not generic code.
The feedback loop: AI makes a mistake → you fix the code AND update the instructions. Next time, it doesn't make that mistake. If it does, you missed pulling the whole root.
Do this every day. After a month, you have an AI that knows your preferences, or your team's preferences, perfectly.
And in teams, this compounds. The AI makes a mistake, one person fixes the instruction. Everyone else avoids that mistake. Every error only happens once. In principle, at least.
You might feel like your job is becoming project management for AI. It's not. You're building the machine that builds. The hierarchy is shifting: architecture matters more than code, engineering matters more than coding. And that's exactly where your experience lives.
Universal Architect
https://preview.redd.it/3gwsuowsfoig1.jpg?width=1376&format=pjpg&auto=webp&s=c99f3c6f9d78c799b66af64d65ac0620f32325af
You don't need to know the syntax to taste that something is wrong.
We used to be defined by our stack. "I'm a Java dev." "I'm a React dev." That's probably over soon.
The AI handles the syntax layer perfectly. It can write Rust, Go, Python all day. What it can't do is the logic layer. It will happily write a mathematically correct function that creates a massive security hole or a subtle race condition.
Your value is no longer typing the brackets correctly. It's looking at AI-generated code and saying: "That logic isn't right. Here we need a reusable component. We need to rethink the architecture entirely." You don't need to know the syntax to see that. You need ten years of experience reading and writing code.
And here's the thing: it's much faster to read code than to write it. The AI writes, you review. Your job is to verify the logic, not check the spelling.
Personal example: I used to say I was a frontend developer. Eight months ago I started building pflow, a Python CLI. A language I barely know. A hundred thousand lines later (26k source code, 69k tests), I've never manually debugged Python. Not once. I understand the architecture, I guide the direction, I review the logic. When I needed debugging capabilities, I built a trace system that the agent could use to debug itself. 80 epics later, adding features actually works better now than when I started.
Your 10,000 hours aren't obsolete. They're what lets you taste that something is wrong. In any language.
As developers, we're no longer stuck always choosing what we know. We can choose the technology that best fits the problem. You'll always need deep experts. But this gives all developers capabilities that regular vibe coders—people who just accept whatever the AI outputs—can only dream of.
Code is Disposable
https://preview.redd.it/f5i4kl3ufoig1.jpg?width=1376&format=pjpg&auto=webp&s=3a881dfb8860c2842ee15a6cd1a6a53ef04215ab
CLAUDE.md got the rose. The JavaScript file is crying in the limo. They do that.
How many times have we spent hours polishing a turd that should have been deleted from the start? If the implementation is bad, delete it. Write tests first if you need to, then delete and regenerate.
The code is disposable now. But the context—the CLAUDE.md, the architectural rules, the specs—that's the real asset. That's what lets a project survive when the team changes.
We're moving from "code maintainers" to "context maintainers." If you're a consultant, this changes what you deliver. Not just code, but the system of instructions that lets the next team maintain and extend with AI. That's the real value now.
I've been saying "CLAUDE.md" as shorthand, but the full toolkit is richer. Tests that catch when the agent goes off track. Progress logs that survive context resets. Subagents specialized for your codebase. That's the toolkit.
Before We Continue
https://preview.redd.it/qiseinhvfoig1.png?width=1326&format=png&auto=webp&s=f74cddc874b14a26b3038d51670aacd1659c256a
That's the /context command in Claude Code. If you are looking at it often (and you should be), you've probably already created a context bar for the status line. Good!
All of this—the obsession with what's in the context window, the systems and habits to keep it clean—has a name. Most call it "context engineering." Forget "prompt engineering." Magic words don't matter. What matters is WHAT you give the AI to work with.
Now you know what you probably already knew. Welcome to hell.
Part 2: Your Survival Guide
The Toolkit
https://preview.redd.it/v87kh5amhoig1.jpg?width=1376&format=pjpg&auto=webp&s=aa3bcd85d281ebe272e4bf88e093bded6f0ccd8e
Conversations die. What survives is what you extract from the process and commit. Managing that is the game.
Even if the rest of your code is chaos, get this right, and at least it won't get worse. It's a floor, not a ceiling.
This is my structure. Yours will look different. And will eventually include Agent Skills and Hooks too. The point is to have one and to start somewhere.
---
pflow/
---
You don't need all of that on day one. Start with a CLAUDE.md. Add a tasks folder when you have specs to store. Build the rest as you discover what you need.
But if you don't tend the garden...
What Not to Do
https://preview.redd.it/w75gfhwthoig1.png?width=816&format=png&auto=webp&s=af7a932ef6314b2d16d47345c2de2d4bf4d1d986
...you get this instead.
Twenty markdown files nobody asked for. AI doesn't just generate code. It spits out documentation too.
The Gardener job from Part 1 applies here. Good documentation is compressed: right information about the right things. It helps humans AND agents navigate. This? This confuses everyone. Including the AI that wrote it.
Why Agents Derail
https://preview.redd.it/ejv6m6dwhoig1.jpg?width=1376&format=pjpg&auto=webp&s=18117b877b445d65fb52e85dc51695f96a95f768
The robot on the right is at 15% effectiveness. Context window is 85% full. We've all been that robot.
Think of every 10% of context as a shot of Jägermeister. AI "researchers" call this "The Dumb Zone," that threshold where the agent stops being helpful and spewing legacy code at light speed.
This is how we all use AI coding at first. You have a conversation, maybe a long one, where you figured out what to build. You explored options, rejected some ideas, clarified requirements. Then you asked it to build. And it either wrote code that completely didn't fit your codebase, or it went in circles trying to fix its own bugs.
Sound familiar?
You might be thinking: "Yeah, I've heard you shouldn't implement two features in one chat." True. But this is different. This isn't about building multiple features. It's about trying to EXPLORE and BUILD a single feature in one conversation.
Here's what actually went wrong: you had ONE conversation trying to do TWO jobs. First, figuring out what's even possible: does that API exist? What are the constraints? What should we even build? That part can be messy, full of brilliant AND idiotic ideas. Then you tried to BUILD in that same conversation.
The AI is now confused. Like having 47 browser tabs open and wondering why your computer is slow. "But I'm not USING them!" Doesn't matter. They're taking memory.
And it gets worse. Those rejected ideas don't disappear. They exert gravitational pull on everything the agent generates afterward. Plus: as context fills up, effectiveness drops. A year ago this was dramatic. Models got noticeably worse after 20k tokens. Today it's more subtle, but still real. And for complex features, you're more likely to run out of space entirely. That's the "game over" scenario we'll cover later.
Simple rule to start with: check how much of the context window you've used before building. If it's more than 50%, split it into two conversations. Explore first, then build with a fresh agent.
The 50% threshold isn't magic. Dexhorthy draws the line at 40%. If you're religiously curating your context and everything in there is highly relevant, you might push to 60-70% with a strict plan. But less is always better. When in doubt, split earlier.
Claude Code added automatic plan -> implementation split two weeks ago. The tools are always catching up. Next week, Cursor might have it too.
Explore, Then Build
https://preview.redd.it/h8f0ziyxhoig1.jpg?width=1376&format=pjpg&auto=webp&s=f34a3175992dab7454b6b6d84202063dbb4ea788
Pirates documented everything. They had to. Otherwise the next crew couldn't find the treasure.
Many call this Research, Plan, Implement (RPI). It works.
First conversation: explore and argue until you both understand what needs to be built. This SHOULD be messy. You're figuring things out. Rejected ideas are part of the process. The output is a spec: what you're building, destination and constraints.
Second conversation (if needed): fresh agent, clean context, full capacity. It's not carrying your exploratory baggage. It reads the spec and proves it understood. Then it writes the plan.
Then STOP. Spec and plan are files in your codebase now. This is your checkpoint. Your save point before the boss fight.
Then it builds.
The documents are the answer. They let knowledge survive the transition from messy exploration to clean implementation. And they join your Context Stack, persistent artifacts that make the next task easier.
The Research Phase
https://preview.redd.it/samsr2azhoig1.jpg?width=1376&format=pjpg&auto=webp&s=7231a57375e318e7a04e54a1c100774b7bfc3eb7
Phase 1 is the investigation. You're not giving orders—you're having a discussion.
"Does that API actually exist?" "Can we modify that without breaking tests?" "What are our options here?"
That last question matters. "What are our options?" forces the agent (and you) to consider alternatives instead of tunnel-visioning on the first solution. You might find a better path.
Use subagents to verify things against the actual codebase, "Verify all your assumptions", in parallel if your tool supports it. For large codebases you can run 8+ subagents simultaneously. Eight times faster.
Before the agent writes the spec: have it explain what you're building. Research is messy, there is decisions, tangents, changed directions. The agent might have lost the big picture. A quick "describe what the end result looks like" makes sure the spec it writes reflects what you actually agreed on, not just the last thing you discussed.
During the conversation, you can be abstract. That's fine for exploration. But the spec the agent writes must be grounded. Real file names, real function names, real patterns from your codebase. "Integrate with AuthService" not "integrate with the auth system." Abstract language leads to hallucination. Let the agent explore the codebase thoroughly, and the grounding happens automatically.
When you've explored enough, the agent writes the spec. Your job is making sure it captures what you agreed on.
What Matters Most
https://preview.redd.it/q6bfokh0ioig1.png?width=733&format=png&auto=webp&s=93f46f49f4a85aa1ddf3cecdae38d92f47e9c0b2
Seven subagents, all searching simultaneously. Only thing missing is the propellers.
One bad line of code is one bad line of code. That's how we work today. One bad line in the plan becomes 10-100 bad lines of code. But one bad critical line of research thats potentially game over before you begin.
That's why we spend time here. The research phase isn't just about making the agent smarter. It's about making YOU understand the problem better. You're not outsourcing the thinking—you're doing the thinking together.
Spec and Plan
https://preview.redd.it/e4oe2gr2ioig1.jpg?width=1376&format=pjpg&auto=webp&s=30cbb46b4cff24525be20ab04738f406470ad889
The baby's asleep because the spec said so. Works every time. Almost.
The spec is where you're going and what matters. The plan is how you get there. That's the agent's job.
But first, the fresh agent reads the spec and looks for gaps. What does it need to assume about the codebase that the spec doesn't say? Have it surface those assumptions and verify them against the actual code. You were there for the research, so you don't notice what's missing from the document. A cold reader does.
Then: "Describe what this looks like when it's done." If the agent can articulate the destination, the spec did its job. If it can't, you caught a misunderstanding before a single line of code.
Then the agent writes the plan. Not reads a pre-written one — writes its own. Following a pre-written plan is like a GPS: turn left, turn right, arrive. If one step is wrong, it won't notice because it never understood the destination.
Writing the plan is forced chain-of-thought, it's "show your work," not "trust this." Each step constrains the next. And checking whether the agent understood is faster than reading 500 lines — with experience, you can tell in seconds whether it's aligned. The plan takes up the same context whether the agent reads it or writes it. But only one version proves understanding. I might be anthropomorphizing but it works.
You verify the spec. The plan? You might not even need to read it. The spec doesn't always need to be a formal document. A task description or a bug report + a braindump from the research agent works too.
Loose and RIGHT beats specific and WRONG. A vague spec that captures the actual goal is better than a detailed spec that precisely describes the wrong thing. Get the destination right. The route can adapt. Or watch the agent force a square peg into a round hole.
If something matters enough to care about, it belongs in the spec. "Baby must NOT wake up." That's a constraint. Tests belong in the spec too. They're acceptance criteria made executable. Everything else is the driver's problem.
During research, if you catch yourself worrying about HOW something will be built, ask: does that actually matter to the outcome? If yes, make sure it ends up as a constraint in the spec. If no, let it go. That's the agent's problem now.
This connects back to Part 1. Your value as a Universal Architect is in the big decisions: the constraints, the things that matter. The spec is where that expertise lives. The plan is implementation details.
Where you're going, that's what matters. The route? The driver handles it. But the driver repeats the destination first. You don't hand them a route and hope they understood where you're going. If it's important you DON'T take the highway, say it. Otherwise, lean back.
The Checkpoint
https://preview.redd.it/nx0vnn44ioig1.jpg?width=1376&format=pjpg&auto=webp&s=985401a555bcd15624420c7d523fda79ff42d558
STOP. Before you enter the final boss...
Before any code is written, tell the agent to stop, or hit escape twice. You've verified the spec. Created the plan. Now create your save point.
"Review" doesn't mean reading every line. If the fresh agent explained the goal back correctly and the plan makes sense, you're good. You don't need to proofread every line.
I'll be honest: I don't always read my specs in detail. If I did everything else right, it almost always works.
So what IS the checkpoint for? It's your recovery point. The files exist outside the conversation. They're persistent. If implementation goes to hell at 80% context, you reset here with a fresh agent and try again. Skip this step and everything only exists in the conversation. You can't recover.
When should you use a fresh agent versus continuing? Two things. One: was the research phase messy? Changed direction a lot, considered many options? Context is polluted. Fresh start. Two: what's your context budget? If implementation will be large and you've already used a lot, fresh start.
Unsure? Go fresh. The cost of an unnecessary handoff is a few minutes. The cost of a polluted agent can be hours.
One more thing: write the spec and plan even for simple tasks. It's like 500 tokens. Worth it every time for grounding and recovery.
Purge, Don't Correct
https://preview.redd.it/9od18s25ioig1.jpg?width=1376&format=pjpg&auto=webp&s=29a1e0bfef404d8216848b82f510e71d7a1d3c8e
The AI makes a mistake. You yell at it and explain more—at best. The AI responds "You are absolutely right..." You get even more annoyed.
Sound familiar?
Every time you explain, you add a sticky note. You don't remove anything. The misunderstanding is still there. You're just talking over it.
And it gets worse. The AI looks at the conversation history and thinks: "Last time I said something, I was told I was wrong. Next token is probably... me being wrong again." You're creating a negative spiral. The trajectory bends toward failure.
Simple question after each message: does this help going forward? Yes or no. If no, go back. Rewrite. Don't try to explain.
And if you see "You are absolutely right," that's the signal. Go back.
Here's what people miss: you found a bug and asked the agent to create a GitHub issue with the CLI. Perfect! It worked. But now bug details and CLI commands are sitting in the context. Does that help the main task? No. Go back.
Success isn't the criterion. Relevance is. A successful tangent is still a tangent.
In Claude Code: escape twice. Use it. In ChatGPT, the Edit message button has existed for over two years. Most underutilized feature in AI tools.
This is the tactical version of Part 1. In Part 1, you fix CLAUDE.md so the mistake never happens again. You're fixing the system. Here, you fix the conversation so the pollution doesn't derail THIS task. Same principle, different timescale.
With time, this becomes intuitive. You'll feel when an exchange doesn't serve the task.
Bonus: going back is the best way to learn prompting. You try again, see what works. The feedback loop is immediate.
Game Over
https://preview.redd.it/sqrgtr86ioig1.jpg?width=1376&format=pjpg&auto=webp&s=b864ef3b82aa28433e00c9141392bc5b1c9501ce
I've killed thousands of agents. But none of them died in vain.
Sooner or later you'll have a task that doesn't fit in one context window. The context fills up. What do you do?
Modern agents naturally break work into phases. Use that. After each phase: run tests, verify it works, BEFORE you move on. Those are your save points. Good place to commit too. External recovery points in case both context AND code go wrong.
Phases aren't just about surviving context limits. They make the whole workflow better: smaller diffs you can actually review, git commits at natural boundaries, verification that each piece works before you build on top of it.
The progress log is your flight recorder. It documents what was tried, what worked, what failed, any deviations from the plan. This is what lets the next agent continue where the last one stopped. Memory that survives the reset.
At 80% MAX: start a fresh agent. But reset to WHERE? You have options. If context is still clean, reset to the checkpoint after the last completed phase. If context got polluted with tangents and corrections, go all the way back to checkpoint zero: post-plan, pre-implementation. The fresh agent can also review git diffs to understand what changed.
The handoff: "Phases 1-3 complete. Read the progress log. Run tests to verify. Continue phase 4." Full capacity. The mission continues.
But without the checkpoint and progress log? Start from scratch, or struggle to get the next agent to understand what's been done.
Reset before you run out, not after. Proactive context management isn't optional.
There's a cost angle too. Every tool call includes your entire conversation as input. Running at 50k tokens versus 150k means every file read, every edit, every search is cheaper. Better output AND lower cost.
One warning: auto-compaction does not work for complex coding. The key here is keeping the full spec/plan in context when you continue, not a summarized version of it.
The End?
https://preview.redd.it/paujb8w8ioig1.jpg?width=1376&format=pjpg&auto=webp&s=d6ab78d1d9a0e709e938844695934aa1abfd9694
Each step earns trust for the next. Solid research? Trust the spec. Fresh agent gets it? Trust the plan. You're not micromanaging every line. You're verifying at key moments.
These are fundamentals. You've probably heard most of them before. The techniques will evolve. Fully autonomous agents are already here (almost), and they make these patterns more important, not less. When there's no human in the loop, the research, the spec, and the guardrails are all you've got. Get those wrong and nothing catches it.
And the next time Claude Code ships a button, you'll know the tradeoffs of pressing it.
---
Written together with AI, the same way I code with it. Not to go faster—to go better.