From Vibe Coding to Disciplined AI Development
None of this is new. Plan before you build. Test before you ship. Review before you merge. Engineers have known this for decades.
But it's not reflected enough in how our community actually uses AI tools. I see it every day. People skip the planning, skip the review, skip the tests, and let AI generate whatever it wants. Then they wonder why things break.
This article is my thinking on it. Not a research paper. Not a tutorial. Just what I've seen work and what I've seen fail.
And yes, if you're reading this, you're probably already doing it right. The people who need this most aren't reading blog posts about process. They're five prompts deep in a broken auth flow. But maybe you work with someone like that. Maybe you're hiring someone like that. Maybe this gives you language to explain why the fast way isn't always the fast way.
AI coding tools promised less grunt work. Somewhere along the way, "less work" became "less thinking."
That has a name now. Vibe coding. Andrej Karpathy coined it in February 2025. "Fully give in to the vibes, embrace exponentials, forget that the code even exists."
A grey literature review on arXiv surveyed how developers actually use these tools in practice. It's not rigorous proof of anything, but it maps the patterns: speed vs quality tradeoff, weak QA, skipped reviews. Worth reading if you want the full picture beyond opinions. TechCrunch reported that senior devs using these tools described themselves as "AI babysitters."
You've seen it. Maybe you've done it. Open Claude Code or Cursor. Type a prompt. Accept whatever comes out. Hope it deploys. When it breaks, paste the error back in. Do that five more times. Ship it.
The Problem
Vibe coding outsources the thinking. You become a typist. Tab, accept, google the error, tab again.
But AI doesn't know your business. It doesn't know your users. It doesn't know that one edge case that will page you at 3 AM. It generates code that looks right. Looking right and being right are different things.
The goal was never to think less. It was to type less and think more.
The Fix
Disciplined AI development. You stop treating AI like autocomplete. You start treating it like a fast, capable tool that has no judgment.
That's the key difference from the "junior dev" analogy everyone uses. A junior dev asks questions when something feels off. A junior dev has social pressure to not ship garbage. An LLM does neither. It will confidently generate a SQL injection vulnerability and move on to the next file. The failure modes are different. An LLM won't flag its own mistakes unless you build a process that forces the check.
So you build the process.
1. Know What You're Building
Write it down. If you can't explain it, you don't understand it.
"Export data as CSV" is not a spec. "Export filtered, paginated results as CSV with progress indication for up to 100k rows, handling special characters and large downloads" is.
AI can't close that gap. You can.
2. Plan First
All three major tools have plan mode now:
- Claude Code:
/plan. It reads your codebase, finds the right files, maps out the steps. - Cursor: Agent mode with planning. Maps changes across files before touching anything.
- GitHub Copilot: Coding agent plans multi-step tasks, creates branches, proposes implementations.
The AI makes a plan. You read it. You find the missing validation, the race condition, the query that won't scale. Then you fix the plan before a single line of code gets written.
This matters. A plan that's missing retry logic for flaky API calls will cause failed deployments. A plan that uses imagePullPolicy: Always when you're building images locally will crash every pod with ImagePullBackOff. You catch these things in review, not in production.
3. Poke Holes
Read the plan like you're reviewing someone else's PR.
- API goes down. What happens?
- User has 10 million records. Still work?
- Transaction boundary. Is there one?
- Permissions. Checked?
AI writes plausible code. Your job is to make it correct.
4. Test First
Clear inputs, clear outputs, known edge cases? Write the test. Tell the AI to make it pass.
"Hope this works" becomes "proven to work." That's a big difference when it's 2 AM and something breaks.
Write a rate limiter? Hit it with 65 requests. First 60 should return 200. Next 5 should return 429 with a Retry-After header. If it does, it works. If it doesn't, you know before your users do.
5. Smoke Test in a Real Browser
Unit tests check logic. They don't check if the button actually works.
Agents can drive Playwright now. Claude Code launches a browser, clicks through your app, takes screenshots, catches console errors. No manual QA for the happy path.
- Write the feature
- Run unit tests
- Agent opens a browser, walks through the flow
- Screenshots confirm it works
- Console errors caught before production
It doesn't just write code. It checks that the code works.
6. Then Build
Now you write code. But you have a spec, a reviewed plan, tests, and browser verification before you start. The AI handles boilerplate. You handle the parts that matter.
Superpowers
obra/superpowers is an open-source skills framework for Claude Code. It puts guardrails on this workflow:
- Brainstorming: Requirements and design before code. Back-and-forth until edge cases are covered.
- Writing Plans: Spec in, step-by-step plan out. File paths, code, test commands, commit points. A reviewer agent checks it before you start.
- Executing Plans: Step by step with checkpoints. Subagents for parallel work.
- TDD: RED-GREEN-REFACTOR. Failing test first. Implement. Clean up.
- Debugging: Investigate before fixing. No guessing.
- Code Review: Real review, not rubber stamping. The skill warns against blindly agreeing.
- Verification: Run the commands, check the output, then say it's done.
And it goes beyond code. Browser automation through Playwright. MCP servers connecting to databases, CMS platforms, APIs. The agent works across your whole stack, not just your editor.
The Difference
| Vibe Coding | Disciplined AI Development |
|---|---|
| Prompt, Accept, Hope | Plan, Review, Verify |
| "Make it work" | "Make it correct" |
| Fix after it breaks | Catch it before it ships |
| AI guesses | You specify |
| Debt piles up | Things last |
| Deploy and pray | Test then deploy |
When to Vibe Code (and When to Stop)
There are times for it. One-off scripts. Learning a new API. Generating test data.
Prototypes are where it gets dangerous. Every production disaster has an origin story, and a lot of them start with "it was just a prototype." The prototype works. Someone shows it to a stakeholder. Now it has a deadline. Now it's in production. Nobody rewrites it because it already "works."
Here's a simple rule: if it might still be running in 30 days, it's not a prototype. Treat it like production from the start, or accept the rewrite cost later.
Even TechCrunch's reporting shows senior devs still find value in vibe coding for scaffolding and early exploration. The future is hybrid. But the line between throwaway and durable needs to be drawn early, not after the thing is already in production.
The Bottom Line
AI isn't replacing developers. But it is accelerating us.
I'm not going to tell you "AI companies just want you to spend more tokens." Yes, they do. That's their business. But it's up to you how much value you get from those tokens.
These tools can generate large chunks of code fast. That's a fact. The question is what you do with that speed. Without control, it creates technical debt faster than any human ever could. With the right process, it builds production software in a fraction of the time.
Whoever uses these tools with bad habits will fail faster. Whoever uses them with discipline will succeed faster. Same tool. The difference is you.
Resources
- Andrej Karpathy on Vibe Coding - The post that started it
- Vibe Coding in Practice (arXiv) - Grey literature review on how developers actually use AI coding tools
- Senior Devs as AI Babysitters (TechCrunch) - Industry reality check
- Claude Code - Anthropic's agentic coding CLI
- obra/superpowers - Skills framework for structured AI development
- Playwright - Browser automation for testing
- Model Context Protocol - Standard for connecting agents to tools
- Cursor - AI code editor