From Vibe Coding to Disciplined AI Development

None of this is new. Plan before you build. Test before you ship. Review before you merge. Engineers have known this for decades.

But it's not reflected enough in how our community actually uses AI tools. I see it every day. People skip the planning, skip the review, skip the tests, and let AI generate whatever it wants. Then they wonder why things break.

This article is my thinking on it. Not a research paper. Not a tutorial. Just what I've seen work and what I've seen fail.

And yes, if you're reading this, you're probably already doing it right. The people who need this most aren't reading blog posts about process. They're five prompts deep in a broken auth flow. But maybe you work with someone like that. Maybe you're hiring someone like that. Maybe this gives you language to explain why the fast way isn't always the fast way.

AI coding tools promised less grunt work. Somewhere along the way, "less work" became "less thinking."

That has a name now. Vibe coding. Andrej Karpathy coined it in February 2025. "Fully give in to the vibes, embrace exponentials, forget that the code even exists."

A grey literature review on arXiv surveyed how developers actually use these tools in practice. It's not rigorous proof of anything, but it maps the patterns: speed vs quality tradeoff, weak QA, skipped reviews. Worth reading if you want the full picture beyond opinions. TechCrunch reported that senior devs using these tools described themselves as "AI babysitters."

You've seen it. Maybe you've done it. Open Claude Code or Cursor. Type a prompt. Accept whatever comes out. Hope it deploys. When it breaks, paste the error back in. Do that five more times. Ship it.

The Problem

Vibe coding outsources the thinking. You become a typist. Tab, accept, google the error, tab again.

But AI doesn't know your business. It doesn't know your users. It doesn't know that one edge case that will page you at 3 AM. It generates code that looks right. Looking right and being right are different things.

The goal was never to think less. It was to type less and think more.

The Fix

Disciplined AI development. You stop treating AI like autocomplete. You start treating it like a fast, capable tool that has no judgment.

That's the key difference from the "junior dev" analogy everyone uses. A junior dev asks questions when something feels off. A junior dev has social pressure to not ship garbage. An LLM does neither. It will confidently generate a SQL injection vulnerability and move on to the next file. The failure modes are different. An LLM won't flag its own mistakes unless you build a process that forces the check.

So you build the process.

1. Know What You're Building

Write it down. If you can't explain it, you don't understand it.

"Export data as CSV" is not a spec. "Export filtered, paginated results as CSV with progress indication for up to 100k rows, handling special characters and large downloads" is.

AI can't close that gap. You can.

2. Plan First

All three major tools have plan mode now:

Claude Code: /plan. It reads your codebase, finds the right files, maps out the steps.
Cursor: Agent mode with planning. Maps changes across files before touching anything.
GitHub Copilot: Coding agent plans multi-step tasks, creates branches, proposes implementations.

The AI makes a plan. You read it. You find the missing validation, the race condition, the query that won't scale. Then you fix the plan before a single line of code gets written.

This matters. A plan that's missing retry logic for flaky API calls will cause failed deployments. A plan that uses imagePullPolicy: Always when you're building images locally will crash every pod with ImagePullBackOff. You catch these things in review, not in production.

3. Poke Holes

Read the plan like you're reviewing someone else's PR.

API goes down. What happens?
User has 10 million records. Still work?
Transaction boundary. Is there one?
Permissions. Checked?

AI writes plausible code. Your job is to make it correct.

4. Test First

Clear inputs, clear outputs, known edge cases? Write the test. Tell the AI to make it pass.

"Hope this works" becomes "proven to work." That's a big difference when it's 2 AM and something breaks.

Write a rate limiter? Hit it with 65 requests. First 60 should return 200. Next 5 should return 429 with a Retry-After header. If it does, it works. If it doesn't, you know before your users do.

5. Smoke Test in a Real Browser

Unit tests check logic. They don't check if the button actually works.

Agents can drive Playwright now. Claude Code launches a browser, clicks through your app, takes screenshots, catches console errors. No manual QA for the happy path.

Write the feature
Run unit tests
Agent opens a browser, walks through the flow
Screenshots confirm it works
Console errors caught before production

It doesn't just write code. It checks that the code works.

6. Then Build

Now you write code. But you have a spec, a reviewed plan, tests, and browser verification before you start. The AI handles boilerplate. You handle the parts that matter.

Superpowers

obra/superpowers is an open-source skills framework for Claude Code. It puts guardrails on this workflow:

Brainstorming: Requirements and design before code. Back-and-forth until edge cases are covered.
Writing Plans: Spec in, step-by-step plan out. File paths, code, test commands, commit points. A reviewer agent checks it before you start.
Executing Plans: Step by step with checkpoints. Subagents for parallel work.
TDD: RED-GREEN-REFACTOR. Failing test first. Implement. Clean up.
Debugging: Investigate before fixing. No guessing.
Code Review: Real review, not rubber stamping. The skill warns against blindly agreeing.
Verification: Run the commands, check the output, then say it's done.

The Difference

Vibe Coding	Disciplined AI Development
Prompt, Accept, Hope	Plan, Review, Verify
"Make it work"	"Make it correct"
Fix after it breaks	Catch it before it ships
AI guesses	You specify
Debt piles up	Things last
Deploy and pray	Test then deploy

When to Vibe Code (and When to Stop)

There are times for it. One-off scripts. Learning a new API. Generating test data.

Prototypes are where it gets dangerous. Every production disaster has an origin story, and a lot of them start with "it was just a prototype." The prototype works. Someone shows it to a stakeholder. Now it has a deadline. Now it's in production. Nobody rewrites it because it already "works."

Here's a simple rule: if it might still be running in 30 days, it's not a prototype. Treat it like production from the start, or accept the rewrite cost later.

Even TechCrunch's reporting shows senior devs still find value in vibe coding for scaffolding and early exploration. The future is hybrid. But the line between throwaway and durable needs to be drawn early, not after the thing is already in production.

The Bottom Line

AI isn't replacing developers. But it is accelerating us.

I'm not going to tell you "AI companies just want you to spend more tokens." Yes, they do. That's their business. But it's up to you how much value you get from those tokens.

These tools can generate large chunks of code fast. That's a fact. The question is what you do with that speed. Without control, it creates technical debt faster than any human ever could. With the right process, it builds production software in a fraction of the time.

Whoever uses these tools with bad habits will fail faster. Whoever uses them with discipline will succeed faster. Same tool. The difference is you.

Resources

Andrej Karpathy on Vibe Coding - The post that started it
Vibe Coding in Practice (arXiv) - Grey literature review on how developers actually use AI coding tools
Senior Devs as AI Babysitters (TechCrunch) - Industry reality check
Claude Code - Anthropic's agentic coding CLI
obra/superpowers - Skills framework for structured AI development
Playwright - Browser automation for testing
Model Context Protocol - Standard for connecting agents to tools
Cursor - AI code editor