How I Build Software with AI

I spend most of my waking coding hours — which is not a lot — inside Claude Code on the Max plan. It's not just my primary tool — it's the conductor of an entire orchestra of specialized agents, skills, and cross-model debates. Over time, I've refined a highly structured, spec-driven workflow that consistently delivers clean, production-ready features with surprisingly few bugs.

The secret isn't any single trick. It's the deliberate layering of human taste, rigorous process, and AI specialization — all tuned for maximum quality through multiple defense layers while keeping my focus narrow and high-leverage.

what makes this setup distinctive

After diving into the 2026 Claude Code community, I realized many pieces of my workflow are popular — but the full orchestration is quite personal.

Two open-source skill packs dominate serious Claude Code users. For those new to this, these are essentially plugin libraries you install to give Claude new specialized commands.

gstack — by YC CEO Garry Tan

This is basically my virtual engineering team. It includes the exact commands I rely on daily: /office-hours (product interrogation), /plan-ceo-review, /plan-eng-review, /qa (with real browser testing), /cso (security), /review, /codex (OpenAI second opinion), /ship, /browse (Playwright-powered), and even /canary. It turns Claude into a full sprint process: Think → Plan → Build → Review → Test → Ship → Reflect, with strong guardrails and user sovereignty.

Superpowers — by Jesse Vincent

This powers my spec-first discipline with /superpowers:brainstorming, writing-plan, parallel agents, and TDD enforcement. It also includes powerful visual tools that evolved into what I use as /visual-explainer — turning complex plans and terminal output into clean HTML diagrams and architecture visuals.

Other distinctive elements in my flow:

Early Grok vs. Gemini pitting to surface clarity gaps before any planning begins.
Parallel adversarial + multi-skill reviews at multiple gates.
Scheduled agents running every 3 hours that test APIs and UX flows, log issues, and update TODO.critical.md + TODO.md (features stay 100% human-driven).
Greptile review loops + ongoing experimentation with /ultrareview (cloud multi-agent deep reviews with sandbox verification).
Isolated worktrees with submodules + Figma MCP for true component-driven development.
Heavy reliance on /visual-explainer during architecture, security, and design phases.

While many developers use pieces of this (SDD is now mainstream, with tools like GitHub Spec Kit and plan mode in Claude Code), the complete "human conductor + multiple defense layers + automated bug maintenance" system feels like my evolved personal operating system.

the quality engine: multiple defense layers

This is where the real power shows up. I don't chase breadth. I deliberately narrow my focus to what only a human can do well: product taste, architecture decisions, security thinking, and user experience. Everything else gets filtered through rigorous layers so I can spend the majority of my time reading and understanding code rather than writing or debugging it.

Here's exactly how an idea travels through the defense layers in practice:

01

The Crucible

Detailed idea → Grok/Gemini debate → /office-hours or brainstorming.
02

The Blueprint

/plan-ceo-review → /plan-eng-review → writing-plan + multi-review (plan-eng + codex + adversarial).
03

The Assembly

Parallel-agent build.
04

The Gauntlet

Parallel post-build reviews (codex + review + plan-eng-review + adversarial).
05

The User Test

/qa + /browse (Playwright) + /design-review when UI is involved.
06

The Final Polish

/ship → Greptile loop → Ultrareview experimentation.

The result: dramatically fewer bugs reach me or production. Because the filters are so strong, I get to operate with a narrower scope and deeper attention. I'm not context-switching across a thousand micro-issues. Instead, I'm deeply engaged with UX flows, component architecture, and security implications — often visualized cleanly via /visual-explainer.

This multi-layer approach is what lets me treat features as high-craft work while the system handles the volume of validation and maintenance.

the supporting cast

Claude Max — Extended context and high limits are non-negotiable for long-running agent sessions and large codebases.
claude mem + persistent CLAUDE.md patterns for cross-session continuity.
Gemini CLI for documentation and occasional fresh perspective (I still jump into Cursor sometimes too).
Ubuntu VM QA environment + push to Canary (where Claude involvement stops).

what's next

In my next post, I'll go deeper into the monitoring side: how I set up continuous system observation, feed real signals back into the TODO system, and let scheduled agents constantly hunt and fix issues — all while I stay focused on the high-value creative and architectural work.

This whole setup has turned Claude Code from a powerful coding assistant into something closer to a small, high-discipline engineering organization that I personally conduct. The compound effect on quality and focus has been massive.

If you're on Claude Code Max, start with gstack and Superpowers — they're free and transformative. Then layer in your own model-pitting rituals and maintenance agents. The quality compounds fast.

What's one defense layer or ritual you've added to your own flow? I'd love to hear about it.