Skip to main content

Command Palette

Search for a command to run...

One Person. Twenty Engineers. No Hype.

Published
4 min read
L
I'm a cloud architect by day and a solo founder by night. I run Heldcraft — a one-person software studio where I'm building something worth building, carefully and without shortcuts. I write about what I'm learning in the process: the tools, the trade-offs, the honest gaps between what sounds good and what actually works. No hype. Just the build.

Garry Tan is President & CEO of Y Combinator. He is also, apparently, shipping 10,000 to 20,000 lines of production code per day as a side activity.

That number sounds like marketing copy. It might not be.

Tan open-sourced his setup last week: gstack, a collection of 15 Claude Code slash commands that turn a single AI session into something that behaves like a staffed engineering team. His last 7-day stat from the README: 140,751 lines added, 362 commits, ~115k net LOC. While running YC.

I spent time with it this week. Here's what it actually is.


What gstack does

The core idea is role-based cognitive specialisation. Instead of prompting Claude Code generically, gstack gives it structured personas — each a slash command, each with a specific mode of thinking:

  • /plan-ceo-review — reframes your feature request. Asks "is this the right problem?" before you write a line of code. Four modes: expand the scope, selectively expand, hold scope, or reduce.

  • /plan-eng-review — produces architecture, ASCII data flow diagrams, a test matrix, edge cases, failure modes. Forces hidden assumptions into the open before you build.

  • /review — the paranoid staff engineer. Finds bugs that pass CI but fail in production. Auto-fixes the obvious ones. Flags the judgment calls.

  • /qa — opens a real Chromium browser, clicks through your actual app, finds bugs, fixes them with atomic commits, generates regression tests.

  • /ship — syncs main, runs tests, audits coverage, pushes, opens a PR. Bootstraps test frameworks from scratch if you don't have one.

The full list is 15 commands. They're designed to chain: plan with CEO + eng + design, build, review, QA, ship. One feature, seven commands, complete.


What actually matters here

The interesting thing about gstack isn't any individual command. It's the governance layer.

Running Claude Code without structure gets you fast iteration but unpredictable quality. The failure mode is vibe coding — moving quickly, making decisions in the moment, shipping things that work until they don't.

gstack solves this by introducing review gates. The CEO command rejects 3-star features before you build them. The eng review locks in architecture before you write code. The design review catches AI slop before it ships. The QA step verifies fixes actually work.

That's not a coding tool. That's a process. And it's a process that scales with parallel sessions — Tan runs 10+ simultaneous Claude Code instances, each in its own branch, each with the right cognitive mode for the task.

One person, ten parallel agents, structured roles, real review gates. That's the actual innovation.


What I'm taking from it

I run a small agent pipeline at Heldcraft — six specialist agents (product, architecture, dev, QA, security, release). The pipeline design predates gstack but the philosophy is the same: specialisation beats generalism for structured work.

A few things from gstack I'm actively looking at:

/qa and real browser automation. The ability to open a real browser, interact with a staging URL, find a bug, fix it, generate a regression test, and verify the fix — in one command — is a genuine capability unlock. Right now my QA infrastructure doesn't have browser access on the server. That's a concrete gap to close.

/review with auto-fix + flag-for-human. The pattern of "fix the obvious, flag the judgment calls" is exactly what automated PR review should look like. Currently my security review step flags everything and asks for human review.

/document-release. A command that reads every doc file in the project, cross-references the diff, and updates everything that drifted. This is the unglamorous work that never gets done. Automating it would close a real gap.


The honest caveat

140k lines in 7 days is an extraordinary claim. I have no way to verify it, and lines of code is a notoriously bad metric. What matters is whether the output is useful and correct.

What I can verify: the tooling exists, it's well-designed, it's MIT licensed, and the pattern of structured AI-assisted development it represents is real. Whether any specific individual is hitting those numbers is less interesting than whether the ceiling is higher than most people think.

The answer to that second question seems clearly yes.


I'm building Heldcraft — a software studio focused on lean, agent-assisted development. This is part of my ongoing experiment with agentic systems.