Are big PRs always better under AI-assisted development?

No. The argument is that a *massive prototype PR* is a valuable starting point — a work blueprint — but that it should be staged backward into smaller, logically independent sub-PRs for review. Big PRs are not the final artifact; they are the origin of the final artifacts.

How do you break down a big prototype PR?

Feed the prototype to a fresh, independent AI agent and ask it to produce a sequence of logically independent sub-PRs — typically an enabling refactor, then the feature, then any drive-by fixes the prototype surfaced. Each sub-PR is reviewable on its own, and together they sum to the prototype.

Should the prototype itself be merged?

No. The prototype's purpose is to inform the staging, not to be merged. Some teams keep it on a draft branch as reviewer context; others discard it once the staged sub-PRs are open. What matters is that reviewers review the staged PRs — not the prototype.

Won't the AI agent hallucinate the decomposition?

The prototype is the source of truth, not the agent. The agent's job is to *explain* the prototype as a sequence of independent steps — and any mismatch is detectable because the staged sub-PRs must sum to the prototype's diff. This is qualitatively different from asking an agent to decompose a feature that doesn't exist yet.

What does this have to do with the Velocity Paradox?

The Velocity Paradox describes faster coding colliding with unchanged downstream processes. Forcing every change into unstructured micro-PRs at the team level multiplies that bottleneck rather than easing it. Staged decomposition — where the prototype informs the staging — is what actually relieves it.

Stop Apologizing for Big PRs. They're the New Starting Point.

You start the agent on a feature. Ninety minutes later it comes back with a working prototype: thirty files touched, two drive-by bug fixes you did not ask for, a passing test suite, a correct diff. The change is right. It is also too large to review.

The reflex, in 2026, is still the reflex of 2018: apologize for the size, promise to decompose differently next time, and start carving the diff into smaller PRs. Don't apologize. The prototype is doing exactly what it should be doing, and the small-PR rule that is making you wince was written for a constraint that no longer holds.

This article makes the case that AI inverts the economics of decomposition. The massive prototype PR is not a planning failure but a work blueprint. The way to ship it well is to build backward from the prototype into a sequence of reviewable sub-PRs — and the result is more reviewable than the small-PR discipline it replaces, not less.

The argument in 5 steps

The small-PR rule was an optimization for reviewer attention, not a principle
AI removed that constraint by collapsing the cost of producing the code
The massive prototype PR is now a work blueprint, not a planning failure
Stage it backward into a sequence of sub-PRs that sum to the prototype
The result is the fix to the Velocity Paradox, not a contributor to it

The small-PR rule was an optimization, not a principle

The argument for small PRs has been gospel for fifteen years for a good reason: reviewer attention was the constraint. A 30-file PR was a planning failure because it asked a human reviewer to load a context they could not hold, against a backlog of other PRs they also could not hold. The discipline of "small, focused PRs" was an optimization for human scanning speed.

Every piece of advice from that era — keep PRs under 400 lines, single-concern, single-commit-able, reviewable in thirty minutes — descended from that constraint. Style guides, eng-productivity research, code-review playbooks all converged on it, and for the world they were written in, they were right. GitHub's own canonical guidance on the subject reads like an exact statement of the orthodoxy:

"Aim to create small, focused pull requests that fulfill a single purpose."

— GitHub Docs, Best Practices for Pull Requests

The constraint changed. AI flips two assumptions at once: code production speed has jumped by roughly an order of magnitude, so the cost of writing the code is no longer the limiting factor; and decomposition can now be performed by a fresh agent reading a working prototype, which means the staging work no longer has to happen before the code exists. Both shifts invalidate the original case for keeping PRs small at the moment of authoring.

The case for keeping PRs reviewable at the moment of merging is unchanged. That is the distinction the small-PR rule was never explicit about — and it is the one that matters now.

Build backward, not forward

Traditional decomposition runs forward. Understand the problem, design the change, sequence the steps, submit each step as a small PR. Each PR is a guess about what the next step will need; some guesses turn out wrong, and the cleanup either lands as a follow-up or never lands at all.

Building backward inverts the order. The prototype runs first. The agent solves the problem end to end, surfaces drive-by fixes the original scope did not anticipate, and produces a complete working reference. Then a second agent — a fresh session, no shared context with the first — reads the prototype and proposes a sequence of logically independent sub-PRs that stage the same change.

The sequence is almost always the same shape:

Enabling refactor. The structural change the feature depends on, landed on its own. Reviewable as a refactor: same behavior, different shape. This is the PR a senior reviewer would have asked for as "step zero" before the feature, except now it is informed by what the feature actually needed instead of guessed at.
The feature. The user-visible change, on top of the prepared structure. Reviewable as a feature: new behavior, contained surface area, the diff focused on intent rather than scaffolding.
Drive-by fixes. The bugs and incidental cleanups the prototype surfaced, each as their own focused PR. Reviewable as fixes: independent of the feature, often valuable to a different reviewer entirely.

Each sub-PR is reviewable on its own. The prototype tells you the staging is correct — not because you trust the agent's plan, but because the prototype is a working reference that the staged sub-PRs reproduce. The discoveries the prototype made survive into the merges instead of getting lost as scope creep. Drive-by fixes ship instead of getting deferred to a "we should clean this up sometime" backlog item that nobody owns.

It is also normal — and a feature, not a bug — for the staging pass to surface flaws the prototype itself glossed over: a missing test, an edge case the implementation does not handle, a naming choice that reads cleanly inside the original session but confusing in isolation. Capture those findings in the appropriate sub-PR (the enabling refactor, the feature, or the drive-by-fix slot) rather than discarding them. The merge sequence ends up qualitatively better than the prototype, because the staging step doubles as the first review pass before any human reviewer sees the code.

Open the prototype as a PR even when it will not merge

A reasonable question — if the prototype is not the artifact you intend to merge, why open a PR at all? Not every prototype needs to be one, but the PR machinery does real work that matters whether or not the prototype itself becomes the merge artifact:

CI surfaces what local can't. Anything you'd run locally — unit tests, type-checks, lints, formatters — is best validated locally first. The PR earns its keep on the validations that only exist in CI: cross-platform matrix builds (Node/Python/Go × Linux/macOS/Windows), preview deployments (Vercel, Cloudflare, Netlify) that give you a shareable URL for the prototype, SAST and supply-chain scans (CodeQL, GitHub Advanced Security, Snyk, dependency-review actions), visual-regression diffs (Chromatic, Percy), and end-to-end tests that need staging credentials no laptop has. None of those run on your machine, and the prototype catching one of them early is exactly the kind of cheap finding the staging pass would otherwise have to chase down.
The diff is easy to read. GitHub's diff view lets you, the staging agent, and anyone curious for context navigate the change at a file level. Local git diff is fine for small changes; it falls apart at thirty files.
Size becomes visible. Lines added, lines removed, files touched — the PR header gives an at-a-glance signal of how much work the staging pass needs to break down.

Mark the status explicitly so colleagues are not confused. GitHub's draft PR is the ideal default — it disables the merge button outright, signals "work in progress" in the UI, and reviewers see "Draft" next to the title. The one thing to check first: some CI setups skip drafts, which would silently strip the CI benefit above. If yours does, fall back to a regular PR with [DO NOT MERGE] in the title.

There is a quieter benefit too: transparency. Opening the prototype as a draft PR — even one nobody plans to read — lets the team see what ideas are being explored and what is currently in motion. Some of the highest-leverage technical conversations of any given week happen because someone scrolled past a draft title and asked a question. And if the prototype works, the staged sub-PRs that ship a few hours later land on roughly the same change anyway — the prototype was never the throwaway it pretended to be.

This is not a new workflow. AI just made it cheap.

The shape of building backward predates AI by half a century. Software engineering has called variations of it iterative prototyping, design-by-building, iterative and incremental development — the practice of starting with a simplified working version and letting the structure emerge through construction rather than fixing it up front. Craig Larman and Victor Basili surveyed the lineage for IEEE Computer in 2003:

"Although many view iterative and incremental development as a modern practice, its application dates as far back as the mid-1950s."

— Larman & Basili, Iterative and Incremental Development: A Brief History, IEEE Computer (2003)

The reason iterative development stayed niche for so long is cost. Producing a complete working prototype meant burning real engineering hours on a path the team would then redo properly — sometimes weeks of throw-away labor. Frederick Brooks named the dynamic in 1975 in The Mythical Man-Month, in a chapter titled "Plan to Throw One Away":

"Hence plan to throw one away; you will, anyhow."

— Frederick P. Brooks, Jr., The Mythical Man-Month, Ch. 11 (1975)

Brooks's argument was that the first version of any non-trivial system is a learning artifact — you build it to discover what the system actually needs to be, not because it is the version you ship. The discipline he was prescribing — expect to throw the first one away, plan around it — was correct then and remains correct now. What has changed is the price of admission. When the prototype takes ninety minutes instead of three weeks, the throw-away assumption inverts: you do not throw it away; you stage it backward into reviewable sub-PRs. The same workflow that was a "sometimes" choice for genuinely uncertain greenfield work becomes the default for anything that touches more than one or two files, because the cost of being wrong about the decomposition has collapsed.

This article is not claiming AI invented building backward. It is claiming AI made an old, well-validated workflow affordable enough to be everyday — and the small-PR discipline that compensated for the absence of cheap prototyping has aged accordingly.

The same tools that prototype can review

The cost collapse cuts both ways. The same agentic tools that turn a one-line task into a working prototype can run a fresh-session pass on the staged sub-PRs before any human reviewer sees them. Open the sub-PR, point the agent at the diff, prompt it to be ruthless, and iterate until the noise outweighs the signal. What surfaces is qualitatively similar to what a senior reviewer would catch on a careful read — pattern errors, missing tests, contract violations, surface bugs, the things pre-AI code-review culture was largely about.

A whole market of dedicated AI-review tools has grown up around this — CodeRabbit, Greptile, Cursor BugBot, Graphite Diamond, Qodo, Ellipsis, and others. They work well. The simpler reality is that the agent already on the author's laptop, run in a clean session against the open PR with a sharp prompt, produces work that is competitive with any of them, and the author keeps full control over the prompt, the iteration count, and which findings to act on.

The point is not to replace the human reviewer. It is to reserve their attention for the kinds of judgment AI cannot reliably exercise — product intent, architectural fit, customer impact, domain-specific risk — by handling the pattern-matching catches before they reach the human. Pre-reviewing each sub-PR with AI is the second cost-collapse the workflow benefits from: where staging-with-AI changed the cost of producing reviewable code, reviewing-with-AI changes the cost of providing the first careful read.

The objections, addressed

The pattern triggers three predictable objections.

"Reviewer bandwidth is still the constraint." True — and that is the case for staged sub-PRs, not against them. A 30-file prototype is unreviewable; a refactor PR scoped to the enabling structure, a feature PR scoped to the user-visible behavior, and a drive-by-fix PR scoped to the surfaced bugs are easier to review than the small-PR discipline they replace. The pre-validation matters: a reviewer is reading code whose end state is already known to work, not a slice of code that may or may not lead somewhere coherent. Reviewer hours go further when the staging itself encodes information the reviewer would otherwise have to reconstruct.

"The agent will hallucinate the staging." The prototype is the source of truth, not the agent. The agent's job is to explain the prototype as a sequence of independent steps, not to invent steps. When the staging is wrong, the diff against the prototype shows it — sub-PRs sum to the prototype, or they do not, and any developer can run the comparison. This is qualitatively different from asking an agent to decompose a feature that does not yet exist; the verification is mechanical.

"The drive-by discoveries will get lost." The opposite. Drive-by discoveries are exactly what the prototype produces and what staging captures. The third PR in the sequence — the drive-by-fix PR — is where they live. Without the prototype, those discoveries either ride along inside an oversized feature PR (where reviewers cannot isolate them) or never get made at all because the developer never got far enough to notice them. Building backward is, if anything, the only workflow that reliably captures incidental quality work.

This is the fix to the Velocity Paradox, not a contributor to it

Harness coined the term Velocity Paradox for the pattern where AI accelerates code generation but engineering organizations ship slower or with higher risk because downstream processes — testing, review, deployment — were designed for the previous speed. Code review queues are the canonical Velocity Paradox bottleneck: more code, same reviewers, longer queues, slower merges, frustrated authors, lower-quality reviews.

The naive response is to cap every PR at the team level: enforce a 200-line maximum, require single-concern PRs, mandate that every change land as a sequence of micro-PRs. That does not solve the Velocity Paradox. It multiplies it. Unstructured micro-PRs scatter context across many reviews, each carrying the same coordination cost. Reviewers face more PRs, not less context per PR, and the team's ratio of useful review time to context-switching time gets worse.

Building backward from a working prototype is structurally different. The staging is informed by the prototype rather than guessed at. Sub-PRs sum to a known-working end state. Reviewers see staged code with pre-validated behavior, not slices of a plan. The same work surfaces fewer review cycles, not more, because each cycle terminates cleanly.

Small PRs were never the goal. Reviewable PRs were the goal, and the size cap was a proxy for reviewability that worked when execution was the bottleneck. Under AI acceleration that proxy stops working, and the right move is to optimize for the underlying property — reviewability — directly.

The agreement this approach requires

Building backward only works if the team agrees on three things.

The prototype is real. Not a sketch, not a half-implemented stub, not "what we would do if we had the time." A working, tested, end-to-end implementation. Authors commit to producing prototypes that pass their own tests; reviewers commit to trusting that the prototype is the actual specification when they review the staged sub-PRs.

Drive-by fixes are valued, not punished. The drive-by-fix PR is where the prototype's incidental discoveries land. If team culture treats those as "scope creep" or "out-of-band work that broke our planning," authors will stop surfacing them — and the team loses one of the largest benefits of the pattern. Engineering leadership has to treat surfaced incidental work as a feature of the workflow, not a bug.

Reviewers review the staged PRs, not the prototype. The prototype's purpose is to inform staging, not to be merged itself. Some teams keep prototypes in draft branches for reviewer context; some delete them after staging is complete. Either is fine. What is not fine is reviewers asking authors to "just merge the prototype" — that defeats the entire purpose of staging and reintroduces the unreviewability problem the pattern was designed to solve.

Without that agreement, the failure mode is predictable. Authors avoid prototyping because the social cost of presenting a 30-file diff outweighs the gain from staging it, the team falls back to forward decomposition, drive-by fixes get dropped, and reviews stay bottlenecked. The Velocity Paradox metastasizes inside the review process — and the team blames AI for it, when the actual cause is a workflow that has not been redesigned for the new constraint.

When this advice does not apply (yet)

Everything above assumes you have customers, or you are about to. If you are a solopreneur prototyping an idea before you have a single user, a non-technical founder building a v1 with AI tools, or an internal tool with a small and forgiving audience — ignore the staging discipline. Ship the prototype. Move fast. Staging is overhead, and at zero customers the overhead has no payoff.

The trap is forgetting the moment your audience changes. The OpenClaw incident from early 2026 is the recent canonical cautionary tale: an open-source AI agent framework reached 180,000 GitHub stars in a matter of weeks, then promptly suffered a critical remote-code-execution vulnerability, a 35,000-email data leak from its companion social-network platform, and a supply-chain compromise of its skills marketplace that delivered macOS infostealer malware to thousands of users. Code that took its developer a few weekends to ship had to be hardened by a security industry response in real time. Most of us would have shipped with the same defenses given the same starting point — that is the point. The discipline this article describes becomes load-bearing the moment the audience scales from "me" to "everyone," and retrofitting it after a breach is a strictly worse experience than building it in from the first paying customer.

Massive PRs are no longer a planning failure

The instinct to apologize for a big PR comes from a real constraint that genuinely existed. It is worth honoring the developers who built that discipline — code-review culture is one of the things the field got right in the 2010s, and the practice of small, reviewable PRs is part of why working in any well-run codebase is bearable today.

The instinct has not aged with the tooling, though. Apologizing for a 30-file prototype in 2026 is the equivalent of apologizing for using a compiler instead of writing assembly. The thing that made the constraint real is no longer the constraint.

The next time the agent hands you a working prototype across thirty files, do not apologize. Stage it. Build backward. Ship the enabling refactor, then the feature, then the drive-by fixes — three reviewable PRs that sum to a known-working end state, with the drive-by discoveries captured instead of dropped. Reviewers and authors are both happier, the merge queue moves faster, and the Velocity Paradox loses one of its main pressure points.

Massive PRs are no longer a planning failure. They are a starting point.