Can capacity planning be done in story points?

Formally, only inside SAFe — its PI Planning model prescribes capacity in 'normalized story points,' computed via the canonical rule of 8 points per full-time developer per iteration. The Scaled Agile Framework's own consultancy ecosystem documents that this approach fails predictably: overcommitment, normalized-velocity gaming, and the 8-points rule treated as truth rather than a starting baseline. Outside of SAFe-shop enterprises, capacity planning in points is rare — and the much more common pattern is teams estimating in time, then laundering the result back into points to satisfy reporting requirements.

Why developer-weeks instead of story points?

Developer-weeks are a real unit anchored in time, which means holidays, vacations, and partial allocations actually subtract from them. Story points cannot be summed across teams or converted into quarter-level math without distortion — and Ron Jeffries, who coined them, has since publicly disowned their misuse. Estimate in time at every level: hours at the sprint level if you must, developer-weeks at the quarter level. Skip the translation layer story points were invented to provide.

Isn't a one-tab capacity sheet overkill for a small team?

It's the opposite — for a six-person team, the entire model fits on one screen and takes about fifteen minutes to populate per quarter. The overhead is comparable to a single sprint planning meeting, run quarterly. The savings come from the conversations the sheet replaces — the ones nobody had because nobody knew how to start them.

What happens when leadership demands more than the sheet shows is possible?

Make the trade explicit. Show the buffer going negative on the summary row, and ask which line gets cut to fund the new ask. If the answer is 'none of them', document the overcommitment in writing. Ambition is acceptable; invisible overcommitment is the failure mode the sheet is designed to prevent.

How often should I update the capacity model?

Follow your planning cycle, whatever it is — quarterly, trimester, or any other rhythm, with or without mid-cycle reviews. The capacity sheet is not a status report; it is the artifact behind your planning process, and it should refresh when that process refreshes. Updating capacity more often than your planning cadence creates churn without adding insight.

Does this work alongside Scrum, SAFe, Shape Up, or any other framework?

It works upstream of any execution framework. Scrum operates within a quarter's allocation, not against it; Shape Up's six-week cycles fit into a quarter's developer-week budget. SAFe is the awkward case — its prescribed unit for PI capacity is normalized story points, not time. If you are inside a SAFe org politically, the practical move is to run the developer-week sheet alongside as your private ground-truth model, and let the points layer continue as the political ritual it has become.

What about AI-multiplied developers — do they count as more than one person in the capacity sheet?

No. AI uplift belongs on the *estimate* side of the math, not the *capacity* side. Capacity stays anchored in real calendar time — a developer is one person in a forty-hour week, full stop. What AI changes is how much work fits inside that real time: a feature that used to take four developer-weeks may now take two, so estimate it at two. Pushing individual coding-time percentages above 100% to reflect AI multipliers reintroduces the same problem story points had — the unit stops mapping to time, forecasts drift, and you end up tracking AI uplift per person, which is operationally messy and creates awkward performance-management dynamics. Treat AI as a force that compresses estimates over time. Let everyone on the team benefit from that compression equally.

The Last Capacity Planning Sheet You'll Ever Need

It's the last week of the quarter. The team is going to miss two of the five committed epics — they already know it. The retro that follows will use the word "velocity" eleven times and the word "capacity" zero. The plan they committed to was a target with a spreadsheet stapled to it. The capacity model — the actual arithmetic of who would be available for how many days, doing what kind of work — never existed.

Frederick Brooks wrote the line that explains why, fifty years ago, in a chapter that has aged better than most:

"How does a project get to be a year late? … One day at a time."

— Frederick P. Brooks, Jr., The Mythical Man-Month (1975)

Quarters get late the same way. A holiday gets missed in the math. Two devs become one when the senior gets pulled into a hiring loop. Support load eats six weeks nobody planned for. By the time you're building the plan for the next quarter, the previous one's arithmetic is so quietly wrong that you stop trusting arithmetic at all — and switch to story points, velocity, and the reassuring vagueness of relative sizing.

This article is about what to do instead. The argument is that capacity planning, done in real units with a real allocation split, on a single sheet, is the last honest conversation left in the quarter — and most teams don't have it. The ones that do, ship.

At the end of the article: a public, read-only Google Sheet you can copy into your own Drive, fill in, and use this Friday. Free. Argument first; link last; the agreement the model requires in between.

The argument in 6 steps

Capacity planning was invented to be honest. Software made it dishonest.
Yes, one framework formally plans capacity in points. It does not work.
The unit you plan in is the truth you will or will not face.
Most capacity plans miss the same six things.
The allocation split is where the actual fight is.
AI didn't change the model. It made the model matter more.

Capacity planning was invented to be honest. Software made it dishonest.

Capacity planning is older than software by half a century. Henry Gantt designed his eponymous chart for the U.S. Army Ordnance Department during the First World War, and the version that ended up coordinating shipbuilding and munitions logistics had a property modern Gantt charts have lost: every bar represented a person's available hours, and the bar was either filled with work or it wasn't. The point of the chart was that you could not pretend a person had eighty hours of capacity in a forty-hour week. The unit was real. The math forced reality on the conversation.

Software adopted the form and dropped the unit. We kept the bar charts and the swim lanes, then replaced "hours" with "story points" — an abstract sizing scale that began life in late-1990s Extreme Programming as a renaming of "ideal days." Ron Jeffries, widely credited with the invention, later wrote that the rename was a response to stakeholder confusion: "the result was that our stakeholders were often confused by how it could keep taking three days to get a day's work done." Allen Holub, writing from the longer-running #NoEstimates tradition, reads the same move less charitably:

"Story Points were invented to obfuscate duration so that certain managers would not pressure the team over estimates."

— Allen Holub, #NoEstimates, An Introduction

Both readings can coexist. Whatever the original intent, the effect was the same: by the early 2000s, "story points" had become a planning currency that nobody could convert back into anything physical even when they wanted to. You cannot subtract a holiday from a story point. You cannot add up nine team members' story points and get a meaningful total because their points don't mean the same thing. (For the deeper case on why story points were never part of Scrum and what the original Scrum Guide actually mandates, see Your Scrum Isn't Scrum.)

Jeffries himself, in his 2019 piece Story Points Revisited, publicly walked back the practice his own term enabled — "I like to say that I may have invented story points, and if I did, I'm sorry now." Holub goes further in the same #NoEstimates piece: "Point values are irrelevant," he argues, with the right response being throughput measurement of completed work rather than relative sizing of work that has not yet started.

Yes, one framework formally plans capacity in points. It does not work.

The honest answer to "does anyone actually plan capacity in story points?" is: outside of one framework, almost no one. Inside that framework — the Scaled Agile Framework, better known as SAFe — it is mandated. SAFe's PI Planning event prescribes "Team PI Capacity," computed in normalized story points via the canonical rule of eight points per full-time developer per iteration, summed into a Program Increment of typically eight to twelve weeks. This is the only place I am aware of where capacity planning in story points exists as a documented, named, official practice.

It does not work, and the documentation of why does not come from outside SAFe — it comes from inside the SAFe consultancy ecosystem itself. The Wibas analysis of the canonical PI Planning capacity slide, written by SAFe-certified coaches, is direct about the failure mode:

"This is an important but also tricky slide that leads many people to misunderstand estimation in SAFe. … It would be a big mistake to calculate Capacity based on '8 points for every full-time developer' in every Iteration/Sprint."

— Wibas, A Tricky Slide About Story Points and Capacity in SAFe

What the Wibas piece, the Mercedes-Benz Tech Innovation case study, and similar inside-SAFe critiques converge on is the same set of failures the rest of this article will name in any capacity model — holidays not modeled per person, support load not deducted, the "8 points per developer" formula treated as a recurring truth rather than a one-time baseline — wrapped in an additional translation layer (the "normalized" velocity) that obscures rather than reveals where the math is wrong. The unit makes the trade-offs harder to see, not easier.

Outside SAFe, in everything I have personally seen across more than a decade across multiple organizations, capacity planning in points is not a real practice. What is a real practice — pervasive, almost universal in points-using teams — is the inverse: teams that estimate in time and then launder the estimate back into points to satisfy a tooling or reporting requirement. Santiago Valdarrama named the pattern bluntly in 2024:

"Everyone I've seen—and I mean everyone—uses points as a measure of time: 1 point = 2 hours, 3 points = 1 day, 5 points = 3 days."

— Santiago Valdarrama, on the lived practice of story-point estimation (2024)

The laundering is bitterly ironic given the origin story. Jeffries' original purpose for points was to encrypt time from managers; teams in 2026 spend cognitive effort decrypting time back out of points to make any planning conversation work. The result is the worst of both: the dysfunction tax of points (the rituals, the planning poker, the Fibonacci arguments, the velocity charts) without any of the originally intended benefit (genuine abstraction from time-as-contract). The unit is doing no work that hours or developer-weeks would not do better.

The fix is not to use story points more carefully or to find the right velocity-normalization formula. The fix is to plan capacity in time, at every level, and stop translating.

The unit you plan in is the truth you will or will not face.

Plan in developer-weeks. One developer, one week of pure coding time, average. Whole numbers when possible, halves when not. Treat the developer-week as the only unit that crosses the boundary between "work to be done" and "people who will do it."

Two things change when you switch.

First, the math becomes auditable. A team of six people running an eleven-week quarter has a maximum theoretical capacity of sixty-six developer-weeks. From that you subtract holidays, vacation days, on-call rotations, hiring panels, mentoring overhead, and the percentage of time each person realistically spends writing code rather than reviewing, attending meetings, or unblocking peers. The result — usually somewhere between thirty-five and forty-five percent of the theoretical maximum — is a number you can defend out loud. It survives a one-on-one with a skeptical CEO because every line beneath it is a real subtraction from a real calendar.

Second, the conversation about commitment becomes precise. "Can we fit a fifth epic?" stops being a vibe-check and becomes "Do we have eight more developer-weeks of business capacity than we've already allocated?" Either you do or you don't. If you do, name them. If you don't, name what's getting cut. The unit makes the negotiation sharper for everybody at the table — engineering and product alike — and the sharpness is the point.

The objection lands every time: but we don't know how long any given epic will take in developer-weeks. Correct. You will be wrong, often badly. You will become less wrong faster than you expect, because the unit gives you a calibration target other than a vibes-based velocity number. Three quarters of dev-week estimation produces an engineering team measurably better at it than three years of story-point velocity tracking, because the unit doesn't shift between teams or quarters or framework redesigns. It is the same unit Brooks used in 1975 and Gantt used in 1917. There is a reason it has outlived its replacements.

Most capacity plans miss the same six things.

The failure modes are unglamorous and predictable. Every team I've audited has hit at least three of them at the same time. The first one is the foundational mistake — the rule that, when broken, makes every other failure on this list inevitable:

Conflating effort with duration. The capacity sheet contains exactly one duration: the period's start and end dates. Everything else — every developer-week on every row, every estimate on every epic — is pure dev effort at one hundred percent focus. Vacations, holidays, meetings, the percentage of any given week that is genuinely available for development work: those have already been deducted on the capacity side of the math. Re-deducting them on the estimate side double-counts the deduction and produces estimates that bear no relationship to reality. The sheet's job is not to predict the date each line item ships. It is to confirm the work fits inside the period. Inside the period, the team will continue to juggle priorities, vacations, support rotations, and unplanned events — that is a runtime concern, not a planning concern. The moment the sheet starts trying to predict per-item dates, it stops being a capacity model and starts becoming a Gantt chart, and you have rebuilt the heavy project management apparatus this approach was meant to replace.
Holidays modeled as a single team-level number. A team distributed across three time zones has three different holiday calendars. Treating them as a single average undercounts the absences and lands the plan in the red by week six. Model holidays per person, not per team, and let the per-person rows roll up to the team total.
New hires counted at full capacity from day one. The first month is onboarding, environment setup, codebase orientation, and the steady drip of "I just need to ask one question" interruptions to the rest of the team. Realistically, a new senior is at twenty percent the first month, fifty the second, eighty the third. Anything more optimistic is the plan lying to itself, and the lie compounds because their notional capacity is pulling roadmap commitments forward.
Tech leads counted as developers. A staff engineer or tech lead who reviews half the team's PRs and runs architecture office hours is, charitably, a thirty-percent contributor to roadmap effort. Their high-leverage work shows up as everybody else's velocity, not their own. Model their personal coding capacity at thirty percent and stop pretending — the multiplier they create across the team is real, and it doesn't need to be double-counted on their row.
No support deduction. Most teams have a steady operational load — escalations, customer-specific bugs, on-call investigations, the recurring asks that never quite become a project. If your team historically spends twenty percent of any given week on this work, twenty percent of capacity is gone before any roadmap planning starts. Deduct it explicitly. If support load consistently exceeds the deduction, that is signal — usually a product-investment signal, not a "the team should work harder" signal.
No allocation gut-check. A capacity plan that lands the math on developer-weeks but treats them as a single bucket misses the most useful question the model is built to answer: are we investing where we said we wanted to invest? Most teams want some level of technical work — refactors, tech-debt paydown, internal tooling, observability — funded each quarter, and most teams discover, when they actually run the math, that they are not. The audit ends on this question because everything else is arithmetic; this is the only step where the math meets strategic intent. The next section is about that step in full.

Each of these is arithmetic, not strategy — except the last, which is strategy hiding inside arithmetic. The reason teams miss them is that nobody is responsible for the math. The capacity model is owned by the engineering manager, validated with the team, and reviewed with product — not delegated to a tool that auto-calculates a velocity from last quarter's points and renders it as a confident green number on a dashboard.

A note on the buffer question, since it always comes up: I do not recommend adding a padding buffer to a capacity plan. Estimates already represent the midpoint between the aggressive and the conservative read of each piece of work; padding on top of that compounds the conservatism and produces plans that succeed by accomplishing less than the team could have. It is materially better to fail a plan estimated honestly than to succeed a plan that was over-padded. A plan that lands above one hundred percent of capacity is allowed — provided the over-commit is visible on the summary row and the success rate is understood to be lower. Capacity planning is a forecast, not a contract. The goal is predictability, not invulnerability.

The allocation split is where the actual fight is.

Most teams that survive the unit problem still misplan the quarter, because they treat capacity as a single bucket. It isn't. The single most useful thing a capacity model can make explicit is what kind of work the capacity is funding.

Three categories cover most of it:

Feature work — the user-facing work the product or commercial side of the company is asking for.
Technical work — refactors, tech-debt paydown, platform investment, internal tooling, the tests and observability the team will regret not having.
Support — escalations, ops work, recurring customer-specific work, the unplanned tax on every quarter.

The fight is over the split, and the fight is healthy. Engineering will argue for more technical work; product will argue for more feature work; support load is whatever it actually is, and pretending otherwise is what got the team here. The point of the sheet is not to dictate the split. The point is to make the split visible so the conversation is about the percentages instead of the individual line items.

A team I worked with had been running, for over a year, at roughly seventy percent feature work and ten percent technical work. Nobody had decided that. It had emerged, the way most allocation does — through the path of least leadership resistance. The capacity sheet, when we finally built it, made the proportion legible for the first time, and the leadership conversation that followed was the first one in eighteen months that actually engaged with allocation. It set a target of sixty / twenty-five / fifteen across feature, technical, and support, and moved the team measurably toward it the next quarter. The numbers themselves are not what mattered. The visibility was the unlock.

If you do not define an allocation target, you cannot later evaluate whether you respected it. If you consistently underinvest in technical work — and most teams do — you are accumulating structural debt whether you've named the trade or not. The same is true in the other direction: a team that lands one hundred percent technical work for two quarters running has misread the strategic moment, and the sheet will tell you that just as readily as it will tell you the inverse.

AI didn't change the model. It made the model matter more.

When AI compresses execution, the constraint moves. Writing the code stops being the bottleneck. What gets written, in what order, against what trade-offs — the planning layer — becomes the constraint instead. Bad allocation in an AI-multiplied org isn't a small drag, the way it was in the pre-AI world. It is the dominant drag, because the multiplier is now applied to the wrong work faster than ever.

This is the version of capacity planning that survives the shift: a plan that knows the unit, knows the allocation, and is honest about both.

The sheet doesn't change shape because of AI. The thing that changes is what's on the right-hand side of the table. AI absorbs more of the operational tax — recurring report generation, customer-specific configuration work, repeat escalations — and frees real capacity. That reclaimed capacity has to land somewhere on the allocation split, and most teams discover, when they actually measure it, that the savings get silently consumed by more feature work rather than the technical investment that created the leverage in the first place. Without the sheet, that consumption is invisible. With the sheet, it is a line you can defend or a line you can change. Both options beat the alternative of pretending the capacity reclaim happened uniformly across categories, because it never does.

There is a second AI shift that matters for capacity planning specifically: estimating itself has gotten faster. The historical pain of capacity planning was that the estimate column required real engineering attention — reading code, modeling integrations, sketching solutions — for every line on the roadmap, including the ones that would never get prioritized. AI removes most of that cost. A senior engineer with a current AI loadout can produce a credible developer-week estimate on an unfamiliar epic in minutes rather than hours, and a directional one in seconds. Use that.

The corollary matters more than it sounds: match estimation depth to prioritization confidence. A line item that has not been triaged through intake yet does not need a forty-five-minute estimation conversation. A directional pass — small / medium / large in developer-weeks, produced cheaply with AI assistance — is enough to support the prioritization decision. Tighter estimates earn their cost only after the work has cleared the bar to enter capacity planning. Estimating everything to the same precision is a habit from when estimation was expensive; it now produces planning theatre.

This is also where spikes earn their place back. Some work is genuinely under-specified — research-heavy investigation, integrations into systems nobody on the team has touched, requirements that need contact with reality before they can be sized. The right answer is not to estimate harder. The right answer is to fund a small exploration window — a spike — out of capacity, take its findings into the next planning conversation, and re-estimate with the new information. Spikes used to be expensive enough that teams skipped them and ate the estimation error downstream. AI compresses spike cost the same way it compresses estimation cost, which means the right-shaped capacity plan now has more spikes in it, not fewer.

The sheet

Why a spreadsheet, of all things? Because the single most important property of a capacity plan is that the numbers balance — and balancing numbers is the one thing spreadsheets exist to do. The team total ties to the per-person rows, the planned-effort total ties to the epic list, every edit propagates instantly. Slides, decks, and Notion pages cannot do that, no matter how nicely they present the result.

Most teams will not enjoy being presented a spreadsheet at a leadership review, and that is fine — build whatever presentation layer your audience needs on top of it. Just minimize the copy-paste between source and presentation, because plans move often and any part of the plan that lives outside the spreadsheet will be stale by the second week. Find the recipe that works for your org. The presentation is negotiable. The math has to balance.

The structure is intentionally simple — six sections on a single tab, in this order:

Period. Start date, end date, number of weeks computed.
Team. One row per person — name, location, role, coding-time %, time off, holidays — with calculated developer-week capacity per person rolled up into a team total.
Support allocation. A single percentage deducted from team capacity to reflect the steady operational load before any roadmap work gets funded. Set it from historical data, not optimism.
Planned Work. One row per epic — name, type (Feature work / Technical work), effort in developer-weeks, notes.
Capacity Allocation by Type. Roll-up of effort across Feature / Technical / Support / Unallocated, with a pie chart for at-a-glance visibility. This is where "are we investing where we said we wanted to invest?" gets answered without needing a separate review meeting.
Capacity Summary. Total capacity, total planned effort, remaining capacity, capacity used %. The decision surface — not a recommendation to pad.

Copy it, replace the example data with your team's, and run a planning conversation against it tomorrow. Resist the urge to add tabs — the point of one sheet is that the whole quarter fits on one screen and the trade-offs cannot hide behind a tab nobody clicks.

If you want the deeper case for why the model is shaped this way — and the rest of the operating model that surrounds it (intake discipline, planning cadence, the continuous learning loop on estimation accuracy) — that's Chapter 11 of the book, Portfolio Discipline in the AI Era. The sheet is free; the chapter is the chapter.

The agreement this approach requires

The sheet only works if a few preconditions hold. None of them are technical:

The engineering manager owns the math. Not the tool, not the PMO. The EM builds the capacity model, defends the numbers in the room where commitments are made, and updates them when reality moves. Numerical correctness is non-delegable — the math is wrong or right, not negotiated. Allocation is the next bullet, and a different question entirely.
Allocation is a collaboration that evolves. The feature / technical / support split is built jointly between the EM and the product manager, shaped by leadership and stakeholder feedback, and revisited mid-period when reality moves — which it does, often. The sheet keeps the conversation honest as the plan iterates; it does not pre-decide the answer. When the same imbalance shows up period after period — support load consistently exceeding budget, technical work consistently slipping below target — read it as a structural signal worth raising at the next planning cycle, not a one-period anomaly.
Product accepts the unit. Capacity is in developer-weeks; commitments will be expressed in developer-weeks; can we fit it? gets answered in developer-weeks. Translating in or out of story points to satisfy a parallel reporting workflow is the dysfunction this model is replacing, not a parallel track to keep running alongside it.
Everyone treats the plan as a forecast, not a contract. No estimation is failproof, and the sheet does not pretend otherwise. The point is to predict what the next period will most likely deliver — and to be more predictable over time, not perfect any single quarter. A team that lands above one hundred percent capacity with that over-commit clearly visible is doing capacity planning correctly. A team that sandbags below capacity to guarantee a green dashboard is not.

Without those, no spreadsheet survives contact with the politics. With them, this one is enough.

Close

Capacity planning is not the most exciting part of running an engineering team. Quietly, it is the part that decides whether anything else you do this quarter actually ships. Plan in real units. Make the allocation visible. Cap the buffer at the start, not the end. Then the rest of your work — the architecture choices, the AI leverage, the team rituals — gets to operate against a true picture of what's possible, instead of a fictional one held together by a velocity number nobody owns.

The sheet is here. Take it. Make it yours. And the next time someone asks "can we fit one more?" — you'll have an answer that survives the quarter, not a hope dressed up as a plan.