On Measuring Strategic Work

March 30, 2026

KPIs are simple. That is their virtue and their failure mode. A single number, a single axis, a clear direction: up is good or down is good. For operational work that is decomposable and repeatable, this works. For strategic work, simplicity becomes simplistic. The KPI flattens a multi-dimensional system into a scalar, and the scalar distorts behavior along every axis it doesn’t measure.

But the alternative has its own failure mode. Strategic work is complex: competing constraints, dependencies that cross domains, tradeoffs that shift over time. Trying to capture that complexity can easily become complicated: layers of process, frameworks, scorecards, strategy maps, cascaded objective trees. More numbers on the dashboard is not the same as understanding the system the numbers are supposed to represent.

The tension is real. Simple becomes simplistic. Complex becomes complicated. The discipline is staying in the useful middle of both.

Two Axes

There are two axes that matter here, and they are easy to confuse.

Simple vs. simplistic. Simple means capturing the essential dynamics with the minimum necessary structure. Simplistic means flattening the dynamics until the representation is easy to look at but no longer corresponds to reality. A KPI is simple when it measures something that is genuinely one-dimensional (units shipped, uptime percentage). It becomes simplistic when it’s applied to something that isn’t (strategic position, decision quality, ecosystem health).

Complex vs. complicated. Complex means the system has irreducible dimensionality: multiple competing tensions, dependencies across domains, nonlinear interactions. Complicated means you’ve added layers of abstraction that don’t correspond to anything real. A constraint map that traces the actual binding constraints on the business is complex. A balanced scorecard with 47 metrics organized into four “perspectives” is complicated. The first helps you think. The second helps you feel like you’re thinking.

The relationship is the same as the axis from confidence to arrogance to hubris. Confidence is earned by doing the work. Arrogance is confidence that has outrun its evidence. Hubris is arrogance that has stopped checking. KPIs follow the same arc: useful measurement becomes target-chasing becomes Goodhart’s Law.

Where KPIs Work and Where They Don’t

KPIs work for operational layers where work is decomposable and repeatable. Manufacturing throughput. Call center response time. Server uptime. These are genuinely one-dimensional enough that a scalar captures the thing you care about.

KPIs fail for strategic work, and the failure modes are specific.

They agree with your framing. A KPI picks one axis and declares it the thing that matters. Every other axis becomes invisible. Speed vs. quality, growth vs. margin, throughput vs. safety: the KPI picks one side of the tension and hides the tradeoff.

They measure outcomes, not the behaviors that produce outcomes. Revenue, churn, defect rate: by the time the KPI moves, the damage is done or the opportunity is past. “Leading indicators” are just more KPIs with the same problems, one layer removed.

They create misaligned incentives across organizational boundaries. Give each executive their own KPIs and you’ve structurally incentivized them to think about their piece instead of the whole. The CTO optimizes the technology stack, the GC optimizes risk posture, the COO optimizes infrastructure, and the strategic failures live in the seams between those domains. Nobody’s KPI covers the seam. Every executive reports green; the system-level problem goes unaddressed.

They create a false floor for dissent. If everyone’s hitting their KPIs, there’s social pressure not to raise the uncomfortable question. A measurement system that signals “green means don’t make waves” is directly hostile to the obligation to dissent.

And gaming is a rational response, not a pathology. If someone’s compensation or career depends on a number, they will find the cheapest way to move that number. At the executive level, the gaming is more sophisticated: it looks like risk avoidance, like not taking on ambiguous cross-functional work that doesn’t map to your scorecard, like managing to the metric when the strategically correct move would hurt your number short-term.

The obvious objection is: “we already solved this with OKRs.” OKRs are KPIs with a narrative layer. The “objective” is the qualitative direction, the “key results” are the measurable evidence. In theory this addresses the scalar problem by pairing a number with a purpose. In practice, the key results become the KPIs and the objectives become decorative. “Objective: Become the market leader in X. KR1: Hit $Y revenue. KR2: Achieve Z% market share. KR3: Launch N features.” The objective is a sentence on a slide. The key results are what people are measured on. Goodhart’s Law applies to the key results the same way it applies to any other target. OKRs also cascade down the org chart, which means they decompose strategic work into individual scorecards, which is the same structural incentive misalignment. The seams still go unowned.

None of this means KPIs or OKRs are useless. It means they are operational tools, and applying operational tools to strategic work creates the feeling of rigor while degrading the quality of strategic thinking.

The Alternative (and Its Own Failure Mode)

The alternative is not “no measurement.” It is richer situational awareness. But it has to stay on the right side of the complex/complicated line, or it fails the same way KPIs fail (people stop engaging with it).

Constraint maps instead of scorecards. The binding constraints on the business (delivery timelines, customer concentration, capital structure flexibility, talent pipeline, regulatory exposure) cross every executive’s domain. No one owns a binding constraint individually. The operating review asks: which constraints loosened this quarter, which tightened, which ones are about to bind? A constraint that is about to bind is more strategically important than any KPI that is tracking green. The failure mode: the constraint map becomes a 200-line spreadsheet that nobody reads.

Scenario narratives instead of targets. Instead of “hit $X revenue this quarter,” the executive team maintains 3-5 named scenarios with explicit assumptions. The operating review asks: which scenario are we tracking closest to, what changed, and what would have to be true for us to shift to a different one? This preserves the multi-dimensionality that a target destroys. It also forces the team to name their assumptions rather than hide them inside a number. The failure mode: the scenarios become stale narratives that nobody updates, and the team defaults to tracking the one closest to the budget.

Decision logs instead of dashboards. The most valuable executive output is decisions under ambiguity. Track the decisions, the information available at the time, and the reasoning. Review them periodically against what actually happened. This is how you develop institutional judgment. The failure mode: the decision log becomes a CYA archive rather than a learning tool.

Tension management instead of optimization. Name the competing tensions explicitly: speed vs. quality, growth vs. margin, build vs. buy, short-term revenue vs. long-term positioning. These are not problems to solve. They are tensions to manage. The operating review asks: where did we make a tradeoff this quarter, was it the right one given what we knew, and has anything changed that should shift where we sit? The failure mode: the tensions get named once, put on a slide, and never revisited.

Seam ownership. The strategic failures live between executive domains, not within them. Name the seams: the handoff between sales and services, the interaction between capital structure and growth rate, the dependency between technology decisions and regulatory exposure. Assign seams to the executive team collectively. The operating review asks: what happened in the seams? The failure mode: collective ownership becomes no ownership.

Exception-based narrative reporting. Replace the dashboard review (“here are all my numbers, they’re all green”) with exception-based reporting: here’s what changed, here’s what’s at risk, here’s what I need from the team. If nothing changed and nothing is at risk, the report is short. The time goes to the exceptions, which is where the strategic risk lives. The failure mode: the narrative becomes performative (“everything is fine, here is my one carefully chosen exception that makes me look proactive”).

Every one of these alternatives is better than a KPI for strategic work. Every one of them can degrade into its own version of the same problem if the organization doesn’t maintain the discipline to use it honestly.

Computational Strategy

What makes this operationally feasible now is that you can model the actual complexity without it becoming complicated. You can run constraint analysis, scenario simulations, sensitivity tests, and dependency traces with tools that produce deterministic output. The simulation gives you numbers you can trust (because they come from code, not from a model’s interpolation). The frontier model gives you interpretation (what do the numbers mean, what’s inconsistent, what should we run next). The executive gives you judgment (given all of this, what do we do).

That is the pipe pattern: computation produces deterministic output, reasoning produces interpretation, humans produce decisions. None of those layers replaces the others.

The discipline is keeping it simple rather than simplistic, complex rather than complicated. A constraint map with 5 binding constraints is useful. A constraint map with 50 is a balanced scorecard by another name. Three named scenarios with explicit assumptions are useful. Fifteen scenarios covering every possibility are a planning exercise that produces no decisions. The same Goodhart’s Law that corrupts KPIs will corrupt any alternative if you let the measurement system grow until it becomes the point rather than the instrument.

Where Multi-Model AI Fits

The historical problem with richer situational awareness is that it degrades over time. The constraint map goes stale because nobody updates it. The scenarios drift because nobody checks the assumptions against incoming data. The decision log becomes an archive because nobody reviews it. The framework works for two quarters and then becomes performative, which is the same failure mode as KPIs.

Multi-model AI changes the maintenance economics.

Constraint monitoring. Models connected to data feeds can track which constraints are tightening or loosening without someone manually updating a spreadsheet. The constraint map stays current because the system is watching the inputs. A constraint that was slack last quarter and is now approaching its bound gets flagged before an executive notices it in a meeting.

Scenario tracking. Given named scenarios with explicit assumptions, models can flag when incoming data shifts which scenario the business is tracking closest to. “The assumptions behind Scenario B just broke: here’s what changed.” This is exception-based by design. No signal when nothing changed. A clear signal when something did.

Decision log analysis. A model reviewing a decision log against subsequent outcomes can surface patterns in decision quality over time. Where does the team systematically underweight risk? Where do they consistently make good calls under ambiguity? This is the institutional judgment development that no KPI captures, and it is the kind of longitudinal analysis that humans do poorly because the feedback loop is too long.

Seam detection. Models reading across domains (the financial model, the delivery pipeline, the regulatory filings, the customer data) can identify interactions between domains that no individual executive is watching. The dependency between capital structure decisions and delivery timelines is a seam that lives in data across multiple systems. No one person holds all of it in their head. A model connected to all of the systems can.

Stress-testing the framework itself. Point a model at your constraint map or scenario set and ask: where is this framework becoming complicated without being useful? Which constraints haven’t changed in 6 months and might be stale? Which scenario has assumptions that contradict each other? The measurement system needs to be measured too, and models are good at this kind of internal consistency check.

I should be honest about this: what I just described is very hard to do. Wiring up constraint monitoring across data feeds, scenario tracking against incoming data, decision log analysis over time, seam detection across organizational systems: that is a significant system to build, to maintain, and to keep from becoming its own version of complicated. It can get complicated fast. It requires someone who can design the system, keep it honest, and resist the organizational gravity that turns every framework into a ritual. Nobody does this well right now. I have built pieces of it. I have not seen anyone do the whole thing at organizational scale.

But this is also where the trajectory of AI capability starts to matter. If you think about what OpenAI calls “L5” (AI doing the work of organizations, not just individuals), then the constraint monitoring, scenario tracking, seam detection, and framework stress-testing described above is exactly the kind of cross-domain, longitudinal, exception-based work that organizational-level AI would need to do. The reason nobody does this well today is that it requires sustained attention across many systems over long time horizons, which is what organizations are bad at and what AI systems are getting better at. The question is not whether this is feasible today at full scale (it is not). The question is whether you start building the pieces now so that as the capability arrives, your organization has the structure to use it.

The models don’t replace the executive judgment. They keep the situational awareness current, surface the exceptions, and stress-test the framework so it doesn’t degrade into the same performative ritual that KPIs become. The executive still makes the call. The system makes sure the call is informed by the actual situation rather than by a dashboard that stopped being updated three months ago. But building that system is itself strategic work, and it is early.

The Discipline

The most valuable thing an executive does is make judgment calls under ambiguity. No KPI captures decision quality. No dashboard captures strategic position. No framework captures the irreducible complexity of running a business. But you can build tools that make the complexity legible without flattening it, keep them current with models that watch the inputs, and run operating reviews that engage with the actual situation rather than performing engagement with a set of green lights.

The question is whether the organization has the discipline to stay in the useful middle: simple enough to actually use, complex enough to actually represent the business, and honest enough to surface the things that are going wrong.