On Computational Strategy: The Transition from Narrative to Computation

Part 9 in a series. Previously: Model Eats the Software (Part 8), The Confident Incompetence Problem (Part 6), Confidence All the Way Down (Part 6b), On Keeping AI in the Critical Path (Part 7), The Disintermediation Principle (Part 5), Zen of Unix Tools (Part 4).


The Transition

Strategy has been narrative. Consultants build slide decks. Analysts build spreadsheets. Assumptions are buried in cell formulas. The model is a black box. Strategy is a narrative defended in a boardroom, and when new information arrives it takes weeks to update because the entire artifact chain — data collection, spreadsheet, PowerPoint — has to be rebuilt by hand.

Computational strategy replaces the narrative with computation:

Narrative StrategyComputational Strategy
Consultants build slide decksEvery assumption is a cited claim with a range
Analysts build spreadsheetsEvery relationship is an executable equation
Assumptions buried in cell formulasEvery scenario is a parameter sweep
Strategy is a narrative defended in a boardroomFrontier models formulate, solve, and explain
Model is a black boxEvery layer is auditable and traceable
New information → weeks to updateNew information → minutes to recompute
Sensitivity: manual, if at allSensitivity: automated across all inputs
Scenarios: 3 tabs labeled Base/Bull/BearScenarios: continuous, Monte Carlo-sampled
Data: analysts manually research, compile, and reconcileData: hundreds of external APIs, MCP servers, open-source datasets queried programmatically
Humans buried in data collection and artifact creationHumans do enhanced oversight because one person can see it all

This stands on three legs: Quantitative + Data + Design. Quantitative alone is a spreadsheet with better inputs. Add data and you have a model. Add design and you have a system that reasons, recomputes, and adapts. This is the same transition that quantitative finance made in capital markets — the firms that got there first captured outsized returns.


The Pipe

The working pattern is simple: write a simulation in Python, run it, pipe the output into a reasoning engine, reason about the results, adjust, run again.

Human + Agent --flags, --options and --arguments | \
python sim.py --scenario base | reasoning | \
python sim.py --adjust | reasoning

If you squint, this is a Unix pipeline where the pipe between programs is a frontier model.

The Python produces deterministic output. Monte Carlo samples from Tri(low, mid, high), propagated through financial algebra, reported as percentiles. The model produces probabilistic reasoning — interpretation, cross-validation, identification of what the numbers mean and what to run next. Neither replaces the other. The simulation can’t interpret its own output. The model can’t produce consistent numerical results. The pipe between them is where the work happens.


Why Python, Not the Model

When I need 50,000 Monte Carlo samples propagated through a Leontief production function, I don’t ask a model to reason about probability distributions. I write numpy code that samples from scipy.stats.triang, propagates through vectorized algebra, and reports P5/P25/P50/P75/P95.

The model cannot do this. Not because it’s bad at math — because deterministic, reproducible numerical computation requires deterministic, reproducible tools. scipy.optimize.brentq finds the IRR. QuantLib builds the yield curve. sympy proves the leverage identity algebraically. These produce answers that are verifiable, not probable.

This is the ground truth layer. When the model says “the IRR looks wrong,” the response isn’t to argue — it’s to run brentq and check. When the model says “the instability condition holds,” the response is to compute α_eff / (η_eff + β_eff) and see if the ratio exceeds 1.1. Reality is the axiom.

The full quantitative stack exists because each package fills a specific role:

  • numpy/scipy: Foundation. Array math, distributions, optimization, root-finding.
  • QuantLib: Yield curves, bond pricing, levered cash flow modeling. The kind of financial computation that has to be exactly right.
  • sympy: Algebraic verification. Prove that the equations are consistent, not just that the numbers look plausible.
  • networkx: Equation dependency DAGs. When you have 166 equations across 7 modules, you need topological sort to know what feeds what.
  • matplotlib: The output. Histograms with KDE overlays, fan charts with P5-P95 shaded bands, tornado sensitivity charts.
  • PyMC (when ready): Bayesian inverse inference. The opposite direction from everything above.

These are forward generative models — sample from distributions, propagate through algebra, report percentiles. The model’s job is to reason about the output, not replicate it.

The Forward-to-Inverse Transition

Every computational strategy project starts forward. You don’t have data yet. You have assumptions — expert judgment, industry benchmarks, published ranges. You encode these as triangular distributions: construction cost is Tri(800, 1000, 1400) $/kW, power price is Tri(45, 55, 70) $/MWh. You propagate forward through the equations and get output distributions. This is honest about what you know: you’re exploring the consequences of your assumptions, not estimating ground truth.

The forward model answers: given what I assume, what happens?

As the project runs, data accumulates. The first building completes — actual CAPEX was $1,150/kW. A second comes in at $980/kW. Quarters of EBITDA arrive. Deals close with known entry and exit multiples. Each observation is a data point that the forward model didn’t have.

This is where the direction reverses. Bayesian inference asks the opposite question: given what I observed, what should I believe about the parameters? Instead of sampling construction cost from Tri(800, 1000, 1400) and propagating forward, you feed the observed costs into PyMC and infer a posterior distribution for what construction cost actually looks like in your portfolio. The prior (your original assumption) gets updated by the likelihood (what the data says). The posterior is sharper than either alone.

The shift matters because the forward model’s uncertainty is bounded by the width of your assumptions. If you assumed Tri(800, 1000, 1400), the output uncertainty reflects that full range no matter how many simulations you run — 50,000 Monte Carlo samples don’t make a wide assumption narrower. Bayesian inference with real data actually reduces the uncertainty. Ten buildings with realized CAPEX produce a posterior that’s tighter than any expert’s triangular distribution.

But the transition has a trigger condition. Bayesian inference with two data points is worse than a well-reasoned forward model — the posterior is dominated by the prior anyway, and you’ve added complexity without information. The triggers are specific:

TriggerWhat changesWhy that threshold
10+ buildings with realized CAPEXHierarchical cost model replaces Tri assumptionsEnough variation to estimate building-level vs portfolio-level effects
8+ quarters of EBITDABayesian state-space model for revenue dynamicsEnough time series to separate trend from noise
5+ portfolio deals with entry/exitHierarchical return modelEnough deals to estimate return distribution, not just point outcomes
20+ DC M&A transactionsCalibrated valuation multiple posteriorEnough comps to build a meaningful market distribution

Until those thresholds are met, the forward generative model is the correct tool. Saying “not yet” to Bayesian inference is a feature, not a limitation — it prevents you from building a sophisticated statistical model on top of insufficient data, which is the quantitative equivalent of the confident incompetence problem.


The L2/L3/L4 Framework

I work with a constraint model for AI infrastructure that classifies AI capability levels. OpenAI’s L1-L5, DeepMind’s L1-L5. When I apply these levels to the human-AI working relationship itself, something clarifying falls out.

The model is L2 and L3. L2: it reasons. It can hold the entire equation system in context, trace dependency chains across modules, check whether conditions hold, identify structural patterns. L3: it acts. It writes Python, runs simulations, calls other models for cross-validation, submits batch jobs, reads results.

The human is L4. The human innovates. The human decides that AI infrastructure is a Leontief production function. The human identifies the 1,000x ratio between compute energy and data movement energy as the fundamental physical fact driving chip architecture. The human writes the instability theorem. The human structures the epistemological layers so that L1 constraints (physics) can’t be overridden because physics doesn’t negotiate.

None of that comes from a model. It comes from understanding data centers, power systems, capital markets, and thermodynamics deeply enough to see the structure.

The division of labor:

The human provides constraints — the axioms, the judgment about what matters, the innovation about how to model it. The model provides reasoning and execution at scale — holding the full corpus in context, running simulations, tracing every dependency chain, cross-validating across multiple models.

This maps directly to the confident incompetence problem. The failure mode is when the model tries to be L4 — proposes a new modeling framework, suggests replacing the instability theorem with something “more sophisticated,” recommends a Bayesian approach before there’s enough data to calibrate it. That’s the model architecting itself out of the critical path.

The discipline is keeping the model at L2/L3. Reason about my model, don’t replace it. Execute against my axioms, don’t propose new ones. When something needs to be deterministic, write Python. When you’re uncertain, say so. When you need a second opinion, call another model.


Everything is Multi-Model

The reasoning engine in the pipe isn’t a single model. It’s a primary model (Claude) with tool access to other frontier models (GPT-5.2-Pro, Gemini 3.1 Pro, Voyage) via MCP.

This is not an ensemble. Not a consensus system. Not a pipeline. Claude decides when it needs a second perspective. GPT-5.2-Pro reasons differently — it might catch an assumption Claude missed. Gemini 3.1 Pro might validate a result from a different angle. Voyage produces embeddings for semantic search over the equation corpus. The parallel_query tool runs both simultaneously and Claude synthesizes.

The key: there’s no orchestration layer. No routing logic. No pre-wired pipeline. Claude uses other models the way a human calls a colleague — when it judges it needs to, not because a workflow mandates it. When a new model drops, Claude can use it immediately. Swap one model ID in the config. The infrastructure is plumbing. The reasoning stays in the critical path.

Vertex AI extends this to compute. Claude can submit batch prediction jobs, launch Docker containers on GPU/TPU, check on running work. The model doesn’t just reason about what to run — it runs it. The “architectural cure is execution” from the confident incompetence essay, implemented as infrastructure.


MCP is the Architecture

MCP (Model Context Protocol) is what makes this composable.

Traditional app: human → UI → business logic → database. The business logic is frozen reasoning. When the business changes, the software doesn’t.

MCP app: human → model → tools (via MCP) → data/compute/other models. The model IS the application. MCP servers provide capabilities — access to data, access to compute, access to other models. They contain no logic. The reasoning about when and how to use that access lives in the model.

My setup is six MCP servers: three Google official (gcloud, observability, storage) for GCP infrastructure, three custom (multimodel for AI APIs, serverless for Supabase edge functions, vertex for Vertex AI jobs). Same .mcp.json works on my laptop, in WSL, or on a headless GCP VM.

The Claude Agent SDK makes this programmable. Claude Code is the interactive version — I’m in the loop. The Agent SDK lets me spin up purpose-built agents: “Claude with exactly these tools, this context, this task, budget limit, run until done.” Multiple agents, different tool sets, different jobs. The headless GCP VM is the compute surface — SSH in from any machine, tmux keeps agents running while I travel.

The philosophy at every layer: the model reasons, infrastructure provides capabilities, nothing intermediates. When models get better, everything built this way gets better automatically.

There’s a security implication worth stating. Traditional enterprise security is syntactic — WAFs pattern-match request shapes, ACLs check role membership, DLP scans for regex. An attacker who knows the patterns can craft requests that pass every check while being obviously malicious to a human reading them. A reasoning engine mediating every data access through MCP understands what the request means. Every capability is typed and bounded. Every tool call carries context about why it’s being made. The audit trail includes reasoning, not just access logs. And when models improve, threat detection improves automatically — no manual rule updates, no security team playing catch-up with novel attack patterns.

The honest caveat: prompt injection is the inverse of this. The same reasoning engine that catches semantic threats can itself be manipulated by adversarial inputs. The model that understands “this query doesn’t make sense for this user’s role” can also be tricked into believing a malicious instruction is legitimate. You still need the traditional layers underneath — this is defense in depth, not replacement. But the semantic layer is the one that’s been missing from enterprise security, and it’s the one that scales with model capability rather than with headcount.


The Workbench

The practical structure for computational strategy work:

work/
├── analysis/ # Forward generative models — scenario analysis
│ ├── input/ # Scenario definitions, assumptions
│ ├── output/ # Results, figures, reports
│ └── sim/ # Simulation code
├── deals/ # Deal-specific analysis
│ ├── input/ # Deal terms, projections
│ ├── output/ # Deal memos, comparisons
│ └── sim/ # Deal-specific models
├── portfolio/ # Portfolio-level analysis (PyMC when justified)
├── optimizer/ # Deterministic optimization (cvxpy, scipy)
│ ├── data.py
│ ├── model.py
│ ├── precompute.py
│ └── solve.py
├── equations/ # Knowledge accumulates as you go
│ └── equations/
│ ├── physics/
│ ├── dynamics/
│ ├── value/
│ ├── instability/
│ ├── project/
│ ├── reliability/
│ ├── topology/
│ ├── registry.py
│ └── types.py
└── data/
├── public/ # Market data, public filings, benchmarks
└── proprietary/ # Internal data, deal data, portfolio data

Each sim/ directory contains forward generative models. Sample from distributions, propagate through algebra, report percentiles. The model reasons about the output. Adjust. Rerun.

equations/ is the knowledge layer. Equations are collected and codified as the work progresses. Each has typed inputs, epistemological layer tags (L1 constraint through L5 scenario), dependency chains. The equation registry provides topological sort so the model can trace what feeds what.

portfolio/ stays empty until the triggers justify Bayesian inference: 10+ buildings with realized CAPEX, 8+ quarters of EBITDA, 5+ deals with entry/exit. Until then, saying “not yet” is the correct answer. This is the L4 judgment that prevents the model from confidently proposing a hierarchical Bayesian model you can’t calibrate.


The Test

For any computational strategy workflow, ask:

  1. Is the deterministic work in Python? If the model is producing numbers instead of numpy, something is wrong.
  2. Is the model reasoning or performing? If it’s proposing to replace your model with something more sophisticated, it’s trying to be L4. Keep it at L2/L3.
  3. Can you verify the output? If the model says the IRR is 18%, can you run brentq and check? If not, the ground truth layer is missing.
  4. Does a better model make this better? If yes, the model is in the critical path. If no, you’ve frozen reasoning into code and the model is decoration.
  5. Are multiple models available? If the reasoning engine is a single model with no cross-validation, you’re trusting one perspective on a probabilistic question.

The goal isn’t to build systems that use AI. It’s to build a working practice where the model reasons honestly, Python computes deterministically, and the human provides the constraints that make the whole thing worth running.


What This Actually Looks Like

Monday morning. I need to evaluate whether a 200 MW campus in Texas makes sense at current power prices given a 36-month construction timeline.

I describe the scenario. Claude sets up the Monte Carlo — power cost as Tri(45, 55, 70) $/MWh, construction timeline as Tri(30, 36, 48) months, GPU refresh cycle as Tri(18, 24, 30) months. Propagates through the master production function, the capacity PV, the levered NPV. 50,000 samples.

The simulation runs. Python produces P5/P25/P50/P75/P95 for IRR, MOIC, NPV. A tornado chart shows sensitivity. Grid interconnection delay dominates.

Claude looks at the output. “The IRR distribution is bimodal — there’s a cluster around 12% and another around 22%. The split correlates with whether GPU refresh happens before or after the construction timeline crosses 40 months. The delay scenarios where both grid interconnection and GPU refresh slip are the tail risk — P5 IRR goes to 4%.”

I ask: “What does the instability theorem say about the demand side at this scale?”

Claude traces through the dynamics module: α_eff (latent demand growth), η_eff (efficiency gains), β_eff (buildout rate). At 200 MW, we’re in R1 territory. The gap is currently diverging — α_eff > 1.1 × (η_eff + β_eff). Pricing power persists through the construction timeline.

I adjust: “Run it with the R2 transition happening at month 24 instead of month 36.”

Claude adjusts the scenario, reruns. The IRR distribution shifts — the bimodality disappears because the GPU refresh timing no longer creates a cliff.

This is the pipe. Python → reasoning → Python → reasoning. Each cycle tightens the analysis. The sim produces the numbers. The model produces the interpretation. I provide the constraints and decide when the analysis is done.

That’s computational strategy.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Jason A. Hoffman

Subscribe now to keep reading and get access to the full archive.

Continue reading