AI Infrastructure: Balancing Constraints and Displacement Risks

February 24, 2026

Two recent pieces frame the AI infrastructure debate from opposite poles. Citadel Securities published “The 2026 Global Intelligence Crisis”, arguing that AI displacement risk is overstated and adoption follows predictable S-curves. Citrini Research published “The 2028 Global Intelligence Crisis”, modeling a structured bear case where rapid white-collar displacement triggers a deflationary spiral through consumer spending, real estate, and private credit.

They’re both partially right. The interesting question is where the binding constraints actually sit.

The Citadel Argument: Constraints as Governor

Citadel’s core claim is empirical: AI capex is running at 2% of GDP (~$650 billion), software engineering job postings are up 11% year-over-year, unemployment sits at 4.28%, and ~2,800 data centers are planned across the US. The data, they argue, shows acceleration of investment without acceleration of displacement.

Their theoretical frame is that displacing white-collar work “would require orders of magnitude more compute intensity than current level utilization,” and that as automation expands, marginal compute costs rise — potentially exceeding labor costs. This is a supply-side argument: physical infrastructure is the governor on how fast AI can displace anything.

This is correct as far as it goes. The 20-year compute-vs-memory scaling divergence tells the story quantitatively: compute has improved 60,000x while memory bandwidth improved only 100x. That 600:1 gap means that even as FLOPS get cheaper, the system’s ability to use them is increasingly constrained by how fast data moves. During memory-bound inference, GPUs sit 97-99% idle. The compute is there; the data path isn’t.

The memory wall shows up in three dimensions:

Capacity: HBM per GPU has grown from 80GB (H100) to 192GB (B200), but model sizes and context windows are growing faster. The KV cache ceiling equation — N_concurrent = (HBM - W_model - overhead) / KV_per_session — shows that long-context inference directly trades concurrent users against memory. More context means fewer simultaneous sessions per GPU.

Bandwidth: HBM3 delivers 819 GB/s per stack, HBM3e hits 1,200 GB/s, and HBM4 (mass production just started at Samsung on February 12, 2026) targets 1,600-2,000 GB/s. Memory bandwidth grows ~1.5x per generation while model parameter counts grow 4-10x per cycle. The gap factor is ~3x and widening.

Supply concentration: SK Hynix holds 57-62% of the HBM market, Samsung 33%, Micron 5%. Roughly 80% of global HBM supply originates in South Korea. HBM demand is growing 110-150% year-over-year. SK Hynix is already pricing HBM4 at $500-550/unit (50%+ premium over HBM3e) and hiking HBM3e prices 20% for 2026 orders. Memory now accounts for nearly 80% of GPU manufacturing cost.

So Citadel is right that physical constraints govern the pace. But “governed” is not “prevented” — it means the timeline is a function of infrastructure buildout, not an indefinite deferral.

The Citrini Argument: Concentration Creates Fragility

Citrini’s scenario analysis operates on the demand side. Their key structural observation: white-collar workers represent 50% of US employment but drive ~75% of discretionary consumer spending. The top 10% of earners account for over 50% of all consumer spending; the top 20% account for ~65%.

This concentration means AI displacement has an asymmetric macro impact. Displacing a $180,000/year product manager who drops to $45,000 driving for Uber doesn’t just reduce one person’s spending — it hits the consumption tier that supports services, real estate, and discretionary goods disproportionately.

Citrini traces the transmission channels:

Labor share of GDP has already declined from 64% (1974) to 56% (2024). Their scenario projects 46% by 2028 — a 10-percentage-point drop in 4 years that would be unprecedented (the prior 8pp drop took 50 years).
Real estate: Home values fall 8-11% YoY in tech-heavy metros (SF, Seattle, Austin). Early-stage mortgage delinquencies rise in ZIP codes with >40% tech/finance employment. The US residential mortgage market is ~$13 trillion.
Private credit: The market grew from under $1 trillion (2015) to over $2.5 trillion (2026). PE-backed software companies are the exact cohort most exposed to AI displacement. Their scenario has Moody’s downgrading $18 billion in PE-backed software debt across 14 issuers.
Fiscal: Federal receipts run 12% below CBO baseline as payroll tax revenue falls, unemployment benefits surge, and capital gains evaporate.

The scenario’s most provocative data point: median US individual token consumption reaches 400,000 tokens per day by March 2027 — a 10x increase from end of 2026. This implies agentic AI has moved from developer tooling to mass consumer adoption (agentic commerce, automated financial planning, AI-driven real estate transactions compressing buy-side commissions from 2.5-3% to under 1%).

India’s IT services sector, exporting over $200 billion annually from companies like TCS, Infosys, and Wipro, represents the international transmission channel. These firms are heavily exposed to AI displacement of routine software development, QA, and business process outsourcing. Citrini’s scenario has the rupee falling 18% in four months.

Where the Arguments Intersect

The interesting thing about reading these together is that they’re not actually contradicting each other — they’re arguing about timing.

Citadel says: the constraints are real, adoption is slow, and the historical pattern is that productivity shocks expand consumption rather than collapse it (Keynes predicted a 15-hour workweek by now; instead, wants expanded).

Citrini says: if adoption does inflect — specifically, if agentic AI crosses from augmentation to displacement for white-collar work — the macro transmission channels are concentrated enough to create correlated losses across asset classes.

Both positions are consistent with the same underlying physics. The question is whether the S-curve inflects before or after the economy can absorb it.

What the Constraints Model Shows

Working through the infrastructure math reveals several things both pieces understate:

The workarounds are real and compounding. Quantization (FP16 to INT4 gives ~4x effective memory capacity), Multi-Latent Attention (4-8x KV cache compression), Grouped-Query Attention (4-8x fewer KV heads), PagedAttention (60-80% waste reduction), and speculative decoding (~5x effective batch size) compound to roughly 15-30x effective memory efficiency improvement vs. naive FP16 multi-head attention. These aren’t theoretical — they’re in production. They buy time that neither article fully accounts for.

The economic constraint may bind before the physics constraint. Memory is already 80% of GPU cost with concentrated supply and rising prices. The constraint may manifest less as “we can’t build enough” and more as “inference costs hit a floor set by memory economics.” This determines who can afford to deploy AI at scale, not whether deployment is physically possible.

The 400,000 tokens/day median projection deserves scrutiny. Current observed knowledge worker consumption is ~100,000-200,000 tokens/day, with power users hitting 130-175 million tokens/day at rate limits. Token consumption follows a power law — the top 1% consume 50-90% of total tokens. A 4x jump in the median in ~3 months would require agentic AI to move from early adopters to mass market essentially overnight. Possible at an inflection point, but aggressive.

The displacement-as-deflation argument has a missing term. Citrini models the demand destruction from displacement but understates the supply-side response: cheaper intelligence creates new categories of demand. Every previous general-purpose technology (electricity, computing, internet) followed this pattern. The question isn’t whether new demand emerges — it’s whether the transition period creates enough correlated stress to trigger a financial crisis before equilibrium is reached. Citrini’s argument is really about transition dynamics, not steady state.

Hyperscaler capex at $150-200 billion per quarter creates its own demand floor. Even in a displacement scenario, the infrastructure buildout itself is a massive economic input — construction labor, electrical equipment, land acquisition, cooling systems. This partially offsets white-collar displacement with blue-collar and skilled-trades demand, though the geographic and demographic overlap is limited.

The Binding Constraint Hierarchy

If you stack the constraints in order of which actually binds first:

Power delivery and grid interconnection — 18-36 month lead times, limited by transmission capacity and utility planning cycles. This is the hardest physical constraint and the one least amenable to software workarounds.
Memory economics — HBM supply concentration, rising prices, and 80% cost share create a price floor on inference. Software efficiency gains (quantization, MLA) relax this but can’t eliminate it.
Memory bandwidth — The 600:1 compute-vs-memory divergence means bandwidth constrains token throughput regardless of available FLOPS. Each HBM generation improves ~1.5x; each model generation demands 4-10x more.
Organizational adoption speed — Citadel’s S-curve argument. Enterprises move slowly. Regulatory, compliance, and integration costs create friction independent of technical capability.
Labor market adjustment speed — The variable Citrini is modeling. If constraints 1-4 relax faster than the labor market can absorb displaced workers, you get their scenario.

The current equilibrium has constraints 1-3 binding hard enough to keep displacement gradual. The risk case is that efficiency improvements (quantization, architectural innovation, next-gen HBM) relax constraints 2-3 faster than expected while power buildout (constraint 1) continues at pace — creating a window where capability outpaces absorption.

Bottom Line

Citadel is right that today’s data shows no crisis. Citrini is right that the structural concentration of consumer spending makes white-collar displacement uniquely dangerous if it accelerates. The memory wall is real but being actively eroded by multiple compounding workarounds. The binding constraint hierarchy suggests power delivery, not memory, is the ultimate governor — but memory economics (not memory physics) may determine who can afford to deploy at scale.

The answer to “is DRAM the real blocking point?” is: it’s a blocking point, it’s being worked around faster than raw hardware specs suggest, and the scarier version of the argument isn’t about capacity or bandwidth at all — it’s about supply concentration and pricing power determining the cost floor of intelligence.

Watch HBM pricing and SK Hynix’s capacity announcements more closely than FLOPS benchmarks. The economics of memory, not the physics of compute, increasingly set the terms.

Jason A. Hoffman

The Memory Wall, the Displacement Clock, and What Actually Binds

The Citadel Argument: Constraints as Governor

The Citrini Argument: Concentration Creates Fragility

Where the Arguments Intersect

What the Constraints Model Shows

The Binding Constraint Hierarchy

Bottom Line

Like this:

Leave a ReplyCancel reply

The Memory Wall, the Displacement Clock, and What Actually Binds

The Citadel Argument: Constraints as Governor

The Citrini Argument: Concentration Creates Fragility

Where the Arguments Intersect

What the Constraints Model Shows

The Binding Constraint Hierarchy

Bottom Line

Like this:

Leave a ReplyCancel reply

Discover more from Jason A. Hoffman