May 7, 2026 · 25 min LONG READ #AI#Nvidia#strategy#infrastructure#Jensen Huang#inference

Pareto Frontiers and Lock-In: When Mathematical Sub-Optimisation Is the Right Strategy

What the inference market and Nvidia's no-surge-pricing tell us about strategic decisions in multi-dimensional spaces.

Reading Nvidia: On strategy, defensibility, and capital in the AI era · PART 3 OF 7

Part 0 · Series overview 6 min · Apr 2026
Part 1 · Nvidia, or the Repricing of a Watt 18 min · May 2026
Part 2 · The Moat That Cannot Be Coded 24 min · May 2026
Part 3 · Pareto Frontiers and Lock-In 25 min · May 2026 you are here
Part 4 · The Return of the Real World upcoming
Part 5 · The Job and the Task upcoming
Part 6 · A Note on China upcoming
Part 7 · Drawing the AI Stack upcoming

A recurring puzzle in the analysis of successful companies is that some of their decisions look, by the standard tools of microeconomics, irrational. Apple refuses to license iOS, leaving billions in licensing revenue on the table. Costco caps its margins at 14% on most categories, refusing the price discrimination its market position would allow¹. Berkshire Hathaway holds cash for years rather than deploying it into available targets. And Nvidia, currently sitting on a generational shortage of GPUs, refuses to do surge pricing.

Each of these decisions, examined in isolation, looks like a transfer of wealth from the company to its counterparties. Apple gives up licensing fees. Costco gives up margin. Berkshire gives up returns. Nvidia gives up the rent that scarcity would normally allow it to extract. By the static maximisation logic that founds standard microeconomics, all of these are mistakes.

And yet, when these companies are studied across decades, the same decisions reappear as the source of their durable advantage. They are not mistakes. They are the visible signs of a different optimisation problem being solved. One in which the function being maximised has more dimensions, a longer time horizon, and a different game structure than the local trade looks like.

This entry tries to make that distinction explicit, using two observations from the recent Jensen Huang interview as illustrations: (1) the segmentation of the inference market into Pareto-frontier endpoints², and (2) Nvidia’s refusal to do surge pricing despite running a structural shortage. Examined naively, both look like mathematical sub-optimisations. Examined with the right tools, both are the rational answer to a richer problem. Together they make a broader point: the difficulty of strategy is almost never in the optimisation. It is in correctly specifying the function being optimised³, a point that connects to a separate thread we have been developing on Daniel Andler’s distinction between Problem and Situation: a Problem is a question already posed and ready to be optimised, a Situation is the upstream act of formulating which question to pose at all.

In this part

What a Pareto frontier is, and why it shows up everywhere
When local optimisation fails, and why no-surge-pricing is the perfect example
The general lesson on strategy
Conceptual Toolbox

What a Pareto frontier is, and why it shows up everywhere

The starting point is one of the most useful tools for thinking about decisions that involve more than one dimension at once. It comes from economics, has spread to engineering, machine learning, and operations research, and has come back into strategic thinking as the canonical way to handle trade-offs.

The concept comes from Vilfredo Pareto, an Italian economist writing at the end of the 19th century. In his Manuale di economia politica (1906), Pareto defined what he called an optimum (now called a Pareto optimum): an allocation such that no further improvement is possible in any dimension without degrading at least one other dimension. Around any such optimum, you can draw the set of all such optimal points: the Pareto frontier⁴. Every point on the frontier is, in some sense, “the best you can do” given a particular trade-off; what is unambiguously bad is any point below the frontier, because you could move towards the frontier without giving anything up.

The idea is simple, and its power is that it works for any set of dimensions, in any domain.

One of the most famous applications is in finance. Harry Markowitz, in his 1952 paper Portfolio Selection, applied the concept to the choice of an investment portfolio. The two dimensions are expected return and risk (measured as the variance of returns). For a given universe of assets, there exists a curve in the return-risk plane that traces all possible efficient portfolios: those for which you cannot increase return without increasing risk, or decrease risk without decreasing return. This is the efficient frontier. It is at the foundation of all modern portfolio theory. Markowitz received the Nobel Prize in 1990 for the work. William Sharpe (Nobel 1990) extended the framework with the Capital Asset Pricing Model in 1964. The vocabulary became canonical in finance, then in economics, then in everything else. Practical consequence everyone has lived with: this is why every financial advisor begins by asking how much risk you are willing to take before recommending anything. The advisor is locating you on the frontier; the recommendation follows.

The lesson Markowitz forced into financial reasoning deserves attention. Once you accept the frontier, no portfolio is “better than another” in the absolute. It depends on your preference for risk. A retired pensioner sits on a different point of the frontier than a 30-year-old saver. The two are both rationally optimised, with different trade-off weights. Calling one of them “wrong” reveals a confusion between the optimum (which depends on the optimiser) and the frontier (which does not).

Now bring this back to AI inference. The recent Huang interview reveals an interesting fact: Groq⁵ has been folded into the CUDA⁶ ecosystem. The reason is that the inference market has spread itself along a continuum, with two extreme regimes at its endpoints that until recently sat inside a single undifferentiated market. The reality is a gradient, not a binary, and the gradient is what creates the Pareto frontier.

On one end, premium agents that emit tokens worth real wages: a token generated for an autonomous coding agent that operates over hours, an autonomous lawyer assistant that drafts contracts, a research agent that synthesises terabytes of documents. These tokens are worth dollars individually. They need low latency and tolerate high cost per token. On the other end, bulk inference workloads: chatbots at scale, classification tasks, simple generations across millions of users. Those tokens are worth fractions of a cent. They need high throughput and tolerate higher latency. Between the endpoints, the entire continuum of mid-value workloads sits at intermediate points on the curve.

The Pareto frontier of inference: why a token at $0.001 and a token at $1.00 need different silicon

This is exactly a Pareto frontier. The two axes are throughput and latency. On the frontier, you cannot simultaneously maximise both: faster latency requires sacrificing throughput, and vice versa. What changed in the past year is not the shape of the frontier; it is that both ends of the curve became commercially viable at the same time, because the value of a token now spans four orders of magnitude depending on what work it is doing.

By integrating Groq, Nvidia commits to investing across the entire frontier rather than picking a single point on it, claiming a wider territory now that the underlying market has spread out.

The actual decision space is in fact higher-dimensional than the diagram. Real choices on inference silicon involve at least: latency at the median, latency at the 99th percentile (which is what determines user experience for agentic workflows), throughput per watt (the new key metric, as we discussed in the previous paper of this series), throughput per dollar of capex, energy consumption, memory bandwidth, reliability under sustained load, ability to mix different model sizes on the same hardware. The 2D Pareto representation is a useful simplification. The real problem is in n dimensions, and serious strategic optimisation across these dimensions is exactly what dynamic programming, multi-criteria decision analysis, and modern reinforcement learning try to formalise.

This is also why Michael Porter, in his foundational 1996 essay What Is Strategy?, defined strategy as “the deliberate choice of a different set of activities to deliver a unique mix of value”. The word trade-off appears more times than any other concept in that essay. For Porter, the essence of strategy is not optimisation; it is the choice of which trade-offs to accept and which to refuse. A company without trade-offs has no strategy. It has operations. Choosing where to sit on a Pareto frontier is exactly what Porter was asking strategists to do. Richard Rumelt, in Good Strategy / Bad Strategy (2011), drives the same point harder: he argues that what most organisations call strategy is actually a list of objectives, ambitions, or financial targets, none of which qualify. Asserting goals is not a strategy. Maximising returns is not a strategy. A strategy is a coherent diagnosis of the situation, a guiding policy for handling it, and a coordinated set of actions that follow. The Pareto framing operationalises Rumelt’s point: the strategy is the position chosen on the frontier, not the desire to be everywhere on it at once.

When local optimisation fails, and why no-surge-pricing is the perfect example

If multi-dimensional reasoning is one source of apparent sub-optimisation, the second is even more interesting. It comes from situations where the time dimension itself changes the structure of the optimal decision.

Standard microeconomics, in its purest form, assumes static utility maximisation. A seller facing a buyer with high willingness to pay should charge a high price. A seller in a shortage should charge whatever the market will bear. Marshall, Hicks, and the entire mainstream tradition predict surge pricing as the rational response to scarcity. The buyer captures the consumer surplus only when supply is abundant; the seller captures it when supply is scarce. This is, as a matter of pure local optimisation, correct.

Yet Huang explicitly refuses to surge price. He runs a multi-year shortage, with hyperscalers and neoclouds queuing for GPUs, and he allocates first-in-first-out. His own words, from the Dwarkesh interview:

“I prefer to be dependable, to be the foundation of the industry. You don’t need to second-guess. If I quoted you a price, we quoted you a price. That’s it. If demand goes through the roof, so be it.”

By the static optimisation framework, Nvidia is leaving billions on the table. So why?

The answer requires expanding the framework. Two structural conditions in Nvidia’s environment change the calculation entirely.

First, lock-in. A hyperscaler making a $50B capex commitment over three years, betting on Nvidia silicon, is not in a relationship that can be renegotiated cycle by cycle. To name what is happening here precisely, we need a concept from Oliver Williamson (Nobel 2009): asset specificity⁷.

An asset is specific when its value is primarily realised inside a particular relationship, and would drop substantially outside it. Williamson distinguished several kinds, and the AI infrastructure stack hits most of them at once: site specificity (TSMC’s most advanced fabs are physically located near Nvidia’s reference designs), physical asset specificity (a fab tooled for one customer’s process node is not easily redeployed), human asset specificity (engineers carrying twenty years of joint roadmap memory), dedicated assets (capacity reserved years in advance for a single counterparty), temporal specificity (HBM allocations valued only if delivered in synchrony with GPU shipments). Each of these makes the asset valuable here, with this counterparty, on this schedule, and progressively less valuable anywhere else.

The relationship is not the asset. The relationship is what keeps the assets in their high-value configuration. TSMC’s reserved fab capacity for Nvidia is itself the specific asset; the multi-decade trust between the two firms is what prevents either side from defecting and collapsing that asset’s value. Same on the other side: Nvidia’s Hopper-Blackwell-Rubin supply commitments are the specific asset; the relationship is the structure that keeps them economic.

You know you are in this configuration when three signs co-occur. First, exit costs are highly asymmetric: walking away destroys more value for the leaver than for the counterparty, but both lose meaningful surplus. Second, contracts cannot fully specify the relationship, because too many contingencies are unknowable upfront, so trust does the work that paper would do in a spot transaction. Third, the relationship itself starts to be valued and protected as an asset, with dedicated relationship managers, joint roadmaps, multi-year reviews. The thirty-year Nvidia / TSMC relationship has all three signatures.

The implications follow directly. In a relationship with specific assets, opportunism is not just rude; it is value-destructive on both sides simultaneously. Surge pricing inside such a relationship would shrink the relational surplus that finances both companies’ growth: the hyperscaler reduces its commitment volume next cycle, TSMC re-balances toward other customers, the joint roadmap stops compounding. Williamson formalised this as the central insight of transaction cost economics, for which he received the Nobel Prize. This connects directly to the institutional knowledge thesis we developed in the previous paper: trust accumulated over decades inside relationship-specific investment is itself a form of institutional capital that cannot be reproduced by any third party, however well capitalised.

Second, repeated information. The hyperscaler does not interact with Nvidia once. It interacts every cycle, every product generation, with a memory of every prior interaction. Robert Axelrod, in The Evolution of Cooperation (1984), ran the famous tournament that demonstrated empirically how repeated games change the equilibrium⁸. The strategy that wins, against all comers, is not maximum extraction. It is tit-for-tat: cooperate by default, retaliate against defection, forgive once retaliation has corrected behaviour. The point that survives the tournament is that cooperation emerges spontaneously in repeated games where players have memory and where the future matters relative to the present. The optimisation horizon drives the cooperative equilibrium.

Apply both insights to Nvidia, and the no-surge-pricing decision is no longer a sub-optimisation. It is the answer to a different problem. The function Nvidia is optimising is not the margin on this cycle. It is the present value of cumulative payoff over many cycles, in an environment where assets are specific and information is repeated. Once the function is correctly specified, the static answer (surge price) becomes wrong, and the apparent paradox dissolves.

We are back at a Pareto frontier, but in a different dimensional space. Nvidia’s strategic frontier has at least three dimensions: immediate margin, depth of supply chain trust, and predictability of revenue for public-market analysts. The no-surge-pricing decision sacrifices on the first axis to gain on the other two. Mathematically optimal, in the dimensional space that matters.

The same logic explains a second decision Huang mentions. The thirty-year relationship with TSMC has been conducted, in his telling, without a formal legal contract. By contractarian logic this is incomprehensible: how can two of the largest companies in the world coordinate $200B+ in supply chain commitments without binding legal documents? The answer is the same. A formal contract is the strategy of one-shot transactions. Repeated cooperation builds an asset (mutual predictability) that no contract can replace, and that any contract attempt would diminish by signalling that the relationship needs external enforcement. The absence of a contract is itself the signal that the trust is total. Williamson would recognise it as the textbook outcome of long-run relationship-specific investment.

The general lesson on strategy

Two threads run through everything above, and they form the explicit lesson of the paper.

First, most apparent strategic irrationalities come from misspecified objective functions. When a successful company makes a decision that looks irrational, the most productive response is not to assume the company is wrong. It is to ask: what dimensions did I forget to include? what time horizon am I implicitly assuming? what game structure am I attributing? Almost always, the apparently irrational decision is the rational answer to a richer problem⁹. Costco caps margins because customer trust compounds over decades and surge-priced trust dissipates instantly. Apple refuses to license because product quality requires control of the whole stack¹⁰. Berkshire holds cash because optionality is itself an asset whose value rises in a world of correlated bubbles¹¹. Each looks like a sub-optimisation only when you have specified too few dimensions. Each is, in the correct space, an optimum.

Second, the great trap of strategy is the belief that local efficiency aggregates into global efficiency. Markowitz demonstrated this for portfolios: a portfolio of individually excellent stocks can be inferior to a less ambitious but better diversified one, because correlation matters. Williamson demonstrated it for transactions: each transaction maximised locally destroys the relational asset that allows future transactions to compound. And Huang, without formalising it explicitly, demonstrates it for pricing: each cycle maximised locally would destroy the trust ecosystem that allows Nvidia to capture the structural surplus. In none of these cases is the local optimum the global optimum. The reason is structural: dynamics, lock-in, and asset specificity introduce correlations between successive decisions that make naive aggregation wrong.

These two lessons are not new. They are, in some form, the entire content of mature strategy thinking from Porter onwards, and they sit on the deeper foundation laid by game theory: John von Neumann and Oskar Morgenstern in Theory of Games and Economic Behavior (1944), then John Nash with the Nash equilibrium (Nobel 1994), and Thomas Schelling on credible commitment and tacit coordination (Nobel 2005). Nash’s insight intersects directly with the Pareto frontier: a Nash equilibrium is a state where no player can improve unilaterally, but it is not necessarily Pareto-optimal. The two concepts together explain why competitive players acting rationally can collectively land below the frontier (the prisoner’s dilemma is the canonical example), and why getting onto the frontier requires the kind of repeated-game cooperation Axelrod identified.

Nash equilibrium versus Pareto optimum: why rational individual play lands below the cooperative frontier

Yet these foundations are continuously forgotten, because static optimisation is taught everywhere and dynamic reasoning almost nowhere. The Pareto frontier in n dimensions is a mental discipline more than a technical tool. Thinking correctly about decisions requires correctly identifying the dimensions, the horizon, the game structure, and the lock-in. Most strategic errors come from having truncated one of these four things.

That is what Huang has internalised, probably from running Nvidia long enough to have made the mistakes that come from the simpler frame. It is what makes apparent sub-optimisations like no-surge-pricing into the foundations of long-term advantage. It is the reason the same handful of companies keep showing up, decade after decade, as the durable winners: not because they are smarter at the optimisation, but because they have correctly specified the function being optimised.

The next paper of this series turns to a different angle on the same broader transformation: how the boundaries between operators and investors are eroding, with actors that used to be financial becoming industrial, and actors that used to be industrial becoming financial. The Williamson lens we just used returns there with a vengeance.

Conceptual Toolbox

Concept	Definition	Why it matters
Pareto frontier	The set of points in a multi-dimensional space such that no point dominates another on all dimensions simultaneously.	Reframes any choice involving more than one dimension. There is no single best option, only a curve of optimal trade-offs.
Efficient frontier (Markowitz)	The Pareto frontier applied to investment portfolios, with risk and return as axes.	Demonstrates that “better in absolute” dissolves the moment you accept multiple dimensions. The optimum depends on the optimiser.
Asset specificity (Williamson)	Investments whose value is primarily realised within a specific relationship rather than on the open market.	In specific relationships, opportunism is value-destructive on both sides. Explains why Nvidia refuses to surge price its hyperscalers.
Repeated games (Axelrod)	When the same parties interact across many cycles with memory, cooperation emerges spontaneously, and tit-for-tat beats maximum extraction.	The optimisation horizon, not the moral position, drives the cooperative equilibrium. Same logic applies to Nvidia/TSMC and Costco/customers.
Trade-off as strategy (Porter)	Strategy is the deliberate choice of which trade-offs to accept and which to refuse, on a Pareto frontier.	A company without trade-offs has no strategy, it has operations. Choosing where to sit on the frontier is the strategic act.

“Strategy is the deliberate choice of a different set of activities to deliver a unique mix of value.”

Michael Porter, What Is Strategy?, 1996.

[Source] Costco’s published policy is to cap markups at 14% on branded items and 15% on its private label Kirkland, against an industry average of 25 to 50%. The discipline is enforced top-down by management as a deliberate brand-and-trust accumulation mechanism, not as a market-imposed constraint. The result is a membership renewal rate above 90% in the United States, which is what actually carries the economic model. ↩
[Definition] By Pareto-frontier endpoints we mean the two extreme regimes of the inference market: ultra-low-latency, high-cost-per-token serving for premium agents, and ultra-high-throughput, latency-tolerant serving for bulk inference. Each endpoint sits at one extreme of the trade-off curve. We develop the geometry of the frontier, with the underlying chart, in the section What a Pareto frontier is below. ↩
[Aside] The framing borrows from the statistician George Box’s formulation that “all models are wrong, but some are useful”. Strategy, similarly, never operates on the right model of the world; it operates on the most useful approximation given what is being decided. The ambition is not to find the true objective function but to specify one rich enough that the local answer aligns with the long-run one. ↩
[Definition] A Pareto frontier (or Pareto front) is the set of points in a multi-dimensional space such that no point dominates another on all dimensions simultaneously. Originating in welfare economics, the concept now anchors multi-objective optimisation in engineering, machine learning, and decision theory. ↩
[Context] Groq (founded 2016, Mountain View) designs the LPU (Language Processing Unit), a custom inference chip built around a single sequential execution model rather than the parallel architectures used by GPUs. The trade-off is extreme: very low latency (sub-millisecond first-token times) at the cost of higher per-token serving cost. Groq is the canonical hardware bet on the premium agents end of the inference frontier. ↩
[Context] CUDA (Compute Unified Device Architecture) is Nvidia’s proprietary parallel computing platform and programming model, launched in 2006. Twenty years of cumulative software, libraries, and developer expertise have made CUDA the default substrate of AI training and inference. “Folding into the CUDA ecosystem” means a competing chip becomes accessible through the same toolchain, libraries, and developer interface as Nvidia GPUs, rather than requiring a separate stack. ↩
[Source] Oliver Williamson developed the concept across Markets and Hierarchies (1975) and the foundational article Transaction-Cost Economics: The Governance of Contractual Relations (Journal of Law and Economics, 1979). The Nobel Committee awarded him the 2009 Prize for “his analysis of economic governance, especially the boundaries of the firm”. The asset-specificity typology used here (site, physical, human, dedicated, brand-name, temporal) was finalised in his 1985 book The Economic Institutions of Capitalism. ↩
[Source] Axelrod’s tournament invited researchers to submit strategies for the Iterated Prisoner’s Dilemma, then ran them against each other in round-robin. Anatol Rapoport’s tit-for-tat (cooperate first, then mirror the opponent’s last move) won both the original 1979 tournament and a larger 1980 rerun against entries that had been written specifically to defeat it. The robustness of cooperation under repeated interaction has been the foundational empirical result of evolutionary game theory ever since. ↩
[Aside] This methodological move is the strategic equivalent of philosophy’s Principle of Charity: when interpreting an opponent’s argument, assume the strongest plausible version before critiquing it. Applied to strategy, the principle becomes: when a successful organisation makes a choice that looks irrational, assume the strongest plausible objective function before concluding the choice is wrong. The result is the same kind of intellectual hygiene Daniel Dennett calls steelmanning. ↩
[Context] Apple declined to license iOS to third parties for nearly two decades, despite repeated analyst pressure. The strategic logic is that operating system value depends on hardware-software co-design (battery management, sensor fusion, neural engine, security enclave), all of which require fine-grained control of the silicon. Licensing would have unlocked short-term licensing revenue at the cost of the long-term differentiation that has driven Apple’s services attach rate. ↩
[Context] Berkshire Hathaway held over $300B in cash and short-term Treasuries by mid-2025, a record level. Buffett has consistently framed this as “optionality preservation” in correspondence to shareholders, not as inability to find deals. The position is large enough that Berkshire’s interest income alone generated more than the GDP of several mid-sized countries during the high-rate cycle of 2023 to 2025. ↩

← Return to the series overview: Reading Nvidia: On strategy, defensibility, and capital in the AI era

Part 4 will cover the return of the real world and the erosion of boundaries between operators and investors.

In this part

What a Pareto frontier is, and why it shows up everywhere

When local optimisation fails, and why no-surge-pricing is the perfect example

The general lesson on strategy

Conceptual Toolbox

Footnotes