I came across this article from a week ago regarding AI monetization models and their sustainability.
…
Gartner’s Sommer studies long-term economic market trends related to generative AI, including calculating just how much money is at stake. Between 2024 and 2029, he said, Gartner estimates that capital investment in AI data centers will reach about $6.3 trillion — a “massive amount of money.”
To avoid a write-down of these assets, major AI model providers would ideally generate a return on invested capital (ROIC) of about 25 percent, Sommer said. (That’s about what Amazon, Microsoft, and Google tend to earn on their overall capital investments.) On the other hand, if the returns fall below 12 percent, institutional capital loses interest — there’s better money elsewhere, Sommer said. Below 7 percent, you’re in write-down territory, which is “an unmitigated disaster for all of the investors in this technology,” Sommer said.
To reach that bare minimum of 7 percent, Gartner forecasts that large AI companies would need to earn cumulatively close to $7 trillion in AI-driven revenue through 2029, which is close to $2 trillion per year by the end of the period. In order to achieve “historic returns,” the providers would need to earn nearly $8.2 trillion in the same period.
OpenAI has already made $600 billion in spending commitments through 2030, the company said in February, which Sommer says is already a “massive step down” from the $1.4 trillion it had planned before. Based on OpenAI’s revenue forecasts and potential compound annual growth, Sommer said that even in the best-case scenario, he predicts that the lab would only hit a fraction of the overall spend required to hit that 7 percent ROIC.
…
To hit investors’ revenue expectations, providers would need to process a “mind-bending” number of tokens, Sommer said.
By most measures, companies’ numbers are already pretty big. Google announced it was processing 1.3 quadrillion tokens in October, for instance. If you add all the providers’ estimates up, Sommer said, you get 100 to 200 quadrillion tokens a year. But to achieve the the $2 trillion in annual spend Gartner calculated, providers would need to be generating, by conservative estimates, a cumulative 10 sextillion tokens per year. (To make that slightly less abstract, a quadrillion has 15 zeros, and a sextillion has 21.) Even assuming a very generous profit margin of 10 percent per token, that would mean that token consumption between now and 2030 would need to grow by 50,000–100,000x.
…
There is plenty of other interesting content in the article, but the part I bolded particularly caught my eye. Unless the Gartner analyst’s calculations are off by several orders of magnitude, achieving a 50,000x – 100,000x increase in token consumption seems suddenly impossible by 2030.
If we consider how much additional computing capacity is expected for AI inference, a certain estimate for growth between 2025 and 2030 is provided by JLL (2026 Global Data Center Outlook):
The data center sector is projected to increase by 97 GW between 2025 and 2030, effectively doubling in size over a five-year period. By 2030, global data center capacity could reach 200 GW. This rapid growth will be driven largely by hyperscale cloud expansion and AI demand.
In the report, the share of AI inference is estimated as follows:
From this graph and the more detailed figures presented above, one can quickly calculate that, measured in watts, the inference capacity available in data centers in 2025 was approx. 9% * 97 GW = 8.7 GW, and in 2030 it is expected to be approx. 37% * 200 GW = 74 GW. Thus, the growth factor is 74 GW / 8.7 GW = approx. 8.5. We could perhaps optimistically round this up to a 10x figure.
This, of course, only measures growth through power consumption. If we think about how many tokens 2030 hardware can produce, we have to try to estimate the growth in compute efficiency (flops/W) and the development of compute required by models (token/flops).
To support this estimation, NVIDIA provides the following rough figures for the last couple of GPU generations:
NVIDIA Blackwell is the correct choice when electricity dominates TCO because Blackwell Ultra (GB300 NVL72) delivers up to 50x higher throughput per megawatt versus Hopper resulting in 35x lower cost per million tokens, while Blackwell (GB200 NVL72) delivers 10x throughput per megawatt and 15x lower cost per million tokens versus the prior generation, together producing the highest revenue per kilowatt-hour of any independently benchmarked inference platform.
One significant component in the achieved efficiency improvements, alongside others, has been the shift to lower-precision quantization (fp4), which allows for computation on simpler hardware. Regarding quantization, we’ve likely reached the end of the road for the most part, but perhaps other similar architectural improvements can be found and implemented over the coming years. Let’s extrapolate then, that the compute capacity installed in 2030 will be approx. 50x more energy-efficient (measured in flops/W required for AI computation) compared to what is currently available.
The amount of compute required by models (flops/token) seems to have grown sharply in recent years as model sizes have increased. However, if we optimistically assume that despite the expected growth in model size, the required amount of compute remains the same even in 2030 (as model optimization might allow this despite size increases), we can set the multiplier for this to 1.
So, with the 2030 compute capacity, data centers could process approximately 10 * 50 * 1 = 500 times the amount of tokens for AI models at that time, relative to how many tokens current AI models can be processed right now.
At first glance, a 500-fold increase sounds like a good addition, but then again, the Gartner analyst assumed a 50,000x – 100,000x volume of tokens to be processed for the cost calculations behind data centers to be on a sustainable footing. In this quickly calculated 500x capacity, there is still a deficit equivalent to a 100x–200x additional multiplier; in other words, we are off by a couple of orders of magnitude. To get the numbers to line up, we would need either significantly more data center capacity than predicted, significantly more energy-efficient hardware, or much more optimized models. And likely a combination of all of these. Or alternatively, the token multipliers behind those cost calculations would need to be brought down significantly.
Well, it’s possible that while drinking my morning coffee, I missed something essential, as I didn’t even manage to get into the right ballpark.