May 16, 2025 5 min read

The AI Compute Currency Problem

While strides are being made in model efficiency, model complexity is outpacing efficiency gains. Recent comments by industry leaders suggest that this is going to be the status quo for the time being. Newer models, with o3, o4-mini, 2.5 Pro are getting smarter, and handling increasingly complex tasks (Deep Research, agentic workflows, etc.). That means charging $20/mo even with usage limits does not cut it, because the limits would be too low for frontier models. Naturally, AI service providers are introducing higher limit tiers for their latest and greatest models. OpenAI and Anthropic seem to be taking two different approaches to this problem. In this post, I want to highlight some of the challenges in explaining "AI usage" to users. I think this model places trust on the service provider in a way that is kind of unique to AI. Thinking further out, I believe advanced cryptographic techniques could help mitigate these issues.

From Flat Fees to Usage Tiers

Let us recap how we got here.

What could have worked best was charging users a flat monthly fee for unlimited (maybe with soft limits such as with unlimited mobile data) queries. However, inference costs have exploded with "thinking" models and Deep Research capabilities. There is a clear need to charge users more in order to access these frontier models.

With the internet, users knew exactly what they paid for, e.g., $X for 100 GB of data and a bandwidth of up to 100 Mbps. Now, the users might not have had a good idea throughout the internet era whether that 100 GB would be enough for their needs. But they did have a good idea what their activities on the internet would consume. Casual browsing of Facebook and the likes can vary in usage, but it's possible to get a relatively accurate estimate. When it came to downloads, users knew exactly how much of their download limit would be consumed by this file because the size of the file is displayed beforehand. This estimate is also not one that is controlled by the internet service provider (ISP), meaning the users do not need to trust the ISP. Because of this clarity, there is an intentionality in the user choosing to spend this currency.

Bandwidth can and does fluctuate, and the ISPs technically only say "up to" 100 Mbps. But a user could measure the speed they were getting at different points in time, and hold the ISP accountable if it was frequently low. Besides, bandwidth issues usually impact users in a given radius so the onus is not entirely on each individual user.

Below is a comparison of current AI pricing tiers^[1]^[2]:

Provider	Plan	Cost	Usage Limit	Notes
OpenAI	ChatGPT Pro	$200/mo	Unlimited*	All reasoning models, GPT-4o, GPT-4.1
Anthropic	Pro	$20/mo	5x Free usage	Reasoning models
Anthropic	Max (5×)	$100/mo	5× Pro usage	Expanded access vs Pro tier
Anthropic	Max (20×)	$200/mo	20× Pro usage	Maximum flexibility
GitHub	Copilot Pro+	$39/mo	1,500 premium requests/mo	Additional at $0.04/request

OpenAI's and GitHub's solution is to introduce a tier with a higher rate limit, using a query as their unit of measurement, and putting a concrete number on the number of queries offered for a given time period. Anthropic uses a more nebulous term of usage.

The pricing dynamics here are analogous to something like what internet service providers do. As bandwidth got cheaper and infrastructure improved with things like fibre optic cables, higher download limits and browsing speeds were offered to customers at no (or marginally higher) extra cost, thereby increasing the value per dollar customers were receiving. Normally, the story ends here and we could say AI model providers will follow a similar pattern. I think there an interesting catch here, one that might not be so straightforward to tackle.

The Fluidity of Compute Units

With AI queries, compute costs vary wildly by prompt, making any "unit" hard to predict. Maybe some day we'll get there. Given this, AI companies are considering alternatives with giving the users a set of "compute units" every month. This seems like a good idea at first. But from the user's point of view, they still have no idea how many compute units their query will take. Estimating compute units a priori is highly nontrivial. So we end up in a place where the company is going to provide users with credits, in a wallet that is managed by the company. When a user clicks “submit” to send a query, they must trust the company to charge them correctly. Even if cost estimates are provided, those figures are still proprietary numbers generated by the provider.

Thinking adversarially for a second, an AI company would see that a user is very high in their usage and decide to swap out the model for something more efficient. Given that none of the models are public, it would be very hard for the user to figure this out (assuming the models are not wildly different). This might sound like a conspiracy theory, but it's meant as a thought exercise to figure out where the gaps are so that we can build better systems that require lesser trust.

Third-party audits do not close the trust gap, because an adversarial model swap as described above is still possible. Having the users inspect model weights is not desirable as they are the companies' IP a lot of the times. Fortunately, cryptography offers a fix: A Zero-Knowledge Proof (ZKP) is a tool that stops an AI provider from launching this sort of model swap attack.

In the interest of keeping things light, here's the bird's eye view of how a zero-knowledge proof for this scenario could look like:

Commitment: The AI provider publishes a “commitment” to its model weights (and perhaps some metadata). This commitment, like a cryptographic hash, reveals nothing about the model itself.
Query Proof: Each time the user submits a prompt, the provider runs the model and generates a proof that the response came from the committed weights.
Verification: The user checks the proof against the commitment, which outputs accept or reject.

If the provider cheated and swapped the model out for a different one, we have a guarantee that they will not be able to generate a proof that verifies correctly. If you want to learn more about ZKPs, here's a fantastic video by one of the pioneers in the field.

Note: An eagle-eyed reader might ask that it is still possible for the provider to cheat (say they offer o4 as a service, but choose to commit to o3 from the get-go). This is indeed possible! Establishing that the model being committed to is in fact o4 is not possible in a vacuum. We will have to rely on additional information, such as a particular training dataset, or specific training code, or benchmark results.

Although ZKPs have flourished in blockchain over the past decade, their application to LLM inference remains experimental, with many research challenges to overcome. Yet, with the AI inference market already a multibillion-dollar industry, developing transparent billing is imperative. Deployment of trust-minimizing tools such as ZKPs will lay the groundwork for transparent, auditable AI services as this market continues to scale.

“Unlimited” subject to abuse guardrails. ↩︎
As of 16th May, 2025. ↩︎