← Back to writing

Why your LLM bill is unpredictable (and how to fix it)

Most teams shipping AI features can tell you their total LLM bill to the dollar and almost nothing about where it came from. The invoice says 14,000 dollars. It does not say that one underused export feature is responsible for 40 percent of it, or that a single power user is quietly running up a four-figure tab. Without per-feature attribution, every cost conversation turns into guesswork, and every optimization is a shot in the dark.

The fix is not a smarter model or a cheaper provider. It is a tagging layer that ties every token back to the feature, customer, and workflow that triggered it. Here is how to build it and what to do once you can finally see the numbers.

Why the provider invoice is not enough

A raw provider bill gives you one number per model per month. That is fine for accounting and useless for engineering. It cannot answer the questions that actually drive decisions: which feature is the most expensive to run, which customers cost more than they pay, and whether last week's prompt change made things better or worse.

The result is a familiar failure mode. Spend creeps up month over month, someone notices, and the team spends a week reverse-engineering the increase from logs that were never designed to answer the question. By the time the culprit is found, another feature has shipped and the baseline has moved again. We see this pattern in nearly every cost engagement, and it is the same root issue behind the real cost of maintaining AI products: the spend is visible only in aggregate, so it is managed only in aggregate.

The five views you actually need

Effective LLM cost attribution comes down to slicing the same token data five ways. If your tracking can produce these five views, you can answer almost any cost question that comes up.

Spend by feature

The most important view, and the one teams most often lack. Every LLM call should carry a feature tag so you can rank features by cost. This is what turns a vague claim that the bill is too high into a concrete finding: the document summarizer is 38 percent of spend, so look there first.

Spend by model and provider

Which models carry the load and what each costs. This is the view that tells you whether an expensive model is doing work a cheaper one could handle, which feeds directly into a routing decision.

Spend by customer or team

Per-customer cost is the difference between a healthy gross margin and a silent loss. In any usage-heavy product, a small number of accounts drive a large share of inference, and you want to know that before renewal, not after.

Spend by endpoint or workflow

An agentic workflow can fan out into dozens of model calls per user action. Attributing cost to the workflow, not just the individual call, is the only way to see the true price of a feature that loops or chains.

Token trends over time

Input, output, and cache-read tokens tracked over time, so a regression shows up as a slope change the day it ships rather than a surprise at the end of the month.

How to build the tagging layer

The mistake most teams make is overbuilding the dashboard and underbuilding the tagging. A polished dashboard over untagged data still cannot tell you which feature is expensive. Spend your effort at the point where the request is made.

Tag at the call site

Every LLM call should attach metadata before it leaves your application: a feature identifier, the customer or tenant ID, the workflow or trace ID, and the environment. This is a small wrapper around your client, not a platform migration. Once the tags are on the request, every downstream view becomes a group-by.

Standardize the fields

OpenTelemetry's GenAI semantic conventions are a sensible baseline because they already define the fields that matter: model, input tokens, output tokens, cache-read tokens, conversation or trace IDs, and provider. Adopting a standard now means your data is portable later if you change observability tools, and it keeps field names consistent across services.

Put the gateway to work

If your calls already pass through a gateway or proxy, that is the natural place to capture cost and tags before the request reaches the provider. A gateway layer gives you one chokepoint for attribution, rate limiting, and failover at once, which is why it pairs well with the kind of model routing that cuts AI costs. The router decides which model handles a call; the same layer records what that call cost and which feature asked for it.

Choose a place to read it back

For most teams, provider-native billing plus request metadata in your logs plus an observability tool that carries cost in the trace covers the essentials. Open-source tracing tools can hold cost at the trace level for developer debugging, while a FinOps view handles financial governance. Mature organizations usually run both: one for engineers chasing a slow or expensive trace, one for finance tracking margin. The right level of investment scales with your spend, and it is a core part of treating AI infrastructure costs as a managed line item rather than a mystery.

What attribution unlocks

Visibility is not the goal; the decisions it enables are. Once you can see spend by feature, the optimization work stops being guesswork.

You can route the expensive features to cheaper models with confidence, because you know which features are expensive. You can move the deferrable, no-one-is-waiting features to a batch endpoint for a 50 percent discount. You can set per-customer cost alerts so a runaway account pages you the same day instead of at invoice time. And you can prove a prompt change worked, because you have a before-and-after number, the same discipline that let us take one engagement from a five-figure bill to less than half of it in our LLM cost optimization breakdown.

Attribution is the prerequisite for every other lever. You cannot route, cache, or batch your way to savings if you do not know which feature to point the lever at. Build the tagging layer first, and the rest of the cost work becomes a series of obvious, measurable moves.

Frequently asked questions

Do I need a dedicated tool to attribute LLM costs?

Not to start. The minimum is metadata on every call plus a way to query it. A dedicated observability or FinOps tool helps as spend grows, but the tagging discipline at the call site is what makes any of those tools useful.

What metadata should every LLM call carry?

At minimum: a feature identifier, the customer or tenant ID, a trace or workflow ID, the model and provider, and token counts including cache reads. Standardizing on OpenTelemetry GenAI field names keeps this portable.

How do I attribute cost in a multi-step agent?

Carry a single trace or workflow ID through every call the agent makes, and sum cost by that ID. That converts a scatter of individual calls into one attributable workflow cost, which is the number that actually reflects the feature.

Can I track cost per customer for margin analysis?

Yes, and you should. Tag each call with the tenant ID and aggregate by customer. This surfaces the accounts whose usage outruns their plan, which is often the highest-leverage thing attribution reveals.

What is the first thing to do once I can see per-feature cost?

Rank features by spend and look at the top one or two. The biggest line item is almost always where the easiest wins are, whether that is routing it to a cheaper model, caching a shared prefix, or moving it to batch.

Get shipped

Rather we just build it?

Book a free scoping call and we'll ship your production-safe AI feature this week.