Last quarter, a CFO asked her finance lead a simple question: which customer is driving our OpenAI bill? The finance lead had a tagging strategy, a FinOps tool, and a monthly invoice from OpenAI. She still couldn't answer. The invoice showed one number. The tagging system had nothing to attach to. The application code called the API directly and returned a completion. No cost center. No customer ID. No feature flag. Just tokens in, tokens out, and a line item at the end of the month.
This is the attribution gap that AI spend has created inside the FinOps Framework. Allocation, as a capability, was built for compute and storage, where every resource has a tag, a project, or an account behind it. A token call has none of that. The unit of spend, one API request, doesn't carry the metadata that tagging depends on. If FinOps teams want to answer per-customer, per-feature, or per-agent cost questions, attribution has to move from the billing layer to the traffic layer.
Why can't tagging solve AI cost attribution?
Tagging fails on AI spend because a token call has no label to attach a tag to. Traditional cloud Allocation works like this: you tag an EC2 instance with team=platform, and every hour that instance runs, the cost flows to platform. The tag lives on the resource. The resource generates the cost. The chain is unbroken.
AI API calls don't work that way. When your application calls openai.chat.completions.create(), there's no resource to tag. There's a request, a response, and a token count logged on OpenAI's side. Your cloud provider never sees it. Your tagging system never touches it. At the end of the month, OpenAI sends you one invoice per organization, sometimes broken out by model, and that's the entire universe of information you have.
The FinOps Foundation defines Allocation as a core capability, but the framework assumes a taggable unit exists. For SaaS-style AI providers, it doesn't. Anthropic and Bedrock have the same shape. Provider dashboards break spend down by API key or model, not by the customer or feature that triggered the request. Even if you split API keys by team, you can't split them per customer without spinning up thousands of keys and managing them like a directory service.
This is why FinOps practitioners who lift the tagging playbook from EC2 and apply it to OpenAI end up with the same answer their CFO started with: a total. No allocation.
Three teams, three questions, one invoice
The same AI bill triggers three different questions inside a company, and none of them can be answered from provider billing data alone.
Finance wants unit economics. What does it cost to serve each customer? If a top-tier customer generates $40 in Claude calls per month and pays $99 for the subscription, that's a very different margin story than a customer generating $2 in calls. Without per-customer attribution, gross margin calculations are guesses.
Product wants feature-level ROI. Which features are worth the token spend? A summarization feature might drive retention and cost $8k a month. An experimental agent might cost $22k a month and be used by 40 people. Product can't prioritize without knowing which feature owns which slice of the bill.
The CFO wants to know why the number moved. When the Anthropic invoice doubles month-over-month, someone has to explain it. Was it a new customer? A prompt change? A runaway agent looping on itself? The invoice doesn't say. It just shows the total.
Raw billing data ingested into a FinOps dashboard doesn't answer any of these questions. It re-presents the invoice. That's a reporting layer, not an Allocation layer. The FinOps Framework treats Automation, Tools, & Services as the software that enables Capabilities, including Allocation for emerging spend categories. For AI, that automation has to reach into the request itself, because that's where the business context lives.
What does traffic-level attribution actually do?
Traffic-level attribution reads every AI request as it happens and maps the token count to the customer, feature, and agent that triggered it. Instead of trying to reconstruct attribution from an aggregated invoice, it captures context at the moment the request is made, before that context disappears.
Here's what that looks like in practice. Your application calls Claude with a prompt. The request passes through a gateway or proxy that inspects the payload, tags it with the customer ID from the auth token, the feature name from the request path, and the agent identifier from the calling service. The request continues to Anthropic. The response comes back. The gateway logs the token counts against the tagged dimensions. At the end of the month, you don't have one number from Anthropic. You have thousands of attributed events, grouped by whatever business unit you care about.
This works across OpenAI, Anthropic, and Bedrock because the traffic pattern is the same. Every provider returns token counts in the response. Every request carries application-level context. The attribution logic sits between your code and the provider, so you don't rewrite application code to add tags. If you want per-customer cost, you get per-customer cost. If you want per-agent cost, you get per-agent cost.
This is closer to how observability works than how tagging works. It's also why the tooling looks different. A tag-based FinOps tool wants to ingest a CUR file. A traffic-level attribution tool wants to sit in the request path. Different architecture, different capability. For a deeper look at why this shift matters, see our earlier analysis in Why traditional FinOps breaks down with AI workloads.
Where this fits in the FinOps Framework
AI cost attribution isn't a new dashboard. It's a new Allocation Capability for a spend category that doesn't fit the existing Enterprise Architecture assumptions. The FinOps Framework was designed with taggable resources in mind. Compute, storage, network, database, all have identifiers that tools can group by. AI API spend doesn't. The unit of consumption is a token, and tokens live inside requests, not on resources.
That means AI attribution belongs in the Automation, Tools, & Services layer of the framework, but with a different implementation pattern than what tag-based Allocation uses. Instead of ingesting billing exports and grouping by tag, the tooling reads live traffic and generates attribution records that feed back into the Allocation Capability. The output looks the same to a finance user: a cost per team, per customer, per feature. The mechanism underneath is different.
This matters for how FinOps practitioners plan their toolchain. If you're building an AI FinOps practice on top of a tag-based tool, you'll hit the same wall the CFO's finance lead hit. If you're evaluating a tool for AI attribution, the question to ask isn't "does it ingest my OpenAI bill?" Every tool does that. The question is "does it attribute my OpenAI bill?" That requires traffic-level instrumentation, not billing ingestion.
The FinOps Framework's Capabilities page is worth re-reading with AI workloads in mind. Anomaly Management, Allocation, and Forecasting all assume attribution exists. For AI spend, attribution is the missing input.
Frequently asked
questions
How do I attribute OpenAI or Anthropic spend to a specific customer?
You have to capture the customer ID at the moment the request is made, because the provider invoice doesn't include it. This means instrumenting the traffic between your application and the provider, either through a gateway, proxy, or SDK wrapper, so every token count can be logged against a customer identifier.
Why can't tagging solve AI cost attribution?
Tagging requires a resource to attach to. Token calls to OpenAI, Anthropic, or Bedrock don't create a taggable resource in your cloud account, so there's nothing for the tag to bind to. The cost shows up as a single aggregate line item on the provider invoice.
What is the cost per feature for an AI product?
It's the sum of token costs generated by requests made from that feature, divided across the customers or sessions that used it. You can only calculate it if you're capturing the feature identifier on each request as it happens, since provider billing data doesn't include feature context.
How do FinOps teams allocate LLM spend across customers and features?
They move attribution from the billing layer to the traffic layer. Instead of trying to split an aggregate invoice after the fact, they capture per-request context, customer, feature, agent, at the time of the API call, and roll those events up into cost reports.
How do you track token usage by agent or workflow?
You need instrumentation that identifies the calling agent or workflow on each request. This can be done with request headers, service identifiers, or a gateway that inspects the call pattern, then attributes the token counts returned by the provider to the agent that triggered them.
AI spend broke the tagging model because a token call has nothing to tag. Finance, product, and the CFO are all asking questions that provider invoices can't answer, and no amount of billing ingestion will change that. Attribution has to move to the traffic layer, where every request still carries the customer, feature, and agent context that gives the number meaning.